4
views
0
recommends
+1 Recommend
0 collections
    0
    shares
      • Record: found
      • Abstract: found
      • Article: found
      Is Open Access

      Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

      Preprint
      , ,

      Read this article at

      Bookmark
          There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

          Abstract

          Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm converges as fast as TDC under constant stepsize, and still enjoys comparable accuracy as TDC under diminishing stepsize.

          Related collections

          Most cited references12

          • Record: found
          • Abstract: not found
          • Article: not found

          Learning to predict by the methods of temporal differences

            Bookmark
            • Record: found
            • Abstract: not found
            • Book Chapter: not found

            Residual Algorithms: Reinforcement Learning with Function Approximation

              Bookmark
              • Record: found
              • Abstract: not found
              • Article: not found

              The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

                Bookmark

                Author and article information

                Journal
                26 September 2019
                Article
                1909.11907
                949d9095-9e6a-4782-a1c7-2a951c0c538b

                http://arxiv.org/licenses/nonexclusive-distrib/1.0/

                History
                Custom metadata
                To appear in NeurIPS 2019
                cs.LG stat.ML

                Machine learning,Artificial intelligence
                Machine learning, Artificial intelligence

                Comments

                Comment on this article