ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

4

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples

Preprint

Author(s): Tengyu Xu , Shaofeng Zou , Yingbin Liang

Publication date Created: 26 September 2019

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Gradient-based temporal difference (GTD) algorithms are widely used in off-policy learning scenarios. Among them, the two time-scale TD with gradient correction (TDC) algorithm has been shown to have superior performance. In contrast to previous studies that characterized the non-asymptotic convergence rate of TDC only under identical and independently distributed (i.i.d.) data samples, we provide the first non-asymptotic convergence analysis for two time-scale TDC under a non-i.i.d.\ Markovian sample path and linear function approximation. We show that the two time-scale TDC can converge as fast as O(log t/(t^(2/3))) under diminishing stepsize, and can converge exponentially fast under constant stepsize, but at the cost of a non-vanishing error. We further propose a TDC algorithm with blockwisely diminishing stepsize, and show that it asymptotically converges with an arbitrarily small error at a blockwisely linear convergence rate. Our experiments demonstrate that such an algorithm converges as fast as TDC under constant stepsize, and still enjoys comparable accuracy as TDC under diminishing stepsize.

Related collections

Most cited references 12

Record: found
Abstract: not found
Article: not found

Learning to predict by the methods of temporal differences

Richard S. Sutton (1988)

0 comments Cited 425 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Book Chapter: not found

Residual Algorithms: Reinforcement Learning with Function Approximation

Leemon C. Baird (1995)

0 comments Cited 106 times – based on 0 reviews

Record: found
Abstract: not found
Article: not found

The O.D.E. Method for Convergence of Stochastic Approximation and Reinforcement Learning

V. S. Borkar, S. P. Meyn (2000)

0 comments Cited 82 times – based on 0 reviews      Review now

Author and article information

Journal

Publication date Created: 26 September 2019

Article

ArXiV ID: 1909.11907

SO-VID: 949d9095-9e6a-4782-a1c7-2a951c0c538b

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Comments To appear in NeurIPS 2019

Categories cs.LG stat.ML

ScienceOpen disciplines: Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Machine learning, Artificial intelligence

Comments

Comment on this article