ScienceOpen: research and publishing network

For Researchers

Search
Advanced search

6

views

    

0

recommends

0

shares

Record: found
Abstract: found
Article: found

Is Open Access

Scalable Bilinear \(\pi\) Learning Using State and Action Features

Preprint

Author(s): Yichen Chen , Lihong Li , Mengdi Wang

Publication date Created: 26 April 2018

Read this article at

ScienceOpen ArXiv

There is no author summary for this article yet. Authors can add summaries to their articles on ScienceOpen to make them more accessible to a non-specialist audience.

Abstract

Approximate linear programming (ALP) represents one of the major algorithmic families to solve large-scale Markov decision processes (MDP). In this work, we study a primal-dual formulation of the ALP, and develop a scalable, model-free algorithm called bilinear \(\pi\) learning for reinforcement learning when a sampling oracle is provided. This algorithm enjoys a number of advantages. First, it adopts (bi)linear models to represent the high-dimensional value function and state-action distributions, using given state and action features. Its run-time complexity depends on the number of features, not the size of the underlying MDPs. Second, it operates in a fully online fashion without having to store any sample, thus having minimal memory footprint. Third, we prove that it is sample-efficient, solving for the optimal policy to high precision with a sample complexity linear in the dimension of the parameter space.

Related collections

Most cited references 10

Record: found
Abstract: not found
Article: not found

The Nonstochastic Multiarmed Bandit Problem

Peter Auer, Nicolo Cesa-Bianchi, Yoav Freund … (2002)

0 comments Cited 380 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

An analysis of temporal-difference learning with function approximation

J.N. Tsitsiklis, B. Van Roy (1997)

0 comments Cited 170 times – based on 0 reviews      Review now

Record: found
Abstract: not found
Article: not found

Exponentiated Gradient versus Gradient Descent for Linear Predictors

Jyrki Kivinen, Manfred Warmuth (1997)

0 comments Cited 126 times – based on 0 reviews      Review now

Author and article information

Journal

Publication date Created: 26 April 2018

Article

ArXiV ID: 1804.10328

SO-VID: cf9c1bfd-34ee-4764-b77c-e1edb5b7ff44

License:

http://arxiv.org/licenses/nonexclusive-distrib/1.0/

History

Custom metadata

Categories cs.LG math.OC stat.ML

ScienceOpen disciplines: Numerical methods,Machine learning,Artificial intelligence

Data availability:

ScienceOpen disciplines: Numerical methods, Machine learning, Artificial intelligence

Comments

Comment on this article