Metatrace Actor-Critic: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning

Reinforcement learning (RL) has had many successes, but significant hyperparameter tuning is commonly required to achieve good performance. Furthermore, when nonlinear function approximation is used, non-stationarity in the state representation can lead to learning instability. A variety of techniques exist to combat this – most notably experience replay or the use of parallel actors. These techniques stabilize learning by making the RL problem more similar to the supervised setting. However, they come at the cost of moving away from the RL problem as it is typically formulated, that is, a single agent learning online without maintaining a large database of training examples.

To address these issues, we propose Metatrace, a meta-gradient descent based algorithm to tune the step-size online. Metatrace leverages the structure of eligibility traces, and works for both tuning a scalar step-size and a respective step-size for each parameter. We empirically evaluate Metatrace for actor-critic on the Arcade Learning Environment. Results show Metatrace can speed up learning, and improve performance in non-stationary settings.

Bibtex

@article{young2019metatrace,
title={Metatrace: Online Step-Size Tuning by Meta-Gradient Descent for Reinforcement Learning Control},
author={Young, Kenny and Wang, Baoxiang and Taylor, Matthew E},
journal={International Joint Conference on Artificial Intelligence},
year={2019}
}

Related Research

Our NeurIPS 2021 Reading List

Our NeurIPS 2021 Reading List

Y. Cao, K. Y. C. Lui, T. Durand, J. He, P. Xu, N. Mehrasa, A. Radovic, A. Lehrmann, R. Deng, A. Abdi, M. Schlegel, and S. Liu.

Computer Vision; Data Visualization; Graph Representation Learning; Learning And Generalization; Natural Language Processing; Optimization; Reinforcement Learning; Time series Modelling; Unsupervised Learning

Research
Heterogeneous Multi-task Learning with Expert Diversity

Heterogeneous Multi-task Learning with Expert Diversity

G. Oliveira, and F. Tung.

Computer Vision; Natural Language Processing; Reinforcement Learning

Research
Desired characteristics for real-world RL agents

Desired characteristics for real-world RL agents

P. Hernandez-Leal, and Y. Gao.

Reinforcement Learning

Research

Cookies Settings

Bibtex

Related Research