On Principled Entropy Exploration in Policy Optimization

In this paper, we investigate Exploratory Conservative Policy Optimization (ECPO), a policy optimization strategy that improves exploration behavior while assuring monotonic progress in a principled objective. ECPO conducts maximum entropy exploration within a mirror descent framework, but updates policies using reversed KL projection. This formulation bypasses undesirable mode seeking behavior and avoids premature convergence to sub-optimal policies, while still supporting strong theoretical properties such as guaranteed policy improvement. Experimental evaluations demonstrate that the proposed method significantly improves practical exploration and surpasses the empirical performance of state-of-the art policy optimization methods in a set of benchmark tasks.

Bibtex

@inproceedings{mei2019principled,
title={On principled entropy exploration in policy optimization},
author={Mei, Jincheng and Xiao, Chenjun and Huang, Ruitong and Schuurmans, Dale and M{\”u}ller, Martin},
booktitle={Proceedings of the 28th International Joint Conference on Artificial Intelligence},
pages={3130–3136},
year={2019},
organization={AAAI Press}
}

Related Research

Our NeurIPS 2021 Reading List

Our NeurIPS 2021 Reading List

Y. Cao, K. Y. C. Lui, T. Durand, J. He, P. Xu, N. Mehrasa, A. Radovic, A. Lehrmann, R. Deng, A. Abdi, M. Schlegel, and S. Liu.

Computer Vision; Data Visualization; Graph Representation Learning; Learning And Generalization; Natural Language Processing; Optimization; Reinforcement Learning; Time series Modelling; Unsupervised Learning

Research
Heterogeneous Multi-task Learning with Expert Diversity

Heterogeneous Multi-task Learning with Expert Diversity

G. Oliveira, and F. Tung.

Computer Vision; Natural Language Processing; Reinforcement Learning

Research
Desired characteristics for real-world RL agents

Desired characteristics for real-world RL agents

P. Hernandez-Leal, and Y. Gao.

Reinforcement Learning

Research

Cookies Settings

Bibtex

Related Research