Mohammad Norouzi on sequence prediction meets reinforcement learning

Neural sequence models have seen remarkable success across sequence prediction tasks including machine translation and speech recognition. I will give an overview of the predominant approach to supervised neural sequence models known as teacher forcing. Then, I will present optimal completion distillation (OCD), which improves upon teacher forcing by training a sequence model on its own mistakes. Given a partial sequence generated by the model, we find the optimal completion in terms of the total edit distance and teach the model to mimic such optimal completions as much as possible. OCD achieves the state-of-the-art on speech recognition on the WSJ dataset. In the second half of the talk, I will focus on sequence prediction tasks that involve discovering latent programs as part of the optimization. I will present our approach called memory augmented policy optimization (MAPO) that improves upon REINFORCE by expressing the expected return objective as a weighted sum of two terms: an expectation over a memory of trajectories with high rewards, and a separate expectation over the trajectories outside of the memory. MAPO achieves the state-of-the-art on the challenging WikiTableQuestions dataset.

Kathryn Hume joins Borealis AI leadership team

News

Kathryn Hume joins Borealis AI leadership team

Meet the Class of 2019: Introducing Borealis AI’s Graduate Fellowship winners

News

Meet the Class of 2019: Introducing Borealis AI’s Graduate Fellowship winners

Borealis AI fellows honoured in Montreal ceremony

News

Cookies Settings

Kathryn Hume joins Borealis AI leadership team

Meet the Class of 2019: Introducing Borealis AI’s Graduate Fellowship winners

Borealis AI fellows honoured in Montreal ceremony