Dr. Jakob Foerster on self-play and zero-shot coordination in Hanabi
In recent years we have seen fast progress on a number of zero-sum benchmark problems in AI, e.g. Go, Poker and Dota. In contrast, success in the real world requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. Recently, the card game Hanabi has been established as a new benchmark environment to fill this gap. In particular, Hanabi is interesting to humans, since it is entirely focused on theory of mind, i.e., the ability to reason over the intentions, beliefs and point of view of other agents when observing their actions. This is particularly important in applications, such as communication, assistive technologies and autonomous driving.
This talk will provide an update on recent progress in this area. It will start out with novel state-of-the-art methods for the self-play setting. Next, it will introduce the Zero-Shot Coordination setting as a new frontier for multi-agent research. Finally it will introduce Other-Play as a novel learning algorithm, which allows agents to coordinate ad-hoc and biases learning towards more human compatible policies.