In this paper, we tackle the problem of weakly-supervised object grounding. For an image and a set of queries extracted from its description, the goal is to localize each query in the image. In a weakly-supervised setting, ground-truth query groundings are not accessible at training time. We propose a novel approach for weakly-supervised object grounding through iterative context reasoning in which we update query representations and region representations iteratively conditioning on each other. Such iterative contextual refinement gradually resolves ambiguity and vagueness in the queries and regions, thus helping to resolve challenges in grounding. We show the effectiveness of our proposed model on two challenging video object grounding datasets.


author = {Chen, Lei and Zhai, Mengyao and He, Jiawei and Mori, Greg},
year = {2019},
month = {10},
pages = {1407-1415},
title = {Object Grounding via Iterative Context Reasoning},
doi = {10.1109/ICCVW.2019.00177}