In this paper, we introduce an open vocabulary model for image hashtag prediction - the task of mapping an image to its accompanying hashtags. Recent work shows that to build an accurate hashtag prediction model, it is necessary to model the user because of the self-expression problem, in which similar image content may be labeled with different tags. To take into account the user behaviour, we propose a new model that extracts a representation of a user based on his/her image history. Our model allows to improve a user representation with new images or add a new user without retraining the model. Because new hashtags appear all the time on social networks, we design an open vocabulary model which can deal with new hashtags without retraining the model. Our model learns a cross-modal embedding between user conditional visual representations and hashtag word representations. Experiments on a subset of the YFCC100M dataset demonstrate the efficacy of our user representation in user conditional hashtag prediction and user retrieval. We further validate the open vocabulary prediction ability of our model.


author = {Durand, Thibaut},
title = {Learning User Representations for Open Vocabulary Image Hashtag Prediction},
booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
month = {June},
year = {2020}