Authors: N. Mehrasa, T. Badamdorj, S. Liu, J. He

What's on your CVPR 2022 reading list? This year, we decided to share some of our favourite papers, organized by topic: self-supervised learning, unsupervised domain adaptation, knowledge distillation and contrastive learning. Our researchers selected the below papers from and provided a summary and their thoughts on the impact of the work. If you're attending the 2022 Conference on Computer Vision and Pattern Recognition in person this year we hope to see you at our booth #1020!


Self-supervised learning


Use All The Labels: A Hierarchical Multi-Label Contrastive Learning Framework

Shu Zhang, Ran Xu, Caiming Xiong, Chetan Ramaiah

by Xenon Team | Nazanin Mehrasa

Why do we reccommend? 
Recently self-supervised learning has shown promising performance in representation learning in different domains such as computer vision, natural language processing, and etc. Contrastive learning is a subclass of self-supervised learning in which representations are learned by contrasting positive and negative samples. In this paper, the authors study a contrastive learning approach for image datasets in which there exists a hierarchical relationship between labels in the label space. Most contrastive learning frameworks focus on leveraging a single supervisory signal to learn representations. This signal is typically defined by using different views of the same image as positive samples (e.g. by using augmentation techniques). In this paper, the authors proposed a contrastive learning approach in which the representations are learned by exploiting supervisory signals from using available labels in the datasets and hierarchical labels relationships. Pairs of positive and negative samples are built using these supervisory signals. They introduce hierarchy preserving losses which aim to learn an embedding function that could preserve the label hierarchy in the embedding space. The paper demonstrates the effectiveness of the proposed approach on down-stream tasks such as category prediction, sub-category retrieval and clustering.

Self-Supervised Models are Continual Learners

Enrico Fini, Victor G. Turrisi da Costa, Xavier Alameda-Pineda, Elisa Ricci, Karteek Alahari, Julien Mairal

by Xenon Team | Taivanbat Badamdorj

Why do we recommend? 
Self-supervised models are great for learning representations from large amounts of data. The learned representations can be used effectively for later down-stream tasks. However, prior works only consider training self-supervised models in an offline setting. In a practical setting, we may receive the data in a continuous stream. This paper shows that in this continual learning setting, self-supervised models have a large degradation in performance. To improve their performance, they are interested in learning semantic concepts that are invariant to the state of the model. Thus they supervise the current model’s representation with a model from a previous time step, and show that it can greatly improve their performance.


Unsupervised domain adaptation 


Continual Test-Time Domain Adaptation

Qin Wang, Olga Fink, Luc Van Gool, Dengxin Dai

by Xenon Team | Siqi Liu

Why do we recommend? 
Domain shifts at test time can negatively affect the performance of a trained model. Although there are many existing solutions, they usually do not consider a continually changing test domain distribution, which is a common situation in practice. In this work, the authors try to address the problem of adapting to a continually changing distribution at test time for a pre-trained model without accessing the original training data. Their method is based on self-training but consists of several key techniques that are aimed at dealing with error accumulation and catastrophic forgetting. (1) They use a mean model averaged with exponential decay over all previous models to generate pseudo-labels. (2) They selectively use data augmentation when generating pseudo-labels based on an estimate of the gap between the training and test domains using the confidence of the pre-trained model. (3) They randomly restore some weights of the model to the original weights in the pre-trained model after each gradient update at test time. The proposed method shows better performance than baselines based on batch norm adaptation, self-training, and entropy minimization in the experiments.


DINE: Domain Adaptation from Single and Multiple Black-box Predictors

Jian Liang, Dapeng Hu, Jiashi Feng, Ran He

by Xenon Team | Siqi Liu

Why do we recommend? 
Unsupervised domain adaptation (UDA) is the problem of adapting a model trained on data from one domain (source) to another domain (target) without access to the true labels in the latter. Traditionally, UDA methods assume having access to either the source domain data or the details of the source domain model when performing the adaptation. In this paper, the authors study a different setting of UDA, where only the predictions from the source domain model on the target domain data are available. Specifically, they study multi-class classification, where only the predicted (top-K) labels and the corresponding probabilities are available. They use knowledge distillation to deal with the lack of the source domain model itself by encouraging the target domain (student) model to align with a teacher model averaged over the source domain model, whose predicted labels are smoothed based on the provided probabilities, and the past student models. Furthermore, they introduce two regularization terms in the loss for training the student model to encourage interpolation consistency and prediction diversity. Finally, they fine-tune the distilled model on the target domain data only to further improve the performance. When compared against not only black-box UDA baselines but also previous methods that need source domain data or model details in the experiments, their method achieves very good overall performance.


Knowledge distillation


Knowledge distillation: A good teacher is patient and consistent

Lucas Beyer, Xiaohua Zhai, Amélie Royer, Larisa Markeeva, Rohan Anil, Alexander Kolesnikov

by Xenon Team | Taivanbat Badamdorj

Why do we recommend? 
In knowledge distillation, we emulate a teacher model with a student model. This is mainly done to reduce the size of the model while maintaining accuracy, but recent work has shown that we can even improve the performance of a model with the same architecture using the same technique. Thus, it’s a simple, effective, and general technique to improve a model’s performance. This paper doesn’t propose a new method for knowledge distillation, but instead identifies fundamental design choices that affect the efficacy of the final distilled model. They find that using the same heavily augmented input to the teacher and student and training for much longer than prior works produces superior results. They use cropped images with mix-up as their data augmentation, and train for thousands of epochs (rather than hundreds). While their student model might underperform the baseline in the early stages of training, it continues to improve and shows no signs of overfitting. While each design choice in this paper is trivial, together they lead to models that perform much better than the baseline models.


Contrastive learning


Contrastive Learning for Unsupervised Video Highlight Detection

Taivanbat Badamdorj, Mrigank Rochan, Yang Wang, Li Cheng

 by: Xenon Team | Taivanbat Badamdorj

Why do we recommend?
In our work, we aim to find the most interesting moments in videos. While most works require labeled data denoting which parts of each video are interesting, we learn to pick highlight moments without any labels. We do this by training a model to pick the most discriminative parts of videos that help differentiate it from other videos. We then show that these moments are often highlight moments and achieve state-of-the-art results.