In this paper, we investigate dimensionality reduction (DR) maps in an information retrieval setting from a quantitative topology point of view. In particular, we show that no DR maps can achieve perfect precision and perfect recall simultaneously. Thus a continuous DR map must have imperfect precision. We further prove an upper bound on the precision of Lipschitz continuous DR maps. While precision is a natural measure in an information retrieval setting, it does not measure ‘how’ wrong the retrieved data is. We therefore propose a new measure based on Wasserstein distance that comes with a similar theoretical guarantee. A key technical step in our proofs is a particular optimization problem of the $L_2$-Wasserstein distance over a constrained set of distributions. We provide a complete solution to this optimization problem, which can be of independent interest on the technical side.
Bibtex
@InProceedings{LuiDR2018,
Title = {Dimensionality Reduction has Quantifiable Imperfections: Two Geometric Bounds},
Author = {Kry Yik Chau Lui and Gavin Weiguang Ding and Ruitong Huang and Robert J. McCann},
Year = {2018},
Abstract = {In this paper, we investigate Dimensionality reduction (DR) maps in an information retrieval setting from a quantitative topology point of view. In particular, we show that no DR maps can achieve perfect precision and perfect recall simultaneously. Thus a continuous DR map must have imperfect precision. We further prove an upper bound on the precision of Lipschitz continuous DR maps. While precision is a natural measure in an information retrieval setting, it does not measure `how' wrong the retrieved data is. We therefore propose a new measure based on Wasserstein distance that comes with similar theoretical guarantee. A key technical step in our proofs is a particular optimization problem of the $L_2$-Wasserstein distance over a constrained set of distributions. We provide a complete solution to this optimization problem, which can be of independent interest on the technical side.},
Journal = {NIPS},
Url = {}
}
Related Research
-
Self-supervised Learning in Time-Series Forecasting — A Contrastive Learning Approach
Self-supervised Learning in Time-Series Forecasting — A Contrastive Learning Approach
T. Sylvain, L. Meng, and A. Lehrmann.
Research
-
Efficient CDF Approximations for Normalizing Flows
Efficient CDF Approximations for Normalizing Flows
C.S. Sastry, A. Lehrmann, M. Brubaker, and A. Radovic. Transactions on Machine Learning Research (TMLR)
Publications
-
PUMA: Performance Unchanged Model Augmentation for Training Data Removal
PUMA: Performance Unchanged Model Augmentation for Training Data Removal
G. Wu, M. Hashemi, and C. Srinivasa. Association for the Advancement of Artificial Intelligence (AAAI)
Publications