Existing literature on adversarial training has largely focused on improving models' robustness. In this paper, we demonstrate an intriguing phenomenon about adversarial training – that adversarial robustness, unlike clean accuracy, is highly sensitive to the input data distribution. Even a semantics-preserving transformation on the input data distribution can cause drastically different robustness for the adversarially trained model, which is both trained and evaluated on the new distribution.
We discover this sensitivity by analyzing the Bayes classifier's clean accuracy and robust accuracy. Extensive empirical investigation confirms our finding. Numerous neural nets trained on MNIST and CIFAR10 variants achieve comparable clean accuracies, but they exhibit very different robustness when adversarially trained. This counter-intuitive phenomenon suggests that input data distribution alone can affect the adversarial robustness of trained neural networks, not necessarily the tasks themselves. Lastly, we discuss practical implications on evaluating adversarial robustness, and make initial attempts to understand this complex phenomenon.
Machine Learning Certification: Approaches and Challenges
Our NeurIPS 2021 Reading List
Computer Vision; Data Visualization; Graph Representation Learning; Learning And Generalization; Natural Language Processing; Optimization; Reinforcement Learning; Time series Modelling; Unsupervised Learning
Not Too Close and Not Too Far Enforcing Monotonicity Requires Penalizing The Right Points
Optimization; Responsible AI