Adversarial robustness is determined by the data points' margins, the distances in the input space from the data points to the decision boundary of the classifier. We study the connection between directly maximizing these margins and adversarial training. In particular, we show that these two different objectives have aligned gradient. Furthermore, we show that directly maximizing margins is an improvement on adversarial training, in the sense that it can be interpreted as adversarial training with automatically selected "correct" perturbation magnitudes that are different for each individual data point. Motivated by our theoretical analysis, we propose the Max-Margin Adversarial (MMA) training to maximize the average margin. We demonstrate the efficiency of the MMA training framework on the MNIST and CIFAR10 datasets. On both, our MMA trained models obtain state-of-the-art robustness under various ℓ∞ and ℓ2 attacks.


 title={Max-Margin Adversarial (MMA) Training: Direct Input Space Margin Maximization through Adversarial Training},
 author={Ding, Gavin Weiguang and Sharma, Yash and Lui, Kry Yik Chau and Huang, Ruitong},
 journal={arXiv preprint arXiv:1812.02637},