A primary concern in every real-world application of AI is the model’s robustness. While robustness in the broadest sense refers to a model’s reliability outside of the ‘laboratory conditions’ used during design and testing, it can be difficult to find much grasp of the problem at this level of abstraction. Considering this, discussion of robustness in AI often focuses on a relatively well-defined subproblem that can benefit from specialized techniques and tools: the problem of handling malicious inputs. This blog post will review several aspects of the malicious input problems that specialized tools can address, discuss the relevance of these different aspects to applied AI best-practices, and survey the advantages and drawbacks of leading toolkits for robustness monitoring and mitigation.

Let us define ‘malicious inputs’ as inputs whose values are outside of the generally expected range for the model’s domain. The issue with such inputs is that, in general, even models that are genuinely well-optimized for their domain may experience catastrophic failure. Work on robustness typically further splits this issue into two kinds of use-cases: handling malicious inputs designed by malicious users or that indicate changes in the data-generating environment. In the first use-case, known as adversarial robustness, we are concerned with inputs designed to elicit false predictions or unsound decisions from the model. In the second use case, known as robustness to data drift, we are concerned with unexpected changes to the target distribution that make the trained model obsolete. 

The study of adversarial robustness is a thriving branch of fundamental research in AI, though the relationship between the formal problem of adversarial robustness and the applied problem of dealing with malicious inputs is complex. Considered formally, the adversarial robustness of a model is defined by its insensitivity to small changes in an input’s value: a model is robust on input x if no possible small change to x can radically change the model’s output.

The motivation for this standard formal definition of robustness comes from the discovery that ordinary neural network models are extremely sensitive to targeted small perturbations of natural inputs (‘adversarial examples’). Within a fundamental research context, studying a model’s adversarial robustness is one promising way to study the relationship of the model’s ‘reasoning’ to human frames of reference. A growing body of research even suggests that training models against adversarial examples improves other measures of interpretability and alignment with human judgment. 

All that said, the benefit of this improved human alignment for a model’s reliability or its capacity to handle unexpected inputs is generally difficult to quantify, and adversarial robustness training can have serious costs to a model’s test-accuracy. What makes robustness to adversarial examples crucial is the direct ability to exploit real-world AI systems via small perturbations in the input.

While it’s important to remember that (despite its name) adversarial robustness doesn’t guarantee robustness to malicious inputs in general, it can deliver certain guarantees against malicious inputs generated with small-perturbations methods. These methods form the basis of the most reliable, cost-effective, and adaptable procedures for crafting malicious inputs, so countering them goes some way towards mitigating cyber 
threats in general. Furthermore, attacks based on small perturbations are especially worrisome because of their potential long-term utility for an attacker: Since sufficiently small perturbations are invisible to human eyes, attackers may be able to continuously manipulate a system (e.g. a video recommendation engine) by infusing seemingly normal inputs with hidden signals that induce a judgment of their choice. For all these reasons, training for adversarial robustness is increasingly necessary for production models despite its limited scope.

In contrast to adversarial inputs, the problem of data-drift looks at ‘natural’ model failure, in other words, genuinely unpredictable changes in the domain that generates the data: Changes in travel, shopping, etc. in models during Covid are a prime example of data-drift, occurring despite best practices in sampling, training, and testing before production.

Notice that outside of such cases, the intuitive idea of ‘drift robustness’ quickly becomes ill-defined: To the degree that an instance of data-drift can be mitigated prior to production, there is not much to distinguish optimizing ‘drift robustness’ from good machine-learning practices at large. There is no method for predicting what is unpredictable and making the most of what’s predictable is simply the work of machine learning.

While there might not be any such thing as a drift-robust model, this does not mean that there’s no such thing as drift robustness. Rather than being a property of a model or a training practice, robustness to data-drift is best seen as an important property of active AI platforms. An organization’s AI platform is robust to data-drift to the extent that a well-coordinated team continues overseeing it after production, monitoring signs of data-drift and reserving resources and infrastructure for retraining models and acquiring additional data as needed. It is here that specialized toolkits can play a crucial role, by offering data scientists a suite of automated monitoring methods for detecting and reporting signs of data-drift.

Though the two aspects or use-cases of robustness we’ve discussed within this blog post do not have much overlap with one another, they each represent an effort to find a discrete technical niche within the complex real-world problem of dealing with unexpected inputs. Each use-case is therefore closely associated with a family of measurable quantities, and subsequently with a suite of techniques that benefits from partial automation through a toolkit. We end by surveying some of the leading toolkits currently on offer, including Borealis’ own Advertorch adversarial robustness research framework:

Adversarial Robustness Toolbox (ART) 


  • Supports many frameworks, data types & machine learning tasks 
  • Offers a wide range of adversarial robustness defenses


  • Focused on Adversarial Robustness 
  • User needs to figure out what epsilon makes sense for their model 

Robustness Gym


  • Promises a “simple and extensible toolkit for robustness testing that supports the entire spectrum of evaluation methodologies, from adversarial attacks to rule-based data augmentations.” 


  • Still in relatively early development

 Advertorch by Borealis  


  • A toolkit focused on enabling researchers studying adversarial robustness


  • Adversarial robustness only
  • Suitable for research rather than production



  • An adversarial example library for constructing attacks, building defenses, and benchmarking both 
  • Supports JAX, PyTorch, and TF2 

Drawbacks :

  • Not all attacks and defenses are supported for each model type 
  • Users need to pick an epsilon value to use 
  • Currently no real offering of defenses



  • A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX 


  • Only identifies potential issues, doesn’t provide defenses 
  • Users need to pick an epsilon value to use 

Alibi Detect 


  • Contains algorithms to detect outliers, adversarial inputs and drift detection 
  • Supports drift detection for many types of data: Tabular, Image, Time Series, Text, Categorical Features, Online, Feature Level 
  • Covers both adversarial robustness and data drift

Evidently AI 


  • Good interactive tool to help identify data drift between two sets of data 
  • Can be used during validation or as part of production monitoring 


  • No defense or mitigation options

Fiddler AI 


  • End-to-end production pipeline monitoring, including concept, prediction, feature and label drift
  • Comes with visualizations and drop-down selections to pin-point impact of drift

Drawback :

  • Does not deal with adversarial robustness out of the box
  • Not open source

Optimizing for adversarial robustness is an open research area defined by complex tradeoffs between provability, efficiency, coverage, and side-effects. This simply means that businesses need to understand that building and testing robust machine-learning systems will remain a challenge, especially where a high-degree of accuracy truly matters. The toolkits that we’ve reviewed provide a selection of generally reliable techniques for enforcing adversarial robustness in a small radius around a model’s training data. As ML and AI become increasingly integral to different facets of society, we must keep the existing risks and limitations in mind.