- Evaluating Lossy Compression Rates of Deep Generative Models by Sicong Huang, Alireza Makhzani, Yanshuai Cao and Roger Grosse
- On Variational Learning of Controllable Representations for Text without Supervision by Peng Xu, Jackie CK Cheung, Yanshuai Cao
- Tails of Lipschitz Triangular Flows by Priyank Jaini, Ivan Kobyzev, Yaoliang Yu and Marcus A. Brubaker

and many members of the research team took the time to virtually attend ICML 2020. Now that the conference content is freely available online, it's a great time to look back and check out some of the highlights. In this post, four Borealis AI researchers describe the papers that they found most interesting or significant from the conference.

*Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan*

by Peng Xu

**Related Papers:**

- Emergence of Invariance and Disentanglement in Deep Representations
- Information-Theoretic Analysis of Generalization Capability of Learning Algorithms.
- $\mathcal{L}_{\text{DMI}}$: A Novel Information-theoretic Loss Function for Training Deep Nets Robust to Label Noise

**What problem does it solve?** Neural networks have the undesirable tendency to memorize information about the noisy labels. This paper shows that, for any algorithm, low values of mutual information between weights and training labels given inputs $I(w : \pmb{y}|\pmb{x})$ correspond to a reduction in memorization of label-noise and better generalization bounds. Novel training algorithms are proposed to optimize for this and achieve impressive empirical performances on noisy data.

**Why is this important?** Even in the presence of noisy labels, deep neural networks tend to memorize the training labels. This hurts the generalization performance generally and is particularly undesirable with noisy labels. Poor generalization due to label memorization is a significant problem because many large, real-world datasets are imperfectly labeled. From a information-theoretic perspective, this paper reveals the root of the memorization problem and proposes an approach that directly addresses it.

**The approach taken and how it relates to previous work**: Given a labeled dataset $S=(\pmb{x}, \pmb{y})$ for data $\pmb{x}=\{x^{(i)}\}_{i=1}^n$ and categorical labels $\pmb{y}=\{y^{(i)}\}_{i=1}^n$ and learning weights $w$, Achille & Soatto present a decomposition of the expected cross-entropy $H(\pmb{y}|\pmb{x}, w)$:

\[ H(\pmb{y} | \pmb{x}, w) = \underbrace{H(\pmb{y} | \pmb{x})}_{\text{intrinsic error}} + \underbrace{\mathbb{E}_{\pmb{x}, w}D_{\text{KL}}[p(\pmb{y}|\pmb{x})||f(\pmb{y}|\pmb{x}, w)]}_{\text{how good is the classifier}} - \underbrace{I(w : \pmb{y}|\pmb{x})}_{\text{memorization}}. \]

If the labels contain information beyond what can be inferred from inputs, the model may do well by memorizing the labels through the third term of the above equation. To demonstrate that $I(w:\pmb{y}|\pmb{x})$ is directly linked to memorization, this paper proves that any algorithm with small $I(w:\pmb{y}|\pmb{x})$ overfits less to label-noise in the training set. This theoretical result is also verified empirically, as shown in Figure 1. In addition, the information that weights contain about a training dataset $S$ has previously been linked to generalization (Xu & Raginsky), which can be tightened with small values of $I(w:\pmb{y}|\pmb{x})$.

To limit $I(w:\pmb{y}|\pmb{x})$, this paper first shows that the information in weights can be replaced by information in the gradients, and then introduces a variational bound on the information in gradients. The bound employs an auxiliary network that predicts gradients of the original loss without label information. Two ways of incorporating predicted gradients are explored: (a) using them in a regularization term for gradients of the original loss, and (b) using them to train the classifier.

**Results**: The authors set up experiments with noisy datasets to see how well the proposed methods perform for different types and amounts of label noise. The simplest baselines are standard cross-entropy (CE) and mean absolute error (MAE) loss functions. The next baseline is the forward correction approach (FW) proposed by Patrini *et* al., where the label-noise transition matrix is estimated and used to correct the loss function. Finally, they include the recently proposed determinant mutual information (DMI) loss proposed by Xu *et* al., which is the log-determinant of the confusion matrix between predicted and given labels. The proposed algorithm illustrates the effectiveness on versions of MNIST, CIFAR-10 and CIFAR-100 corrupted with various noise models, and on a large-scale dataset Clothing1M that has noisy labels, as shown in Fig 2.

*Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama and Daniel Nikovski*

by Pablo Hernandez-Leal

**Related Papers:**

- Learning state representation for deep actor-critic control.
- State representation learning for control: An overview.
- Densely Connected Convolutional Networks.

**What problem does it solve?** This paper starts from the question of whether learning good representations for states and using larger networks can help in learning better policies in deep reinforcement learning.

The paper mentions that many dynamical systems can be described succinctly by *sufficient statistics* which can be used to accurately to predict their future. However, there is still the question whether RL problems with intrinsically low-dimensional state (*i.e.,* with simple sufficient statistics) can benefit by *intentionally increasing* its dimensionality using a neural network with a good feature representation.

**Why is this important?** One of the major successes of neural networks in supervised learning is their ability to automatically acquire representations from raw data. However, in reinforcement learning the task is more complicated since policy learning and representation learning happen at the same time. For this reason, deep RL usually requires a large amount of data, potentially millions of samples or more. This limits the applicability of RL algorithms to real-world problems, for example, continuous control and robotics where that amount of data may not be practical to collect.

It can be assumed that increasing the dimensionality of the input might further complicate the learning process of RL agents. This paper argues this is not the case and that agents can learn more efficiently with the high-dimensional representations than with the lower-dimensional state observations. The authors hypothesize that larger networks (with a larger search space) is one of the reasons that allows agents to learn more complex functions of states, ultimately improving sample efficiency.

**The approach taken and how it relates to previous work**: The area of state representation learning focuses on representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. In this context, the authors highlight a previous work by Munk *et* al. where the output of a neural network is used as input for a deep RL algorithm. The main difference is that the goal of Munk *et* al. is to learn a *compact* representation, in contrast to the idea of this paper which is learning good higher-dimensional representations of state observations.

The paper proposes an Online Feature Extractor Network (OFENet) that uses neural networks to produce good representations that are used as inputs to a deep RL algorithm, see Figure 3.

OFENet is trained with the goal of preserving a sufficient statistic via an auxiliary task to predict future observations of the system. Formally, OFENet trains a feature extractor network for the states, $z_{o_t}=\phi_o(o_t)$, a feature extractor for the state-action, $z_{o_t,a_t}=\phi_{o,a}(o_t,a_t)$, and a prediction network $f_{pred}$ parameterized by $\theta_{pred}$. The parameters $\{\theta_o, \theta_{o,a}, \theta_{pred}\}$ are optimized to minimize the loss:

$$L=\mathbb{E}_{(o_t,a_t)\sim p,\pi} [||f_{pred}(z_{o_t},a_t) - o_{t+1}||^2]$$

which is interpreted as minimizing the prediction error of the next state.

The authors highlight the need for a network that can be optimized easily and produce meaningful high-dimensional representations. Their proposal is a variation of DenseNet, a densely connected convolutional network whose output is the concatenation of previous layer's outputs. OFENet uses a DenseNet architecture and is learned in on online fashion, at the same time as the agents policy, receiving as input observation and action as depicted in Figure 4.

**Results**: The paper evaluates 60 different architectures with varying connectivity, sizes and activation functions. The results showed that an architecture similar to DenseNet consistently achieved higher scores than the rest.

OFENet was evaluated with both on-policy (SAC and PPO) and off-policy reinforcement learning algorithms (TD3) on continuous control tasks. With these three algorithms the addition of OFENet obtained better results than without it.

Ablation experiments were performed to verify that just increasing the dimensionality of the state representation is not sufficient to improve performance. The key point is that generating *effective* higher dimensional representations, for example with OFENet, is required to obtain better performance.

*Rob Cornish, Anthony L. Caterini, George Deligiannidis, and Arnaud Doucet*

by Ivan Kobyzev

**Related Papers:**

- SurVAE Flows: Surjections to Bridge the Gap between VAEs and Flows
- A RAD approach to deep mixture models
- Augmented Neural ODEs

**What problem does it solve?** The key ingredient of a Normalizing Flow is a diffeomorphic function (*i.e.,* invertible function which is differentiable and its inverse is also differentiable). To model a complex target distribution a normalizing flow transforms a simple base measure via multiple diffeomorphisms stacked together. However, diffeomorphisms preserve topology; hence, the topologies of the supports of the base distribution and target distribution must be the same. This is problematic for the real-world data distributions which can have complicated topology (*e.g.,* they can be disconnected, have holes, etc). The paper proposes a way to replace a diffeomorphic map with a continuous family of diffeomorphisms to solve this problem.

**Why is this important?** It is generally believed that many distributions exhibit complex topology. Generative methods which are unable to learn different topologies will, at the very least, be less sample efficient in learning and potentially fail to learn important characteristics of the target distribution.

**The approach taken and how it relates to previous work**: Given a latent space $\mathcal{Z}$ and a target space $\mathcal{X}$, the paper considers a continuous family of diffeomorphisms $\{ F(\cdot, u): \mathcal{Z} \to \mathcal{X} \}_{u \in \mathcal{U}}$. The generative process of this model is given by

$$z \sim P_Z, \quad u \sim P_{U|Z}(\cdot|Z), \quad x = F(z,u),$$

which is illustrated in Figure 5. There is no closed form expression on the likelihood $p_X(x)$, hence to train the model one needs to use variational inference. This introduces an approximate posterior $q_{U|X} \approx p_{U|X}$, and constructs an variational lower bound on $p_X(x)$ which can be used for training. To increase expressiveness one can then stack several layers of this generative process.

The authors proved that under some conditions on the family $F_u$, the model can well represent a target distribution, even if its topology is irregular. The downside, compared to other normalizing flows, is that model doesn't allow for exact density computation. However estimates can be computed through the use of importance sampling.

**Results**: The performance of the method is demonstrated quantitatively and compared against Residual Flows, on which it's architecture is based. On MNIST and CIFAR-10 in particular it performs better than Residual Flow (Figure 6), improving the bits per dimension on the test set by a small but notable margin. On other standard datasets the improvements are even larger and, in some cases, state-of-the-art.

*Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin*

by Mohamed Osama Ahmed

**Related Papers:**

- Simple and scalable predictive uncertainty estimation using deep ensembles.
- Inconsistency of Bayesian Inference for Misspecified Linear Models, and a Proposal for Repairing It
- Bayesian deep learning and a probabilistic perspective of generalization.

**What problem does it solve?** The paper studies the performance of Bayesian neural network (BNN) models and why they have not been adopted in industry. BNNs promise better generalization, better uncertainty estimates of predictions, and should enable new deep learning applications such as continual learning. But despite these potentially promising benefits, they remain widely unused in practice. Most recent work in BNNs has focused on better approximations of the posterior. However this paper asks whether the actual posterior itself is the problem, *i.e.,* is it even worth approximating?

**Why is this important?** If the actual posterior learned by BNN is poor then efforts to construct better approximations are unlikely to produce better results and could actually hurt performance. Instead this would suggest that more efforts should be directed towards fixing the posterior itself before attempting to construct better approximations.

**The approach taken and how it relates to previous work**: Many recent BNN papers use the "cold posterior" trick. Instead of using the posterior $p(\theta|D) \propto \exp( -U(\theta) )$, where $U(\theta)= -\sum_{i=1}^{n} \log(y_i|x_i,\theta)-\log p(\theta)$, they use $p(\theta|D) \propto \exp(-U(\theta)/T)$ where $T$ is a temperature parameter. If $T=1$, then we recover the original posterior distribution. However, recent papers report good performance with a "cold posterior" where $T<1$. This causes the posterior to become sharper around the modes and the limiting case $T=0$ corresponds to maximum a posteriori (MAP) point estimate.

This paper studies why the cold posterior trick is needed. That is, why is the original posterior learned from BNN is not good enough on its own. The paper investigates three factors:

**Inference**: Monte Carlo methods are needed for posterior inference. Could the errors and approximations induced by the Monte Carlo methods cause problems? In particular, the paper studies different problems such as inaccurate SDE simulations, and minibatch noise.**Likelihood**: Since the likelihood function used for training BNNs is the same as the one used for SGD models, then this should not be a problem.

However, the paper raises the point that "Dirty Likelihoods" are used in recent deep learning models. For example batch normalization, dropout, and data augmentation may be causing problems.**Prior**: Most BNN work uses a Normal prior over the weights. The paper raises the question of whether this is a good prior which they call the "Bad Prior Hypothesis". Specifically, the hypothesis is that the current priors used for the parameters of BNNs may be inadequate, unintentionally introducing an incorrect bias into the posterior and potentially being too strong and overruling the data as model complexity increases. To study this the authors draw samples of the BNN parameters $\theta$ from the prior distribution and examine the predictive distribution that results with these randomly generated parameters.

**Results**: The experiments find that, consistent with previous work, the best posteriors are achieved with cold posteriors, *i.e.,* at temperatures $T<1$. This can be seen in Figure 7. While it's still not fully understood why, cold posteriors are needed to get good performance with BNNs.

Further, results suggest that neither inference nor the likelihood are the problem. Rather, the prior seems likely to be, at best, unintentionally and misleadingly informative. Indeed, current priors generally map all images to a single class. This is clearly unrealistic and undesirable behaviour of prior. This effect can be seen in Figure 8 which shows the class distribution over the training set for two different samples from the prior.

**Discussion**

To date there has been a significant amount of work on better approximations for the posterior in BNNs. While this is an important research direction for a number of reasons, this paper suggests that there are other directions that we should be pursuing. This is highlighted clearly by the fact that the performance of BNNs are worse than single point estimates trained by SGD and to improve the performance, cold posteriors are currently required. While this paper hasn't given a definitive answer to the question of why cold posteriors are needed or why BNNs are not more widely used, it has clearly indicated some important directions for future research.

]]>Foteini Agrafioti, Head, Borealis AI, explains why she believes Aiden, the AI-powered electronic trading platform developed by RBC Capital Markets and Borealis AI, is a scientific milestone for reinforcement learning and AI.

]]>*The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.*

** Jodie Wallis (JW)**:

Very simply put, explainability is about being able to detail how the AI came to the decision that it did in a given scenario, and what the drivers were behind that decision. Being able to explain how decisions are being made has always been important. But as the algorithms become more sophisticated and as AI starts to reach deeper and deeper into our decision-making processes, the need for explainability has become much more acute.

** JW**:

No. And that’s an important distinction. Explainability really comes in when we are using AI to make decisions or recommendations that affect people’s lives in some material way. If an algorithm is being used to make a credit decision on a customer, for example, or to decide who to hire or promote – that is a decision that will require explainability. But if I’m using AI and a recommendation engine to decide which pair of shoes to offer you in an online store, I don’t believe that kind of algorithm necessarily needs explaining.

** JW**:

I think one of the issues with explainability in AI is that it feels overwhelming and limiting at the same time. Many execs and IT leaders worry about the complexity and overhead they will need to create if they must explain all of their new models to numerous stakeholders before launching.

The problem with explainability is that the ease or difficulty with which you produce an explanation varies greatly with the type of algorithm you are using. The deeper the algorithm, the more difficult explainability is; the shallower the algorithm, the easier explainability becomes. And I think this has led some organizations to shy away from using certain types of deep learning algorithms.

** JW**:

It all starts with understanding which decisions and algorithms need to be explained and which do not. Right from the outset of the research, you need to know how important explainability is to the issue you are addressing. Does the action taken have a material impact on the life of an individual or individuals? If it’s not important, then the researcher or developer is free to explore any and all algorithms that might best fit their problem. But if explainability is going to be important, you will likely be limited in the types of algorithms you can use to solve that problem.

When we work with clients, that is almost always our first step – creating a framework to help decision-makers understand which actions require explainability and which do not.

** JW**:

No. And, frankly, I think the market is currently very immature in terms of the technical tools to help manage these aspects of responsible AI.

There are a few different schools of thought as to how you do explainability of deep algorithms. Some researchers and scientists are using reverse engineering techniques where they study the outputs and patterns of a sophisticated deep learning algorithm in order to create a less sophisticated model that is able to simulate those outputs in a more explainable way. The problem is that they are trading off a certain amount of accuracy in order to achieve explainability. But in some circumstances, that may be a worthwhile trade-off to make.

Ultimately, every situation will be different and there are no tools that truly ‘solve’ the explainability challenge. That’s why it is so important that designers and developers understand the need for explainability at the very start of the project – at the point where they can build it into the design.

** JW**:

I think governments and privacy commissioners will need play a key role in this area. Some are already making inroads. In Europe, for example, the General Data Protection Regulation (GDPR) talks about a person’s right to “meaningful information about the logic” when automated decisions are being made about them. Individual regulators are also looking at the challenge – Singapore’s monetary authority, for example, has published guidelines around explainability. But, currently, regulation is still pretty nascent.

** JW**:

This is about putting explainability at the very start of the process. Before you go and start solving for a particular business problem, you really need to understand the ultimate need for explainability. There’s no use developing a cool and sophisticated new tool if the business is unable to use it because they can’t explain it to stakeholders. So it is critical that developers and designers understand what will require explaining and select their tools accordingly.

** JW**:

I believe business leaders recognize that explainability is one element of their responsible AI strategy and framework. If they are not already thinking about this, I would suggest the business community spend a bit of time creating smart policies around the explainability of algorithms and extending existing frameworks – like their Code of Business Ethics – into AI development.

That will lead to two key value drivers for businesses. The first is that organizations will be freer to develop really interesting value through AI solutions. But, at the same time, they will be contributing to the societal discourse around the need for explainability. And, given the growing importance of the topic to consumers, regulators and oversight authorities, that can only be a good thing.

Jodie Wallis is the managing director for Artificial Intelligence (AI) in Canada at Accenture. In her role, Jodie works with clients across all of Canada’s industries to develop AI strategies, discover sources of value and implement AI solutions. She also leads Accenture’s collaboration with business partners, government and academia and oversees Accenture’s investments in the Canadian AI ecosystem.

]]>However, the pace of this change has brought with it some tough challenges, with recent failures in AI systems leading to mistrust and fear of the technology. In some instances, even among some of the world’s leading technology companies, it has led to a costly removal of AI products from the market. Many businesses are realizing that they need to slow down and invest in more responsible AI product development.

Building AI responsibly comes with numerous tradeoffs. A recent Borealis AI/RBC* survey found that while 77% of those currently using AI believe it is important for businesses to implement it in an ethical way, 93% say they experience barriers such as cost and lack of understanding when attempting to do so.

In putting issues such as fairness, stability, bias and explainability at the top of their agenda, business leaders are investing in a trusted partnership with their clients at the expense of speed to market. Doing the right thing comes at a cost; and in unregulated environments, businesses could be free to take risks that compromise society.

This is why I believe it is so important that the public, businesses and governments are educated about the risks involved in AI technologies and that product owners are held to account for ethical and transparent deployment of these technologies.

One particular area of concern to me is bias. I’ve seen too many examples of companies perpetuating racial or gender discrimination through poorly executed technologies such as facial recognition, and violating human rights through biased algorithms. In fact, our survey found that 88% of companies believe bias exists in their organization, but almost half (44 per cent) do not understand the challenges that bias presents in AI. The most important thing to understand is that this technology is not neutral, and that we are responsible for removing bias at every step.

Companies should review every level of AI development to ensure that any potential bias has been addressed. The different levels could include:

**Data level**: The data that serves as input to AI models for training may be collected in a way that under-represents certain groups. This is often the problem with face recognition systems which are sometimes designed, and thus better able to serve, individuals who fit within the races the models were trained on, though this problem is pervasive and not constrained to face recognition only.**Model level**: Bias can be introduced at any time during the development of an AI model through architecture decisions made by engineers. It is important to note that these biases may still be unintentional, yet the impact to specific groups is the same. For instance, a model can be tuned to be more receptive to English accents to the detriment of other languages.**Application level**: Even when a completely unbiased model can be engineered, there is still risk in how this AI is applied in the real world. The ethical considerations of the product owners together with (or lack of) regulation or internal controls can play a major role in tipping the scale.

While AI is finding applications across different sectors, each industry is unique and AI’s impact on people’s lives and freedoms can vary widely.

As part of the Royal Bank of Canada (RBC), Borealis AI’s mandate is to advance the field of machine learning by bringing products to life for the financial services industry. Banking is a fundamental aspect of our society and one that plays a major role in helping people achieve financial health and stability. The economic prosperity of our communities is partially the responsibility of this sector. As such, any technological misstep may mean that people don’t reach their full potential - in starting a business, sending children to university, or building a house, for instance. Banks have a contract with society that requires them to be a fair and vested partner in its success.

Borealis AI has the privilege and responsibility of building products that touch the lives of millions of clients. As part of RBC, we are driven by the mission to help our clients thrive and communities prosper, and when it comes to AI this means putting human integrity first.

Over the years, we have developed research practices that ensure that AI is developed responsibly and are supported by RBC’s data and model governance rules. Whether we work with our regulators to understand risks, or we scrutinize our own AI systems with thorough validation, building things the right way means that we routinely trade off speed for considerate and equitable innovation.

It is also our belief that knowledge and opportunity should be shared and, for this reason, we have made the decision to contribute our research, publications and scientific code in this area to the community, as well as share RBC’s approach and expertise in governing and securing AI models which has evolved over decades of practice. Under the RESPECT AI program we are also convening a number of industry and academic leaders who are contributing their experience and offer practical advice on how to approach building AI responsibly.

At a time where technology evolves fast and puts pressure on the ability to govern and secure, it is imperative that we slow our pace down and come together in order to develop robust solutions to the new challenges we are presented with. We hope that RESPECT AI is a step in this direction and that this series opens up some honest dialogue, exchange and sharing of our collective experiences in building AI responsibly.

**Data were collected as part of Maru BizPulse program, operated by Maru/Reports and Maru/Matchbox, which collects and tracks key metrics describing how Canadian businesses are feeling, thinking and behaving. The survey audience was made up of owners and senior decision-makers with Canadian businesses, with a particular focus on small and mid-sized businesses. The survey was fielded in September 2020. All sample was sourced through the Maru/Blue proprietary business panel and partners. A total of 622 responses were collected for this portion of the survey. For more information please visit www.marureports.com.*

The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.

Are current privacy laws and regulations enough to ensure data privacy in AI?

** Ann Cavoukian (AC)**:

The challenge with many data privacy laws is that they do not reflect the dynamic and evolving nature of today's technology. In this era of AI, social media, phishing expeditions and data leaks, I would argue that what we really need are proactive measures around privacy.

I think we are just starting to reach the tip of the iceberg on data privacy and protection. The majority of the iceberg is still unknown and, in many cases, unregulated. And that means that, rather than waiting for the safety net of regulation to kick in, we need to be thinking more about algorithmic transparency and designing privacy into the process.

** AC**:

I mean baking privacy protective measures right into the code and algorithms. It’s really about designing programs and models with privacy as the default setting.

During my time as Data Privacy Commissioner for Ontario, I created ‘Privacy by Design’, a framework for helping organizations prevent privacy breaches by embedding privacy into the design process. More recently, I created an extensive module called ‘AI Ethics by Design’ which was specifically intended to deal with the need for algorithmic transparency and accountability. There are seven key principles that underpin the framework, supported by strong documentation to facilitate ethical design and data symmetry. These principles, based on the original privacy by design framework, include respect for privacy as a fundamental human right.

** AC**:

Absolutely. And I’m happy to see that facial recognition tools are routinely banned in various US states and across Canada. Your face is your most sensitive personal information. And, more often than not with these applications, nobody is obtaining individual consent before capturing facial images; there may not even be visible notification that facial recognition tools are being used.

From a privacy perspective, that’s terrible. The point of privacy laws is to provide people with control over their personal data. Applications like facial recognition take away all of that control. All that aside, the technology has also proven to be highly inaccurate and frequently biased; time and again, their use has been struck down in the courts of justice and public opinion.

* AC*:

I think it is absolutely critical to consumers; virtually every study and survey confirms that. Consider what happened early on in the pandemic. A number of governments tried to launch so-called ‘contact tracing’ apps that offered fairly weak privacy controls. Uptake was dismal. Even though the apps could be potentially life-saving for users, few were willing to share their personal information with the government or put it into a centralized repository.

What worked well, on the other hand, was the Apple/Google exposure notification API. In part, it was well adopted because it works on the majority of smart phones in use in North America today. But, more importantly, it is fully privacy protected. I have personally had a number of 1-on-1 briefings from Apple and was highly confident that the API collected no personally identifiable information or geolocation data. Around the world, Canada included, apps based on that API have seen tremendous uptake within the population.

Now, remember, this is for an app that helps people avoid the biggest health crisis to face modern civilization. If they are not willing to trade their privacy for that, you would be crazy to assume consumers would trade it away simply for convenience or service.

** AC**:

Not at all. We need to get away from this view where privacy must be traded for something. It’s not an either/or, zero-sum proposition, involving trade-offs. Far better to enjoy multiple positive gains by embedding both privacy AND AI measures — not one to the exclusion of the other.

I also think the environment is rapidly changing. Consider, for example, the efforts being made by the Decentralized Identify Foundation, a global technology consortium that is working to find new ways to ensure privacy while allowing data to be commercialized. Efforts like these suggest we are moving towards a world where privacy can be embedded into AI by default.

** AC**:

The AI community needs to remember that – above all – transparency is essential. People need to be able to see that their privacy has been baked into the code and program by design. I would argue that public trust in AI is currently very low. The only way to build that trust is by embedding privacy by design.

I think the same advice goes for business executives and privacy oversight leaders: don’t just accept algorithms without looking under the hood first. There are all kinds of potential issues – privacy and ethics related – that can arise when applying AI. As an executive, you need to be sure your organization and people are always striving to protect personal data.

Dr. Ann Cavoukian is recognized as one of the world’s leading privacy experts. Appointed as the Information and Privacy Commissioner of Ontario, Canada in 1997, Dr. Cavoukian served an unprecedented three terms as Commissioner. There she created Privacy by Design, a framework that seeks to proactively embed privacy into the design specifications of information technologies, networked infrastructure and business practices, thereby achieving the strongest protection possible. Dr. Cavoukian is presently the Executive Director of the Global Privacy and Security by Design Centre.

]]>*The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI. *

Why should the AI community be focused on model validation?

** Sander Klous (SK)**:

It’s very tempting to run off and create all sorts of futuristic solutions using AI and Machine Learning. But without proper model validation and governance processes, creativity can very quickly turn into risk.

For example, I have seen examples of healthcare organizations launch algorithms that mysteriously strip people of their healthcare allowances; suddenly patients had to jump through hoops to demonstrate they were eligible, basically reversing the burden of proof. There are also examples of fraud detection algorithms at banks that generated too many false positives; fraud departments quickly became overwhelmed without the means to address the problem and it created frustration for customers.

In both cases, these unexpected outcomes should have been uncovered in the model validation process. Especially with these kind of new technological developments, where trustworthiness is still a fragile concept, positive experiences are the key to success.

** SK**:

Actually, quite the opposite. I’m not worried about AI becoming too smart. I’m worried about AI being too stupid right now. We tend to think AI can do a lot, but often it’s not as smart as we give it credit for. It does not magically solve any issues for you. It requires thorough process, robust validation, governance, controls and risk frameworks – amongst other things – to ensure we remove the ‘stupidity’ from these models. That’s not a future risk, it is something that needs to be addressed right now.

* SK*:

There are actually three lines of defense that normally come into play when we talk about model validation and risk management. The first line are the developers and designers themselves. They are the ones that need to be following the controls and considering the implications at the design level. The second line of defense is the risk function; it’s the risk function that needs to develop and drive adherence to those controls. There is also often a third line of defense that is served by an independent validator. These validators may be internal to the company or external advisors, depending on the circumstances. All three lines of defense need to work together to ensure proper model validation.

** SK**:

There are two big challenges. The first is a general lack of global standards around AI model validation. We see lots of different standards bodies working to come up with practical frameworks. But nothing is really mature yet. So it is very difficult for organizations to assess what ‘good’ looks like and then have that validated in the same way they would a financial statement, for example.

The other challenge comes down to process. Typically, the three lines of defense would work in a waterfall approach – design, followed by risk validation, followed by periodic independent auditing. But AI isn’t developed using a waterfall approach. And that means that it is becoming increasingly difficult to draw the lines of separation of duties around the three lines of defense.

** SK**:

That is certainly an ongoing problem and one that will take some time to resolve. As we saw with other similar regulation – like GDPR in Europe – it takes a lot of case law and a lot of collaboration to come up with a set of global standards. That can take years.

In the meantime, most organizations are creating their own set of validation standards and controls, largely based on industry good practices and evolving current standards. The problem is that the environment is continuously evolving and – until we have a set of global standards that can be audited – ‘good’ will continue to be a moving target.

Some organizations are creating their own ecosystems by collaborating with third parties and industry peers where there are common areas that they can all benefit from. For example, manufacturing companies who want to take a combined approach to validating specific parts of the process that other manufacturers would also use. This means that there could be standard validation for key aspects but not one overall industry approach that everyone adheres to.

** SK**:

To be successful at rapidly adopting AI models solutions, the second and third lines of defense need to reinvent themselves. Risk managers and oversight professionals are starting to rethink their approach to model validation in an agile environment. Unfortunately for them, this may result in a reduction in efficiency as validation processes are run and re-run as the models evolve. But this is not a bad thing; risk managers tell me they understand the trade-off between their own efficiency and that of the business. Some would be willing to see their efficiency cut in half just to deliver a 10 percent efficiency boost to their data scientists.

I think we are also starting to see an interesting evolution in the accounting and auditing professions around this issue. KPMG firms have been working with a range of clients to help develop their own internal standards and controls. The experiences we gained in these activities are the foundation of our "AI in Control framework" – it helps organizations build and evaluate sound AI models, driving better adoption, confidence, and compliance. I believe that eventually – once there is a set of global standards – the auditing profession will play an essential role in providing the same type of independent validation they already deliver on financial statements.

* SK*:

I think we all really need to keep challenging each other. You can’t just accept models on face value; you need to stay sharp and have rock-solid processes and frameworks for model validation. This is all new territory and we don’t really know what the ultimate standards and frameworks will look like. And that means it requires more thought and more caution than other areas where the roadmap has already been created.

I would argue that the greatest challenge is doing all of that while still encouraging the type of creativity, innovation and problem solving that drew you to consider an AI solution in the first place. Balancing that need for creativity against the controls of model validation can be extremely difficult.

Sander is a Professor of Big Data ecosystems for business and society at the University of Amsterdam and D&A Leader for KPMG in the Netherlands. He has a PhD in high energy physics and worked for over a decade on a number of projects for CERN, the world’s largest physics institute in Geneva. His best-selling book, 'We are Big Data', was runner-up for the management book of the year award in 2015. His new book, Trust in a Smart Society, is a top selling management book in the Netherlands.

]]>However, along with its myriad benefits, AI brings a host of new challenges which require the enhancing of governance processes and validation tools to ensure it is deployed safely and effectively within the enterprise.

With our combined expertise in AI safety, regulation, and model governance, Borealis AI and RBC have been navigating the complexities of this space to develop a robust, comprehensive AI validation process.

Model validation has played an integral role in banks’ traditional data analytics for many years. It helps to ensure that models perform as expected, identifies potential limitations and assumptions, and assesses possible negative impacts. Guidance from the US Federal Reserve dictates that “all model components—inputs, processing, outputs, and reports—should be subject to validation"[1] Banks in Canada have to adhere to similar regulations[2] and have already developed extensive validation processes to meet these requirements and ensure that they manage model risk appropriately. However, the advent of AI poses a number of challenges for traditional validation techniques.

First, it is costly to validate the large volume and variety of data used by AI models. AI models can make use of significantly more variables—referred to as “features” in AI parlance—than conventional quantitative models, and ensuring the integrity and suitability of these large datasets requires more computational power and more attention from validators. This challenge is particularly acute for AI models that use unstructured natural-language data like news feeds and legal or regulatory filings, which require new validation tools as well as more resources. Moreover, AI modelers often use “feature engineering” to transform raw data prior to training, which further increases the dimensionality of the data that must be validated.

Second, the complexity of AI methodologies makes it more difficult for validators to predict how AI models will perform after they are deployed. Compared to conventional models with relatively few features, it is harder to determine how AI models will behave—and why they behave this way—across the full range of inputs these models could face once deployed. AI models’ complexity can also make it more difficult to explain the reasons behind these models’ behavior, which in turn can make it harder to identify biased or unfair predictions. Ensuring that models do not lead some groups of customers to be treated unfairly is an important part of the validation process.

Finally, the dynamic nature of many AI models also creates unique validation challenges. Conventional models are typically calibrated once using a fixed training dataset before being deployed. AI models, on the other hand, often continue to learn after deployment as more data become available, and model performance may degrade over time if these new data are distributed differently or are of lower quality than the data used during development. These models must be validated in a way that takes their adaptiveness into account and frequently monitored to ensure that they remain robust and reliable.

To meet these challenges, banks must develop new validation methods that are better equipped to deal with the scale, complexity, and dynamism of AI. Borealis AI and RBC’s model governance team have joined forces to research and develop a new toolkit that automates key parts of the validation process, provides a more comprehensive view of model performance, and explores new approaches in areas like adversarial robustness and fairness. This pathbreaking technology is designed from the ground up to overcome the unique challenges of AI. AI safety is central to everything we do at Borealis AI, much like strong governance and risk management practices are central to RBC. This research will help to support faster AI deployment and more agile model development, and it will provide validators with more comprehensive and systematic assessments of model performance.

]]>

However properties of distributions constructed with normalizing flows remain less well understood theoretically. One important property is that of *tail behavior*. We can think about a distribution as having two regions: the *typical set* and the *tails* which are illustrated in Figure 1. The typical set is what is most often considered; it's the area where the distribution has a significant amount of density. That is, if you draw samples or have a set of training examples they're generally from the typical set of the distribution. How accurately a model captures the typical set is important when we want to use distributions to, for instance, generate data which looks similar to the training data. Many papers show figures like Figure 2 which showcase how well a model matches the target distribution in regions where there's lots of density.

The tails of the distribution are basically everything else and, when working on an unbounded domain (like $\mathbb{R}^n$) corresponds to asking how the probability density behaves as you go to infinity. We know that the probability density of a continuous distribution on an unbounded domain goes to zero in the limit, but the rate at which it goes to infinity can vary significantly between different distributions. Intuitively tail behaviour indicates how likely extreme events are and this behaviour can be very important in practice. For instance, in financial modelling applications like risk estimation, return prediction and actuarial modelling, tail behaviour plays a key role.

This blog post discusses the tail behaviour of normalizing flows and presents a theoretical analysis showing that some popular normalizing flow architectures are actually unable to estimate tail behaviour. Experiments show that this is indeed a problem in practice and a remedy is proposed for the case of estimating heavy-tailed distributions. This post will omit the proofs and other formalities and instead will aim at providing a high level overview of the results. For readers interested in the details we refer them to the full paper which was recently presented at ICML 2020.

Let $\mathbf{X} \in \mathbb{R}^D$ be a random variable with a known and tractable probability density function $f_\mathbf{X} : \mathbb{R}^D \to \mathbb{R}$. Let $\mathbf{T}$ be an invertible function and $\mathbf{X} = \mathbf{T}(\mathbf{Y})$. Then using the change of variables formula, one can compute the probability density function of the random variable $\mathbf{Y}$:

\begin{align}

f_\mathbf{Y}(\mathbf{y}) & = f_\mathbf{X}(\mathbf{T}(\mathbf{y})) \left| \det \textrm{D}\mathbf{T}(\mathbf{y}) \right| , \tag{1}

\end{align}

where $\textrm{D}\mathbf{T}(\mathbf{y}) = \frac{\partial \mathbf{T}} {\partial \mathbf{y}}$ is the Jacobian of $\mathbf{T}$. Normalizing Flows are constructed by defining invertible, differential functions $\mathbf{T}$ which can be thought of as transforming the complex distribution of data into the simple base distribution, or "normalizing" it. The paper attempts to characterize the tail behaviour of $f_\mathbf{Y}$ in terms of $f_\mathbf{X}$ and properties of the transformation $\mathbf{T}$.

Before we can do that though we need to formally define what we mean by tail behaviour. The basis for characterizing tail behaviour in 1D was provided in a paper by Emanuel Parzen. Parzen argued that tail behaviour could be characterized in terms of the *density-quantile function*. If $f$ is a probability density and $F : \mathbb{R} \to [0,1]$ is its cumulative density function then the quantile function is the inverse, *i.e.*, $Q = F^{-1}$ where $Q : [0,1] \to \mathbb{R}$. The density-quantile function $fQ : [0,1] \to \mathbb{R}$ is then the composition of the density and the quantile function $fQ(u) = f(Q(u))$ and is well defined for square integrable densities. Parzen suggested that the limiting behaviour of the density-quantile function captured the differences in the tail behaviour of distributions. In particular, for many distributions

\begin{equation}

\lim_{u\rightarrow1^-} \frac{fQ(u)}{(1-u)^{\alpha}} \tag{2}

\end{equation}

converges for some $\alpha > 0$. In other words, the density-quantile function asymptotically behaves like $(1-u)^{\alpha}$ and we denote this as $fQ(u) \sim (1-u)^{\alpha}$. (Note that here we consider the right tail, i.e., $u \to 1^-$, but we could just as easily consider the left tail, i.e., $u \to 0^+$.) We call the parameter $\alpha$ the *tail exponent* and Parzen noted that it characterizes how heavy a distribution is with larger values having heavier tails. Values of $\alpha$ between $0$ and $1$ are called light tailed and include things like bounded distributions. A value of $\alpha=1$ corresponds to some well known distributions like the Gaussian or Exponential distributions. Distributions with $\alpha > 1$ are called heavy tailed, *e.g.*, a Cauchy or student-T. More fine-grained characterizations of tail behaviour are possible in some cases but we won't go into those here.

Now, given the above and two 1D random variables, $\mathbf{Y}$ and $\mathbf{X}$ with tail exponents $\alpha_\mathbf{Y}$ and $\alpha_\mathbf{X}$, we can make a statement about the transformation $\mathbf{T}$ that maps between them. First, the transformation is given by $T(\mathbf{x}) = Q_\mathbf{Y}( F_\mathbf{X}( \mathbf{x} ) )$ where $F_\mathbf{X}$ denotes the CDF of $\mathbf{X}$ and $Q_\mathbf{Y}$ denotes the quantile function (i.e., the inverse CDF) of $\mathbf{Y}$. Second, we can then show that the derivative of this transformation is given by

\begin{equation}

T'(\mathbf{x}) = \frac{fQ_\mathbf{X}(u)}{fQ_\mathbf{Y}(u)} \tag{3}

\end{equation}

where $u=F_\mathbf{X}(\mathbf{y})$ and $fQ_\mathbf{X}$ and $fQ_\mathbf{Y}$ are the density-quantile functions of $\mathbf{X}$ and $\mathbf{Y}$ respectively.

Now, given our characterization of tail behaviour we get that

\begin{equation}

T'(\mathbf{x}) \sim \frac{(1-u)^{\alpha_{\mathbf{X}}}}{(1-u)^{\alpha_{\mathbf{Y}}}} = (1-u)^{\alpha_{\mathbf{X}}-\alpha_{\mathbf{Y}}} \tag{4}

\end{equation}

and now we come to a key result. If $\alpha_{\mathbf{X}} < \alpha_{\mathbf{Y}}$ then, as $u \to 1$ we get that $T'(\mathbf{x}) \to \infty$. That is, if the tails of the target distribution of $\mathbf{Y}$ are heavier than those of the source distribution $\mathbf{X}$ then the slope of the transformation must be unbounded. Conversely, if the slope of $T(\mathbf{x})$ is bounded (i.e., $T(\mathbf{x})$ is Lipschitz) then the tail exponent of $\mathbf{Y}$ will be the same as $\mathbf{X}$, i.e., $\alpha_\mathbf{Y} = \alpha_\mathbf{X}$.

The above is an elegant characterization of tail behaviour and it's relationship to the transformations between distributions but it only applies to distributions in 1D. To generalize it to higher dimensional distributions, we consider the tail behaviour of the norm of a random variable, i.e., $\Vert \cdot \Vert$. Then the degree of heaviness of $\mathbf{X}$ can be characterized by the degree of heaviness of the distribution of the norm. Using this characterization we can then prove an analog of the above.

**Theorem 3** *Let $\mathbf{X}$ be a random variable with density function $f_\mathbf{X}$ that is light-tailed and $\mathbf{Y}$ be a target random variable with density function $f_\mathbf{Y}$ that is heavy-tailed. Let $T$ be such that $\mathbf{Y} = T(\mathbf{X})$, then $T$ cannot be a Lipschitz function.*

So what does this all mean for normalizing flows which are attempting to transform a Gaussian distribution into some complex data distribution? The results show that a Lipschitz transformation of a distribution cannot make it heavier tailed. Unfortunately, many commonly implemented normalizing flows are actually Lipschitz. The transformations used in RealNVP and Glow are known as affine coupling layers and they have the form

\begin{equation}

T(\mathbf{x}) = (\mathbf{x}^{(A)},\sigma(\mathbf{x}^{(A)}) \odot \mathbf{x}^{(B)} + \mu(\mathbf{x}^{(A)}) \tag{5}

\end{equation}

where $\mathbf{x} = (\mathbf{x}^{(A)},\mathbf{x}^{(B)})$ is a disjoint partitioning of the dimensions, $\odot$ is element-wise multiplication and $\sigma(\cdot)$ and $\mu(\cdot)$ are arbitrary functions. For transformations of this form, we can then prove the following:

**Theorem 4** *Let $p$ be a light-tailed density and $T(\cdot)$ be a triangular transformation such that $T_j(x_j; ~x_{<j}) = \sigma_{j}\cdot x_j + \mu_j$. If, $\sigma_j(z_{<j})$ is bounded above and $\mu_j(z_{<j})$ is Lipschitz continuous then the distribution resulting from transforming $p$ by $T$ is also light-tailed.*

The RealNVP paper uses $\sigma(\cdot) = \exp(NN(\cdot))$ and $\mu(\cdot) = NN(\cdot)$ where $NN(\cdot)$ is a neural network with ReLU activation functions. The translation function $t(\cdot)$ is hence Lipschitz since a neural network with ReLU activation is Lipschitz. However the scale function, $\sigma(\cdot)$, at first glance, is not bounded because the exponential function is not unbounded. However in practice this was implemented as $\sigma(\cdot) = \exp(c\tanh(NN(\cdot)))$ for a scalar $c$. This means that, as originally implemented, $\sigma(\cdot)$ *is* bounded above, i.e., $\sigma(\cdot) < \exp(c)$. Similarly, Glow uses $\sigma(\cdot) = \mathsf{sigmoid}(NN(\cdot))$ which is also clearly bounded above.

Hence, RealNVP and Glow are unable to unable to represent heavier tailed distributions. Not all architectures have this property though and we point out a few that can actually change tail behaviour, for instance SOS Flows.

To address this limitation with common architectures, we proposed using a parametric base distribution which is capable of representing heavier tails which we called *Tail Adaptive Flows* (TAF). In particular, we proposed the use of the student-T distribution as a base distribution with learnable degree-of-freedom parameters. With TAF the tail behaviour can be learned in the base distribution while the transformation captures the behaviour of the typical set of the distribution.

We also explored these limitations experimentally. First we created a synthetic dataset using a target distribution with heavy tails. After fitting with a normalizing flow, we can measure it's tail behaviour. Measuring tail behaviour can be done by estimating the density-quantile function and finding the value of $\alpha$ such that $(1-u)^{\alpha}$ approximates its near $u=1$. Our experimental results confirmed the theory. In particular, fitting a normalizing flow with a RealNVP or Glow style affine coupling layer was fundamentally unable to change the tail exponent, even as more depth was added. Figure 4 shows an attempt to fit a model based on a RealNVP-style affine coupling layers to a heavy tailed distribution (student T). No matter how many blocks of affine coupling layers are used, it is unable to capture the structure of the distribution and the measured tail exponents remain the same as the base distribution.

However, when using a tail adaptive flow the tail behaviour can be readily learned. Figure 5 shows the results of fitting a tail adaptive flow on the same target as above but with 5 blocks. This isn't entirely surprising as tail adaptive flows use a student T base distribution. However, SOS Flows is also able to learn the tail behaviour as predicted by the theory. This is shown in Figure 6.

We also evaluated TAF on a number of other datasets. For instance, Figure 7 shows tail adaptive flows successfully fitting the tails of Neal's Funnel, an important distribution which has heavier tails and exhibits some challenging geometry.

In terms of log likelihood on a test set, our experiments show that using TAF is effectively equivalent to not using TAF. However, this shouldn't be too surprising.

We know that normalizing flows are able to capture the distribution around the typical set and this is where most samples, even in the test set, are likely to be. Put another way, capturing tail behaviour is about understanding how frequently rare events happen and by definition it's unlikely that a test set will have many of these events.

This paper explored the behaviour of the tails of commonly used normalizing flows and showed that two of the most popular normalizing flow models are unable to learn tail behaviour that is heavier than that of the base distribution. It also showed that by changing the base distribution we are able to restore the ability of these models to capture tail behaviour. Alternatively, other normalizing flow models like SOS Flows are also able to learn tail behaviour.

So does any of this matter in practice? If the problem you're working on is sensitive to tail behaviour then absolutely and our work suggests that using an adaptive base distribution with a range of tail behaviour is a simple and effective way to ensure that your flow can capture tail behaviour. If your problem isn't sensitive to tail behaviour then perhaps less so. However, it is interesting to note that the seemingly minor detail of adding a $\tanh(\cdot)$ or replacing $\exp(\cdot)$ with sigmoid could significantly change the expressiveness of the overall model. These details have typically been motivated by empirically observed training instabilities. However our work connects these details to fundamental properties of the estimated distributions, perhaps suggesting alternative explanations for why they were empirically necessary.

]]>Interested in how ownership and copyright protection for media content is impacted by the rise of social media, Xiaohong’s research is focused on image watermarking and image forgery detection. More specifically his research involves new deep neural network architectures for blind image watermarking based on information-theoretic principles.

He is a third year Ph.D. student at McMaster University, supervised by Dr. Jun Chen. Xiaohong is interested in a career in machine learning because teaching machines complex tasks formerly only accomplished by humans excites him.

The Borealis AI fellowship has provided him with the opportunity continue his research and broaden its impact. The fellowship also leads him to join some of the most talented minds in ML an AI and advises him how to take his research and career further.

A fun fact about Xiaohong is that he has a musical side, knowing how to play the accordion.

Check out Xiaohong Liu’s Google Scholar.

Sedigheh is passionate about finding machine learning solutions that could positively impact important domains like healthcare. Her research involves predicting continuous-time Markov chains, with a focus on stochastic processes and simulations with applications to nucleic acid kinetics.

Sedigheh Zolaktaf received her BSc in Computer Engineering from Sharif University of Technology, Iran, in 2013, and MSc in Computer Science from the University of British Columbia, Canada, in 2015. She is currently a Ph.D. candidate in the Artificial Intelligence and Algorithms laboratories at the University of British Columbia. She chose a career in machine learning as it aligned with her interests in mathematics, coding and problem-solving.

The Borealis AI 2020 fellowship has provided support to Sedigheh by recognizing the importance of her work. This award also motivates her to continue her research in the area of stochastic processes and nucleic acid kinetics.

Outside of research, Sedigheh likes to stay active playing basketball and netball.

She is enthusiastic about the future of AI technologies and how they will intertwine with human decision making. Ibtihel Amara is focused on performing efficient analysis of Neural Network uncertainty. More specifically, her research looks into finding efficient uncertainty computation for edge devices. She also believes that ensuring trust and reliability are integrated into AI systems is paramount.

Ibtihel Amara is currently completing her Ph.D. at McGill University at the Center for Intelligent Machines (CIM). The Borealis AI fellowship has given her the opportunity to fully focus on her research goals and provided her with valuable encouragement and support that motivates her to dream big.

Ibtihel's hobbies are harmonious with her passion for technology. She enjoys spending her time gardening and finding ways to enhance urban agriculture with the help of AI.

AI is transforming industries. Whether it’s healthcare or global warming, cyber security or customer service, I’m constantly amazed and excited about the potential for machine learning to help businesses and society address some of today’s biggest challenges.

However, for modern AI to be performed properly and to succeed at scale, researchers and engineers need access to large datasets – the kind that are held by only a few companies worldwide. At the same time, the need to protect sensitive and private information is paramount.

To me, this is where the real opportunity lies. How can we ensure that AI is accessible to all in a safe and ethical manner?

At Borealis AI, we are championing the importance of Responsible AI by researching and developing practical solutions to enable a safer and more ethical adoption of AI technology. It includes a wide-range of considerations including privacy, accountability, transparency and bias and is critical to maintaining trust and accountability.

I recently recorded a panel discussion for Collision from Home where I touched on this opportunity, and the responsibility we have to ensure responsible AI for all. You can check out the Untapped Potential of AI recording in the video above.

]]>Elahe received a BSc degree in Electrical Engineering from the Isfahan University of Technology in 2012, and a MASc in Electronic-Digital Systems from Amirkabir University of Technology (Tehran Polytechnic) in 2016. She is currently a second-year Ph.D. student at Concordia Institute for Information System Engineering (CIISE) in Montreal where her studies focus on machine learning and deep learning models in rehabilitation and assistive technologies under the supervision of Prof. Arash Mohammadi.

Outside of her research Elahe enjoys testing out new baking recipes in the kitchen and being out in nature.

Read more about Elahe's work on Google Scholar.

Chenyang completed his bachelor degree in mathematics at the Northwest Polytechnic in Xi’an, Shaanxi, China before moving to Canada in 2013. He studied at the University of Windsor, Ontario where he obtained a Bachelor in Computer Science before moving to the University of Alberta. Chenyang is currently studying for his PhD in Computer Science while fulfilling his passion for teaching as a teaching assistant at the U of A.

In his spare time, Chenyang enjoys watching documentaries and testing his strategy skills with online gaming. He also enjoys listening to classical music.

Read more about Chenyang's work on Google Scholar.

Canada is a pioneer of AI and Machine Learning (ML). However, there continues to be a lack of women in this field. Borealis AI’s collaboration with Athena Pathways will tackle the gender imbalance in AI and technology in general, by providing mentorship and internship opportunities to women, starting their careers in these fields.

Athena Pathways has a near-term goal of enrolling 500 women in high-school and university courses, as well as providing internships, mentorships, and other workplace opportunities to significantly increase the number of women currently working for industry in the technology sector.

As part of its support, Borealis AI will work with female students across universities in British Columbia to add industry skills and experience to their studies through internships and provide them with job-seeking advice to prepare them for their careers as soon as they graduate. Borealis AI’s support will help improve gender diversity in the field of technology and will help address Canada’s needs in AI talent.

The Athena Pathways project also aims to mitigate risks in AI technology due to the gender imbalance and misrepresentation across AI model creators.

Speaking of the project, Dr. Eirene Seiradaki, Director of Research Partnerships at Borealis AI, said: “We are proud to support Athena Pathways. We share their commitment to attracting more women to the field of AI. Our collaboration with Athena will enable us to provide ongoing support and training to women at the very start of their careers and encourage a more competitive talent landscape in Canada."

This project is part of Borealis AI’s ongoing program focused on women in AI and technology. Borealis AI recently announced its support for AI4Good Lab, a 7-week summer training program that annually brings together a cohort of 30 women from across Canada.

Anna completed her bachelor’s degree in Biophysics at Goethe University in Frankfurt, Germany in 2014. A highlight of Anna’s academic career is completing her Masters at the Perimeter Scholars International program in 2017. She is now a Ph.D. student at the Perimeter Institute for Theoretical Physics and the University of Waterloo under advisor Roger Melko.

Outside of the world of research, Anna is passionate about art and sports. In her spare time, she lets her creative and adventurous side shine by painting and finding new spots to surf.

Read more about Anna's work on Google Scholar.

High Performance Computing (HPC) infrastructure, with a distributed and fully automated environment, is extremely important when building modern AI models, especially when this research is applied in production environments like RBC’s, where the size of the datasets can be massive (~10 Billion new client interactions every month).

Our objective was to build an AI infrastructure that could handle both research and production workload, ensuring that Borealis AI’s research projects could transition to production efficiently. We believe in quick iterations and therefore this infrastructure is designed to be flexible and easy to use. It encompasses two GPU clusters to accommodate the distinct needs of Borealis AI’s research and production work.

Throughout the research community, there has been a growing number of HPC clusters that use Slurm - a resource scheduler and cluster management software. AI researchers are familiar with this technology and it was adopted at Borealis AI in order to facilitate use and reduce the learning curve for new users. Our researchers coming from academia can now quickly onboard onto our platform and start their research.

Building a powerful cluster is more than just stacking together GPUs. An AI cluster requires every component, including networking and storage, to operate in harmony and at high performance. With the AI community moving towards training larger models, an integrated system became more important than piling up servers. We built our cluster using AIRI based on NVIDIA‘s reference architecture which provided us with a high performance integrated solution, and the flexibility to increase capacity efficiently.

Taking a machine learning model into production is not a trivial task. These applications need to handle complexities such as data reliability and stochasticity which are not there in traditional software development. In order to manage this complexity we designed a compute infrastructure based on industry standards and best practices. The emergence of Docker and Kubernetes has changed the way we build AI infrastructure and, with RBC’s vast expertise in managing the Red Hat OpenShift platform, Borealis AI built its production cluster using OpenShift and allowed developers to deploy containerized ML applications and services into production using GPUs.

Borealis AI is leveraging the power of this new infrastructure across a broad spectrum of projects, ranging from personal & commercial banking, to wealth management and capital markets. Prediction tasks in the finance industry are particularly challenging, because they are driven by massive datasets and require exhaustive analysis of multiple dependent axes, including data filtering, neural architecture search, hyperparameter optimization, dynamic targets, and path-dependent metrics. A thorough exploration of the resulting joint parameter space typically requires optimization of tens of thousands of configurations, or the equivalent of thousands of CPU years.

Our new HPC infrastructure, with a distributed and fully automated environment, enables parallel execution of above tasks in a matter of days. In production, this infrastructure enabled us with parallel online computation of complex feature representations and, as a consequence, ultra-fast reactions in an environment that is primarily dominated by one factor...speed!

]]>That’s something that the organizers of the recent AI4Good Lab Industry Night were able to recreate – albeit in a virtual world - thanks to personalized avatars, virtual meeting rooms and real-time chats.

The purpose of the event was to give the all-women students of the AI4Good Lab a stronger sense of research groups and companies that work in the AI space and an array of initiatives that they can get involved in. It also gave the partners an opportunity to provide more detailed information about themselves.

Borealis AI’s all-female team, along with other partners, including CIFAR, IVADO, Amii, DeepMind and Accenture among others, participated in the AI4Good industry event, chatting with delegates about internships, fellowships, and offering advice on how to navigate the job market in the AI space. The team shared their thoughts on a wide range of topical issues, including ethical AI. They also provided information about AI research and products at Borealis AI as well as various internship and job opportunities with the team.

The AI4Good team prepared avatars for everyone, using photos of the participants. The delegates were able to virtually walk and stand with each other while they chatted. Borealis AI’s room, designed by visual designer, April Cooper, brought some nature and light to the room with the addition of a virtual tree!

Thanks to Maya Marcus-Sells, Executive Director of AI4Good Lab, and her colleague, Yosra Kazemi, for pulling the industry Night together and giving us a much-needed chance to chat and further build the women in AI community.

If you would like a peek inside this year’s virtual Industry Night, a tour of the 3D booths, a look at Maya’s, Eirene’s, and April’s avatars enjoying the virtual shadow of the Borealis AI tree, or just want to virtually “feel” and “smell” the breeze though the branches of the Borealis AI tree, we’ve got you covered!

Click on the gallery below to see pics from the event.

]]>