The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Carole Piovesan (CP):
They certainly know they need to be thinking about it. But I think it’s one of those evolving challenges that executives really have trouble getting their heads around. They intuitively know that AI must be implemented ethically. But don’t know exactly what that means. They are not sure how to affect responsible innovation. It’s not an easy issue to deal with and, unfortunately, only a handful of the leaders are having that conversation right now. Yet more should be.
CP:
I think that – all too often – decisionmakers are looking at one side of the coin without understanding the other. The problem is that, if you are only looking at the risks without also assessing the benefits you expect to achieve, you probably aren’t going to do anything. As with any informed decisionmaking, you need to be seeing the full picture. If you’re not quantifying and reporting the benefits alongside the risks, you will probably end up stifling innovation. Of course, there are also those that only focus on the benefits and that, too, can be a significant problem.
CP:
It’s always useful to think about risk in terms of categories and sources. With AI, the general categories of risk can span a range of areas including brand, reputation, regulatory compliance and other legal risks. The sources of that risk are more nuanced. Those risks could be in the data – whether the data is complete or robust enough; whether you have authority to use it; questions around ownership and so on – or they could be in the system itself including its governance and controls. One of the big challenges, therefore, is in identifying all the potential and foreseeable consequences of those risks and their sources.
CP:
Absolutely. The recently proposed Bill C11 (known as the Digital Charter Implementation Act) will strongly influence the way Canadian companies manage and protect data – both customer data and organizational data. While the Act has been positioned as an update to the Personal Information Protection and Electronic Documents Act (PIPEDA), it goes much further to also include rules around ‘automated decision systems’ that will be very relevant to Canadian businesses using AI.
Globally, we are also seeing a great debate emerge around this topic. The EU has convened a highlevel expert group on AI to look at this. In the US, the White House released a draft memo on a riskbased assessment of AI. The Global Partnership on AI has also been driving the discussion. And while the world has yet to achieve consensus on clear global guidelines, it is encouraging to see a healthy discussion around how to conduct valuable risk assessments of AI.
CP:
The regulatory agenda is certainly moving ahead. But you can’t just sit around and wait for regulation to be promulgated and interpreted on this topic. As recent media reports clearly illustrate, the public already has views on what is ethical behaviour and what is not.
That means that decisionmakers need to be thinking about their governance right now. They need to be working out how they plan to ensure oversight across the lifecycle of the system, allowing for periodic documentation to demonstrate you are being diligent.
That’s not a guarantee that nothing will go wrong. But if something does go wrong, it will put you in a much more defensible position legally and reputationally than if you had just ignored the risks and blamed a lack of regulation.
CP:
The most common barrier I see is a lack of coordination across functions. For instance, I think the legal teams and the technology and innovation teams need to start working much more collaboratively and at a much earlier stage in the development process – ideally at the ideation phase. That will allow both parties to work with the business to convert ideas into systems that deliver on business objectives while remaining aligned to the overall values of the organization.
The next challenge, however, is ensuring that everyone is talking the same language and understanding the same risks in the same way. Terms like ‘explainability’ mean different things to lawyers, business leaders, developers and customers. Having everyone on the same page from the beginning is critical to ensuring risk assessments are robust and holistic.
CP:
My clients are keen to experiment with emerging technologies and they are not willing to wait for regulation to arrive. So they are being very diligent about how they prepare and manage their risk assessments.
At the same time, they also recognize that this is about more than just privacy. And we are working with them to create a much broader approach to risk governance and assessment that is supported by an integrated team and integrated governance. At each step, we help them think through that risk versus benefit analysis that recognizes the unique context of each system.
My advice to clients is to go sit down with their innovation teams and decide what the organization is going to look like in five years. Then we work back from there to understand the risks and priorities going forward. If you are only thinking about where you are today, you’ll never be building for the future.
Carole is a partner and cofounder of INQ Data Law where she concentrates on AI, privacy, cyber readiness and data governance. As well as advising some of Canada’s leading companies on technologyrelated issues, Carole is the cochair of the Exposure Notification application on behalf of the federal government; the cochair of the data governance working group for the Data Governance Standardization Collaborative at the Standards Council of Canada; a member of the Data Governance Working Group for the Global Partnership on AI; and an advisor to the Law Commission of Ontario’s working group on AI in administrative decisions.
]]>To motivate the factor graph approach to SAT solving, we will first consider how the difficulty of satisfiability scales with the problem size. If we are given a random formula with $V$ variables, and $C$ clauses, each of clause length $K$, how likely is it that there is a configuration of variables that will return $\text{SAT}$? Considering each of these parameters in turn:
The clause:variable ratio $C/V$ is a critical factor in determining the probability that an expression will be satisfiable. Consider increasing the number of clauses and the number of variables, but keeping this ratio constant. For very large numbers of clauses and variables, there is a distinct threshold below which almost all problems are satisfiable and above which almost all problems are not satisfiable. This is known as a phase change.
For the 2SAT problem, the phase change can be proven to occur at a clause:variable ratio of $C/V = 1.0$. For 3SAT it has been empirically observed at $C/V \approx 4.27$ (figure 3). This is not mathematically proven although lower and upper bounds of 3.52 and 4.49 respectively have been proved.
These results should be interpreted with some caution. This analysis concerns randomly generated expressions. For real problems, the clauses are not chosen randomly but relate to the structure of the underlying problem. Nonetheless, they provide some insight into the difficulty of SAT problems as a function of their constituent parts.
The clause:variable ratio is also important in determining the type of algorithms that we should apply to SAT problems. As we approach the phase change threshold, satisfying solutions are thought to be sparse and separated by large energy barriers (figure 4). In this regime, the searchbased methods that we described in part II of this tutorial do not necessarily work well. However, by reframing the problem in terms of factor graphs, we can develop algorithms that function well in this part of the space.
In the following section we'll discuss the relation between SAT solving and factor graphs and show that satisfiability can be established using belief propagation. We'll also introduce the survey propagation algorithm, which is the method of choice for very difficult SAT problems that are expressed in terms of factor graphs.
Up until this point, we have treated SAT solving as a search for literals that satisfy a formula. A slightly different viewpoint is to consider taking the logical $\text{OR}$ of $2^{V}$ copies the SAT formula where each copy contains one of the possible solution sets of the literals. If this compound statement evaluates to $\text{true}$ then it means that at least one of the possible solutions evaluates to $\text{true}$ and so the formula is satisfiable.
To express this idea mathematically, let's denote the $c^{th}$ clause as a function $\mbox{f}[\mathbf{x}_{\mathcal{S}_{c}}]$ of a subset $\mathcal{S}_{c}$ of the variables $\{x_{i}\}_{i=1}^{I}$ that returns $\text{true}$ or $\text{false}$. The SAT formula now looks like
\begin{equation}
\phi := \mbox{f}[\mathbf{x}_{\mathcal{S}_{1}}] \land \mbox{f}[\mathbf{x}_{\mathcal{S}_{2}}] \land \ldots \land \mbox{f}[\mathbf{x}_{\mathcal{S}_{C}}] \tag{1}
\end{equation}
and we can express the logical $\text{OR}$ing of all $2^{V}$ combinations of variables as:
\begin{equation}
\phi':= \bigvee_{x_{1}}\left[\bigvee_{x_{2}}\left[\ldots \bigvee_{x_{V}}\left[ \mbox{f}[\mathbf{x}_{\mathcal{S}_{1}}] \land \mbox{f}[\mathbf{x}_{\mathcal{S}_{2}}] \land \ldots \land \mbox{f}[\mathbf{x}_{\mathcal{S}_{C}}]\right]\right]\right] \tag{2}
\end{equation}
where the notation $\bigvee_{x_{1}}[\boldsymbol\phi] = (\phi\land x_{1})\lor (\phi\land \overline{x_{1}})$ logically $\text{OR}$s together two copies of the expression where we have set $x_{1}$ to $\text{true}$ and $\text{false}$ respectively.
Of course, this is not very practical, because the resulting expression will have an $2^{V}$ terms. However, this form elucidates a connection with graphical models. Instead of considering the functions returning $\text{true}$ or $\text{false}$, let's modify them so that they return the real numbers $1$ when $\text{true}$ and $0$ when $\text{false}$. Now we can write:
\begin{eqnarray}
\phi'&:=& \max_{x_{1}}\left[\max_{x_{2}}\left[\ldots \max_{x_{V}}\left[ \mbox{f}[\mathbf{x}_{\mathcal{S}_{1}}] \cdot \mbox{f}[\mathbf{x}_{\mathcal{S}_{2}}] \cdot \ldots \cdot \mbox{f}[\mathbf{x}_{\mathcal{S}_{C}}]\right]\right]\right] \nonumber \\
&=&\max_{x_{1},x_{2}\ldots x_{V}}\left[\prod_{c=1}^{C} \mbox{f}[\mathbf{x}_{\mathcal{S}_{c}}]\right] \tag{3}
\end{eqnarray}
where we have replaced the logical $\text{AND}$s by multiplication operations and the $\bigvee_{x}$ operations by a maximization over the two possible binary values of $x$. In a valid solution each function (clause) will return 1 and hence so will their product. In an invalid solution, one of the functions will return 0 and hence the product will be zero. It follows that if the maximum possible output of this product is 1, then at least one solution must be true and the formula is satisfiable.
We can now view the SAT problem in terms of finding the maximum likelihood solution in an undirected graphical model with cliques $\mathcal{S}_{c}$:
\begin{equation}\label{eq:SAT_as_Prob}
Pr(\mathbf{x}) = \frac{1}{Z}\prod_{c=1}^{C}\mbox{f}[\mathbf{x}_{\mathcal{S}_{c}}] \tag{4}
\end{equation}
where the normalizing constant $Z$ is known as the partition function. Since the unnormalized function yields $1$ for each valid solution and $0$ for each invalid solution, the partition function counts the number of valid solutions.
If maximizing the unnormalized distribution with respect to the binary variables $\{x_{i}\}_{i=1}^{V}$ yields $1$ then all the terms are simultaneously satisfiable and we have found a solution. As a concrete example, consider the formula:
\begin{equation}
\phi:= (x_{1}\lor \overline{x}_{2}) \land (\overline{x}_{1} \lor x_{3}) \land ({x}_{2} \lor x_{3} \lor \overline{x}_{4}) \land (\overline{x}_{3} \land x_{5}) \land(\overline{x}_{3}\lor x_{4}\lor x_{5}). \tag{5}
\end{equation}
Solving the satisfiability problem for this formula is equivalent to maximizing the probability in the factor model in figure 5 with respect to the discrete variables $x_{i}$.
Inference in factor graphs can be tackled using belief propagation. If the factor graph takes the form of a chain or a tree (i.e. has no loops) then we can find the maximum likelihood solution exactly in polynomial time using the maxproduct algorithm. However, this is rarely the case and so we must rely on loopy belief propagation, which is not guaranteed to find the global maximum, but often finds a good solution in practice. Both these algorithms involve repeatedly passing messages between neighboring variables on the graph.
Note that this method can also be used for MaxSAT problems. Here, the assumption is that there is no satisfying solution and we aim to find the maximum number of clauses that can be simultaneously satisfied. It can also be trivially adapted to the weighted MaxSAT problem (in which there is a different penalty for each clause being left unsatisfied).
Since we have expressed the SAT problem as a probability distribution (equation 4), we could also consider computing the marginals of this distribution. The marginal distribution of a variable $x_{i}$ indicates the proportion of solutions in which $x_{i}$ evaluates to $\text{true}$ and $\text{false}$. If this distribution indicates that one or the other solution is almost certain, then we could just fix this value and hence reduce the complexity of the problem. This process is known as decimation. One approach to establishing satisfiability is hence to alternate between decimating and then finding the marginals again in the new simpler problem.
The marginal distributions for all of the variables can be computed at the same time using the sumproduct algorithm. Once more, if the graph structure is a chain or tree then this can be done both exactly and efficiently. However, for practical problems this is rarely the case and we must resort again to a loopy belief propagation algorithm. This provides an approximate estimate of the marginal distributions, which may in some circumstances be good enough to choose variables to decimate.
The sumproduct algorithm has another fringe benefit in that it also computes the partition function (or an estimate thereof in the loopy case). Hence, we can use this approach to solve the model counting problem (#SAT problem) in which we want to know the total number of valid solutions.
A disadvantage of the maxproduct and sumproduct approaches is that they do not fully explore the solution space. Their approach is analogous to gradient methods in optimization in that they converge on a local maximum, but may miss the best overall solution. This is particularly problematic when the clause:variable ratio approaches the phase change threshold; recall that here the satisfying solutions are thought to be sparse and separated by large energy barriers.
One approach to finding satisfying assignments in this regime is to use survey propagation. This can be thought of a metabelief propagation algorithm. In the sumproduct algorithm, the messages passed provide information about the distribution over variable assignments. In survey propagation, the messages passed provide information about the messages in the belief propagation sumproduct algorithm. In a sense, survey propagation is considering the full family of possible sumproduct solutions simultaneously and so is less prone to getting stuck in local minima. It also has a superior estimate of the true marginals with which it can better inform the decimation process. As before, survey propagation is alternated with a decimation process as we gradually fix variables and simplify the problem.
This concludes the discussion of SAT algorithms that we started in part II of this tutorial. To summarize, SAT problems get harder as the clause:variable ratio approaches the phase transition value and different methods are required. Hence we reformulated the satisfiability problem in terms of maximum likelihood on a factor graph model. We then discussed a series of methods based on maxproduct, sumproduct, and survey propagation algorithms. These approaches also have the fringe benefit of being applicable to the MaxSAT, weighted MaxSAT and model counting problems.
In the next two sections, we change tack completely and discuss the extension of SAT method to nonbinary variables.
The reader may have some lingering doubt as to the utility of an optimization method that can only work with binary variables. However, the machinery of SAT solvers can be leveraged to use integer or floating point variables. We'll present two approaches. In this section, we consider converting problems that are not naturally binary into a binary form and using the standard SAT machinery to solve them. In the following section, we consider generalizing the SAT framework to allow continuous variables using satisfiability modulo theory solvers.
The former approach is very simple. We convert the nonbinary variables to a binary form and express the constraints that tie them together as a series of binary relations. To help understand this, we adapt an example from Knuth (2015). Consider the problem of factoring a known integer $Z$: given this integer we aim to return two other integers $X$ and $Y$ that multiply together to give $Z$ or return $\text{UNSAT}$ if this is not possible.
First we express the variables in binary form so that, for example $X=x_{3}x_{2}x_{1}$ where $x_{3}$ represents the most significant digit and $x_{1}$ the least, so that the decimal number 5 would be represented as $x_{3}x_{2}x_{1} = 101$. Then we note that to multiply two binary numbers we can use a variation of the method taught in schools (figure 6a). The suboperations needed to perform this method can all be expressed in terms of logical relations. When we run a SAT solver on this problem, it will either succeed and return possible values for $X$ and $Y$ in binary form, or it will return $\text{UNSAT}$. In a more practical system we might want to add a further condition that neither $X$ nor $Y$ are 1 which gives a trivial factorization.
This type of approach is not limited to working with integers. We could similarly represent realnumbers in binary form up to some prespecified precision. However, one disadvantage of the approach is that it requires a lot of clauses to represent simple relationships between the nonbinary variables. In the following section, we discuss SMT solvers which can work directly with integers or floating point numbers without converting them to a binary form.
Satisfiability modulo theories (SMT) problems generalize the SAT problem so that the binary variables are replaced by predicates that are taken from a particular family. A predicate is a binary valued function of the nonbinary variables. One common example of a family of predicates is linear inequalities such as $a + b > 0$. The associated SMT problem might consider expressions in conjunctive normal form like:
\begin{eqnarray}\label{eq:SMTExample}
\phi&:=& \left((10a+4b<20) \lor (10a4b<10) \right) \land \nonumber \\
&&\hspace{1cm}\left((2ab>5) \lor (b<4) \right) \land \left( (4a+5b>0)\lor (b>2)\right) \tag{6}
\end{eqnarray}
where are asking the question of whether there is any combination of continuous values $a, b$ that make this expression evaluate to $\text{true}$ (figure 7).
To solve a problem of this type, we need an efficient algorithm that tests whether a conjunction (logical $\text{AND}$) of the predicates are satisfiable. This is known as a theory solver. For this example, the theory solver tests if a set of linear inequalities are simultaneously feasible. Luckily, there is a well established method to do this, which is the first step of the Simplex algorithm for linear programming. For the purposes of the remaining discussion, we'll treat the theory solver as a black box that returns $\text{true}$ if a set of inequalities are simultaneously feasible and $\text{false}$ otherwise.
In the lazy approach to SMT solving, we integrate the theory solver into the standard treesearch SAT solver approach. Continuing our worked example, we associate a binary literal $x_{i}$ with each of the six inequalities in equation 6 so the formula now looks like:
\begin{equation}
\phi:= (x_{1} \lor x_{2}) \land (x_{3}\lor x_{4}) \land (x_{5}\lor x_{6}) \tag{7}
\end{equation}
Then we proceed with the DPLL tree search (figure 8). As we search through the tree, we set $x_{1}$ to $\text{false}$, which means that by unit resolution $x_{2}$ must be $\text{true}$ for the first clause to be valid. Then we set $x_{3}$ to $\text{false}$ which means that $x_{4}$ must be $\text{true}$. At this point we have established that $x_{2}$ and $x_{4}$ must both be valid and so we use our theory solver to test if $x_{2}\land x_{4}$ is possible. It turns out that it is, and so we continue. We set $x_{5}$ to $\text{false}$ and this implies that $x_{6}$ must be $\text{true}$ and so we now use the theory solver to test if $x_{2}\land x_{4} \land x_{6}$ is feasible. These three inequalities cannot be simultaneously satisfied and so the theory solver returns $\text{false}$.
In principle, we could continue in this way exploring the tree exhaustively until we either find a solution that is $\text{SAT}$ or we terminate and return $\text{UNSAT}$. However, as in conflict driven clause learning, we can make the solution considerably more efficient by adding a new term reflecting what we have learnt from the theory solver. In this case, we find that $x_2 \land x_{4} \land x_{6}$ cannot be $\text{true}$. Analyzing this more carefully, it turns out that this is because $x_{4}$ and $x_{6}$ are incompatible so we add the term $\overline{x}_{4} \lor \overline{x}_{6}$ to give the new formula.
\begin{equation}
\phi':= (x_{1} \lor x_{2}) \land (x_{3}\lor x_{4}) \land (x_{5}\lor x_{6}) \land (\overline{x}_{4} \lor \overline{x}_{6}). \tag{8}
\end{equation}
Now we continue the search using the same procedure, but with the extended formula . Each time that the theory solver returns that a subset of the constraints are incompatible, we add this to the SAT formula. We continue this until we either exhaust the possibilities or find the solution. The full tree is illustrated in figure 9 and it finds the solution $x_{1}\land x_{3}\land x_{5}$ which visual inspection of figure 7 indicates are indeed the three regions that have a valid intersection.
Note that at the tree node in the yellow rectangle in figure 9, we eliminate an entire fork of the tree based on the SAT solver as a result of adding the clauses. In a fullscale problem, this means that we can eliminate many parts of the search space using the SAT solver and hence reduce the number of times that we need to call the more expensive theory solver.
In conclusion, the lazy approach to SMT solving can be employed as long as we have an efficient theory solver that can evaluate whether an arbitrary conjunction of predicates are $\text{true}$. The SAT solver and the theory solver work in conjunction. Cases where the theory solver failed are added back to the tree which effectively prunes the search procedure and reduces the number of times that we have to use the theory solver in the future.
This series of tutorials has introduced the basics of the satisfiability problem and the algorithms that are used to solve it. It is our opinion that this sophisticated machinery is underused in AI and that there is considerable scope for combining it with more mainstream machine learning techniques.
For further information about SAT solving including the DPLL and CDCL algorithms, consult the Handbook of Satisfiability. A second useful resource is Donald Knuth's Facsicle 6. For further reading about random SAT problems and the phase change, the relation between SAT and factor graphs and the survey propagation algorithm, consult Information, physics, and computation. For an extremely comprehensive list of applications of satisfiability, consult SAT/SMT by example. If you want to practically use a SAT solver, this tutorial will help you get started.
]]>The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Greg Kircznow (GK):
The idea is to create some really good checks against cognitive bias in models. At its most basic, an independent effective challenge is about motivating a group of experts to check the work of another group. That is often the best tool available for ensuring models are working correctly. And the more complex the model, the more value that comes from having another set of experienced eyes verifying that everything is working as expected.
GK:
I think it’s very difficult for people to see their own biases and to know how to mitigate them. So, if you are a model developer you might need a second set of eyes – that independent effective challenge – to see what you are not seeing and identify problems before they come up in production.
GK:
It has been a challenge. I think there is the perception that anyone with a statistics degree can quickly develop expertise in the various kinds of machine learning models. But my experience has been that it is a steep learning curve. When I’m hiring, I’m looking for people with a high level of expertise because their job is to challenge the developers and that means they must be at the same level as the developers to do that.
GK:
The financial industry has been thinking about model validation for a long, long time. So, too, have the regulators. I believe that the model risk regulations are already broad enough and robust enough to encompass AI. The challenge, however, is that these regulations are largely principlesbased which means we need to think carefully about what validation techniques we can use to ensure we abide by those principles.
GK:
No. Frankly, I think there is a popular misconception that explainability is a prerequisite for model robustness and model fairness. But I would argue that it’s not only not necessary, it’s also not sufficient for testing robustness or fairness. Simply put, I believe we can have complicated models that we can trust to be robust and fair, even if we don’t have an explainability technique for them. The reality is that there are lots of tests we can run to determine whether a model is robust and fair, and those tests do not rely on our ability to explain how the model arrived at that decision.
GK:
If we are talking about the technical view of explainability – being able to document how an algorithm arrived at a certain decision – then no. I don’t think most users really want to understand how these models work, any more than I want to understand how my mobile phone works. What users want to know is that there are people responsible for knowing the model is working; that there is a team of professionals monitoring it; and that, if they find problems, that they can fix the mistakes. Of course, there are special cases. Model developers and validators find explainability methods to be helpful tools, and people sometimes want to know if there is an action they could have taken that would have resulted in different model output. We consider these kinds of situations carefully.
GK:
We’re starting to recognize some of the limitations of current testing approaches. Consider, for example, the common approach of using holdout data sets for testing. That approach only demonstrates that the model works against a similar set of data. I think there is a wider recognition that we need to go deeper than that to think about how your model will perform sometime down the road. There’s a lot of research going into this area at the moment; I have personally been following teams working on developing commonsense tests for Natural Language Processing models. The findings have been eyeopening.
GK:
Many tools are being developed by different companies right now. The problem is that – for the financial services industry at least – the approaches are not fitforpurpose for what we do in the FS industry. We need something that is really tailored to our sector and our business. That is why we are so excited to be working with Borealis AI to develop and apply novel new tests. Right now, we’re focused on developing adversarial testing approaches. But we see lots of room for collaboration across the ecosystem to improve the way AI models are tested and challenged.
Greg serves as RBC’s Senior Director of AI Model Risk Management. As such, Greg is responsible for developing and delivering the bank’s broad range of AI model risk frameworks, overseeing risk management activities and managing a team of independent and effective challengers.
]]>It’s one thing to do machine learning research as a solo practitioner; it’s another thing to do research at scale across a large team, building machine learning products. A team effort can lead to faster experiments, but can also bring bottlenecks and frustrations if things aren’t coordinated well. You might find yourself waiting for a compute job to start before you can change your code to run the next experiment. Or spending hours trying to reproduce someone’s experiment before you get comparable results.
This is where configuration, or config, files come in handy. Configuration files are files used to configure the parameters and settings of a program or application. Config files allow you to separate the code from the parameters of the machine learning pipeline to help produce repeatable outcomes. They explain how to read the data, what model to use, how to learn the model, and how to evaluate model performance – without interaction with the operator. It’s not easy to create great config files, however, and it takes time to understand how to build them properly. But they provide a set of advantages that make them worth considering in your machine learning workflows.
Many different formats of config files exist YAML, JSON, INI, TOML, XML, etc. Tools like Hydra make it easy to manage large config files. Hydra provides the ability to compose a hierarchical configuration and override it through config files and the command line. Each format has its strengths and weaknesses, so you should choose the format that best fits your needs.
Creating a config file is relatively straightforward, but requires a good understanding of the configuration framework. Here are some guidelines to do so successfully:
If you want to train a model on a given dataset, you can use a config file with the following structure. This is only one example, and you can add more parameters. A user can easily see there are four main modules: dataset, model, metric, optimizer. Each module provides the parameters you want to be able to change. For example, you can easily increase the number of layers in your model from three to five by writing num_layers: 5.
dataset: 
Reproducibility is critical for deploying machine learning products. Teams need to be able to run an algorithm on different datasets and obtain the same (or very similar) results before putting an algorithm into production. Reproducibility quickens production pipelines because it reduces errors and ambiguity when the projects move from development to production. It also helps to create trust and credibility.
To reproduce something, you need to take a snapshot of it. Config files help your colleagues reproduce your experiment more easily in the future, which is good practice for both computer science and software engineering. Teams can spend more time making enhancements and improvements to your work instead of decoding the work you’ve already done. This helps you make progress faster.
Config files are also friendly with versioning tools like git. It’s easy to keep track of config files in the repository with the code and to version them. Version control allows you to keep a record of what changes were made and the commit message provides the reason why the change was made. If a change has unintended side effects, it is easy to review the history to see what change caused the effects.
Ideally, two or more people should be able to work on the same model at the same time to maximize efficiency. Config files help do this. It’s timeconsuming to write out all the parameters you want to use, and config files provide a configuration of experiments to help team members just run them, instead of recreating them.
If you develop a machine learning pipeline entirely of code, it’s not possible to change one part of the pipeline without breaking everything. But if you use config files to house different parts of your model, you can work with others on the same model in parallel without breaking the code for everyone else. One developer can run their experiment without impacting the entire model or those of their colleagues. For example, two people can train the same model on the same data but with different optimizers:
dataset: 
dataset: 
Running a new experiment should not break previous experiments. Config files help you follow best practices for continuous integration or a continuous delivery cycle. Because the parameters of each experiment are in a config file, it’s easy to run several different experiments in parallel. This is particularly useful when running experiments on HighPerformance Computing (HPC) clusters, where machines are shared with other teams and you have to wait for resources to free up. In this case, you often don’t know when the job with your experiment will run. If your experiment parameters are hardcoded, you need to wait for the experiment to start before moving to another experiment. Config files help here because the parameters are decoupled from the code: you can make progress on a new experiment while waiting for the first job to start without breaking anything. It’s also efficient if your team members can add new features to the code while your job is waiting somewhere.
Config files also help make your work more easily available to others. Perhaps a colleague wants to incorporate your work into their work or to compare your version of the pipeline with theirs to see any differences. Config files make this much easier. And it never hurts to be able to share work that benefits others on your team.
Config files can help structure projects, which can simplify not only the project but also getting people who just joined up to speed quickly.
If your domain is complicated, it’s difficult to work with files on their own. Config files provide a compartmentalized view of a project. They provide a way to reuse pieces of your configuration and make it easier to understand, which shortens the time it takes to understand the project, as colleagues need not scroll through many pages to understand the big picture.
Config files help researchers with project alignment and standardization, giving flexibility and easier code reuse across projects. Moving from one project or product to another can be a lot of work, and config files provide a neat way to package work already done in one space for reuse in another.
The benefits here aren’t only functional: config files help make projects look simpler and boost efficiency. Simplicity increases adoption and motivation, increasing return on the time and energy invested in creating products. If you make something easy to use, more people will use it.
]]>Data and metadata representations between MS SQL Server and PostgreSQL are not homogeneous (see Table 1), so migrations between these databases have a reputation for being tedious work. Even so, my team decided that it was worth the effort because the alternative was to repurpose application code and for the team to learn a different software system, which would slow down our progress. Since each RDBMS uses its own set of data types, migrating data from one to the other requires a mapping to transform the data into a format the latter can understand. While the Open Database Connectivity (ODBC) standard goes a long way in providing such mappings, automatic migration is not guaranteed and the mappings often require tweaks. Some of the data in question was of the geography data type in MS SQL Server, which is not easily exported to its counterpart in PostgreSQL. Furthermore, stored procedures written in TransactSQL (TSQL), the variant of SQL used by MS SQL Server, needed to be translated into a procedural language supported by PostgreSQL (for example, PL/pgSQL) before they could be executed by the PostgreSQL engine.
MS SQL Server  PostgreSQL  

1  Available under a Commercial license  Open source, released under PostgreSQL License 
2  Examples of data type differences: 1. Fixedlength byte string Binary 2. [1, 0, null] available with Bit 3. Native support for spatial data type via geography 
Examples of data type differences: 1. Fixedlength byte string BYTEA 2. [1, 0, null] available with Boolean 3. Support for spatial data type via optional PostGIS package 
3  Supports clustered and nonclustered indexes, among others  Supports BTrees and Generalized Search Tree (GiST) structures, among others 
4  Tablevalued functions and stored procedures supported in TSQL, .NET languages, Python and others  User Defined Functions supported in PL/pgSQL, Perl, Python and others 
In nonregulated, nonsecured environments, database migration could involve some or all of the following steps. We could have acquired a snapshot of our partner’s database by unencrypted email, use public shared cloud storage, or even a thumb drive. There would not have been a requirement to track who needed access to what, when and why. Offtheshelf tools like IspirerMnMTK or Full Convert could have been used to migrate the database schema (including tables, views, stored procedures and functions) in one go. In the absence of such tools, the solution shown in Figure 1 might be set up. The database snapshot would be used to restore a MS SQL Server instance (Figure 2). Then, once software dependencies are resolved (via public internet in the case of Figure 1) the migration to a PostgreSQL instance could be orchestrated by tools like SSMS or pgLoader.

But when data privacy is at stake, the game changes, calling for rigorous access controls and monitoring. Banks like RBC take immense care to ensure data is secure and protected. As the machine learning hub for RBC, Borealis AI is held to the same high standards to keep data safe and secure. Once access to data was approved, snapshots could only be shared with authenticated shared network drives or internal cloud platforms. With looming deadlines, the migration tools mentioned above were off the table, as a request for admin access could take time to get approved. We ended up resolving software dependencies through secure software repositories, not the public internet. Stored procedures and functions were translated from TSQL to PL/pgSQL. Finally, with the MS SQL Server and the PostgreSQL instances set up on Borealis AI’s High Performance Compute cluster, we could use SQL Server Management Studio to migrate data. While deployment on this cluster is configured through custom YAML templates, an equivalent Docker ‘run’ command for the MS SQL Server instance is shown in Figure 3.

Surprisingly, working in a more restricted environment became an opportunity for creative problem solving and learning:

So as I sit here, in front of my improvised workfromhome desk in my quiet Toronto home, I recognize that this migration deepened my understanding of the ecosystem inside which I write code. I felt satisfaction in getting to know the system. It also invited me to reflect on why restrictions are necessary when managing data. And of course, the experience taught me how to migrate data from MS SQL Server to PostgreSQL.
]]>Access to our finances has evolved from branches to online to mobile and many of us use our digital personal banking and financial tools almost daily. We pay for groceries, gas and consumer goods with our debit or credit cards, we shop online, pay bills using online banking, and transfer money across accounts to progress towards our personal financial goals.
The data that results from our activity in the context of digital banking is multilayered. Every transaction and money movement carries a time stamp, a dollar amount, and sometimes, location information. The combination and history of these data points provides context, and therefore carries meaning. Selecting the right model, adapting it to this domain and customizing it for financial modelling presents a rich and interesting problem for researchers and engineers.
We’re exploring a few key areas of machine learning research to apply prediction solutions to personal banking.
Most banking data has a timestamp attached to it, denoting when an event such as a purchase took place. But those times are often irregular. A client may go several days without a transaction, and then have a cluster of purchases on a single weekend shopping trip. So, we need models that can handle irregular temporal sampling. We’re also working with data that goes back years, so we need models that are able to incorporate past information that goes back a long time. Modeling longrange, irregular, timebased sequences is an interesting research challenge.
A rich history exists in machine learning models for temporal data. Standard approaches are built upon recurrent neural networks such as the LSTM (long shortterm memory). These can be augmented with techniques for including longrange connections (Residual Networks), and deciding when to make predictions (Selective Networks). LSTMs have been used successfully in a wide range of applications, from language to transportation to healthcare to business process management.
Because LSTMs are especially suited for incorporating prior context into predicting future outcomes, they have been shown to be successful in other longrange sequence modelling tasks, like speech recognition, translation, sentiment analysis or transcription. For example, Google Translate uses LSTMs. Like banking, with its specific yet irregular timebased contexts, language is a complex “context” challenge in machine learning. LSTMs have also been applied to various parts of autonomous vehicle development.
Our team at Borealis AI has made novel research contributions to the literature on temporal data analysis that support the work we are doing in personal banking. We developed methods for modeling uncertainties in irregular time series data, point processes, that can capture when activities are likely to occur. These methods are built in a variational autoencoder framework and use latent representations and nonlinear functions to parametrize distributions over which event is likely to occur next in a sequence and at what time (Mehrasa et al., CVPR 2019).
The temporal events we are modeling can be quite complex – different people can have different patterns of financial behaviour. The probability distributions that we use to model these uncertainties need to be able to capture this variety. We have built state of the art methods for capturing this variability based on normalizing flows that deform a base stochastic process for time series (Deng et al, NeurIPS 2020).
Finally, clients have their own pattern of behaviour. Adapting to this can allow our models to provide personalized advice and service. Our work on learning user representations (Durand, CVPR 2020) contributes a model that extracts a representation of a user based on his/her historical data. Our model allows us to incrementally improve a user representation from new data without retraining the model, an important benefit for scalability.
We’re using methods such as these in this use case, because they allow us to perform time series modelling and use the vast banking data available at RBC to make good predictions. So, we use this model to predict future patterns and therefore suggest future actions.
Meanwhile, machine learning research doesn’t stand still. Transformers, another type of machine learning method, have been shown to perform more accurately with this type of longrange sequence modelling task. Using Transformers is another way we could solve this problem, and we’re actively exploring how well they apply here to help us reach our goals.
Like most machine learning tasks, this work must account for potential biases. For example, we consider how variables like our clients’ gender may impact a prediction or suggestion. We also consider judgements we might make on what is considered discretionary versus non discretionary spending. These are tied to values, and machines shouldn’t be put in a position of judgement.
At RBC we’re mitigating risks such as bias and other by using appropriate features in our models, gathering feedback from our users, deploying automated analysis to test for behaviour, and employing thorough validation processes according to regulation. You can learn more about our validation approach here.
(Deng et al., NeurIPS 2020) R. Deng , B. Chang, M. Brubaker , G. Mori , A. Lehrmann. Modeling Continuous Stochastic Processes with Dynamic Normalizing Flows. Neural Information Processing Systems (NeurIPS), 2020
(Durand, CVPR 2020) T. Durand. Learning User Representations for Open Vocabulary Image Hashtag Prediction. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2020
(Mehrasa et al., CVPR 2019) N. Mehrasa, A. Jyothi, T. Durand , J. He , L. Sigal , G. Mori. A Variational AutoEncoder Model for Stochastic Point Process. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2019
]]>
Shayegan Omidshafiei, Karl Tuyls, Wojciech M. Czarnecki, Francisco C. Santos,Mark Rowland, Jerome Connor, Daniel Hennes, Paul Muller, Julien Pérolat, Bart De Vylder,Audrunas Gruslys and Rémi Munos
by Pablo Hernandez
What do RockScissorPaper, Chess, Go, and Starcraft II have in common? They are multiplayer games that have taken a central role in artificial intelligence algorithms, in particular in reinforcement learning and multiagent learning. This paper asks the question of how to define a measure on multiplayer games that can be used to define a taxonomy over them. Trivial proposals might not scale or be appropriate to many games, turning this question into a highly complicated task. In this work, the authors use tools from game theory and graph theory to understand the structure of generalsum multiplayergames. In particular, they study graphs where the nodes correspond to strategies (or trained agents) and the edges correspond to interactions between nodes, quantified by the game’s payoffs. The paper concludes by shedding some light on another paramount and related question, how to automatically generate interesting (i.e., with desirable characteristics) environments (games) for learning agents?
Alexander D’Amour, Katherine Heller, Dan Moldovan, Ben Adlam, BabakAlipanahi, Alex Beutel, Christina Chen, Jonathan Deaton, Jacob Eisenstein, Matthew D.Hoffman, Farhad Hormozdiari, Neil Houlsby, Shaobo Hou, Ghassen Jerfel, Alan Karthikesalingam, Mario Lucic, Yian Ma, Cory McLean, Diana Mincu, Akinori Mitani, AndreaMontanari, Zachary Nado, Vivek Natarajan, Christopher Nielson, Thomas F. Osborne, RajivRaman, Kim Ramasamy, Rory Sayres, Jessica Schrouff, Martin Seneviratne, Shannon Sequeira, Harini Suresh, Victor Veitch, Max Vladymyrov, Xuezhi Wang, Kellie Webster, SteveYadlowsky, Taedong Yun, Xiaohua Zhai, D. Sculley
by Mehran Kazemi
When we solve an underdetermined system of linear equations (i.e. more unknowns than linearly independent equations), we obtain a class of solutions rather than a unique solution. This paper shows that a similar phenomenon often occurs in machine/deep learning: when we fit a model to a given independent and identically distributed (IID) dataset, several weight configurations achieve nearoptimal heldout performance. This phenomenon is called underspecification and it may arise due to the use of models with many parameters (sometimes more parameters than data points). While our learning process often selects a random weight configuration from the class of configurations achieving nearoptimal heldout performance, as shown in this paper for many models and in many domains, each of these configurations encodes different inductive biases which result in very different (and sometimes undesired) behaviors in production when the data distribution slightly changes. This paper is a mustread before deploying any machine learning model to production.
Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, AlexanderKirillov, Sergey Zagoruyko
by Thibaut Durand
This paper proposes a new framework to tackle the object detection problem. The authors introduce DETR (DEtection TRansformer) which replaces the full complex handcrafted object detection pipeline with a Transformer. Unlike most of the modern object detectors, DETR approaches object detection as a direct set prediction problem. It consists of a setbased global loss, which forces unique predictions via bipartite matching, and a Transformer encoderdecoder architecture. Given a fixed small set of learned object queries, DETR reasons about the relations of the objects and the global image context to directly output the final set of predictions in parallel. The authors also show that DETR can be easily extended to the panoptic segmentation task by using the same recipe for “stuff” and “things” classes.
Ting Chen, Simon Kornblith, Mohammad Norouzi and Geoffrey Hinton
by Maryna Karpusha
Recent natural language processing models, such as BERT and GPT, show how accuracy can be improved with unsupervised learning. In this case, the model is first pretrained on a large unlabelled dataset and then finetuned on a small amountof labeled data. Google Brain Researchers, in their paper "SimCLR: A Simple Frameworkfor Contrastive Learning of Visual Representations," show that a similar approach has a great potential to improve the performance of computer vision models. SimCLR is a simple framework for contrastive visual representation learning. It differs from the standard supervised learning on ImageNet by a few components: the choice of data augmentation, the use of nonlinear head at the end of the network, and the loss function selection. By carefully studying different design choices, authors improve considerably over previous selfsupervised, semisupervised, and transfer learning methods. SimCLR provided a strong motivation for further research in this direction and improved selfsupervised learning for computer vision.
Hao Wu, Jonas Köhler, Frank Noé
by Andreas Lehrmann
Normalizing flows are a family of invertible generative models that transform a simple base distribution to a complex target distribution. In addition to traditional density estimation, they have recently been used in conjunction with reweighting schemes (e.g., importance sampling) to draw unbiased samples from unnormalized distributions (e.g., energy models). This paper proposes a variant of this idea in which the flow consists of an interwoven sequence of deterministic invertible functions and stochastic sampling blocks. The added stochasticity is shown to overcome the limited expressivity of deterministic flows, while the learnable bijections improve over the efficiency of traditional MCMC. Interestingly, the authors show how to compute exact importance weights without integration over all stochastic paths, enabling efficient asymptotically unbiased sampling.
Didrik Nielsen, Priyank Jaini, Emiel Hoogeboom, Ole Winther, Max Welling
by Marcus Brubaker
Variational Autoencoders (VAEs) and Normalizing Flows are two popular classes of generative models that allow for fast sample generation and either exact or tractable approximations of probability density. This paper provides a middle ground between these two models by showing how surjective and stochastic transformations can be incorporated and mixed with bijective transformations to construct a wider and more flexible range of distributions. While this construction loses the ability to compute exact densities, it allows for more accurate "local" variational posteriors which can be derived based on the specific transformation. The resulting SurVAE Flow generative model is similar in nature to a VAE in that likelihoods can only be approximated or bounded. However, instead of trying to learn a single (potentially very complex) variational posterior, SurVAEFlows instead learn a variational posterior for each step which will generally be more accurate.
Emily M. Bender, Alexander Koller
by Yanshuai Cao
The words “meaning”, “semantics”, “understand” or “comprehend” are often used loosely in the broader AI literature, and sometimes even in NLP papers. This position paper precisely defines “form” and “meaning”, and argues that an NLP system trained only on form has a priori no way to learn meaning, regardless of the amount of training data or compute power. It is a modern take on the symbol grounding problem. The next paper in our list, “Experience Grounds Language” by Bisket al., can be viewed as a followup and an attempt to answer some of the questions posed in this paper. Besides the clarity that Bender and Koller bring to the topic, this paper is also recommended because the authors leverage thought experiments to make convincing arguments without mathematical proofs or numerical experiments. This style is sometimes used in Physics literature but rarely in modern AI.
Yonatan Bisk, Ari Holtzman, Jesse Thomason, Jacob Andreas, Yoshua Bengio, Joyce Chai, Mirella Lapata, Angeliki Lazaridou, Jonathan May, Aleksandr Nisnevich, Nicolas Pinto, Joseph Turian
by Daniel Recoskie
This paper posits that exposure to a large textual corpusis not enough for a system to understand language. The authors argue that experience in the physical world (including social situations) is necessary for successful linguistic communication. Five "world scopes" are used as a framework in which to view NLP: corpus, internet, perception, embodiment, and social. The authors use these world scopes to create a roadmap towards true language understanding. The paper also reviews a large amount of linguistics and NLP literature in a way that is approachable even without any NLP background.
Clara Meister, Tim Vieira, Ryan Cottere
by Layla El Asri
In this paper, the authors analyse the success of beam search as a decoding strategy in NLP. They show that beam search in fact optimizes a regularized maximum a posteriori objective and that the induced behaviour relates to a concept known in cognitive sciences as the uniform information density hypothesis. According to this hypothesis, "Where speakers have a choice between several variants to encode their message, they prefer the variant with more uniform information density". Based on this observation, the authors propose other regularizers and show how the performance of beam search for larger beams can be improved. Several decoding schemes have been proposed to alleviate some of the difficulties of language generation. This paper proposes an elegant frame work that helps analyze the very popular method of beam search and understand why it produces highquality text.
Hyojin Bahng, Sanghyuk Chun, Sangdoo Yun, Jaegul Choo, Seong Joon Oh
by Ga Wu
While machine learning models could achieve remarkable accuracy on many classification tasks, it is hard to identify if the models rely on data bias as a shortcut for successful prediction. Consequently, the models could extract biased observation representations and fail to generalize when the bias shifts. This paper described an application where HilbertSchmidt Independence Creterion (HSIC) is adopted for unbiased representation learning. Specifically, The authors proposed to train a debiased representation by encouraging it to be statistically independent of intentionally biased representations. While it appears similar to the Noise Contrastive Estimation (NCE) that also distinguish desired representation from others, the proposed approach is based on information theory instead of density estimation. This paper is one of the recent works that pair HSIC and deep learning, which has received a surge of attention.
Florian Tramèr, Nicholas Carlini, Wieland Brendel, and Aleksander Maadry Tian
by Amir Abdi
The paper demonstrates that typical adaptive evaluations for adversarial defense are deemed incomplete. Accordingly, they detail the methodology of an appropriate adaptive attack and go through thirteen wellknown adversarial defenses only to show that all of them can be circumvented by careful and appropriate tuning of the attacks. The authors propose an informal nofreelunch theorem that “for any proposed attack, it is possible to build a nonrobust defense that prevents that attack”. Thus, they advise the community not to overfit on the proposed adaptive attack and use them only as sanity checks. Moreover, the paper supports “simpler” but “handdesigned” attacks that are as close as possible to straightforward gradient descent with an appropriate loss function.
]]>The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Holly Shonaman (HS):
Not at all. I agree that we are highly regulated and that privacy is central to those regulations. But I think most datarich businesses now understand that privacy protection is about much more than simply meeting a regulatory hurdle.
Like many organizations, we need data from our customers in order to run our business. We need it to ensure our products and services are meaningful and valuable to them. If our customers don’t trust us with their data, it becomes very difficult for us to do our jobs and deliver value.
So, yes, we are always mindful of the regulatory aspects. But that’s not what guides us: our focus is on building trust with our customers, and privacy is central to that.
(HS):
My job is to consider how we are using and processing data in all aspects of the business. And in that respect, AI isn’t all that different from more conventional methods of data analytics.
However, there are clear nuances that surround the use of AI, particularly in a consumer setting. In part, it’s the scale and speed that AI can achieve. That makes privacy and reputational risks more difficult to assess and control.
But it’s also that the public conversation around AI remains mired in mistrust. People don’t trust that the data is accurate; they don’t trust it is free from bias; they don’t trust how their data is going to be used. They simply don’t believe that machine learning can replace human interactions.
(HS):
It comes down to literacy on the topic. I don’t believe people really know what AI is and what protections are around it. I would argue that, as a country, we need to have a much more robust conversation about AI and help Canadians understand what kinds of questions they should be asking. That will require some thinking at a national policy level. But it’s important that – like financial literacy and data literacy – Canadians gain some AI literacy as well.
(HS):
At RBC, data privacy is baked into our processes. Our role in the global Privacy Office is to ensure that AI developers and business leaders understand and assess privacy risks. For example, before launching any new product or initiative, we conduct a privacy risk impact assessment which looks at the entire endtoend process. If a risk is identified, we have a conversation about the types of controls that should be put in place.
Sometimes that means applying differential privacy techniques or limiting the amount of information that goes into the model. Or it could require further testing for things like the right level of data granularity, to ensure anonymous people in the data set cannot be identified based on the outcomes.
(HS):
I can’t overstate the trust aspect. One of my concerns is that if we overuse AI without understanding the full short and longterm consequences of models, we could end up destroying any trust we build in the technology as a society.
The problem is that society is changing extraordinarily rapidly and that means the AI community can’t always assess the full impact of their models until the risks are all too apparent. Being able to stay on top of these shifts is our focus.
(HS):
I am very encouraged by the robustness of the way many management teams – including those at RBC – are approaching this issue. We have a very strong risk management team. And our board of directors and executives demand clarity on what we are doing to treat clients fairly and use their data appropriately.
Generally speaking, I think everyone is very happy to do things with more speed, better information and more efficiency. But they also recognize that if you have a fast car, you need strong brakes. In other words, companies need to have the ability to continuously assess these models and take them ‘offline’ if there is a problem.
(HS):
I would argue that it needs to start at the university and training level – we need to educate developers on ethical AI from the outset. It can’t just be all about code; developers need to understand the social, ethical and privacy issues that influence their field.
I also think bias and risk should always be top of mind. They need to try to think broadly about a range of potential short, medium and longterm scenarios and test against them. That’s not easy; it’s hard work to look into the future.
I would also encourage AI developers to be more frontandcentre, working with the business and the privacy team to talk about what they are doing, the problems they have identified, their data sources and their designs.
(HS):
Business leaders need to keep doing more of what they are already doing. They need to demand more transparency, more reporting and testing. Perhaps more importantly, leaders need to allow employees to find flaws in their models, and maybe even reward that.
I think we are also going to see a lot more focus on thirdparty verification and audits to ensure corporate models and controls are really up to the task. It’s good protection for the business and helps the organization understand the robustness of their own testing.
(HS):
Quite to the contrary. I actually believe that – if we get it right – privacy is the key to building trust in AI. It doesn’t matter if you are lending money or selling sweatpants; access to customer data is critical to being able to deepen your relationship with your customers, deliver a great experience to them, and serve them. If you don’t use their data respectfully to support the client relationship, they’ll lose interest in your business. If you breach their privacy or cross the ethical line, you lose their trust. So our focus and attention to privacy controls is actually what will allow us to move ahead with AI development in Canada.
As RBC’s Chief Privacy Officer, Holly Shonaman leads RBC’s global Privacy Risk Management program and provides compliance oversight in support of the bank’s leadership digitallyenabled relationship banking. Ms. Shonaman has held various positions within RBC across the retail and commercial banking, and wealth management divisions.
]]>\begin{equation}\label{eq:example_cnf_conditioning}
\phi:= (x_{1} \lor x_{2} \lor x_{3}) \land (\overline{x}_{1} \lor x_{2} \lor x_{3}) \land (x_{1} \lor \overline{x}_{2} \lor x_{3}) \land (x_{1} \lor x_{2} \lor \overline{x}_{3}). \tag{1}
\end{equation}
where the notation $\lor$ represents a $\text{OR}$ operation and $\land$ represents a $\text{AND}$ operation. The satisfiability problem establishes whether there is any way to set the variables $x_{1},x_{2},x_{3}\in$$\{$$\text{true}$,$\text{false}$$\}$ so that the formula $\phi$ evaluates to $\text{true}$.
In this tutorial we focus exclusively on the SAT solver algorithms that are applied to this problem. We'll start by introducing two ways to manipulate Boolean logic formulae. We'll then exploit these manipulations to develop algorithms of increasing complexity. We'll conclude with an introduction to conflictdriven clause learning which underpins most modern SAT solvers.
SAT solvers rely on repeated algebraic manipulation of the formula that we wish to test for satisfiability. Two such manipulations are conditioning and resolution. In this section we will discuss each in turn.
In conditioning, we set a variable $x_{i}$ to a concrete value (i.e., $\text{true}$ or $\text{false}$). When we set $x_{i}$ to $\text{true}$, we can simplify the formula using two rules:
For example, consider the formula:
\begin{equation}\label{eq:example_cnf_conditioning_s}
\phi:= (x_{1} \lor x_{2} \lor x_{3}) \land (\overline{x}_{1} \lor x_{2} \lor x_{3}) \land (x_{1} \lor \overline{x}_{2} \lor x_{3}) \land (x_{1} \lor x_{2} \lor \overline{x}_{3}). \tag{2}
\end{equation}
When we set $x_{1}=$ $\text{true}$, this becomes
\begin{equation}\label{eq:example_cnf_conditioning2}
\phi \land x_{1} := (x_{2} \lor x_{3}). \tag{3}
\end{equation}
where the first, third and fourth clause have been removed as they are now satisfied (by rule 1) and the term $\overline{x}_{1}$ has been removed from the second clause as this term is now $\text{false}$ (by rule 2).
Similarly, when we condition by setting a variable to $\text{false}$ all clauses containing $\overline{x}_{i}$ disappear, as do any terms $x_{i}$ in the remaining clauses. Setting $x_{1}$ to $\text{false}$ in equation 2 gives:
\begin{equation}
\phi \land \overline{x}_{1} := (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{3}) \land (x_{2} \lor \overline{x}_{3}). \tag{4}
\end{equation}
Note that variable $x_{i}$ must be either $\text{true}$ or $\text{false}$ and so:
\begin{eqnarray}
\phi &=& (\phi \land x_{i}) \lor (\phi \land \overline{x}_{i})\\
&=& (x_{2} \lor x_{3}) \lor ((x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{3}) \land (x_{2} \lor \overline{x}_{3})).\nonumber \tag{5}
\end{eqnarray}
Here we apply the conditioning operation twice and the result is to remove the variable $x_{i}$ from the formula $\phi$ to yield two simpler formulae, $(\phi \land x_{i})$ and $(\phi \land \overline{x}_{i})$ which are logically $\text{OR}$ed together. Note though, that the result is not in conjunctive normal form.
The second common operation applied to Boolean formulae is resolution. Consider two clauses $c_{1}$ and $c_{2}$ where $x_{i}\in c_{1}$ and $\overline{x}_{i}\in c_{2}$. When we resolve by $x_{i}$, we replace these two clauses with a single clause $(c_{1}\setminus x_{i})\lor (c_{2}\setminus\overline{x}_{i})$. This clause is known as the resolvent and contains the remaining terms in $c_{1}$ and $c_{2}$ after $x_{i}$ and $\overline{x}_{i}$ are removed.
This is best illustrated with an example. Consider the formula:
\begin{equation}\label{eq:example_cnf_resolution}
\phi:= (x_{1}\lor x_{2} \lor \overline{x}_{3}) \land (\overline{x}_{2} \lor x_{4}) \land (x_{2} \lor x_{4}\lor x_{5}). \tag{6}
\end{equation}
We note that $x_{2}$ is in the first clause and $\overline{x}_{2}$ is in the second clause and so we can resolve with respect to $x_{2}$ by combining the remaining terms from the first and second clause:
\begin{equation}
\phi:= (x_{1}\lor \overline{x}_{3} \lor x_{4}) \land (x_{2} \lor x_{4}\lor x_{5}). \tag{7}
\end{equation}
Note that the third clause is unaffected by this operation.
The underlying logic is as follows. If $x_{2}$ is $\text{false}$, then for the first clause to be satisfied we must have $x_{1}\lor \overline{x}_{3}$. However, if $x_{2}$ is $\text{true}$, then for the second clause to be satisfied, we must have $x_{4}$. Since either $x_{2}$ or $\overline{x}_{2}$ must be the case, it follows that we must have $x_{1}\lor \overline{x}_{3} \lor x_{4}$.
An important special case is unit resolution. Here, at least one of the clauses that we are resolving with respect to is a unit clause (i.e., only contains a single literal). For example,
\begin{equation}
\phi:= (x_{1}\lor \overline{x}_{3} \lor \overline{x}_{4}) \land x_{4} \tag{8}
\end{equation}
Resolution between these two clauses works as normal. However, we can go further. Since we know that $x_{4}$ must be $\text{true}$ from the second clause, the effect of resolution here is the same as conditioning. We can remove all clauses containing $x_{4}$ and remove all terms $\overline{x}_{4}$ from the remaining clauses. So unit resolution can be seen as either a special case of resolution or as a conditioning operation depending how you look at it.
A unit resolution operation may create more unit clauses. In this case, we can repeatedly apply unit resolution to the expression and at each stage we eliminate one of the variables from consideration. This procedure is known as unit propagation.
We now present a series of learning algorithms that use conditioning and resolution to solve the satisfiability problem. In this section, we will use resolution to solve the 2SAT problem and show why this can be solved in polynomial time. Then we'll introduce the directional resolution algorithm which uses resolution to solve 3SAT problems and above, but we'll see that this becomes more computationally complex. In the next section, we'll move to algorithms that primarily exploit the conditioning algorithm to solve SAT problems.
To solve a 2SAT problem we first condition on an arbitrarily chosen variable. This sets off a unit propagation process (a chain of unit resolutions) in which variables are removed onebyone. This continues until either the formula is satisfied or we are left with a contradiction $x_{i}\land \overline{x}_{i}$.
Worked example: This process is easiest to understand using a concrete example. Consider the following 2SAT problem in four variables:
\begin{equation}
\phi:= (x_{1}\lor \overline{x}_{2}) \land (\overline{x}_{1}\lor \overline{x}_{3}) \land (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{4}) \land (x_{3}\lor \overline{x}_{4}). \tag{9}
\end{equation}
We start with a single step of conditioning on an arbitrarily chosen variable. Here we'll choose $x_{1}$ and apply the formula $\phi = (\phi \land x_{1}) \lor (\phi \land \overline{x}_{1})$. We could work directly with this cumbersome expression, but in practice we set $x_{1}$ to $\text{true}$ and test for satisfiability. If this is not satisfiable, then we set $x_{1}$ to $\text{false}$ and try again and if neither are satisfiable, then the expression is not satisfiable as a whole.
Let's work through this process explicitly. Setting $x_{1}$ to $\text{true}$ gives:
\begin{equation}
\phi \land x_{1} =(x_{1}\lor \overline{x}_{2}) \land (\overline{x}_{1}\lor \overline{x}_{3}) \land (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{4}) \land (x_{3}\lor \overline{x}_{4}) \land x_{1}. \tag{10}
\end{equation}
We now perform unit resolution with respect to $x_{1}$ which means removing any clauses that contain $x_{1}$ and removing $\overline{x}_{1}$ from the rest of the formula to get:
\begin{equation}
\phi \land x_{1} = \overline{x}_{3} \land (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{4}) \land (x_{3} \lor \overline{x}_{4}). \tag{11}
\end{equation}
Notice that we are left with another unit clause $\overline{x}_{3}$ so we know $x_{3}$ must be $\text{false}$ and we can perform unit resolution again to yield:
\begin{equation}
\phi \land x_{1}\land \overline{x}_{3} = x_{2} \land (\overline{x}_{2} \lor x_{4})\land \overline{x}_{4}. \tag{12}
\end{equation}
This time, we have two unit clauses. We can perform unit resolution with respect to either. We'll choose $x_{2}$ so we now now that $x_{2}$ is $\text{true}$ and we get:
\begin{equation}
\phi \land x_{1}\land\overline{x}_{3}\land x_{2} = x_{4} \land \overline{x}_{4} =\text{false}. \tag{13}
\end{equation}
Clearly this is a contradiction, and so we conclude that the formula is not satisfiable if we set $x_{1}$ to $\text{true}$.
We now repeat this process with $x_{1}$ = $\text{false}$, which gives
\begin{eqnarray}
\phi \land \overline{x}_{1} &=&(x_{1}\lor \overline{x}_{2}) \land (\overline{x}_{1}\lor \overline{x}_{3}) \land (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{4})\land (\overline{x}_{4} \lor x_{3}) \land \overline{x}_{1} \nonumber \\
&=& \overline{x}_{2} \land (x_{2} \lor x_{3}) \land (\overline{x}_{2} \lor x_{4}) \land (\overline{x}_{4} \lor x_{3}). \tag{14}
\end{eqnarray}
Once more, this leaves a unit clause $\overline{x}_{2}$, so we set $x_{2}$ to $\text{false}$ and perform unit resolution again to get
\begin{equation}
\phi \land \overline{x}_{1}\land \overline{x}_{2} = x_{3} \land (\overline{x}_{4} \lor x_{3}) \tag{15}
\end{equation}
which gives the unit clause $x_{3}$ and so we set $x_{3}$ to $\text{true}$. Now something different happens. The entire of the right hand side disappears. Since there are no clauses left to be satisfied, the formula is satisfiable:
\begin{equation}
\phi \land \overline{x}_{1}\land \overline{x}_{2} \land x_{3}= \text{true} \tag{16}
\end{equation}
Note that the formula is satisfiable regardless of the value of $x_{4}$ (it is on neither side of the equation) so we have found two satisfiable solutions $\{\overline{x}_{1},\overline{x}_{2},x_{3},x_{4}\}$ and $\{\overline{x}_{1},\overline{x}_{2},x_{3},\overline{x}_{4}\}$.
Complexity: If there are $V$ variables, there are at most $V$ rounds of unit resolution for each of the two values of the initial conditioned variable. Each unit resolution procedure is linear in the number of clauses $C$ so the algorithm has total complexity $\mathcal{O}[CV]$.
It's possible to reach a case where the chain of unit propagation stops and we have to condition on one of the remaining variables to start it again. However, this only occurs when subsets of the variables have no interaction with one another and so it does not add to the complexity.
Now consider what happens if we apply the unit resolution approach above to a 3SAT problem. When we condition on the first variable $x_{i}$, we remove clauses that contain $x_{i}$ and remove $\overline{x}_{i}$ from the rest of the clauses. Unfortunately, this doesn't create another unit clause (at best it just changes a subset of the 3clauses to 2clauses), and so it's not clear how to proceed.
Directional resolution is a method that uses resolution to tackle $3$SAT and above. The idea is to choose an ordering of the variables and then perform all possible resolution operations with each variable in turn before moving on. We continue until we find a contradiction or reach the end. In the latter case, we work back in the reverse order to find the values that satisfy the expression.
Worked example: Again, this is best understood via a worked example. Consider the formula:
\begin{eqnarray}
&\phi:= &(x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}) \land (\overline{x}_{1}\lor x_{3}\lor \overline{x}_{5}) \land \\
&&\hspace{1cm}(x_{2} \lor x_{3}\lor \overline{x}_{4}) \land (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}) \land (\overline{x}_{3} \lor x_{4}\lor x_{5}).\nonumber \tag{17}
\end{eqnarray}
We sort the clauses into bins. Those containing $x_{1}$ or $\overline{x}_{1}$ are put in the bin 1 and any remaining clauses containing $x_{2}$ or $\overline{x}_{2}$ are put in bin 2 and so on:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor x_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: (x_{2} \lor x_{3}\lor \overline{x}_{4}), (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}) \nonumber \\
&& x_{3}: (\overline{x}_{3} \lor x_{4}\lor x_{5}) \nonumber \\
&& x_{4}: \nonumber \\
&& x_{5}: \tag{18}
\end{eqnarray}
We work through these bins in turn. For each bin we perform all possible resolutions and move the resulting generated clauses into subsequent bins. So for bin 1 we resolve the clauses $(x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4})$ and $(\overline{x}_{1}\lor x_{3}\lor \overline{x}_{5})$ with respect to $x_{1}$ to get the new clause $\color{BurntOrange} (\overline{x}_{2}\lor \overline{x}_{4}\lor x_{3}\lor \overline{x}_{5})$. We add this to bin 2 as it contains a $x_{2}$ term:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor x_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: (x_{2} \lor x_{3}\lor \overline{x}_{4}), (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}), \color{BurntOrange}(\overline{x}_{2}\lor \overline{x}_{4}\lor x_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{3}: (\overline{x}_{3} \lor x_{4}\lor x_{5}) \nonumber \\
&& x_{4}: \nonumber \\
&& x_{5}: \tag{19}
\end{eqnarray}
We then consider bin 2 and resolve the clauses with respect to $x_{2}$ in all possible ways. In bin 2 there is one clause containing $x_{2}$ and we can resolve it against the two clauses containing $\overline{x}_{2}$. This creates two new clauses that we simplify and add to bin 3 since they contain terms in $x_{3}$:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: (x_{2} \lor x_{3}\lor \overline{x}_{4}), (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}), (\overline{x}_{2}\lor \overline{x}_{4}\lor x_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{3}: (\overline{x}_{3} \lor x_{4}\lor x_{5}), \color{BurntOrange} (x_{3}\lor \overline{x}_{4} \lor x_{5})\color{Black}, \color{BurntOrange}(x_{3} \lor \overline{x}_{4}\lor \overline{x}_{5}) \nonumber \\
&& x_{4}: \nonumber \\
&& x_{5}: \tag{20}
\end{eqnarray}
Now we consider bin 3. Again, there are three clauses here and combining them with resolution creates two new clauses. Resolving the first and second clause with respect to $x_{3}$ creates $(x_{4}\lor \overline{x}_{4} \lor x_{5})$ which evaluates to $\text{true}$ since either $x_{4}$or $\overline{x}_{4}$ must always be $\text{true}$. Similarly, combining the first and third clause creates the clause $x_{4}\lor x_{5}\lor \overline{x}_{4}\lor \overline{x}_{5}$ which evaluates to $\text{true}$ and so we are done. At this point, we can say that the formula is $\text{SAT}$ as we have not created any contradictions of the form $x_{i}\land\overline{x}_{i}$ during this resolution process
Finding the certificate: To find an example that satisfies the expression, we work backwards, setting the bin value to $\text{true}$ or $\text{false}$ in such a way that it satisfies the clause. There are no clauses in bin 5 and so we are free to choose either value. We'll set $x_{5}$ to be $\text{true}$. Similarly, there are no clauses in bin 4 and so we will arbitrarily set $x_{4}$ to $\text{true}$ as well. After these changes we have:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: (x_{2} \lor x_{3}\lor \overline{x}_{4}), (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}), (\overline{x}_{2}\lor \overline{x}_{4}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{3}: (\overline{x}_{3} \lor x_{4}\lor x_{5}), (x_{3}\lor \overline{x}_{4} \lor x_{5}) \nonumber \\
&& x_{4}: \text{true} \nonumber \\
&& x_{5}: \text{true} \tag{21}
\end{eqnarray}
Now we consider the third bin. We substitute in the values for $x_{4}$ and $x_{5}$ and see that both clauses evaluate to $\text{true}$, regardless of the value of $x_{3}$, so again, we can choose any value that we want. We'll set $x_{3}$ to $\text{false}$ to give:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: (x_{2} \lor x_{3}\lor \overline{x}_{4}), (\overline{x}_{2} \lor \overline{x}_{4}\lor x_{5}), (\overline{x}_{2}\lor \overline{x}_{4}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{3}: \text{false}\nonumber \\
&& x_{4}: \text{true} \nonumber \\
&& x_{5}: \text{true} \tag{22}
\end{eqnarray}
Progressing to the second bin, we observe that the second and third clause are already satisfied by the previous assignments, but the first clause is not since $x_{3}$ is $\text{false}$ and $x_{4}$ is $\text{true}$. Consequently, we must satisfy this clause by setting $x_{2}$ to $\text{true}$:
\begin{eqnarray}
&& x_{1}: (x_{1}\lor \overline{x}_{2} \lor \overline{x}_{4}), (\overline{x}_{1}\lor \overline{x}_{3}\lor \overline{x}_{5}) \nonumber \\
&& x_{2}: \text{true} \nonumber \\
&& x_{3}: \text{false} \nonumber \\
&& x_{4}: \text{true} \nonumber \\
&& x_{5}: \text{true} \tag{23}
\end{eqnarray}
Finally, we consider the first bin. We note that the second clause is satisfied because $x_{3}$ is $\text{false}$ but the first clause is not and so to satisfy it, we must set $x_{1}$ to $\text{true}$ and now we have a satisfying example.
Complexity: The directional resolution procedure works, but is not especially efficient. For large problems, the number of clauses can expand very quickly: if there were $C$ clauses and half contain $x_{1}$ and the other half $\overline{x}_{1}$ then we could create $C^{2}/4$ new clauses in the first step. For a $K$SAT problem, each of these clauses are larger than the original ones with size $2(K1)$.
It is possible to improve the efficiency. Any time we generate a unit clause, we can perform unit propagation which may eliminate many variables. Also in our example we organized the bins by the variable index, but this was an arbitrary choice. This order can have a big effect on the total computational cost and so careful selection can improve efficiency. However, even with these improvements, this approach is not considered viable for large problems.
In this section, we will develop algorithms that are fundamentally centered around the conditioning operation (although they also have unit resolution embedded). We'll describe both the DPLL algorithm and clause learning algorithms which underpin most modern SAT solvers. To understand these methods, we first need to examine the connection between conditioning and tree search.
We'll use the running example of the following Boolean formula with $C=7$ clauses and $V=4$ variables:
\begin{eqnarray}\label{eq:SAT_working_example}
\phi&:=&(x_{1} \lor x_{2}) \land (x_{1} \lor \overline{x}_{2} \lor \overline{x}_{3} \lor x_{4}) \land (x_{1} \lor \overline{x}_{3} \lor \overline{x}_{4}) \land \\
&&\hspace{0.5cm} (\overline{x}_{1} \lor x_{2} \lor \overline{x}_{3}) \land (\overline{x}_{1} \lor x_{2} \lor \overline{x}_{4}) \land (\overline{x}_{1} \lor x_{3} \lor x_{4}) \land (\overline{x}_{2} \lor x_{3}) \nonumber, \tag{24}
\end{eqnarray}
Consider conditioning on variable $x_{1}$ so that we have:
\begin{equation}
\phi = (\phi \land \overline{x}_{1}) \lor (\phi \land x_{1}). \tag{25}
\end{equation}
This equation makes the obvious statement that in any satisfying solution $x_{1}$ is either $\text{true}$ or $\text{false}$. We could first investigate the case where $x_{1}$ is $\text{false}$. If we establish this is $\text{SAT}$ then we are done, and if not we consider the case where $x_{1}$ is $\text{true}$. Taking this one step further, we could condition each of these two cases on $x_{2}$ to get:
\begin{equation}
\phi = ((\phi \land \overline{x}_{1}) \land \overline{x}_{2}) \lor ((\phi \land \overline{x}_{1}) \land \overline{x}_{2}) \lor ((\phi \land x_{1})\land \overline{x}_{2} ) \lor ((\phi \land x_{1})\land x_{2} ). \tag{26}
\end{equation}
and now we could consider each of the four combinations $\{\overline{x}_{1}\overline{x}_{2}\},\{\overline{x}_{1}x_{2}\},\{x_{1}\overline{x}_{2}\}$ and $\{x_{1}x_{2}\}$ in turn, terminating when we find a solution that is $\text{SAT}$.
One way to visualise this process is as searching through a binary tree (figure 1). At each node of the tree we branch on one of the variables. When we reach a leaf, we have known values for each variable and we can just check if the solution is $\text{SAT}$.
This example was deliberately constructed to be pathological in that the first 14 combinations (or equivalently leaves of the tree) all make the formula evaluate to $\text{false}$. These are signified in the plot by red crosses. We number the clauses:
\begin{eqnarray}
&& 1:(x_{1} \lor x_{2}) \nonumber\\
&& 2:(x_{1} \lor \overline{x}_{2} \lor \overline{x}_{3} \lor x_{4}) \nonumber\\
&& 3:(x_{1} \lor \overline{x}_{3} \lor \overline{x}_{4}) \nonumber\\
&& 4:(\overline{x}_{1} \lor x_{2} \lor \overline{x}_{3}) \nonumber\\
&& 5:(\overline{x}_{1} \lor x_{2} \lor \overline{x}_{4}) \nonumber\\
&& 6:(\overline{x}_{1} \lor x_{3} \lor x_{4}) \nonumber\\
&& 7:(\overline{x}_{2} \lor x_{3}) \tag{27}
\end{eqnarray}
and for each leaf of the tree in figure 1, the clauses that were contradicted are indicated in grey. In this case, both of the last two combinations (leaves) satisfy the formula, and once we find the first one ($x_{1}, x_{2}, x_{3},\overline{x}_{4})$ we can return SAT.
Note, we have not yet obviously made the algorithm more efficient. We might still have to search all $2^{V}$ combinations of variables to establish satisfiability or lack thereof. However, viewing SAT solving as tree search is the foundation that supports more efficient algorithms.
We can immediately improve the efficiency of the binary search method by some simple bookkeeping. As we pass through the tree we keep track of which clauses are satisfied and which are not. As soon as we find one that is not satisfied, we do not need to explore further and we can backtrack. Similarly, if we find a situation where all of the clauses are already satisfied before we reach a leaf then we can return $\text{SAT}$ without exploring further. This means that the variables below this point can take any value.
In our worked example, when we pass down the first branch and set $x_{1}$ to $\text{false}$ and $x_{2}$ to $\text{false}$ we have already contradicted clause 1 which was $(x_{1} \lor x_{2})$, and so there is no reason to proceed further. Continuing in this way we only need to search a subset of the full tree (figure 2). We find the first satisfying solution when $x_{1},x_{2},x_{3}=$$\text{true}$,$\text{true}$,$\text{true}$ and need not continue to the leaf. As we saw from the full tree in figure 1, the setting of $x_{4}$ is immaterial.
We can also consider the tree search from an algebraic point of view. Each time we make a decision at a node in the tree, we are conditioning on a given variable. So when we set $x_{1}$ to $\text{false}$, the resulting formula is
\begin{eqnarray}\label{eq:sat_tree_cond}
\phi\land \overline{x}_{1} := x_{2} \land (\overline{x}_{2} \lor \overline{x}_{3} \lor x_{4}) \land (\overline{x}_{3} \lor \overline{x}_{4}) \land (\overline{x}_{2} \lor x_{3}), \tag{28}
\end{eqnarray}
where we have used the usual recipe of removing all clauses containing $\overline{x}_{1}$ and removing the term $x_{1}$ from the remaining clauses.
The Davis–Putnam–LogemannLoveland (DPLL) algorithm takes tree search one step further, by embedding unit propagation into the search algorithm (figure 3). For example, when we condition on $\overline{x}_{1}$ and yield the new expression in equation 28, we generate the unit clause $x_{2}$. We can perform unit resolution using $x_{2}$ to get:
\begin{eqnarray}
\phi\land \overline{x}_{1}\land x_{2} := (\overline{x}_{3} \lor x_{4}) \land (\overline{x}_{3} \lor \overline{x}_{4}) \land x_{3}, \tag{29}
\end{eqnarray}
which creates another unit clause $x_{3}$. Applying unit resolution again we yield the contradiction $x_{4}\land \overline{x}_{4}$ and need proceed no further.
To summarize, the DPLL algorithm consists of tree search, where we perform unit propagation whenever unit clauses are produced. Since unit resolution can be done in linear time, this is much more efficient than the tree search that it replaces.
Note that in our worked example, the unit propagation process always generated a contradiction or a $\text{SAT}$ solution. However, this is not necessarily the case in a larger problem. After unit resolution there will usually be nonunit clauses left containing the remaining variables have neither been conditioned on, nor eliminated using unit resolution. At this point, we condition on the next available variable and continue down the tree, performing unit resolution when we can (figure 4).
The DPLL algorithm makes SAT solving by tree search much more efficient, but there can still be considerable wasted computation. Consider the case where we have set $x_{1}$ to $\text{false}$ and then set $x_{2}$ to $\text{false}$ (figure 5). However, imagine that there are clauses that mean that when $x_{2}$ is $\text{false}$, there is no way to set the variables $x_{3}$ and $x_{4}$ in a valid way. For example, the following combination of clauses will achieve this:
\begin{equation}
(x_{2} \lor x_{3}\lor x_{4}) \land (x_{2} \lor x_{3}\lor \overline{x}_{4}) \land (x_{2} \lor \overline{x}_{3}\lor x_{4}) \land (x_{2} \lor \overline{x}_{3}\lor \overline{x}_{4}). \tag{30}
\end{equation}
As we work through the subtree in the blue region in figure 5, we duly establish that there is no possible solution.
As we search through the tree, we will eventually come to another place where we set $x_{2}$ to $\text{false}$ and now we must work through exactly the same calculations again to establish that there is no valid solution (yellow region in figure 5). In a large problem this may happen many times.
Conflictdriven clause learning aims to reduce this redundancy. When a conflict occurs, the cause is found and we add a new clause to the original statement that prevents exploration of redundant subtrees. For example, in this simple case, we could add the clause $(x_{2})$ which would prevent exploration of trees where $x_{2}$ is $\text{false}$.
Unfortunately, the causes of a conflict are usually more complex than a single variable. To find the combinations of variables that are ultimately responsible for the conflict, we build a structure called an implication graph as we search through the tree.
Figure 6a provides a concrete example of a SAT problem where there are 11 clauses and 10 variables. Figure 6b illustrates the situation where we are midway through the DPLL search in which we have interleaved processes of conditioning (blue shaded areas) and unit resolution (yellow shaded areas). We have just established a conflict at clause 11 (at the blue arrow) which cannot be satisfied when we set $x_{5}$ to $\text{true}$.
Figure 6c is the implication graph associated with this point in the search, which contains all of the variables that we have established so far. The literals $\overline{x}_{1}, x_{2},x_{3}, x_{5}$ that we conditioned on are depicted with blue vertices and the literals $x_{4}, \overline{x}_{6},x_{10},x_{9}, x_{7}$ and $\overline{x}_{7}$ that resulted from unit propagation are shown as yellow vertices. Each edge depicts a contribution to the unit resolution process. For example, the edge between $\overline{x}_{1}$ and $x_{4}$ represents the fact that when $x_{1}$ is set to $\text{false}$, we must set $x_{4}$ to $\text{true}$. This is due to clause 1 and the edge is accordingly labelled with $c_{1}$. Similarly, we can see that $x_{10}$ has become $\text{true}$ by clause $c_{3}$ because previously in the search process $x_{1}$ and $x_{6}$ were both set to $\text{false}$.
You can see from this implication graph exactly how the conflict happened. When we condition on $x_{5}$, clause $c_{6}$ implied that $x_{7}$ must be $\text{true}$ given that $x_{2}$ and $x_{5}$ were both $\text{true}$, but clause $c_{11}$ implied that $x_{7}$ must be $\text{false}$ given that $x_{5}$ and $x_{10}$ were both $\text{true}$. So, one interpretation is that the conflict is inevitable given states $x_{2},x_{5}$ and $x_{10}$ that were inputs to these contradictory clauses.
However, this is not the only interpretation. For example, if $x_{10}$ is one of the proximal causes, then this was only set to $\text{true}$ because we previously set $x_{1}$ and $x_{6}$ to $\text{false}$. So maybe we should attribute the contradiction not to variables $x_{2},x_{5},x_{10}$ but to variables $x_{2},x_{5},\overline{x}_{6},\overline{x}_{1}$. We can use the implication graph to find alternative explanations. Any cut of the graph that separates the conditioning (blue) variables from the conflict defines an explanation (figure 7). The explanatory variables are the source vertices of the edges that were cut.
Having established a cause, we must now derive a new clause that prevents the SAT solver from exploring similar deadends in the future. If the case was attributed to $\overline{x}_{1}, x_{2}, x_{5}$, then we would add the clause $(x_{1}\lor \overline{x}_{2} \lor \overline{x}_{5})$ to prevent this combination happening. We continue exploring the tree, by jumping back up the tree structure to a sensible point and resuming with this new constraint.
The previous discussion outlined the main ideas of conflictdriven clause learning algorithms, but there are many additional choices to be made in a modern system. For example, we must decide the order of variables to condition on. In our examples, we have done this in numerical order, but this choice was arbitrary and there is no particular reason to evaluate them in the same order as we go down different branches in the tree. Much work is devoted to developing heuristics to making this choice. For example, we might prioritize variables that are in short clauses, with the goal of triggering unit propagation earlier. Alternatively, we might prioritize variables that are in lots of clauses as this will simplify the expression a great deal.
There are also many other decisions to make. In CDCL, we must choose which of many potential explanations for a conflict is superior and decide exactly where we should jump back to in the tree. Some solvers periodically restart the solution process to avoid wasting the available computation time fruitlessly searching a single branch, and we must decide when exactly to perform these restarts. One approach to making these decisions is to use machine learning to guide the choices.
For example, Liang et al., (2016) developed uses a reward function to choose the order in which variables in a CDCL solver are considered. A reward function $r[i]$ is defined for each variable $x_{i}$:
\begin{equation}
r[i] \propto \frac{1}{numConflictslastConflict[i])} \tag{31}
\end{equation}
Here $numConflicts$ keeps track of the total number of conflicts the solver has encountered so far whereas $lastConflict[i]$ keeps track of the last time variable $x_{i}$ was involved in a conflict. From this we can see that a variable which was recently involved in a conflict would get a high reward. This reward is then incorporated into a score function:
\begin{equation}
Q[i] \longleftarrow (1\alpha)Q[i] + \alpha r[i] \tag{32}
\end{equation}
At any iteration where a new variable must be selected for conditioning, the variable with the highest score is picked, provided that it is not currently already conditioned on. This is known as the conflict history branching heuristic.
In the above formulation, the terms appearing in the reward will always be known and can hence be computed exactly. However, Liang et al., (2016) also dealt with a case where the reward function is defined such that the terms appearing in it have associated uncertainty. This new reward definition improves the branching heuristic and the authors show how tools from a MultiArmed Bandit framework in RL can be used to estimate the uncertainty and further improve performance.
In further work Liang et al., 2017 the same authors show how gradient based methods can be used to optimize another branching heuristic, one based on how many learnt clauses can be obtained from each decision. Other examples of machine learning in the SAT community include Nejati et al., (2017) where reinforcement learning is used to decide when to restart the solver.
In this blog, we introduced resolution and conditioning operations. We then developed a series of algorithms based on these operations. We started with using resolution (via unit propagation) to efficiently solve 2SAT and then investigated the directional resolution for 3SAT and above. We reframed the SAT solving problem as tree search where conditioning is used at each branch in the tree. This led to the DPLL and CDCL algorithms.
For further information about SAT solving including the DPLL and CDCL algorithms, consult the Handbook of Satisfiability. Most chapters are available on the internet if you search for their titles individually. A second useful resource is Donald Knuth's Facsicle 6.
In part III of this article, we'll investigate a completely different approach to SAT solving that relies on belief propagation in factor graphs. Finally, we'll show how the machinery of SAT solving can be extended to continuous variables by introducing satisfiability module theory (SMT) solvers.
]]>With the explosive growth of AI products, we are beginning to see how AI can transform peoples’ lives. From online shopping to healthcare, and banking to climate change, the potential for AI is almost limitless. However, examples of misuse of data, biased algorithms and inaccurate decisions are fueling a general mistrust and skepticism of the technology. For society to embrace AI and ensure broader adoption of these technologies, we need to effectively evaluate model reliability and safety. Compared to the lab, the range of consequences is far wider if our models don’t perform as intended.
Building robust model governance tools is critical to model performance. They give us more ways to assess behaviour, and to ultimately gain a greater level of trust in AI. One example of this can be found in facial recognition.
A recent RBC Disruptors podcast exposed the dangers presented by biased facial recognition systems. It is widely known that machine learning algorithms tend to perform worse on images of women and people of colour [1]. To some extent this is due to biased datasets; however, we must be vigilant at all stages of the ML lifecycle. We need to rethink how we design, test, deploy, and monitor machine learning algorithms. Performance across diverse groups should be a property that models satisfy before they are deployed.
This article explains how we are researching automated model validation tools at Borealis AI. While facial recognition tasks are not currently in our model validation pipeline, the examples shown can be generalized across datasets and machine learning tasks: you just need to define the property you want to detect.
We build AI systems with the intent of capturing patterns inside of datasets. The patterns they learn, however, are determined by the dataset and the training algorithm used. Without explicit regularization, models can take shortcuts to achieve their learning objectives, and the result is illusory performance and undesirable behaviors.
One solution is to find more data, or find cleaner data. But that can be expensive, even if it is possible. What’s more, we don’t always recognize when data is contaminated.
The next best option is to ensure your model will not act adversely against a range of potential scenarios. But this is a bit of a “chicken and egg” situation: how can you do that without deploying your model in the real world first, if you only have so much data? The proactive answer is to run extensive tests. We begin from communityaccepted definitions of desirable model behavior: for instance, good models should have consistent predictions around an input, and avoid making predictions based on protected attributes. We then run a search over the inputs and outputs to find violations of these properties, and return them for analysis. Actions can then be taken to improve the model, for example by retraining it to account for the violations.
At a high level, that is how our validation platform is being developed. Each test is essentially a mathematical expression which consists of the model, plus the desired property for which it is being assessed. One example is a test for adversarial robustness, as shown in figure 1.A. Here we are interested in knowing whether a tiny nudge (of size epsilon) to an input data point X can completely change a prediction made by the model. Having defined our property, we then run a solver over the expression to see if any failure examples can be found. If so, we return them to the user as examples of having failed this adversarial robustness test.
Tests for other properties can be crafted in the same way, where the underlying theme always relies on a region around a point [2]. Varying the shape of the region corresponds to different properties such as in Figure 1.B. Our current research work involves developing methods capable of coming up with these shapes to test for notions such as fairness.
Changing the shape of the region results in a more complex search space for our solver to explore. As such, future research may involve looking into more powerful solvers: for instance, by using symmetry to avoid redundant areas of the search space.
As an ML lab within the highly regulated financial services industry, we need to demonstrate that we meet a set of strict compliance and regulatory standards. RBC’s longstanding model risk management policies and procedures form a great basis for managing the risks of AI. However, our testing methodologies have to keep pace with AI’s rapid evolution to ensure that we continue to deploy cuttingedge models safely.
Borealis AI recently launched a new online hub, RESPECT AI, providing resources to the AI community and business leaders to help them adopt responsible, safe and ethical AI practices. This program includes a focus on model governance and includes interviews with leading industry experts as well as learnings from our own research. We will continue to share our findings with our peers in the AI community, particularly in nonregulated industries where governance is far less mature.
AI is undeniably one of the biggest opportunities for our economy and innovation. However, with this opportunity comes risks that are too great to ignore. In order to meet these challenges, the AI community needs to work alongside industry, regulators, and government to push the boundaries and adapt or develop new AI validation methods tailored to the complexities of this space.
[1] Buolamwini, Joy and Timnit Gebru. "Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification." FAT (2018).
[2] Robey, Alexander et al. “ModelBased Robust Deep Learning: Generalizing to Natural, OutofDistribution Data.” (2020).
]]>The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Valentine Goddard (VG):
The debate about the social impacts of AI have evolved as technology has evolved.
However, the pandemic is rapidly increasing the Digital Gap which is often linked to socioeconomic inequities. The conversation has shifted from an overemphasis on the economic efficiencies of AI to one about social resilience in the age of AI. We’re facing a historical opportunity to look into what kind of New Normal we want.
VG:
To start, I think diversity is key. We need to improve the diversity of perspectives that go into the development of AI and data analytics. I think awareness about the importance of this is slowly increasing.
Meanwhile, we’re also seeing social issues rise up the corporate agenda. An increasing number of organizations have created roles responsible for AI Ethics, but there is still much debate about what that means exactly.
I’m also encouraged by efforts to ensure that developers and employees are able to raise concerns about the fairness of the algorithms they are creating. Though we still encounter businesses that can improve when it comes to properly supporting their applied research teams in this area.
VG:
I have noticed an increasing desire on the part of businesses to participate in ‘AI for Good’ or ‘Data for Good’ initiatives. These are great first step towards improving awareness around the social impacts of AI, but they are often just ‘oneoffs’. What we need is a more sustainable approach.
That is why I have been advocating for greater partnership between the AI ecosystem and Civil Society Organization’s (CSOs) as a way to drive participation and fairness in the development and implementation of AI. It will likely require more funding for CSOs as they move up the AI maturity curve, but this is not just about funding.
For those at the leading edge of AI development and governance, the inclusion of CSOs can deliver massive benefits – they can help bring diversity of perspective (seven in ten employees at socially focused nonprofits are women); they can help socialize the wider benefits of AI within their communities (often those most underserved by technology), and they can help identify emerging social issues related to the use of AI (and maybe even help solve them before they become problems).
There is also a growing body of research on the value of including CSOs in business decisionmaking. Academics suggest it can increase trust and accelerate the responsible adoption of new technologies. Others stress the value inherent in working with CSOs to collect relevant, highquality data that can lead to more robust and sociallybeneficial results. Many simply highlight the need to include more democratic processes in regulatory innovation.
VG:
That largely depends on the sector you are working in and the stakeholders you touch, but the net should be cast wide. Before the pandemic disrupted our normal lives, my organization – AI Impact Alliance –conducted workshops throughout the year that lead to the adoption of international public policy recommendations on the role of civil society, the Arts, in digital governance by the United Nations. It’s essential to work towards more inclusive AI policy recommendations.
VG:
Two approaches are gaining greater adoption in the market. The first is to conduct ongoing social impact assessments on the AI models and technologies you are developing and implementing. There is no single guide to what goes into a social impact assessment, so companies will need to work with stakeholders and others to define what standards and KPIs they will measure.
We are also seeing greater adoption of social return on investment criteria, both within decisionmaking and in corporate reporting. Again, the standards vary depending on the sector and market. But they can be a useful tool for measuring progress.
VG:
I would argue that public institutions need to take a lead role in supporting education and digital literacy – particularly around the ethical, social, legal, economic and political implications of AI. They need to be encouraging the adoption of democratic AI processes and normative frameworks. They need to be tackling the roots of the digital divide. And they need to be supporting new forms of democratic participation and civic engagement in the field of AI.
The government can also play a role in incentivising businesses for responsible behaviour. They could take a heavy handmaking social impact assessments compulsory, for example – or they could take a more collaborative approach by ensuring the right stakeholders are at the table and that social return on investment is recognized and valued.
VG:
At an individual level, we’ve enjoyed tremendous support from developers, researchers and scientists who want to contribute to the debate about the impact of AI on society. I have also seen AI developers and researchers donate their time directly to CSOs to help data literacy and management capacity.
I think more broadly, the AI community needs to continue to focus on addressing the root causes of digital inequality. I think we need to be aware of how the models we are developing for the digital economy can sometimes become a driver of ethical problems. And we need to be more supported when we see problems emerging.
I think the AI community wants to build socially responsible models and technologies. They just need the tools, frameworks and encouragement to go do it.
Valentine Goddard is the founder and executive director of AI Impact Alliance, an independent nonprofit organization operating globally, whose mission is to facilitate an ethical and responsible implementation of artificial intelligence. She is a member of the United Nations Expert Groups on The Role of Public Institutions in the Transformative Impact of New Technologies, and on a “Socially just transition towards sustainable development: The role of digital technologies on social development and wellbeing of all”. Ms. Goddard sits on several committees related to the ethical and social impact of AI and contributes to public policy recommendations related to the ethical and normative framework of AI.
]]>This tutorial concerns the Boolean satisfiability or SAT problem. We are given a formula containing binary variables that are connected by logical relations such as $\text{OR}$ and $\text{AND}$. We aim to establish whether there is any way to set these variables so that the formula evaluates to $\text{true}$. Algorithms that are applied to this problem are known as SAT solvers.
The tutorial is divided into three parts. In part I, we introduce Boolean logic and the SAT problem. We discuss how to transform SAT problems into a standard form that is amenable to algorithmic manipulation. We categorize types of SAT solvers and present two naïve algorithms. We introduce several SAT constructions, which can be thought of as common subroutines for SAT problems. Finally, we present some applications; the Boolean satisfiability problem may seem abstract, but as we shall see it has many practical uses.
In part II of the tutorial, we will dig more deeply into the internals of modern SAT solver algorithms. In part III, we recast SAT solving in terms of message passing on factor graphs. We also discuss satisfiability modulo theory (SMT) solvers, which extend the machinery of SAT solvers to solve more general problems involving continuous variables.
The relevance of SAT solvers to machine learning is not immediately obvious. However, there are two direct connections. First, machine learning algorithms rely on optimization. SAT can also be considered an optimization problem and SAT solvers can find global optima without relying on gradients. Indeed, in this tutorial, we'll show how to fit both neural networks and decision trees using SAT solvers.
Second, machine learning techniques are often used as components of SAT solvers; in part II of this tutorial, we'll discuss how reinforcement learning can be used to speed up SAT solving, and in part III we will show that there is a close connection between factor graphs and SAT solvers and that belief propagation algorithms can be used to solve satisfiability problems.
In this section, we define a set of Boolean operators and show how they are combined into Boolean logic formulae. Then we introduce the Boolean satisfiability problem.
Boolean operators are standard functions that take one or more binary variables as input and return a single binary output. Hence, they can be defined by truth tables in which we enumerate every combination of inputs and define the output for each (figure 1). Common logical operators include:
A Boolean logic formula $\phi$ takes a set of $I$ variables $\{x_{i}\}_{i=1}^{I}\in\{$$\text{false}$,$\text{true}$$\}$ and combines them using Boolean operators, returning $\text{true}$ or $\text{false}$. For example:
\begin{equation}
\phi:= (x_{1}\Rightarrow (\lnot x_{2}\land x_{3})) \land (x_{2} \Leftrightarrow (\lnot x_{3} \lor x_{1}). \tag{1}
\end{equation}
For any combination of input variables $x_{1},x_{2},x_{3}\in\{$$\text{false}$,$\text{true}$$\}$, we could evaluate this formula and see if it returns $\text{true}$ or $\text{false}$. Notice that even for this simple example with three variables it is hard to see what the answer will be by inspection.
The Boolean satisfiability problem asks whether there is at least one combination of binary input variables $x_{i}\in\{$$\text{false}$,$\text{true}$$\}$ for which a Boolean logic formula returns $\text{true}$. When this is the case, we say the formula is satisfiable.
A SAT solver is an algorithm for establishing satisfiability. It takes the Boolean logic formula as input and returns $\text{SAT}$ if it finds a combination of variables that can satisfy it or $\text{UNSAT}$ if it can demonstrate that no such combination exists. In addition, it may sometimes return without an answer if it cannot determine whether the problem is $\text{SAT}$ or $\text{UNSAT}$.
To solve the SAT problem, we first convert the Boolean logic formula to a standard form that it is more amenable to algorithmic manipulation. Any formula can be rewritten as a conjunction of disjunctions (i.e., the logical $\text{AND}$ of statements containing $\text{OR}$ relations). This is known as conjunctive normal form. For example:
\begin{equation}\label{eq:example_cnf}
\phi:= (x_{1} \lor x_{2} \lor x_{3}) \land (\lnot x_{1} \lor x_{2} \lor x_{3}) \land (x_{1} \lor \lnot x_{2} \lor x_{3}) \land (x_{1} \lor x_{2} \lor \lnot x_{3}). \tag{2}
\end{equation}
Each term in brackets is known as a clause and combines together variables and their complements with a series of logical $\text{OR}$s. The clauses themselves are combined via $\text{AND}$ relations.
The Tseitin transformation converts an arbitrary logic formula to conjunctive normal form. The approach is to i) associate new variables with subparts of the formula using logical equivalence relations, (ii) to restate the formula by logically $\text{AND}$ing these new variables together, and finally (iii) manipulate each of the equivalence relations so that they themselves are in conjunctive normal form.
This process is most easily understood with a concrete example. Consider the conversion of the formula:
\begin{equation}
\phi:= ((x_{1} \lor x_{2}) \Leftrightarrow x_{3}) \Rightarrow (\lnot x_{4}). \tag{3}
\end{equation}
Step 1: We associate new binary variables $y_{i}$ with the subparts of the original formula using the $\text{EQUIVALENCE}$ operator:
\begin{eqnarray}\label{eq:tseitin}
y_{1} &\Leftrightarrow &(x_{1} \lor x_{2})\nonumber \\
y_{2} &\Leftrightarrow &(y_{1} \Leftrightarrow x_{3}) \nonumber \\
y_{3} &\Leftrightarrow &\lnot x_{4}\nonumber \\
y_{4} &\Leftrightarrow &(y_{2} \Rightarrow y_{3}). \tag{4}
\end{eqnarray}
We work from the inside out (i.e., from the deepest brackets to the least deep) and choose subformulae that contain a single operator ($\lor, \land, \lnot, \Rightarrow$ or $\Leftrightarrow$).
Step 2: We restate the formula in terms of these relations. The full original statement is now represented by $y_{4}$ together with the definitions of $y_{1},y_{2},y_{3},y_{4}$ in equations 4. So the statement is $\text{true}$ when we combine all of these relations with logical $\text{AND}$ relations. Working backwards we get:
\begin{eqnarray}\label{eq:tseitin_stage2}
\phi&=& y_{4} \land (y_{4} \Leftrightarrow (y_{2} \Rightarrow y_{3})) \nonumber \\&&\hspace{0.4cm}\land (y_{3} \Leftrightarrow \lnot x_{4})\nonumber\\&& \hspace{0.4cm}\land (y_{2} \Leftrightarrow (y_{1} \Leftrightarrow x_{3}))\nonumber
\\&&\hspace{0.4cm}\land (y_{1} \Leftrightarrow (x_{1} \lor x_{2})). \tag{5}
\end{eqnarray}
This is getting closer to the conjunctive normal form as it is now a conjunction (logical $\text{AND}$) of different terms.
Step 3: We convert each of these individual terms to conjunctive normal form. In practice, there is a recipe for each type of operator:
\begin{eqnarray}
a \Leftrightarrow (\lnot b) & = & (a \lor b) \land (\lnot a \lor \lnot b) \\
a \Leftrightarrow (b \lor c) &= & (a\lor \lnot b) \land (a \lor \lnot c) \land (\lnot a \lor b \lor c) \nonumber \\
a \Leftrightarrow (b \land c) & = & (\lnot a \lor b) \land (\lnot a \lor c) \land (a \lor \lnot b \lor \lnot c) \nonumber \\
a \Leftrightarrow (b \Rightarrow c) & = & (a \lor b) \land (a \lor \lnot c) \land (\lnot a \lor \lnot b \lor c) \nonumber \\
a \Leftrightarrow (b \Leftrightarrow c) & = & (\lnot a \lor \lnot b \lor c)\land (\lnot a \lor b \lor \lnot c) \land (a \lor \lnot b \lor \lnot c) \land (a\lor b\lor c).\nonumber \tag{6}
\end{eqnarray}
The first of these recipes is easy to understand. If $a$ is $\text{true}$ then the first clause is satisfied, but the second can only be satisfied by having $\lnot b$. If $a$ is $\text{false}$ then the second clause is satisfied, but the first clause can only be satisfied by $b$. Hence when $a$ is $\text{true}$, $\lnot b$ is $\text{true}$ and when $a$ is $\text{false}$, $\lnot b$ is $\text{false}$ and so $a \Leftrightarrow (\lnot b)$ as required.
The remaining recipes are not obvious, but you can confirm that they are correct by writing out the truth tables for the left and right sides of each expression and confirming that they are the same. Applying the recipes to equation 5 we get the final expression in conjunctive normal form:
\begin{eqnarray}\label{eq:tseitin_stage3}
\phi\!\!&\!\!:=& y_{4} \land (y_4\lor y_2) \land (y_4 \lor \lnot y_3) \land (\lnot y_4 \lor \lnot y_2 \lor y_3)\nonumber \\
&&\hspace{0.4cm}\land (y_3 \lor x_4) \land (\lnot y_3 \lor \lnot x_4)\nonumber\\
&& \hspace{0.4cm}\land (\lnot y_2 \lor \lnot y_1 \lor x_3)\land (\lnot y_2 \lor y_1 \lor \lnot x_3) \land (y_2 \lor \lnot y_1 \lor \lnot x_3) \land (y_2\lor y_1\lor x_3)\nonumber \\&&
\hspace{0.4cm}\land (y_1\lor \lnot x_1) \land (y_1 \lor \lnot x_2) \land (\lnot y_1 \lor x_1 \lor x_2). \tag{7}
\end{eqnarray}
In the conjunctive normal form, each clause is a conjunction (logical $\text{OR}$) of variables and their complements. For neatness, we will write the complement $\lnot x$ of a variable as $\overline{x}$, so instead of writing:
\begin{equation}
\phi:= (x_{1} \lor x_{2} \lor x_{3}) \land (\lnot x_{1} \lor x_{2} \lor x_{3}) \land (x_{1} \lor \lnot x_{2} \lor x_{3}) \land (x_{1} \lor x_{2} \lor \lnot x_{3}), \tag{8}
\end{equation}
we write:
\begin{equation}\label{eq:example_cnf2}
\phi:= (x_{1} \lor x_{2} \lor x_{3}) \land (\overline{x}_{1} \lor x_{2} \lor x_{3}) \land (x_{1} \lor \overline{x}_{2} \lor x_{3}) \land (x_{1} \lor x_{2} \lor \overline{x}_{3}). \tag{9}
\end{equation}
We collectively refer to the variables and their complements as literals and so this formula contains literals $x_{1},\overline{x}_{1},x_{2},\overline{x}_{2}, x_{3}$ and $\overline{x}_{3}.$
When expressed in conjunctive normal form, we can characterise the problem in terms of the number of variables, the number of clauses and the size of those clauses. To facilitate this we introduce the following terminology:
SAT solvers are algorithms that establish whether a Boolean expression is satisfiable and they can be classified into two types. Complete algorithms guarantee to return $\text{SAT}$ or $\text{UNSAT}$ (although they may take an impractically long time to do so). Incomplete algorithms return $\text{SAT}$ or return $\text{UNKNOWN}$ (i.e. return without providing an answer). If they find a solution that satisfies the expression then all is good, but if they don't then we can draw no conclusions.
Here are two naïve algorithms that will help you understand the difference:
When a solver returns $\text{SAT}$ or $\text{UNSAT}$, it also returns a certificate, which can be used to check the result with a simpler algorithm. If the solver returns $\text{SAT}$, then the certificate will be a set of variables that obey the formula. These can obviously be checked by simply computing the formula with them and checking that it returns $\text{true}$ . If it returns $\text{UNSAT}$ then the certificate will usually be a complex data structure that depends on the solver.
First, the bad news. The SAT problem is proven to be NPcomplete and it follows that there is no known polynomial algorithm for establishing satisfiability in the general case. An important exception to this statement is 2SAT for which a polynomial algorithm is known. However, for 3SAT and above the problem is very difficult.
The good news is that modern SAT solvers are very efficient and can often solve problems involving tens of thousands of variables and millions of clauses in practice. In part II of this tutorial we will explain how these algorithms work.
Until now we have focused on the satisfiability problem in which try to establish if there is at least one set of literals that makes a given statement evaluate to $\text{true}$. We note that there are also a number of closely related problems:
UNSAT: In the UNSAT problem we aim to show that there is no combination of literals that satisfies the formula. This is subtly different from SAT where algorithms return as soon as they find literals that show the formula is $\text{SAT}$, but may take exponential time if they cannot find a solution. For the UNSAT problem, the converse is true. The algorithm will return as soon as soon as it establishes the formula is not $\text{UNSAT}$, but may take exponential time to show that it is $\text{UNSAT}$.
Model counting: In model counting (sometimes referred to as #SAT or #CSP), our goal is to count the number of distinct sets of literals that satisfy the formula.
MaxSAT: In MaxSAT, it may be the case that a formula is $\text{UNSAT}$ but we aim to find a solution that minimizes the number of clauses that are invalid.
Weighted MaxSAT: This is a variation of MaxSAT in which we pay a different penalty for each clause when it is invalid. We wish to find the solution that incurs the least penalty.
For the rest of this tutorial, we'll concentrate on the main SAT problem, but we'll return to these related problems in part III of this tutorial when we discuss factor graph methods.
Most of the remainder of part I of this tutorial is devoted to discussing practical applications of satisfiability problems. Based on the discussion thus far, the reader would be forgiven for being sceptical about how this rather abstract problem can find realworld uses. We will attempt to convince you that it can! However, before, we can do this, it will be helpful to review commonlyused SAT constructions.
SAT constructions can be thought of as subroutines for Boolean logic expressions. A common situation is that we have a set of variables $x_{1},x_{2},x_{3},\ldots$ and we want to enforce a collective constraint on their values. In this section, we'll discuss how to enforce the constraints that they are all the same, that exactly one of them is $\text{true}$, that no more than $K$ of them are true or that exactly $K$ of them are true.
To enforce the constraint that a set of variables $x_{1},x_{2}$ and $x_{3}$ are either all $\text{true}$ or all $\text{false}$ we simply take the logical $\text{OR}$ of these two cases so we have:
\begin{equation}
\mbox{Same}[x_{1},x_{2},x_{3}]:= (x_{1}\land x_{2}\land x_{3})\lor(\overline{x}_{1}\land \overline{x}_{2}\land \overline{x}_{3}). \tag{10}
\end{equation}
Note that this is not in conjunctive normal form (the $\text{AND}$ and $\text{OR}$s are the wrong way around) but could be converted via the Tseitin transformation.
To enforce the constraint that only one of a set of variables $x_{1},x_{2}$ and $x_{3}$ is true and the other two are false, we add two constraints. First we ensure that at least one variable is $\text{true}$ by logically $\text{OR}$ing the variables together:
\begin{equation}
\phi_{1}:= x_{1}\lor x_{2} \lor x_{3}. \tag{11}
\end{equation}
Then we add a constraint that indicates that both members of any pair of varaiables cannot be simultaneously $\text{true}$:
\begin{equation}\label{eq:exactly_one}
\mbox{ExactlyOne}[x_{1},x_{2},x_{3}]:= \phi_{1}\land \lnot (x_{1}\land x_{2}) \land \lnot (x_{1}\land x_{3}) \land \lnot (x_{2}\land x_{3}) . \tag{12}
\end{equation}
There are many standard ways to enforce the constraint that at least $K$ of a set of variables are $\text{true}$. We'll present one method which is a simplified version of the sequential counter encoding.
The idea is straightforward. If we have $J$ variables $x_{1},x_{2},\ldots x_{J}$ and wish to test if $K$ or more are true, we construct a $J\times K$ matrix containing new binary variables $r_{j,k}$ (figures 2b and d). The $j^{th}$ row of the table contains a count of the number of $\text{true}$ elements we have seen in $x_{1\ldots j}$. So, if we have seen 3 variables that are $\text{true}$ in the first $j$ elements, the $j^{th}$ row will start with 3 $\text{true}$ elements and finish with $K3$ $\text{false}$ elements.
If there are at least $K$ variables, then the bottom right variable $r_{J,K}$ in this table must be $\text{true}$ and so in practice, we would add a clause $(r_{J,K})$ stating that this bottom right element must be $\text{true}$ to enforce the constraint. When this element is $\text{false}$, the solver will search for a different solution where $\mathbf{x}$ does have at least $K$ elements or return $\text{UNSAT}$ if it cannot find one. By the same logic, to enforce the constraint that there are less than $K$ elements, we add a clause $\overline{r}_{J,K}$ stating the at the bottom right hand variable is $\text{false}$.
The table constructed in figure 2d also shows us how to constrain the data to have exactly K $\text{true}$ values. In this case, we expect the bottom right element to be $\text{true}$, but the element above this to be $\text{false}$ add the clause $(r_{J,K}\land \overline{r}_{J1,K})$. Figure 3 provides more detail about how we add extra clauses to the SAT formula that build these tables.
Armed with these SAT constructions, we'll now present two complementary ways of thinking about SAT applications. The goal is to inspire the novice reader to see the applicability to their own problems. In the next section, we'll consider SAT in terms of constraint satisfaction problems and in the section following that, we'll discuss it in terms of model fitting.
The constraint satisfaction viewpoint considers combinatorial problems where there are a very large number of potential solutions, but most of those solutions are ruled out by some prespecified constraints. To make this explicit, we'll consider the two examples of graph coloring and scheduling.
In the graph coloring problem (figure 4) we are given a graph consisting of a set of vertices and edges. We want to associate each vertex with a color in such a way that every pair of vertices connected by an edge have different colors. We might also want to know how many colors are necessary to find a valid solution. Note that this maps to our description of the generic constraint satisfaction problem; there are a large number of possible assignments of colors, but many of these are ruled out by the constraint that neighboring colors must be different.
To encode this as a SAT problem, we'll choose the number of colors $C$ to test. Then we create binary variables $x_{c,v}$ which will be $\text{true}$ if vertex $v$ is colored with color $c$. We then encode the constraint that each vertex can only have exactly one color using the construction $\mbox{ExactlyOne}[x_{\bullet, v}]$ from equation 12. We also add the constraints to ensure that the neighbours have different colors. Formally this means that that $x_{c,v}\Rightarrow \lnot x_{c,v'}$ for every color $c$ and neighbour $v'$ of vertex $v$.
Having set up the problem, we run the SAT solver. If it returns $\text{UNSAT}$ this means we need more colors. If it returns $\text{SAT}$ with a concrete coloring, then we have an answer. We can find the minimum number of colors required by using binary search over the number of colors to find the point where the problem changes from $\text{SAT}$ to $\text{UNSAT}$.
The graph coloring problem is a rather artificial computer science example, but many realworld problems can similarly be expressed in terms of satisfiability. For example, consider scheduling courses in a university. We have a number of professors, each of whom teach several different courses. We have a number of classrooms. We have a number of possible timeslots in each classroom. Finally, we have the students themselves, who are each signed up to a different subset of courses. We can use the SAT machinery to decide which course will be taught in which classroom and in what timeslot so that no clashes occur.
In practice, this is done by defining binary variables describing the known relations between the real world quantities. For example, we might have variables $x_{i,j}$ indicating that student $i$ takes course $j$. Then we encode the relevant constraints: no teacher can teach two classes simultaneously, no student can be in two classes simultaneously, no room can host more than one class simultaneously, and so on. The details are left as an exercise to the reader, but the similarity to the graph coloring problem is clear.
A second way to think about satisfiability is in terms of function fitting. Here, there is a clear connection to machine learning in which we fit complex functions (i.e., models) to training data. In fact there is a simple relationship between functionfitting and constraint satisfaction; when we fit a model, we can consider the parameters as unknown variables, and each training data/label pair represents a constraint on the values those parameters can take. In this section, we'll consider fitting binary neural networks and decision trees.
Binary neural networks are nets in which both the weights and activations are binary. Their performance can be surprisingly good, and their implementation can be extremely efficient. We'll show how to fit a binary neural network using SAT.
Following Mezard and Mora (2008) we consider a one layer binary network with $K$ neurons. The network takes a $J$ dimensional data example $\mathbf{x}$ with elements $x_{j}\in\{1,1\}$ and computes a label $y\in\{1,1\}$, using the function:
\begin{equation}\label{eq:one_layer}
y = \mbox{sign}\left[\sum_{j=1}^{J}\phi_{j}x_{j}\right] \tag{13}
\end{equation}
where the unknown model parameters $\phi_{j}$ are also binary and the function $\mbox{sign}[\bullet]$ returns 1 or 1 (figure 5) based on the sign of the summed terms.
Given a training set of $I$ data/label pairs $\{\mathbf{x}_{i},y_{i}\}$, our goal is to choose the model parameters $\phi_{j}$. We'll force all of the training examples to be classified correctly and so each training example/label pair can be considered a hard constraint on the parameters.
To encode these constraints, we create new variables $z_{i,j}$ that indicate whether the product $\phi_{j}x_{i,j}$ is positive. This happens when either both elements are positive or both are negative, so we can use the $\mbox{Same}[\phi_{j},x_{i,j}]$ construction. Note that for the rest of this discussion we'll revert to the convention that $x_{i,j}, y_{j}\in\{$$\text{false}$,$\text{true}$$\}$.
The predicted label is the sum of the elements $z_{i,j}$ and will be positive when more than half of the product terms $z_{i,\bullet}$ evaluate to $\text{true}$. Likewise it will be negative if less than half are $\text{true}$. Hence, for the network to predict the correct output label $y_{i}$ we require
\begin{equation}
\left(y_{i} \land \mbox{AtLeastK}[\mathbf{z}_{i}]\right)\lor \left(\overline{y}_{i} \land \lnot\mbox{AtLeastK}[\mathbf{z}_{i}]\right) \tag{14}
\end{equation}
where $K=J/2$ and the vector $\mathbf{z}_{i}$ contains the product terms $z_{i,\bullet}$.
We have one such constraint for each training example and we logically $\text{AND}$ these together. When we run the SAT solver we are asking whether it is possible to find a set of parameters $\boldsymbol\phi$ for which all of these constraints are met.
It is easy to extend this example to multilayer networks and to allow a certain amount of training error and we leave these extensions as exercises for the reader.
A binary decision tree also classifies data $\mathbf{x}_{i}$ into binary labels $y_{i}\in\{0,1\}$. Each data example $\mathbf{x}_{i}$ starts at the root. It then passes to either the left or right branch of the tree by testing one of its elements $x_{i,j}$. We'll consider binary data $x_{i,j}\in\{$$\text{false}$, $\text{true}$$\}$ and adopt the convention that the data example passes left if $x_{i,j}$ is $\text{false}$ and right if $x_{i,j}$ is $\text{true}$. This procedure continues, testing a different value of $x_{i,j}$ at each node in the tree until we reach a leaf node at which a binary output label is assigned.
Learning the binary decision tree can also be framed as a satisfiability problem. From a training perspective, we would like to select the tree structure so that the training examples $\mathbf{x}_{i}$ that reach each leaf node have labels $y_{i}$ that are all $\text{true}$ or all $\text{false}$ and hence the training classification performance is 100%.
We'll develop simplified version of the approach of Narodytska et al. (2018). Incredibly, we can learn both the structure of the tree and which features to branch on simultaneously. When we run the SAT solver for a given number $N$ of tree nodes, it will search over the space of all tree structures and branching features and return $\text{SAT}$ if it is possible to classify all the training examples correctly and provide a concrete example in which this is possible. By changing the number of tree nodes, we can find the point at which this problem turns from $\text{SAT}$ to $\text{UNSAT}$ and hence find the smallest possible tree that classifies the training data correctly.
We'll describe the SAT construction in two parts. First we'll describe how to encode the structure of the tree as a set of logical relations and then we'll discuss how to choose branching features that classify the data correctly.
Tree structure: We create $N$ binary variables $v_{n}$ that indicate if each of the $N$ nodes is a leaf. Similarly we create $N^{2}$ binary variables $l_{m,n}$ indicating if node $n$ is the left child of node $m$ and $N^{2}$ binary variables $r_{m,n}$ indicating if node $m$ is the right child of node $n$. Then we build Boolean expressions to enforce the following constraints:
Any set of variables $v_{n}$, $l_{m,n}$, $r_{m,n}$ that obey these constraints form a valid tree, and we can find such a configuration with a SAT solver. Two such trees are illustrated in figure 6.
Classification: The second part of the construction ensures that the data examples $\mathbf{x}_{i}$ are classified correctly (figure 7). We introduce variables $f_{n,j}$ that indicate that node $n$ branches on feature $x_j$. We'll adopt the convention that when the branching variable $x_{j}$ is $\text{false}$ we will always branch left and when it is $\text{true}$ we will always branch right. In addition, we introduce variables $\hat{y}_{n}$ that will indicate if each leaf node classifies the data as $\text{true}$ or $\text{false}$ (their values will be arbitrary for nonleaf nodes).
We'll also create several bookkeeping variables that are needed to set this up as a SAT problem, but are not required to run the model once trained. We introduce ancestor variables $a^{l}_{nj}$ at each node $n$ which are $\text{true}$ if we branched left on feature $j$ at node $n$ or at any of its ancestors and similarly $a^{r}_{nj}$ if we branched right on feature $j$ at this node or any of its ancestors. Finally, we introduce variables $e_{i,n}$ that indicate that training example $\mathbf{x}_{i}$ reached leaf node $n$. Notice that this happens when $x_{ij}$ is $\text{false}$ everywhere $a^{l}_{nj}$ is $\text{true}$ (i.e., we branched left somewhere above on these left ancestor features) and $x_{ij}$ is $\text{true}$ everywhere $a^{r}_{nj}$ is $\text{true}$ (i.e., we branched right somewhere above on these right ancestor features).
Using these variables, we build Boolean expressions to enforce the following constraints:
Collectively, these constraints mean that all of the data must be correctly classified. When we logically $\text{AND}$ all of these constraints together, and find a solution that is $\text{SAT}$ we retrieve a tree that classifies the data 100% correctly. By reducing the number of nodes until the point that the problem becomes $\text{UNSAT}$, we can find the most efficient tree that partitions the training data exactly.
This concludes part I of this tutorial on SAT solvers. We've introduced the SAT problem, shown how to convert it to conjunctive normal form and presented some standard SAT constructions. Finally, we've described several different applications which we hope will inspire you to see SAT as a viable approach to your own problems.
In the next part of this tutorial, we'll delve into how SAT solvers actually work. In the final part, we'll elucidate the connections between SAT solving and factor graphs. For those readers who still harbor reservations about the applicability of a method based purely on Boolean variables, we'll also consider (i) how to converting nonBoolean variables to binary form and (ii) methods to work with them directly using SMT solvers.
If you want to try working with SAT algorithms, then this tutorial will help you get started. For an extremely comprehensive list of applications of satisfiability, consult SAT/SMT by example. This may give you more inspiration for how to reframe your problems in terms of satisfiability.
]]>This execution problem sounds straightforward but there are several complications. A naive approach might be to wait until the price seems "low enough" and then buy all the shares at once. Putting aside the question of how to define "low enough", this method has a huge drawback. Executing a large order all at once creates a great deal of demand and the effect of this is to increase the price (market impact). Unfortunately, this action has an undesirable effect on the final price achieved. Consequently, it could be more sensible to buy the shares gradually through the specified time period. But how many should the broker buy, and when?
To the savvy machine learning researcher, it will be obvious that this problem lends itself to a reinforcement learning formulation. The execution algorithm must make a series of sequential decisions about how many shares to buy at each time step and receives rewards in the form of low execution prices.
The structure of the rest of this article is as follows. First, we describe the order execution problem in more detail. This will necessitate a discussion of how modern financial markets work in practice and the limit order book. Then we provide a brief review of reinforcement learning and describe how it maps to this problem. Finally, we describe the practical details of the Aiden system.
Contemporary financial markets such as the TSX/NYSE/NASDAQ are limit order markets. This means that traders who wish to purchase shares can specify not only the volume they wish to purchase, but also the maximum price (limit) that they are prepared to pay. More formally, a limit order can be specified by the tuple $\{\tau, p, n\}$ where $\tau\in\{0,1\}$ specifies whether this is a buy or sell order, $p$ is the specified price limit, and $n$ is the maximum number of shares to be traded. The possible prices of the shares in the order book are discrete, and the smallest difference allowable between them is a tick.
The limit order book consists of the set of all current limit orders. It can be visualized as two histograms (figure 1). The first consists of the volume of the buy orders at each discrete price level and the second consists of volumes of the sell orders. The highest buy order is known as the current bid price and the lowest sell order is known as the current ask price. The difference between the two is known as the bidask spread and the average of the two is known as the midprice.
When a trader enters a buy limit order that is at or above the current ask price, the order will receive executions. The first trades will occur at the current ask price, but if the volume of the buy order exceeds the volume available at that price, the order will continue at the next price level. This process occurs until either the entire order has been fulfilled, or it reaches the specified limit. In this case, there are insufficient shares available for sale at or below this limit and so the order is only partially filled. Hence, the overall effect of placing a limit order is that the price is guaranteed to be within the specified limit, but the volume is not.
Any remaining unfulfilled part of the order is then added to the buy side of the limit order book and remains there until it is either (i) matched by a new sellside order (ii) its time limit expires or (iii) it is removed by the trader. Orders are typically matched based on a firstin / firstout basis for most trading venues; in this instance, any order placed below the current ask price will be placed last in the queue for that particular price level. A worked example of a limit order is given in figure 2.
In addition to limit orders it is possible to place a market order which is specified by the volume $n$ of shares that we wish to buy. Essentially, this means that the trader will buy all $n$ shares now at whatever prices are available on the sell side of the limit order book. So first the trader buys shares at the current ask price. If the volume of the buy order exceeds the volume available at the current ask price, the trader will continue fulfilling the order at the next best price and so on. Effectively, a market order is hence a limit order where the limit is $+\infty$. A worked example of a market order is given in figure 3.
Notice that for both the limit order and the market order, a large volume affects the current ask price. As the volume at one price level is exhausted the ask price increases to the next level where there is nonzero volume. Hence, by placing a large volume buy order, ceteris paribus, it may have a large impact on the market and the midprice (a proxy for current stock price) correspondingly increases.
Now that limit order book has been explained, let's return to the problem of order execution. The goal is to buy a known volume $V$ of shares within a given time window $[0,T]$. This is typically referred to as the parent order (or meta order). At each time step $0\leq t < T$ the trader can place a limit order, remove an existing limit order, or place a market order by tranching parts of the meta order into smaller parts (child orders) as to minimize market impact. As the trader reaches the end of the execution timeframe, they can make use of more aggressively priced orders to complete their order, potentially at a higher cost.
How can a trader decide which action to take at each time step? Electronic markets release in realtime the contents of the limit order book and the trader can use this data as a basis for their decisions. Such market microstructure data comes in two different resolutions which are referred to as level I and level II data respectively. Level I data includes the current bid price and associated volume, the current ask price and associated volume and the price and volume of the last transaction. Level II data includes more details about the contents of the limit order book; for example, it might include the top ten current bid and ask orders and their associated volumes.
It's clear that this market microstructure data contains clues about when it might be a good idea to place an order. For example, if the askprice is decreasing over time, it might be worth using this momentum signal to delay buying shares. Similarly, if there is a lot more volume on the sell side than the buy side of the limit order book then this gives an insight into the current levels of supply and demand and this may similarly affect the decision to execute an order at this time or not. In addition, the time stamp and volume already executed should feed into the decision. If time is running out, the trader needs to place more aggressive orders to fulfil their obligation.
In this section we provide a brief recap of reinforcement learning (RL). RL is concerned with an agent that is interacting with an environment. At each timestep $t$, the state of the environment is captured in a state vector $\mathbf{s}_{t}$. The agent observes this state and chooses an action which is parameterized by the vector $\mathbf{a}_{t}$. Taking an action triggers two events. First the state changes to a new state $\mathbf{s}_{t+1}$ via the stochastic transition function $Pr(\mathbf{s}_{t+1}\mathbf{s}_{t}, \mathbf{a}_{t})$. Second, a reward $r_{t}$ may be issued to the agent, where this reward depends on the unseen reward function $Pr(r_{t}\mathbf{s}_{t}, \mathbf{a}_{t})$. The basic RL setup is shown in figure 4.
At any time $t'$ the agent might wish to maximize the total sum of future rewards $\sum_{t=t'}^{T}r_{t}$. However, rewards that happen sooner in time are often considered more important, and so instead it maximizes the discounted sum of rewards $\sum_{t=t'}^{T}\gamma^{tt'}r_{t}$. Here $\gamma\in(0,1]$ controls how the rewards decrease in importance as they stretch into the future. So the goal of reinforcement learning is to learn how to choose actions that maximize the sum of the future discounted rewards.
Reinforcement learning is challenging for a number of reasons ranging from practical considerations and design choices to inherent limitations of the RL framework. First, the agent does not know either the transition function or the reward function and it must either implicitly or explicitly learn these. Second, these functions are stochastic, and so it may take a lot of experience to understand them. Third, the reward for an action may be temporally very distant from the action that caused it. This is known as the temporal credit assignment problem. For example, a win in chess may have been largely due to a brilliant move (action) that was made much earlier in the game, yet is only observed by the reward (winning the game) at the end.
Finally, reinforcement learning algorithms must balance exploration and exploitation. On the one hand, if the agent does not explore the state space and try different actions, it cannot get enough experience to learn a good strategy. On the other, once it has figured out how to receive a respectable reward, it might want to exploit this knowledge rather than explore the other regions of the stateaction space. A tradeoff between these two tendencies is inherent in any reinforcement learning algorithm.
Modelbased methods try to predict what the next state and/or reward will be (i.e., the transition function and the reward function), so that they can look into the future and make sensible decisions that will ultimately result in high cumulative rewards. In contrast, modelfree methods do not build a model of the environment or reward, but just directly map states to actions. Modelfree methods can be divided into policybased methods which directly predict a probability distribution over the possible actions from the state and valuebased methods which compute the relative value of every possible stateaction pair and hence indirectly specify the best action for any state.
The Aiden system described in this article is a policybased modelfree method and so it aims to take the state $\mathbf{s}_{t}$ and predict a probability distribution $Pr(\mathbf{a}_{t}\mathbf{s}_{t}, \boldsymbol\theta)$ over which action $\mathbf{a}$ to take. Since the state space is highdimensional and data is very limited, Aiden approximates this mapping using a neural network, with parameters $\boldsymbol\theta$. The goal of learning is to ensure that these parameters lead to actions that result in high cumulative rewards.
Hopefully, it is becoming increasingly clear why reinforcement learning is a suitable way to carry out the order execution problem. There is a reward (the average price at which the agent bought the shares), but the agent does not know the extent of this reward until it has completely fulfilled the order. There is a partially observed state which includes the market microstructure data, the elapsed time, and the remaining volume. Finally, there are a number of actions that can be taken at any time (placing limit orders, removing limit orders, placing market orders). It's clear that these actions affect the state by changing the configuration of the market and depleting the remaining volume.
In this context the goal of the reinforcement learning algorithm is to learn the policy; for a given observed state (market microstructure, elapsed time and remaining volume), it must learn to output a probability distribution over the possible actions (types of order). The algorithm draws from this distribution to determine what to do next. This in turn changes the state and so on.
In this section we describe the main features of the Aiden reinforcement learning setup: the action space, the state and the reward functions. In the subsequent section we discuss the reinforcement learning algorithm itself.
In practice Aiden does not directly select the details of the order that is provided to Aiden, but instead chooses between different highlevel actions at each time step that correspond to different levels of aggressiveness as Aiden begins to liquidate the parent order using child orders. These range from crossing the spread (and so immediately executing some of the order) at one end of the spectrum to doing nothing / removing existing orders at the other. These actions form the input to a system that translates them into concrete limit orders.
Aiden's state is currently composed of several hundred market features and selfaware features. The market features comprise of handcrafted functions that compute quantities of interest from the market microstructure data. Examples might include measurements of the liquidity, recent price changes, or whether there is an imbalance between the bid and ask volumes. The selfaware features relate to the history of previous actions that Aiden has taken. For example, they might include measurements of how aggressive Aiden has been in recent time steps, and how many shares Aiden still has to execute.
The rewards are chosen so that Aiden optimizes around a core trading objective, such as a benchmark. One such commonly used benchmark to measure performance is the volumeweighted average price (VWAP) of the market for the asset over the whole period. As the name suggests, this is the average price of all transactions in the limit order book, weighted by volume. Consequently, rewards are designed based on the difference between this market VWAP and the actual prices Aiden achieved. Of course, Aiden will not know the market VWAP price until the end of the period and so as is typical in reinforcement learning, the feedback is delayed.
Aiden is trained using policy gradient algorithms. As the name suggests, these compute the gradient of the expected discounted reward with respect to the parameters $\boldsymbol\theta$ of the network that takes the state $\mathbf{s}_{t}$ and outputs the policy $Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta)$ over actions $\mathbf{a}_{t}$. The gradient is used to find parameters that give better rewards. In practice, the aim is to maximize the following objective:
\begin{equation}
J[\boldsymbol\theta] = \mathbb{E}\left[\log[Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta)]\Psi_{t} \right], \tag{1}
\end{equation}
where the expectation denotes an empirical average over samples. For the simplest policy gradient algorithms, the function $\Psi_{t}$ might just be the total observed rewards.
Unfortunately, this basic policy gradient algorithm is notoriously unstable and so Aiden uses an actorcritic approach (see Sutton and Barto, 2018) to decrease the variance of the learning procedure. Here, the function $\Psi$ is changed so that it measures the difference between the observed rewards and the value function, which is essentially a prediction of what the total reward will be given that we are in the current state. The network that produces the policy is known as the actor (since it directly affects the environment) and the network that produces the value function is known as the critic (since it evaluates the actor's choices).
The Aiden architecture mainly consists of fully connected layers. However, in partially observable environments like a financial market we do not expect to observe the complete state of the world at each timestep. Therefore, it is common to add a recurrent layer to help deal with this problem. To this end, Aiden uses a recurrent architecture; at each time step it takes as input the market features, selfaware features and the input from the recurrent connection. From these Aiden produces three outputs. First, it produces a softmax output with probabilities over the action space (i.e., the actor). Second, it produces a single scalar output representing the value function (the critic), and third it produces a new recurrent vector to be passed to the next time step (figure 5).
Aiden exploits another trick to make learning more stable in that is uses proximal policy optimization. This method changes the objective function to:
\begin{equation}
J[\boldsymbol\theta] = \mathbb{E}\left[\frac{Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta)}{Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta_{old})}\Psi_{t} \right], \tag{2}
\end{equation}
where the term $\boldsymbol\theta_{old}$ represents the parameters before the update and then clips this function to prevent very large changes in the policy (hence making it more stable). Defining:
\begin{equation}
f[\boldsymbol\theta] = \frac{Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta)}{Pr(\mathbf{a}_{t}\mathbf{s}_{t},\boldsymbol\theta_{old})}, \tag{3}
\end{equation}
proximal policy optimization maximizes the following surrogate objective:
\begin{equation}
J[\boldsymbol\theta] = \begin{cases}\mathbb{E}\left[\min\left[ f[\boldsymbol\theta] \Psi_{t}, 1+\epsilon\right]\Psi_{t}\right] &\quad \Psi_{t} > 0 \\
\mathbb{E}\left[\max\left[ f[\boldsymbol\theta] \Psi_{t}, 1\epsilon\right]\Psi_{t}\right] &\quad \Psi_{t} \leq 0,
\end{cases} \tag{4}
\end{equation}
where $\epsilon$ is a predefined threshold.
In this section we discuss a few of the challenges of training a productionlevel system for the order execution problem like Aiden.
Generality: The algorithm is required to work in many situations. Hence, the Aiden algorithm only uses input features that can be found in any market. Moreover, different stocks vary in price, liquidity, volatility and other quantities. To this end, the Aiden algorithm must normalize the input features so that the absolute magnitudes of price and volume observed in the market microstructure data are factored out.
Simulation: Reinforcement learning algorithms are notorious for the amount of data that they must consume to learn a successful strategy. Obviously, it's not practical to wait many years for the algorithm to train. Furthermore, we cannot put the algorithm into the real marketplace before it has learned which decisions to make to achieve good performance.
The solution to both of these problems is to build a training environment in which the market can be simulated based on observations of historical trading data. In this way, Aiden can train much faster than realtime and learn sensible policies without risking financial loss. This procedure can be sped up even more by training multiple RL agents who compete with one another in the same simulated market and learn from one another.
In this article we introduced the order execution problem and showed how it could be mapped to a reinforcement learning problem. We then described some features of Aiden, RBC's electronic execution platform. The scope for reinforcement learning in finance is huge since there are often many rapid decisions that need to be made and these need to take into account the present condition of the market. Aiden is one of the first steps in RBC's adoption of these technologies.
]]>Now, can we lead on tackling its ethical and societal implications?
The news is flooded with examples of AI fails: algorithms that favour male job applicants over women, or image recognition software failing to correctly identify people of colour.
Dr. Foteini Agrafioti, the Head of Borealis AI and one of the country’s strongest voices on ensuring AI is ethical, is also the cochair of Canada’s Advisory Council on AI. She led the RBC Disruptors conversation about battling bias in AI with Dr. Elissa Strome, Executive Director, PanCanadian AI Strategy at CIFAR, and Dr. Layla El Asri, Research Team Lead at Borealis AI and formerly Research Manager at Microsoft Research Montréal.
Here are their thoughts on what the scientific community, governments and ordinary citizens can do to confront bias in artificial intelligence, and position Canada as a leader in ethical AI.
Bias has long existed in our society – and so it exists in our data. El Asri sees this as an opportunity. Unlike our own unconscious bias, we can at least uncover bias in an algorithm. To do this, companies need to be auditing their AI for bias every step of the way, as the major labs are now doing. El Asri credited Canadian leaders, such as AI pioneer Yoshua Bengio, for developing a will in Canada’s tech community to develop AI in a responsible way.
Right now, artificial intelligence is being developed by a very narrow subset of society: mainly highlyeducated men who went to the same schools, and now live in the same cities. Only 18% of AI researchers are women, a fact that Strome called “terrible.” Organizations like CIFAR are working to bring more voices into the development of AI, with initiatives such as the AI for Good Summer Lab, a sevenweek training program for undergraduate women in AI.
AI is only as good as the data it’s trained on. “If your data is not representative enough, your model is not going to work,” El Asri said. There needs to be more vigilance in ensuring data is representative — an area where Canada has a homegrown advantage. If you’re working with data collected in a multicultural country like ours, you’re likely working with data that represents different ethnic backgrounds. This kind of data will be essential to building technology that works for everyone, especially when it comes to something like health care.
Right now, it’s really just the tech community and policymakers talking about issues that are going to transform our society. We need to broaden that perspective, building in consultation with social scientists as an integral part of the development process. A recent CIFAR initiative brought together computer scientists and social scientists for a day to discuss the social, legal and ethical implications of AI. “The computer scientists were so eager to get their advice and insights,” Strome said. Similarly, at her former employers Microsoft, El Asri noted that their AI and ethics committees are made up of people from different disciplines, including anthropologists and historians.
“There’s a lot of fear and misunderstanding and myths about AI,” Strome said. Over the next few years, it’s going to be critical to bring the public into the AI conversation. People need to be aware of the positive implications, as well as the risks, that AI will have on their lives. The better the next generation understands AI and its societal and ethical implications, the better prepared they’ll be to ask tough questions of their leaders. Agrafioti suggested that Canadian culture is particularly attuned to ensuring fairness, casting a critical eye on technology before implementing it. Our balance of technical expertise and social values is exactly what’s needed to make sure the product that gets to market is ethical.
AI has been advancing much faster than any government can regulate it — so it was big news this week when the OECD adopted a set of AI principles, which set valuesbased standards for developing AI. Our leaders have an incredibly important role to play in developing policy and regulations around the use of AI, both domestically and internationally. Strome noted that Canada’s solid international reputation could go a long way in urging the world to play catchup. Last summer, Prime Minister Trudeau and President Macron announced a joint CanadaFrance initiative on an International Panel on AI to support and guide the responsible adoption of AI, grounded in human rights. The first symposium will be in Paris this fall.
Solving bias in machines will take a human touch — and there’s no country better positioned than Canada to take the reins.
]]>
Bias is nothing new. In fact, a recent Borealis AI/RBC survey found that 88 per cent of businesses believe bias exists within their own organization. Addressing this is a critical component of building corporate culture. But we have to eliminate bias in our technology, too.
Bias in AI has serious consequences. From wrongful arrests to unfair recruitment policies, a biased algorithm has the ability to negatively impact the freedom, privacy and security of individuals and society as a whole.
AI is not neutral. Bias usually exists because algorithms have been trained using inadequate or biased data or the architectures are skewed towards specific outcomes. As machine learning algorithms are increasingly used to determine important realworld outcomes such as loan approval, pay rates, and parole decisions, the AI community has a responsibility to account for that discrimination. But how?
Listen to our new podcast, led by RBC’s John Stackhouse and featuring Saadia Muzzaffar, entrepreneur and founder of TechGirls Canada; Ruha Benjamin, Associate Professor of African American Studies at Princeton University; and Foteini Agrafioti, Head of Borealis AI and Chief Science Officer at RBC. The discussion offers an enlightening and impassioned view on how society and businesses must tackle bias to ensure a fair, safe and trustworthy approach to AI.
Ruha Benjamin is Associate Professor of African American Studies at Princeton University, Founding Director of the Ida B. Wells JUST Data Lab, and author of the awardwinning book Race After Technology: Abolitionist Tools for the New Jim Code (Polity 2019) and editor of Captivating Technology (Duke 2019), among many other publications. Ruha’s work investigates the social dimensions of science, medicine, and technology with a focus on the relationship between innovation and inequity, health and justice, knowledge and power. She is the recipient of numerous awards and fellowships including from the American Council of Learned Societies, National Science Foundation, Institute for Advanced Study and the President’s Award for Distinguished Teaching at Princeton. For more info, please visit ruhabenjamin.com
Saadia Muzzaffar is a tech entrepreneur, author and passionate advocate of responsible innovation, decent work for everyone, and prosperity of immigrant talent in STEM. She is the founder of TechGirls Canada, the hub for Canadian women in science, technology, engineering and math  and cofounder of Tech Reset Canada, a group of business people, technologists, and other residents advocating for innovation that is focused on the public good. In 2017, she was featured in Canada 150 Women, a book about 150 of the most influential and groundbreaking women in Canada. She is honoured to serve on the board of Women's Shelters Canada and the Advisory board for the University of Guelph's Centre for Advancing Responsible and Ethical Artificial Intelligence (CAREAI).
]]>Facial recognition technology is now a part of our daily lives, personalizing our services and making identity verification easier. Yet a lack of clear restrictions on its usage creates ambiguity for Canadian businesses that are constantly seeking data insights to drive growth and maintain relevance with consumers in a platformbased world.
The pandemic has fuelled an explosion in the use of video, as we try to stay connected during the global lockdown. Workers are using Zoom and Webex daily, families are catching up over FaceTime or Skype, and we’re turning ever more frequently to social media like Instagram and TikTok for entertainment. Our faces are travelling everywhere, even when we’re not.
Our faces are travelling everywhere,
even when we’re not.
At the same time, Artificial Intelligence is becoming ever present in our lives as we spend more time online; sending us shopping and podcast recommendations, predicting our upcoming bills, and learning which shows we like to watch. Our faces have become a central part of this data wave, as we teach our phones to recognize us, sort our photos and even interpret our emotions. But facial recognition creates a different sort of data tool from web traffic and credit card histories – one that can assess identities, behaviours and social interactions. We’re no longer anonymous, whether sitting at our computers or taking a walk downtown.
When paired with AI, facial recognition offers incredible commercial applications that could increase the personalization of services and reduce friction in the verification of payments, health records or even voting. How will Canadian firms – big and small – choose to employ the potential of facial recognition, as we all strive to leverage technology and consumer insights? Are there clear regulations on the use of this data? On these questions, we don’t operate in a vacuum; the technology is being developed and the data put to use in various ways around the world. The drive for innovation in this space will test our resolve to ensuring AI is used for good.
To do this right, Canadian businesses should avoid working in isolation. Canada is home to the world’s leaders in developing ethical AI. It’s here that the Montreal Declaration for the Responsible Development of AI was signed, the Privacy by Design certification was developed, and CIFAR’s AI & Society program was born. In this spirit, RBC and Borealis AI have launched RESPECT AI, a hub for firms to gain practical solutions for the responsible adoption of AI.
The number of global patents referencing “facial recognition” stands at 1,617, with over 100 new patents so far in 2020. Tech giants Google, Samsung and IBM dominate the filings (Apple ranks 7th). Only 27 of these patents are held by Canadian applicants.
Despite the data revolution, 53% of Canadian companies aren’t using AI to inform their business decisions, and among them, 6 out of 10 have no plans to do so soon. The other half feel that AI is central to their business growth, and most plan to expand their usage over next 2 years.
Each day, over 100 million hours of video are streamed on Facebook, while more than 95 million uploads are made to Instagram. These images are tagged with names and locations, providing better training data for algorithms. Google is building software that can crawl all social media sites to identify a person’s face (and their associated activity) across all platforms.
Apple FaceID claims a 1 in 1,000,000 chance someone else could unlock your device with their face. Google’s FaceNet achieved 99.63% accuracy against a benchmark image data set, surpassing Facebook’s DeepFace at 97.35%. By comparison, the human eye is accurate 97.53% of the time.
Market for this technology could double within five years
The facial recognition market had revenue of about US$3.2 billion in 2019, with some forecasts calling for it to reach US$7 billion by 2024. Key sectors of growth will continue to be government and security, with rising usage among retail and ecommerce.
As a tool, facial recognition provides rapid identification of an individual. This can help companies provide a personalized experience to consumers, reduce friction points on verification to access secure materials, or assist law enforcement by rapidly identifying possible suspects from video.
Similar image recognition technology is already used to reduce pedestrian accidents in cities by monitoring traffic patterns. It’s also being applied in agriculture to distinguish weeds from crops for precision pesticide use. Yet, we’re likely most accustomed to using facial recognition on our phones to verify banking or email passwords, or to automatically sort our photos.
Increasing confidence in the accuracy of this technology has furthered its public uses. Between 2014 and 2018, the US agency NIST estimates a 20x improvement in accuracy, from a 4.0% failure rate to 0.2%, respectively.
We can expect the list of potential applications to grow:
A subset of research is developing measures to counter the misuse of people’s images, such as deep fakes or identify fraud. And liveness detection software is aiming to identify whether an image or video is true to the subject involved; in essence, a good AI that can identify bad AI.
Many countries have no explicit legal or regulatory requirements related to facial recognition within their privacy regimes. In some places, this leaves interpretation open to discretion or abuse by government and business; as such, surveillance has become synonymous with facial recognition.
Here in Canada, we’ve seen pushback when its use has gone exceeded public comfort. When the RCMP’s association with Clearview AI – a firm with a database of 3 billion photos from Facebook and Instagram – became public, the Mounties had to set limits on its use. When Vancouver police attempted to use driver’s license photos to identify suspects in the Stanley Cup riot in 2011, the privacy commissioner required a court order on future use of such technology.
As this technology develops and pushes the limits of privacy, countries are navigating these challenges in real time. The adoption of Canada’s Digital Charter in 2019 – the federal government’s statement of intentions on digital security – suggests individuals can anticipate increased control over personal data and images under its “control and consent” principles. However, the roadmap remains unclear. In July 2020, 77 Canadian civil society groups called on the Trudeau government to; (i) ban use of the tech by federal law enforcement for surveillance and intelligence, (ii) launch public consultations on use of facial recognition, and (iii) update PIPEDA protections to specifically cover biometric data.
China, with its estimated 626 million surveillance cameras, has perhaps the strictest restrictions on private business use of biometric data. The central government, however, is exempt. Facial recognition is a key tool in its emerging national “social credit” system that scores personal public behaviour and penalizes “bad” practices (e.g. jaywalking or smoking in the wrong spot). These major investments have made China the world’s capital of facial recognition; since 2015, the majority of related patents on facial recognition and surveillance have come from Chinese applicants.
The European Union’s GDPR is the most advanced data privacy regime. It classifies the data harvested from facial recognition technology as biometric, a category that requires explicit consent from the subject prior to its collection.
In the US, four states – Washington, Illinois, Texas and California – have adopted laws on the protection of biometric data including explicit optin clauses, while numerous cities have banned the use of facial recognition in public services, including policing. The most recent to do so was Portland, amid civil unrest. However, the Trump administration has sought rulings in federal court to proceed with facial scans at airport entry for all passengers, including nonUS citizens. Currently being piloted at Los Angeles International Airport and DallasFort Worth International Airport, facial recognition is used to identify potential criminals, passport fraud and people on nofly lists
Facial recognition differs from other biometrics – DNA, fingerprints – for two big reasons. First, it’s easy to collect; your image can be captured on video by anyone, anywhere. Second, it’s increasingly easy to verify against online images and with deep learning tools.
Think about it: how many times have you provided your fingerprint? Now how many images of your face are online?
Think about it: how many times have you provided your fingerprint? Now how many images of your face are online? Governments alone have an enormous trove of reliable, labelled images, from your health card to your passport photos. Meantime, our penchant for posting images to social media – with our names, friends and locations – have created inadvertent, massive datasets for facial recognition.
A couple of years ago, Google stunned many observers by announcing development of an algorithm to track people’s social media activity across all platforms, simply by following their face. Google can do this with its propriety “reverse image search” combined with its massive scale in crawling millions of websites at once. The ease of access to people’s faces makes all of this possible.
Despite advances in the technology, facial recognition wears a mask of mistrust, particularly along racial lines. Some of the original facial recognition systems and algorithms were proven to contain ethnic bias with high levels of inaccuracy for nonwhite faces, due largely to the input data that skewed to white males.
Any application of this technology must appreciate the potential for bias in the underlying data  particularly given its potential to negatively impact people. Trust in these tools remain divided; when asked about personal verification methods to access health records, 58% of White respondents in the US were comfortable using facial recognition. But this figure fell to 50% and 41% among Hispanic and Black respondents, respectively.
Put simply, tread lightly. Misuse of personal information can carry massive reputational and legal risks.
An RBC Borealis AI survey of Canadian businesses revealed their top motivations to invest in AI programs were (i) to reduce costs, (ii) to increase productivity and (iii) to increase sales. This is increasingly relevant amid the current economic recovery, as firms look to leverage any data advantages to create consumer relevance and new revenue.
Tech giants have inherent advantages to developing this technology, and have staked their claims in hundreds of global patents. In turn, they are seeking consumers of these data tools; for example, a retailer looking for information on shoppers who browse, but don’t buy; a restauranteur aiming to track frequency of visits to grant loyalty points; or a construction firm interested in gathering insights on worker activity on job sites.
Only 36% of US consumers trust tech companies to use facial recognition software responsibly.
Comfort with all these uses, however, is not yet widespread. A 2019 Pew research study found that only 36% of US consumers trust tech companies to use facial recognition software responsibly, and just 17% trust advertisers. When individuals are unsure of how their data is being used, firms risk running afoul of privacy and ethical practices.
Any organization or entrepreneur should consider:
Uncertainty about how to use AI responsibly could account for the stark divide among Canadian firms adopting it. Six in ten Canadian businesses feel that AI is mostly for larger organizations.
Consumers should have the right to know why and how firms use their likeness, and governments are responsible for ensuring it is done legally. Businesses that engage in facial recognition applications without appreciating the associated ethical questions risk strong blowback from consumers.
Canada has been a leader in supporting AI for good. How facial recognition technology is deployed will be an important test of adherence to such ideals. The pandemic has accelerated how Canada moves from conversation towards action around digital ethics.
This article originally appeared as part of RBC Disruptors series, which offers insights about social, economic and technological trends in an age of disruption.
]]>and many members of the research team took the time to virtually attend ICML 2020. Now that the conference content is freely available online, it's a great time to look back and check out some of the highlights. In this post, four Borealis AI researchers describe the papers that they found most interesting or significant from the conference.
Hrayr Harutyunyan, Kyle Reing, Greg Ver Steeg, Aram Galstyan
by Peng Xu
Related Papers:
What problem does it solve? Neural networks have the undesirable tendency to memorize information about the noisy labels. This paper shows that, for any algorithm, low values of mutual information between weights and training labels given inputs $I(w : \pmb{y}\pmb{x})$ correspond to a reduction in memorization of labelnoise and better generalization bounds. Novel training algorithms are proposed to optimize for this and achieve impressive empirical performances on noisy data.
Why is this important? Even in the presence of noisy labels, deep neural networks tend to memorize the training labels. This hurts the generalization performance generally and is particularly undesirable with noisy labels. Poor generalization due to label memorization is a significant problem because many large, realworld datasets are imperfectly labeled. From a informationtheoretic perspective, this paper reveals the root of the memorization problem and proposes an approach that directly addresses it.
The approach taken and how it relates to previous work: Given a labeled dataset $S=(\pmb{x}, \pmb{y})$ for data $\pmb{x}=\{x^{(i)}\}_{i=1}^n$ and categorical labels $\pmb{y}=\{y^{(i)}\}_{i=1}^n$ and learning weights $w$, Achille & Soatto present a decomposition of the expected crossentropy $H(\pmb{y}\pmb{x}, w)$:
\[ H(\pmb{y}  \pmb{x}, w) = \underbrace{H(\pmb{y}  \pmb{x})}_{\text{intrinsic error}} + \underbrace{\mathbb{E}_{\pmb{x}, w}D_{\text{KL}}[p(\pmb{y}\pmb{x})f(\pmb{y}\pmb{x}, w)]}_{\text{how good is the classifier}}  \underbrace{I(w : \pmb{y}\pmb{x})}_{\text{memorization}}. \]
If the labels contain information beyond what can be inferred from inputs, the model may do well by memorizing the labels through the third term of the above equation. To demonstrate that $I(w:\pmb{y}\pmb{x})$ is directly linked to memorization, this paper proves that any algorithm with small $I(w:\pmb{y}\pmb{x})$ overfits less to labelnoise in the training set. This theoretical result is also verified empirically, as shown in Figure 1. In addition, the information that weights contain about a training dataset $S$ has previously been linked to generalization (Xu & Raginsky), which can be tightened with small values of $I(w:\pmb{y}\pmb{x})$.
To limit $I(w:\pmb{y}\pmb{x})$, this paper first shows that the information in weights can be replaced by information in the gradients, and then introduces a variational bound on the information in gradients. The bound employs an auxiliary network that predicts gradients of the original loss without label information. Two ways of incorporating predicted gradients are explored: (a) using them in a regularization term for gradients of the original loss, and (b) using them to train the classifier.
Results: The authors set up experiments with noisy datasets to see how well the proposed methods perform for different types and amounts of label noise. The simplest baselines are standard crossentropy (CE) and mean absolute error (MAE) loss functions. The next baseline is the forward correction approach (FW) proposed by Patrini et al., where the labelnoise transition matrix is estimated and used to correct the loss function. Finally, they include the recently proposed determinant mutual information (DMI) loss proposed by Xu et al., which is the logdeterminant of the confusion matrix between predicted and given labels. The proposed algorithm illustrates the effectiveness on versions of MNIST, CIFAR10 and CIFAR100 corrupted with various noise models, and on a largescale dataset Clothing1M that has noisy labels, as shown in Fig 2.
Kei Ota, Tomoaki Oiki, Devesh K. Jha, Toshisada Mariyama and Daniel Nikovski
by Pablo HernandezLeal
Related Papers:
What problem does it solve? This paper starts from the question of whether learning good representations for states and using larger networks can help in learning better policies in deep reinforcement learning.
The paper mentions that many dynamical systems can be described succinctly by sufficient statistics which can be used to accurately to predict their future. However, there is still the question whether RL problems with intrinsically lowdimensional state (i.e., with simple sufficient statistics) can benefit by intentionally increasing its dimensionality using a neural network with a good feature representation.
Why is this important? One of the major successes of neural networks in supervised learning is their ability to automatically acquire representations from raw data. However, in reinforcement learning the task is more complicated since policy learning and representation learning happen at the same time. For this reason, deep RL usually requires a large amount of data, potentially millions of samples or more. This limits the applicability of RL algorithms to realworld problems, for example, continuous control and robotics where that amount of data may not be practical to collect.
It can be assumed that increasing the dimensionality of the input might further complicate the learning process of RL agents. This paper argues this is not the case and that agents can learn more efficiently with the highdimensional representations than with the lowerdimensional state observations. The authors hypothesize that larger networks (with a larger search space) is one of the reasons that allows agents to learn more complex functions of states, ultimately improving sample efficiency.
The approach taken and how it relates to previous work: The area of state representation learning focuses on representation learning where learned features are in low dimension, evolve through time, and are influenced by actions of an agent. In this context, the authors highlight a previous work by Munk et al. where the output of a neural network is used as input for a deep RL algorithm. The main difference is that the goal of Munk et al. is to learn a compact representation, in contrast to the idea of this paper which is learning good higherdimensional representations of state observations.
The paper proposes an Online Feature Extractor Network (OFENet) that uses neural networks to produce good representations that are used as inputs to a deep RL algorithm, see Figure 3.
OFENet is trained with the goal of preserving a sufficient statistic via an auxiliary task to predict future observations of the system. Formally, OFENet trains a feature extractor network for the states, $z_{o_t}=\phi_o(o_t)$, a feature extractor for the stateaction, $z_{o_t,a_t}=\phi_{o,a}(o_t,a_t)$, and a prediction network $f_{pred}$ parameterized by $\theta_{pred}$. The parameters $\{\theta_o, \theta_{o,a}, \theta_{pred}\}$ are optimized to minimize the loss:
$$L=\mathbb{E}_{(o_t,a_t)\sim p,\pi} [f_{pred}(z_{o_t},a_t)  o_{t+1}^2]$$
which is interpreted as minimizing the prediction error of the next state.
The authors highlight the need for a network that can be optimized easily and produce meaningful highdimensional representations. Their proposal is a variation of DenseNet, a densely connected convolutional network whose output is the concatenation of previous layer's outputs. OFENet uses a DenseNet architecture and is learned in on online fashion, at the same time as the agents policy, receiving as input observation and action as depicted in Figure 4.
Results: The paper evaluates 60 different architectures with varying connectivity, sizes and activation functions. The results showed that an architecture similar to DenseNet consistently achieved higher scores than the rest.
OFENet was evaluated with both onpolicy (SAC and PPO) and offpolicy reinforcement learning algorithms (TD3) on continuous control tasks. With these three algorithms the addition of OFENet obtained better results than without it.
Ablation experiments were performed to verify that just increasing the dimensionality of the state representation is not sufficient to improve performance. The key point is that generating effective higher dimensional representations, for example with OFENet, is required to obtain better performance.
Rob Cornish, Anthony L. Caterini, George Deligiannidis, and Arnaud Doucet
by Ivan Kobyzev
Related Papers:
What problem does it solve? The key ingredient of a Normalizing Flow is a diffeomorphic function (i.e., invertible function which is differentiable and its inverse is also differentiable). To model a complex target distribution a normalizing flow transforms a simple base measure via multiple diffeomorphisms stacked together. However, diffeomorphisms preserve topology; hence, the topologies of the supports of the base distribution and target distribution must be the same. This is problematic for the realworld data distributions which can have complicated topology (e.g., they can be disconnected, have holes, etc). The paper proposes a way to replace a diffeomorphic map with a continuous family of diffeomorphisms to solve this problem.
Why is this important? It is generally believed that many distributions exhibit complex topology. Generative methods which are unable to learn different topologies will, at the very least, be less sample efficient in learning and potentially fail to learn important characteristics of the target distribution.
The approach taken and how it relates to previous work: Given a latent space $\mathcal{Z}$ and a target space $\mathcal{X}$, the paper considers a continuous family of diffeomorphisms $\{ F(\cdot, u): \mathcal{Z} \to \mathcal{X} \}_{u \in \mathcal{U}}$. The generative process of this model is given by
$$z \sim P_Z, \quad u \sim P_{UZ}(\cdotZ), \quad x = F(z,u),$$
which is illustrated in Figure 5. There is no closed form expression on the likelihood $p_X(x)$, hence to train the model one needs to use variational inference. This introduces an approximate posterior $q_{UX} \approx p_{UX}$, and constructs an variational lower bound on $p_X(x)$ which can be used for training. To increase expressiveness one can then stack several layers of this generative process.
The authors proved that under some conditions on the family $F_u$, the model can well represent a target distribution, even if its topology is irregular. The downside, compared to other normalizing flows, is that model doesn't allow for exact density computation. However estimates can be computed through the use of importance sampling.
Results: The performance of the method is demonstrated quantitatively and compared against Residual Flows, on which it's architecture is based. On MNIST and CIFAR10 in particular it performs better than Residual Flow (Figure 6), improving the bits per dimension on the test set by a small but notable margin. On other standard datasets the improvements are even larger and, in some cases, stateoftheart.
Florian Wenzel, Kevin Roth, Bastiaan S. Veeling, Jakub Świątkowski, Linh Tran, Stephan Mandt, Jasper Snoek, Tim Salimans, Rodolphe Jenatton, Sebastian Nowozin
by Mohamed Osama Ahmed
Related Papers:
What problem does it solve? The paper studies the performance of Bayesian neural network (BNN) models and why they have not been adopted in industry. BNNs promise better generalization, better uncertainty estimates of predictions, and should enable new deep learning applications such as continual learning. But despite these potentially promising benefits, they remain widely unused in practice. Most recent work in BNNs has focused on better approximations of the posterior. However this paper asks whether the actual posterior itself is the problem, i.e., is it even worth approximating?
Why is this important? If the actual posterior learned by BNN is poor then efforts to construct better approximations are unlikely to produce better results and could actually hurt performance. Instead this would suggest that more efforts should be directed towards fixing the posterior itself before attempting to construct better approximations.
The approach taken and how it relates to previous work: Many recent BNN papers use the "cold posterior" trick. Instead of using the posterior $p(\thetaD) \propto \exp( U(\theta) )$, where $U(\theta)= \sum_{i=1}^{n} \log(y_ix_i,\theta)\log p(\theta)$, they use $p(\thetaD) \propto \exp(U(\theta)/T)$ where $T$ is a temperature parameter. If $T=1$, then we recover the original posterior distribution. However, recent papers report good performance with a "cold posterior" where $T<1$. This causes the posterior to become sharper around the modes and the limiting case $T=0$ corresponds to maximum a posteriori (MAP) point estimate.
This paper studies why the cold posterior trick is needed. That is, why is the original posterior learned from BNN is not good enough on its own. The paper investigates three factors:
Results: The experiments find that, consistent with previous work, the best posteriors are achieved with cold posteriors, i.e., at temperatures $T<1$. This can be seen in Figure 7. While it's still not fully understood why, cold posteriors are needed to get good performance with BNNs.
Further, results suggest that neither inference nor the likelihood are the problem. Rather, the prior seems likely to be, at best, unintentionally and misleadingly informative. Indeed, current priors generally map all images to a single class. This is clearly unrealistic and undesirable behaviour of prior. This effect can be seen in Figure 8 which shows the class distribution over the training set for two different samples from the prior.
Discussion
To date there has been a significant amount of work on better approximations for the posterior in BNNs. While this is an important research direction for a number of reasons, this paper suggests that there are other directions that we should be pursuing. This is highlighted clearly by the fact that the performance of BNNs are worse than single point estimates trained by SGD and to improve the performance, cold posteriors are currently required. While this paper hasn't given a definitive answer to the question of why cold posteriors are needed or why BNNs are not more widely used, it has clearly indicated some important directions for future research.
]]>Foteini Agrafioti, Head, Borealis AI, explains why she believes Aiden, the AIpowered electronic trading platform developed by RBC Capital Markets and Borealis AI, is a scientific milestone for reinforcement learning and AI.
]]>
]]>
However, along with its myriad benefits, AI brings a host of new challenges which require the enhancing of governance processes and validation tools to ensure it is deployed safely and effectively within the enterprise.
With our combined expertise in AI safety, regulation, and model governance, Borealis AI and RBC have been navigating the complexities of this space to develop a robust, comprehensive AI validation process.
Model validation has played an integral role in banks’ traditional data analytics for many years. It helps to ensure that models perform as expected, identifies potential limitations and assumptions, and assesses possible negative impacts. Guidance from the US Federal Reserve dictates that “all model components—inputs, processing, outputs, and reports—should be subject to validation"[1] Banks in Canada have to adhere to similar regulations[2] and have already developed extensive validation processes to meet these requirements and ensure that they manage model risk appropriately. However, the advent of AI poses a number of challenges for traditional validation techniques.
First, it is costly to validate the large volume and variety of data used by AI models. AI models can make use of significantly more variables—referred to as “features” in AI parlance—than conventional quantitative models, and ensuring the integrity and suitability of these large datasets requires more computational power and more attention from validators. This challenge is particularly acute for AI models that use unstructured naturallanguage data like news feeds and legal or regulatory filings, which require new validation tools as well as more resources. Moreover, AI modelers often use “feature engineering” to transform raw data prior to training, which further increases the dimensionality of the data that must be validated.
Second, the complexity of AI methodologies makes it more difficult for validators to predict how AI models will perform after they are deployed. Compared to conventional models with relatively few features, it is harder to determine how AI models will behave—and why they behave this way—across the full range of inputs these models could face once deployed. AI models’ complexity can also make it more difficult to explain the reasons behind these models’ behavior, which in turn can make it harder to identify biased or unfair predictions. Ensuring that models do not lead some groups of customers to be treated unfairly is an important part of the validation process.
Finally, the dynamic nature of many AI models also creates unique validation challenges. Conventional models are typically calibrated once using a fixed training dataset before being deployed. AI models, on the other hand, often continue to learn after deployment as more data become available, and model performance may degrade over time if these new data are distributed differently or are of lower quality than the data used during development. These models must be validated in a way that takes their adaptiveness into account and frequently monitored to ensure that they remain robust and reliable.
To meet these challenges, banks must develop new validation methods that are better equipped to deal with the scale, complexity, and dynamism of AI. Borealis AI and RBC’s model governance team have joined forces to research and develop a new toolkit that automates key parts of the validation process, provides a more comprehensive view of model performance, and explores new approaches in areas like adversarial robustness and fairness. This pathbreaking technology is designed from the ground up to overcome the unique challenges of AI. AI safety is central to everything we do at Borealis AI, much like strong governance and risk management practices are central to RBC. This research will help to support faster AI deployment and more agile model development, and it will provide validators with more comprehensive and systematic assessments of model performance.
]]>The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Jodie Wallis (JW):
Very simply put, explainability is about being able to detail how the AI came to the decision that it did in a given scenario, and what the drivers were behind that decision. Being able to explain how decisions are being made has always been important. But as the algorithms become more sophisticated and as AI starts to reach deeper and deeper into our decisionmaking processes, the need for explainability has become much more acute.
JW:
No. And that’s an important distinction. Explainability really comes in when we are using AI to make decisions or recommendations that affect people’s lives in some material way. If an algorithm is being used to make a credit decision on a customer, for example, or to decide who to hire or promote – that is a decision that will require explainability. But if I’m using AI and a recommendation engine to decide which pair of shoes to offer you in an online store, I don’t believe that kind of algorithm necessarily needs explaining.
JW:
I think one of the issues with explainability in AI is that it feels overwhelming and limiting at the same time. Many execs and IT leaders worry about the complexity and overhead they will need to create if they must explain all of their new models to numerous stakeholders before launching.
The problem with explainability is that the ease or difficulty with which you produce an explanation varies greatly with the type of algorithm you are using. The deeper the algorithm, the more difficult explainability is; the shallower the algorithm, the easier explainability becomes. And I think this has led some organizations to shy away from using certain types of deep learning algorithms.
JW:
It all starts with understanding which decisions and algorithms need to be explained and which do not. Right from the outset of the research, you need to know how important explainability is to the issue you are addressing. Does the action taken have a material impact on the life of an individual or individuals? If it’s not important, then the researcher or developer is free to explore any and all algorithms that might best fit their problem. But if explainability is going to be important, you will likely be limited in the types of algorithms you can use to solve that problem.
When we work with clients, that is almost always our first step – creating a framework to help decisionmakers understand which actions require explainability and which do not.
JW:
No. And, frankly, I think the market is currently very immature in terms of the technical tools to help manage these aspects of responsible AI.
There are a few different schools of thought as to how you do explainability of deep algorithms. Some researchers and scientists are using reverse engineering techniques where they study the outputs and patterns of a sophisticated deep learning algorithm in order to create a less sophisticated model that is able to simulate those outputs in a more explainable way. The problem is that they are trading off a certain amount of accuracy in order to achieve explainability. But in some circumstances, that may be a worthwhile tradeoff to make.
Ultimately, every situation will be different and there are no tools that truly ‘solve’ the explainability challenge. That’s why it is so important that designers and developers understand the need for explainability at the very start of the project – at the point where they can build it into the design.
JW:
I think governments and privacy commissioners will need play a key role in this area. Some are already making inroads. In Europe, for example, the General Data Protection Regulation (GDPR) talks about a person’s right to “meaningful information about the logic” when automated decisions are being made about them. Individual regulators are also looking at the challenge – Singapore’s monetary authority, for example, has published guidelines around explainability. But, currently, regulation is still pretty nascent.
JW:
This is about putting explainability at the very start of the process. Before you go and start solving for a particular business problem, you really need to understand the ultimate need for explainability. There’s no use developing a cool and sophisticated new tool if the business is unable to use it because they can’t explain it to stakeholders. So it is critical that developers and designers understand what will require explaining and select their tools accordingly.
JW:
I believe business leaders recognize that explainability is one element of their responsible AI strategy and framework. If they are not already thinking about this, I would suggest the business community spend a bit of time creating smart policies around the explainability of algorithms and extending existing frameworks – like their Code of Business Ethics – into AI development.
That will lead to two key value drivers for businesses. The first is that organizations will be freer to develop really interesting value through AI solutions. But, at the same time, they will be contributing to the societal discourse around the need for explainability. And, given the growing importance of the topic to consumers, regulators and oversight authorities, that can only be a good thing.
Jodie Wallis is the managing director for Artificial Intelligence (AI) in Canada at Accenture. In her role, Jodie works with clients across all of Canada’s industries to develop AI strategies, discover sources of value and implement AI solutions. She also leads Accenture’s collaboration with business partners, government and academia and oversees Accenture’s investments in the Canadian AI ecosystem.
]]>The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
Ann Cavoukian (AC):
The challenge with many data privacy laws is that they do not reflect the dynamic and evolving nature of today's technology. In this era of AI, social media, phishing expeditions and data leaks, I would argue that what we really need are proactive measures around privacy.
I think we are just starting to reach the tip of the iceberg on data privacy and protection. The majority of the iceberg is still unknown and, in many cases, unregulated. And that means that, rather than waiting for the safety net of regulation to kick in, we need to be thinking more about algorithmic transparency and designing privacy into the process.
AC:
I mean baking privacy protective measures right into the code and algorithms. It’s really about designing programs and models with privacy as the default setting.
During my time as Data Privacy Commissioner for Ontario, I created ‘Privacy by Design’, a framework for helping organizations prevent privacy breaches by embedding privacy into the design process. More recently, I created an extensive module called ‘AI Ethics by Design’ which was specifically intended to deal with the need for algorithmic transparency and accountability. There are seven key principles that underpin the framework, supported by strong documentation to facilitate ethical design and data symmetry. These principles, based on the original privacy by design framework, include respect for privacy as a fundamental human right.
AC:
Absolutely. And I’m happy to see that facial recognition tools are routinely banned in various US states and across Canada. Your face is your most sensitive personal information. And, more often than not with these applications, nobody is obtaining individual consent before capturing facial images; there may not even be visible notification that facial recognition tools are being used.
From a privacy perspective, that’s terrible. The point of privacy laws is to provide people with control over their personal data. Applications like facial recognition take away all of that control. All that aside, the technology has also proven to be highly inaccurate and frequently biased; time and again, their use has been struck down in the courts of justice and public opinion.
AC:
I think it is absolutely critical to consumers; virtually every study and survey confirms that. Consider what happened early on in the pandemic. A number of governments tried to launch socalled ‘contact tracing’ apps that offered fairly weak privacy controls. Uptake was dismal. Even though the apps could be potentially lifesaving for users, few were willing to share their personal information with the government or put it into a centralized repository.
What worked well, on the other hand, was the Apple/Google exposure notification API. In part, it was well adopted because it works on the majority of smart phones in use in North America today. But, more importantly, it is fully privacy protected. I have personally had a number of 1on1 briefings from Apple and was highly confident that the API collected no personally identifiable information or geolocation data. Around the world, Canada included, apps based on that API have seen tremendous uptake within the population.
Now, remember, this is for an app that helps people avoid the biggest health crisis to face modern civilization. If they are not willing to trade their privacy for that, you would be crazy to assume consumers would trade it away simply for convenience or service.
AC:
Not at all. We need to get away from this view where privacy must be traded for something. It’s not an either/or, zerosum proposition, involving tradeoffs. Far better to enjoy multiple positive gains by embedding both privacy AND AI measures — not one to the exclusion of the other.
I also think the environment is rapidly changing. Consider, for example, the efforts being made by the Decentralized Identify Foundation, a global technology consortium that is working to find new ways to ensure privacy while allowing data to be commercialized. Efforts like these suggest we are moving towards a world where privacy can be embedded into AI by default.
AC:
The AI community needs to remember that – above all – transparency is essential. People need to be able to see that their privacy has been baked into the code and program by design. I would argue that public trust in AI is currently very low. The only way to build that trust is by embedding privacy by design.
I think the same advice goes for business executives and privacy oversight leaders: don’t just accept algorithms without looking under the hood first. There are all kinds of potential issues – privacy and ethics related – that can arise when applying AI. As an executive, you need to be sure your organization and people are always striving to protect personal data.
Dr. Ann Cavoukian is recognized as one of the world’s leading privacy experts. Appointed as the Information and Privacy Commissioner of Ontario, Canada in 1997, Dr. Cavoukian served an unprecedented three terms as Commissioner. There she created Privacy by Design, a framework that seeks to proactively embed privacy into the design specifications of information technologies, networked infrastructure and business practices, thereby achieving the strongest protection possible. Dr. Cavoukian is presently the Executive Director of the Global Privacy and Security by Design Centre.
]]>However, the pace of this change has brought with it some tough challenges, with recent failures in AI systems leading to mistrust and fear of the technology. In some instances, even among some of the world’s leading technology companies, it has led to a costly removal of AI products from the market. Many businesses are realizing that they need to slow down and invest in more responsible AI product development.
Building AI responsibly comes with numerous tradeoffs. A recent Borealis AI/RBC* survey found that while 77% of those currently using AI believe it is important for businesses to implement it in an ethical way, 93% say they experience barriers such as cost and lack of understanding when attempting to do so.
In putting issues such as fairness, stability, bias and explainability at the top of their agenda, business leaders are investing in a trusted partnership with their clients at the expense of speed to market. Doing the right thing comes at a cost; and in unregulated environments, businesses could be free to take risks that compromise society.
This is why I believe it is so important that the public, businesses and governments are educated about the risks involved in AI technologies and that product owners are held to account for ethical and transparent deployment of these technologies.
One particular area of concern to me is bias. I’ve seen too many examples of companies perpetuating racial or gender discrimination through poorly executed technologies such as facial recognition, and violating human rights through biased algorithms. In fact, our survey found that 88% of companies believe bias exists in their organization, but almost half (44 per cent) do not understand the challenges that bias presents in AI. The most important thing to understand is that this technology is not neutral, and that we are responsible for removing bias at every step.
Companies should review every level of AI development to ensure that any potential bias has been addressed. The different levels could include:
While AI is finding applications across different sectors, each industry is unique and AI’s impact on people’s lives and freedoms can vary widely.
As part of the Royal Bank of Canada (RBC), Borealis AI’s mandate is to advance the field of machine learning by bringing products to life for the financial services industry. Banking is a fundamental aspect of our society and one that plays a major role in helping people achieve financial health and stability. The economic prosperity of our communities is partially the responsibility of this sector. As such, any technological misstep may mean that people don’t reach their full potential  in starting a business, sending children to university, or building a house, for instance. Banks have a contract with society that requires them to be a fair and vested partner in its success.
Borealis AI has the privilege and responsibility of building products that touch the lives of millions of clients. As part of RBC, we are driven by the mission to help our clients thrive and communities prosper, and when it comes to AI this means putting human integrity first.
Over the years, we have developed research practices that ensure that AI is developed responsibly and are supported by RBC’s data and model governance rules. Whether we work with our regulators to understand risks, or we scrutinize our own AI systems with thorough validation, building things the right way means that we routinely trade off speed for considerate and equitable innovation.
It is also our belief that knowledge and opportunity should be shared and, for this reason, we have made the decision to contribute our research, publications and scientific code in this area to the community, as well as share RBC’s approach and expertise in governing and securing AI models which has evolved over decades of practice. Under the RESPECT AI program we are also convening a number of industry and academic leaders who are contributing their experience and offer practical advice on how to approach building AI responsibly.
At a time where technology evolves fast and puts pressure on the ability to govern and secure, it is imperative that we slow our pace down and come together in order to develop robust solutions to the new challenges we are presented with. We hope that RESPECT AI is a step in this direction and that this series opens up some honest dialogue, exchange and sharing of our collective experiences in building AI responsibly.
*Data were collected as part of Maru BizPulse program, operated by Maru/Reports and Maru/Matchbox, which collects and tracks key metrics describing how Canadian businesses are feeling, thinking and behaving. The survey audience was made up of owners and senior decisionmakers with Canadian businesses, with a particular focus on small and midsized businesses. The survey was fielded in September 2020. All sample was sourced through the Maru/Blue proprietary business panel and partners. A total of 622 responses were collected for this portion of the survey. For more information please visit www.marureports.com.
]]>