In this post, we explore the concept of fairness in machine learning with Richard Zemel, Co-Founder and Director of Research at the Vector Institute for Artificial Intelligence; Industrial Research Chair in Machine Learning at the University of Toronto; and Senior Fellow at the Canadian Institute for Advanced Research.
The views expressed in this article are those of the interviewee and do not necessarily reflect the position of RBC or Borealis AI.
How do you describe fairness within a machine learning context?
Richard Zemel (RZ):
The reality is that there isn’t a single agreed-upon definition of fairness. Lots of fields talk about fairness – law, economics and human rights, for example – but they each have a different definition. In machine learning, we like to boil things down into mathematical concepts that can be quantified and, ideally, optimized by a learning system. But it’s very difficult to take a multifaceted concept like fairness and boil it down into a single quantifiable metric.
Is it possible to measure fairness in machine learning algorithms? Is there a quantifiable metric that people can measure and optimize?
There is no single, universal measure for fairness. I think the best we can do right now is to expose the different kinds of metrics that are out there and debate the kinds of assumptions that go into them. If your goal, for example, is to ensure that criteria are being applied equitably, unfairness could be expressed at an individual level, whether similar individuals are treated similarly, as the degree to which errors are not balanced between different groups. Contrast this with a group-based definition of fairness where, in a learning setting, if the system is making predictions about individuals, we look to see that prediction errors are balanced between different groups. These are some definitions and measures that can be articulated and quantified. And the same could be said for a range of other definitions and justifications.
I think we are currently in a situation where there are multiple, equally valid measures, each with their own set of assumptions and own definitions about what it means to be fair and equitable. Rather than a single metric, I think we’re moving towards a panel of metrics that can be quantified along the way.
Is the goal that everyone should be treated equally?
That’s an ongoing debate. There’s a distinction between equal and equitable. In an equal world, everyone would get the same loan at the same rates. In an equitable world, each person would get a loan based on a set of criteria that consider relevant information, and as transparently as possible. It’s not always sensible to treat everyone equally – banks can’t ignore criteria like credit ratings, propensity to repay, employment status and so on. But they can ensure that their criteria are fair and are being applied equitably across all demographics.
Who should be developing these concepts and models around fairness?
I would argue that machine learning developers, researchers and scientists should. But not alone. I think there are a lot of questions and nuances that will require the participation of a range of people from different disciplines. Consider, for example, the challenge of defining the characteristics of the groups that you want to measure, and what in fact are the characteristics that define a group in the first place. Different people have different ideas of what defines a group. And there are teams of social scientists now trying to answer those very questions. So, we really need to work in collaboration.
On the one hand, we need machine learning researchers and developers who are actively involved; who want to not just develop algorithms in a mechanistic way, but who are trying to understand the context their algorithms will operate in. We also need social scientists who want to not just create abstract notions and definitions, but who are interested in operationalizing those concepts. We need both, and we need them working together.
What will it take to accelerate research in this area?
One of the challenges facing researchers in this field is the lack of available data sets. Fairness is inherently about individuals and people. And that means understanding their personal information. But it’s very difficult to find large public data sets with personal data. So, we are either left with very small data sets or very old data sets. The healthcare industry has similar challenges since most healthcare data is personal in some way. But, there at least, the value proposition for individuals to share their data is somewhat different. My hope, however, is that fairness is on the same path and that we will be getting larger and larger datasets that we can use to better understand the nuances of these problems.
What are business leaders doing to better understand and measure fairness in their machine learning models?
For businesses, I think it’s really about having good definitions of fairness and good metrics to quantify that. Businesses are trying to move towards a ‘dashboard’ of relevant metrics that they can not only track, but can use in various scenarios to assess how different decisions impact fairness in different ways.
On the research side, there are a number of problems being addressed today. For example, people are now looking at the idea of fairness not being just a function of what happens at a point in time – one decision – but rather the impact of that decision on a person over time. We’re seeing good examples of this in recommender systems, for example. There’s a notion that fairness is a very dynamic model that needs to be examined through time. That’s just one example of the many problems that researchers are trying to answer today.
What can machine learning developers and business leaders do to advance the conversation?
I would argue that we need to start thinking more about the peculiarities that are often embedded in data sets. Some of our biggest challenges come down to problems inherent in the datasets and the way the data was collected. And I don’t think it’s a challenge for machine learning developers alone – everyone is now doing more with data. I’m just not certain they are all examining their data as well as perhaps they should be. So, my very basic advice would be to look to your data sets.
About Richard Zemel
Richard is a Professor of Computer Science at the University of Toronto, where he has been a faculty member since 2000. Richard’s research contributions include foundational work on systems that learn useful representations of data without any supervision; methods for algorithmic fairness; learning with little data; and graph-based machine learning. His awards include an NVIDIA Pioneers of AI Award, a Young Investigator Award from the Office of Naval Research, and a Presidential Scholar Award.