Algorithms are to fairness as trains are to tracks

Deloitte Partner Rick Shaw reflects on his experience building algorithms and brings a mathematician’s perspective to designing automated systems which serve human interests.

I design, build and install algorithms and I am often surprised by the aura of infallibility these automated processes have. Algorithms are often thought to be fundamentally objective, unswayed by sentiment and able to improve on human decision-making.

Algorithms are an automated way of expressing an opinion based on a simplified view of the world. They are prone to unintentional bias from a number of sources, including: data, conscious and unconscious leanings of programmers, interpretation and application of outputs, and the unavoidable trade-off between maximising predictive power and expectations relating to “fairness”. The aim of this note is to make clear the limitations inherent to algorithms and suggest procedures to mitigate bias and achieve desired outcomes from algorithmic processes.

Algorithm fairness: Selecting graduate hires

I will use an example of algorithmic filtering of graduate applicants to illustrate the issues that arise when using algorithms to make decisions. I simplify things a bit, so don’t worry about the detail. All that matters is that there are different cohorts with different attributes. The aim is to build fair algorithms.

Consider a large company, which in any one year receives thousands of applicants, and uses an algorithm to select 100 applicants to shortlist to interview. The firm has a particular focus on Science, Technology, Engineering, and Mathematics (STEM) aptitude.

The algorithm has been “trained” on past data, which will indicate past applicants who were “successful”. “Successful” here means the applicant was offered and accepted a position and progressed within the firm.

If success is the only criterion for assessing candidates, the trained model will weight across different attributes to predict success and rank and select the 100 applicants with the highest score. Such a single criterion algorithm is called “unconstrained”.

Direct and systemic bias

Let’s say the company believes that men have had an unfair advantage in past recruitment exercises. This could have arisen because recruitment criteria or practices favoured men (direct bias) or systemic disadvantages in the country’s culture or education system disproportionality affected women (systemic bias). Two examples of historical systemic bias are: 1) fewer women were encouraged to study STEM subjects, and; 2) workplace expectations favoured stereotypically male behaviours and attributes, such as aggression and risk-taking. In practice, it is difficult to separate the relative impact of these two sources of discrimination.

The company’s objective is to ensure that the selection algorithm is “fair” i.e. that it does not disadvantage women, or that it does not discriminate between men and women. Statistically, this is a “constraint” on the algorithm. Note that simply excluding gender as a variable in the algorithm will not meet this objective, because other variables correlate to gender and systemic bias will not be addressed. For example, if the algorithm favours STEM aptitude, women are still at a disadvantage, even if the algorithm is blind to gender.

Fairness is in the eye of the beholder

The objective is to maximise successful candidates whilst satisfying fairness constraints. What is needed is a definition of “fairness” that can be coded into the algorithm. The challenge is that fairness (or justice etc.) can be defined in many ways and any definition requires trade-offs. Also, fairness depends on perspective: the employing firm, candidates and broader society may each have different definitions of fairness.

First, there is a trade-off between fairness of process and fairness of outcomes. An algorithm which is fair in the sense that it does not distinguish gender (fairness through blindness) will result in outcomes biased against women. An algorithm which focuses on fair outcomes requires explicitly using gender to mitigate direct and systemic bias against women (fairness through awareness).

In data science, there are 21 commonly accepted definitions of fairness. These include each gender having equal probability of being interviewed (group fairness or statistical parity) equal ranking having equal outcome (calibration) the proportion of each gender interviewed who are successful being equal (predictive parity) and so on.

Predictive parity and some other fairness measures require different standards for the genders (e.g. interviews for men who scored above 80, and for women who score above 70). Many measures of fairness work at the cohort level and result in inequitable treatment at an individual level. That is, two candidates who differ only in gender may be treated differently.

Inherent limitations in algorithms

We need to accept that algorithmic output is an opinion which reflects the programmer’s philosophy of fairness. Limitations in algorithmic systems are unavoidable and model fairness will not be achieved by more data or better models.

Fairness is a human judgement and algorithm builders need clear articulation of corporate and social objectives that the algorithms must meet. Algorithms are fit for mechanical processes, which can free up humans to complete tasks that require instinct and empathy.

Suggested approach: Bring in the humans

Let me use the graduate hire example to show how an acceptance of inherent limitations in computable systems presents an opportunity to optimise the complementary roles of AI and human activity.

In the example, algorithms built with a clearly articulated fairness constraint can be used to apply objective and transparent criteria to reject those candidates that are very unlikely to be successful and to accept those candidates that are very likely be successful. Human judgement can focus on candidates who do not fall into those two well-defined categories. This judgement will be based on principles rather than rules and can adapt to the specific circumstances of those candidates under review.

Human ownership makes the overall process flexible enough to be fair and adds a control over inappropriate outputs. The underlying algorithm makes the process efficient and objectively selects candidates needing more detailed review. Synthesising algorithmic and human processes provides a powerful combination to strengthen workplace decision-making.

Want to stay up-to-date?

Stay on trend and in the know when you sign up for our latest content