Embedding fairness into algorithmic decision-making

Last month, ethics guidelines for trustworthy AI were presented by the European Commission’s High-Level Group on AI. They highlight the principle of fairness as one of the 4 ethical principles for trustworthy AI and provided a pilot assessment list to help stakeholders verify that key ethical requirements have been met in operationalising AI.

One area of focus for the checklist is in ensuring that unfair bias is avoided. Bias in algorithms is often endemic due to the datasets used and/or the way the algorithm operates.

Biased datasets

Biases can arise due to the data used to train algorithms. If datasets contain historical biases then algorithms that are trained using these datasets often project those biases into the future. For example, a hiring algorithm developed by Amazon was scrapped due to its gender bias. The bias arose because the 10 years’ worth of CV’s use to train the algorithm contained predominately male CVs.

It’s important to consider whether datasets being used are biased. The EU’s pilot assessment checklist refers specifically to the need to consider diversity and representativeness of users in the data in both development and deployment phases. If particular groups are under-represented then there is a risk that the algorithm will not categorise people in the under-represented group effectively.

Resource allocation

Another challenge to fairness is that algorithms are typically designed to maximise resource efficiency without considering fairness as part of the resource allocation process.

As an example, the data analytics team at the University of Chicago Medicine developed an algorithm to improve hospital resource efficiency. The algorithm allocated a case manager to those hospital patients most likely to be discharged soon after they were admitted to hospital. The theory was that by ensuring there were no undue blockages to these patients being discharged promptly, this could free up hospital resources (e.g. beds, staff time) to be spent on other patients. The model was initially based on patients’ clinical data, but ZIP code was found by the development team to improve accuracy in identifying patients likely to be discharged soon after being admitted.

However, it turned out that those in ZIP codes of poorer neighbourhoods were more likely to have a longer length of stay. So ZIP code was effectively being used as a proxy for wealth. If the algorithm had been put into practice this would’ve disadvantaged poorer neighbourhoods, giving the undesirable result of hospitals providing additional resources to the more affluent to get them out of hospital quicker.

Fortunately in this case, the data analysis team identified this proxy, realised this would be an unfair outcome and did not deploy the algorithm. Instead they focused on developing systems to make sure they explicitly considered equity throughout algorithm development as well as going on to consider ways that algorithms could be used to identify and overcome inequity.

Preventing algorithmic discrimination on the grounds of characteristics such as wealth is important. The EU’s checklist recommends ensuring there is a mechanism for a wide range of stakeholders to flag issues related to bias, discrimination or poor performance of AI systems.

Tolerance for error

A further problem is that algorithms optimise resource efficiency subject to a defined tolerance for error. This can lead to unfair outcomes, particularly where the algorithm does not consider the relative costs of type I and type II errors on individuals/society.

Let’s consider the case of an algorithm that is designed to identify whether employees are sufficiently productive. The output of this algorithm decides whether or not an employee should be fired. A type I (false positive) error would result in an underperforming employee being identified as productive and thus being retained by the company. A type II (false negative) error on the other hand, would result in a productive employee being identified as unproductive and consequently being dismissed. Clearly the personal cost to the employee of a type II error is much greater than that of a type I error. Under the type II error the employee would lose their job, being unfairly dismissed despite being a productive employee.

Consciously considering how to balance type I vs type II error rates is an important part of ensuring algorithmic fairness. The EU checklist emphasises the importance of establishing mechanisms to define fairness in the AI system as well as the need to implement robust metrics to validate that algorithmic decision-making is fair.

Managing the risk of algorithmic bias

Understanding and managing the risk of algorithmic bias is important. Fairness should be embedded through the algorithmic design and utilisation phases. Open-source tools such as IBM’s AI Fairness Toolkit are being deployed to help developers quantitatively test for algorithmic fairness on an ongoing basis, as well as implement strategies to address any unwanted biases that are identified.

Checklists are also emerging as an avenue to help stakeholders manage this risk. For example, the Ethical OS Toolkit uses a structured question set to help kick-start the conversation around identifying emerging areas of risk and social harm from algorithmic bias. In terms of the EU’s checklist for trustworthy AI, this is still in pilot phase. Interested stakeholders are being encouraged to sign up to consider how they would implement the checklist within their organisation’s AI development activities.

With the rising adoption of AI, its pivotal to ensure algorithms can be embedded into society in a fair and ethical way by taking steps to identify and mitigate algorithmic biases. Its also important to further engage the public in dialogue about the uses of their data and to improve the transparency and explainability of algorithmic decision-making.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: