What is the meaning of “bias” in AI?

Ambre Davat
GRESEC / Chaire Ethique & IA (Institut MIAI)

In the recent years, the word “bias” has frequently been used in the field of artificial intelligence (AI), and more precisely machine learning. It is an “umbrella term” which can refer to a great variety of issues: for example the lack of diversity among computer scientists, the non-representativity of data bases, or the existence of implicit stereotypes in statistical models. However, the concept of “bias”, its potential causes, and even consequences are not always clearly defined.

Recent reviews have proposed classifications based on the sources of bias and how they impact the development and use of AI systems (Hovy, Prabhumoye 2021; Mehrabi et al. 2022). In this workshop, we wish to shed a new light on bias by discussing the underlying implicit norms. Our hypothesis is that in the field of AI, three different visions of bias are actually confounded: bias as a divergence from scientific standards (methodological bias), bias as a divergence from the “rational” decision (cognitive bias), and bias as a divergence from an ideal society (socio-historical bias). We will also question the ambition of “unbiasing” AI.

Methodological bias concerns epistemological issues. In machine learning, it mostly depends on the quality of the data used to train the AI models. Taken seriously, these issues raise several non-trivial questions related to objectivity and “scientific truth”. However, a naive understanding of science may lead AI designers and users to think that data speaks for themselves. In fact, data-driven sciences require very careful methodologies for collecting and interpretating data, which are often lacking in AI applications. The field of critical data studies emerged in order to denounce the fact that data are never “neutral” or “raw”, but always “cooked” (Iliadis, Russo 2016). From this standpoint, “unbiasing” AI will always be a continuous and never-ending process (as well as scientific objectivity).

Cognitive bias is a concept initially proposed in psychology (Tversky, Kahneman 1974), before gaining great popularity in economy. It is defined as a systematic error of judgement, attributed to a priori knowledge and mental shortcuts. The standard for “unbiased” or “rational” decisions are those of “Homo economicus”, who seeks to maximize their own interests. Contextual and qualitative information (like emotions of the subject or the framing of sentences) is considered irrelevant (Gigerenzer 2018). In the field of AI, the reference to cognitive bias is ambiguous, because it can be used to describe the limits of AI systems (most of them unable to emulate a reasoning and relying on statistical models), but also as an argument to replace human decisions by algorithmic computation (Kahneman et al. 2016). “Unbiasing” AI may therefore be seen as a technocratic project, aiming at avoiding conflicts and depoliticizing institutional decisions.

Socio-historical bias is frequently mentioned when discussing the social impacts of AI systems. This issue goes beyond data quality or representativity issues: it is also question of spurious and unwanted correlations. Famous examples of unfairness and algorithmic discriminations have contributed to the rise of AI ethics, but the way to tackle these issues remains controversial. If several definitions of fairness and ethical frameworks have been proposed, many authors suggested that they are not a technical issue, but a political one (Binns 2018; D’Ignazio, Klein 2020; Green 2021). In this case, instead of “unbiasing” AI, we should talk of “rebiasing” AI in a way that is considered more ethical at a given time. Because of the high cost of training and testing AI, this “rebiased” AI will by definition be conservative.

References:

BINNS, Reuben, 2018. Fairness in Machine Learning: Lessons from Political Philosophy. In : Proceedings of the 1st Conference on Fairness, Accountability and Transparency [en ligne]. PMLR. 21 janvier 2018. pp. 149‑159. [Consulté le 22 août 2022]. Disponible à l’adresse : https://proceedings.mlr.press/v81/binns18a.html

D’IGNAZIO, Catherine et KLEIN, Lauren F., 2020. Data Feminism. Cambridge, MA, USA : MIT Press. Strong Ideas. ISBN 978-0-262-04400-4.

GIGERENZER, Gerd, 2018. The bias bias in behavioral economics. Review of Behavioral Economics. 2018. Vol. 5, n° 3‑4, pp. 303‑336.

GREEN, Ben, 2021. The Contestation of Tech Ethics: A Sociotechnical Approach to Technology Ethics in Practice. Journal of Social Computing. septembre 2021. Vol. 2, n° 3, pp. 209‑225. DOI 10.23919/JSC.2021.0018.

HOVY, Dirk et PRABHUMOYE, Shrimai, 2021. Five sources of bias in natural language processing. Language and Linguistics Compass. 2021. Vol. 15, n° 8, pp. e12432. DOI 10.1111/lnc3.12432.

ILIADIS, Andrew et RUSSO, Federica, 2016. Critical data studies: An introduction. Big Data & Society. 1 décembre 2016. Vol. 3, n° 2, pp. 2053951716674238. DOI 10.1177/2053951716674238.

KAHNEMAN, Daniel, ROSENFIELD, Andrew M, GANDHI, Linnea et BLASER, Tom, 2016. Noise: How to Overcome the High, Hidden Cost of Inconsistent Decision Making. Harvard Business Review. octobre 2016. pp. 9.

MEHRABI, Ninareh, MORSTATTER, Fred, SAXENA, Nripsuta, LERMAN, Kristina et GALSTYAN, Aram, 2022. A Survey on Bias and Fairness in Machine Learning [en ligne]. 25 janvier 2022. arXiv. arXiv:1908.09635. [Consulté le 1 juillet 2022]. Disponible à l’adresse : http://arxiv.org/abs/1908.09635

TVERSKY, Amos et KAHNEMAN, Daniel, 1974. Judgment under Uncertainty: Heuristics and Biases: Biases in judgments reveal some heuristics of thinking under uncertainty. science. 1974. Vol. 185, n° 4157, pp. 1124‑1131.