How safe are LLMs compared to search engine autocompletions? A bias check-up

Alina Leidinger
University of Amsterdam

To what extent do publicly and commercially available LLMs repeat and reinforce the biases found in commercial search engine autocompletions? The research examines the stereotype-evoking prompts made to Google, Yahoo! and DuckDuckGo autocompletion in 2022 where we found significant differences between the engines’ content moderation (Leidinger and Rogers, 2023). While Google and to a lesser extent DuckDuckGo moderate stereotypes, Yahoo! provides far more license to them. The stereotype-eliciting prompts were re-run across three LLMs in order to test for safe model behaviours, i.e., refusal to answer, with results that vary quite significantly, both between commercial and open source models. Notably often we find that models partially refuse (Rottger et al. 2023) highlighting a tension between model helpfulness and harmlessness (Askell et al. 2021).

References:

Askell, Amanda, et al. “A general language assistant as a laboratory for alignment.” arXiv preprint arXiv:2112.00861 (2021). Leidinger, Alina, and Richard Rogers. “Which Stereotypes Are Moderated and Under-Moderated in Search Engine Autocompletion?.” Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency. 2023.

Safiya Umoja Noble. 2018. Algorithms of oppression. New York University Press.

Röttger, Paul, et al. “Xstest: A test suite for identifying exaggerated safety behaviours in large language models.” arXiv preprint arXiv:2308.01263 (2023).

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. 2023. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288.

OpenAI. 2023a. https://platform.openai.com/docs/guides/moderation/overview Almazrouei, Ebtesam, et al. Falcon-40B: an open large language model with state-of-the-art performance. Technical report, Technology Innovation Institute, 2023.