Probing Language Models: A Study Case on Biases using the Census Dataset

Language models are being widely applied, ranging from applications in virtual personal assistants, web search, various text generation tasks such as text summarization, language translation, production of speeches, and others. The improvement of user interfaces for interacting with language models has contributed to the rapid popularization of language models among the society as a whole. One of the main issues when applying these models to real world applications is that they are often used as black boxes and trained on datasets that reflect harmful societal biases, which in turn can be present or even amplified in the model, after the training process.

Harmful biases are difficult to detect and measure in a systematic way and can be towards individuals with a certain age, nationality, ethnic origin, gender, sexual orientation, disability, mental health condition, combination of multiple of these factors, among others. Harmful biases in machine learning models have already caused damages in the society, when applied for automating the legal system, negatively affecting black people, and when applied for automating the recommendation of job positions, preferring men over women for suggesting jobs with leadership roles.

Given  this scenario where language models are being applied for various tasks while still being trained on data with societal biases, how one can investigate such harmful biases and detect them? There has been recent work on probing language models in a systematic way using Angluin’s exact learning framework. The idea is to see the language model as an oracle and employ algorithms from this framework to extract information from this model. It provides the first steps towards a flexible approach for probing language models to detect group biases and it can be enriched in many ways. First of all, regarding the selection of groups, which should include the possibility of focusing on range of factors that are known to affect certain groups of the population negatively. Considering Census data from countries such as France, Norway, United Kingdom, and USA, where a large range of factors can be analysed in a combined way, which groups are most affected by harmful biases in language models? Secondly, can we create a platform that simplifies the process of selecting language models, the factors to be analysed, and the extent to which the probing process is applied?

See full description here.

Publisert 7. sep. 2023 08:40 - Sist endret 18. sep. 2023 19:01

Veileder(e)

Omfang (studiepoeng)

60