Adding gender information to named entities for studying bias in the news

Named Entity Recognition (NER) is a central task in NLP, aiming to not only identify but also categorize proper names in running text, e.g., whether a span of text refers to a person, organization, location, etc. For Norwegian, the NorNE dataset is the basis for existing NER-tools such as SpaCy. In an applied setting, however, there are additional relevant factors relating to entities that may be of interest. For instance, knowing the gender of a person may enable monitoring of gender bias in news coverage. One possible direction for this could be to couple the predictions of a gender+NER model with information from other models like targeted sentiment analysis. 

The goal of this thesis is to 

  • (semi-)automatically enrich the NorNE-annotations with gender information,  
  • train and evaluate models to predict such gendered named entities,  
  • and evaluate the tool on a use case in current news coverage, on-going work at Amedia.

Supervision will be conducted as a collaboration between IFI (LTG), Amedia, and the MediaFutures Research Centre for Responsible Media Technology & Innovation, with supervisors from all affiliations. The precise details and scope of the thesis can be further decided in agreement between the supervisors and the candidate. 

The project presupposes a good balance of technical and linguistic expertise. Good programming skills, experience with machine learning and a solid background in NLP are relevant qualifications. 

