Named Entity Recognition (NER) is a central task in NLP, aiming to not only identify but also categorize proper names in running text, e.g., whether a span of text refers to a person, organization, location, etc. For Norwegian, the NorNE dataset is the basis for existing NER-tools such as SpaCy. In an applied setting, however, there are additional relevant factors relating to entities that may be of interest. For instance, knowing the gender of a person may enable monitoring of gender bias in news coverage. One possible direction for this could be to couple the predictions of a gender+NER model with information from other models like targeted sentiment analysis.
The goal of this thesis is to
- (semi-)automatically enrich the NorNE-annotations with gender information,
- train and evaluate models to predict such gendered named entities,
- and evaluate the tool on a use case in current news coverage, on-going work at Amedia.
Supervision will be conducted as a collaboration between IFI (LTG), Amedia, and the MediaFutures Research Centre for Responsible Media Technology & Innovation, with supervisors from all affiliations. The precise details and scope of the thesis can be further decided in agreement between the supervisors and the candidate.
The project presupposes a good balance of technical and linguistic expertise. Good programming skills, experience with machine learning and a solid background in NLP are relevant qualifications.