Question Answering for Norwegian

Question Answering (QA) is a central task within Natural Language Understanding and QA datasets have become standard benchmarks for Large Language Models (LLMs) in recent years.

Bildet kan inneholde: font, rektangel, skjermdump, parallell, antall.

The first Norwegian QA dataset NorQuAD was recently released. The dataset was the result of a small-scale data annotation effort led by LTG. While the modeling results based on this dataset were promising there remains many possible areas of research to improve QA benchmarking for Norwegian.

This thesis can take several directions, depending on the interests of the student (and may also be conducted by a team of two students). Some possible directions include:

assessing the effect of data augmentation approaches to enlarge the NorQuAD benchmark
adapting the NorQuAD benchmark for instruction-tuning of generative LLMs
developing a neural, generative QA model for Norwegian
and others

Ivanova, Sardana; Andreassen, Fredrik Aas; Jentoft, Matias; Wold, Sondre & Øvrelid, Lilja (2023). NorQuAD: Norwegian Question Answering Dataset. In Alumäe, Tanel & Fishel, Mark (Ed.), Proceedings of the 24th Nordic Conference on Computational Linguistics (NoDaLiDa).

Publisert 3. okt. 2023 16:31 - Sist endret 9. okt. 2023 16:08

Veileder(e)

Sondre Wold Universitetet i Oslo
Vladislav Mikhailov Universitetet i Oslo
Lilja Øvrelid Universitetet i Oslo

Question Answering for Norwegian

Veileder(e)

Omfang (studiepoeng)