Oppgaven er ikke lenger tilgjengelig

Semantic change detection for Norwegian

Time tensor from (Jurgens and Stevens, 2009)

Words in human languages change their meaning over time. For example, the English word "cell" originally had only one sense of "solitary dwelling, as in monastery or prison". Later, it acquired an additional sense of "a small usually microscopic mass of protoplasm bounded externally by a semipermeable membrane". In the last two decades, the word is used more and more in a  third sense of "mobile phone".

These changes (diachronic semantic shifts) can be captured automatically. In particular, this is often done by analyzing changes in the behavior of large-scale neural language models trained on texts created in different time periods.  There are several survey papers on the topic, ACL workshops in 2019 and 2021 and a SemEval shared task in 2020. The findings in these studies are important both for general linguistics and for practical applications like web search and digital humanities.

However, this is mostly done for English and a few other languages. This Master thesis should fill in the lack of experimental results in unsupervised data-driven detection of temporal semantic change for Norwegian. This includes two main tasks:

  1. Refining and studying of human-annotated test sets of Norwegian words which acquired a new sense or lost an old sense over time (the annotation is ongoing now as a separate effort).
  2. Using modern neural architectures and raw Norwegian texts from different time periods to detect diachronic shifts from the test set. In particular, it is necessary to evaluate both static distributional approaches (word2vec, fastText, etc) and contextualized models (ELMo, BERT, etc).

Prerequisites:

  • good knowledge of Norwegian
  • at least some competence in programming and machine learning
Emneord: natural language processing, semantic change, historical linguistics, history of language, computational semantics, word embeddings
Publisert 13. okt. 2021 17:17 - Sist endret 7. des. 2022 14:54

Veileder(e)

Omfang (studiepoeng)

60