Detection of cognates or borrowings

The bilingual lexicon induction task tries to identify pairs of words from two different languages that have the same meaning. For unrelated languages, this is mainly done on the basis of distributional information, but for closely related languages, words with the same meaning often also have similar forms due to their common origin. Such word pairs are called cognates. A possible thesis topic could be to identify cognate pairs on the basis of monolingual text corpora alone.

Besides cognates, it is frequent to see borrowings, or loanwords, in most languages. While most modern borrowings come from English and are relatively transparent, it may be difficult to identify older borrowings, e.g. words of Low German origin in Norwegian, or words of Russian origin in Finnish. A possible thesis topic could consist in automatically identifying borrowings between two given languages.

Publisert 6. okt. 2023 10:37 - Sist endret 6. okt. 2023 10:37


Omfang (studiepoeng)