Oppgaven er ikke lenger tilgjengelig

PoS tagging for Norwegian

The newly created Norwegian Dependency Treebank (NDT) contains manually created morphosyntactic analyses of Norwegian Bokmål and Nynorsk sentences. It has recently been used to train syntactic parsers for Norwegian. In order for these parsers to be applied to running text, however, the task of PoS-tagging has to be performed. The treebank contains information about parts-of-speech and morphological properties such as definiteness, gender and number and may therefore be used to train and evaluate PoS-taggers for both Norwegian varieties.  Until now, the most widely used PoS-tagger for Norwegian is the Oslo-Bergen tagger, which is a rule-based tagger.

This thesis will train and evaluate commonly used statistical PoS-taggers on the task of Norwegian PoS-tagging. It will examine how different granularities of categories, e.g. different sized tagsets, affect the results and furthermore how these results generalize to different types of data (e.g. news data, blog data etc.). It will furthermore compare these results to those obtained using the rule-based OBT tagger mentioned above.

Publisert 14. sep. 2014 19:09 - Sist endret 24. sep. 2014 12:11


Omfang (studiepoeng)