Computational linguistic typology

Greenberg's universals of language state empirical facts or tendencies of syntactic structure that hold for all languages of the world. Here are a few examples:

  • Universal 17: With overwhelmingly more than chance frequency, languages with dominant order VSO have the adjective after the noun.
  • Universal 18: When the descriptive adjective precedes the noun, the demonstrative and the numeral, with overwhelmingly more than chance frequency, do likewise.
  • Universal 25: If the pronominal object follows the verb, so does the nominal object.

The World Atlas of Linguistic Structures (WALS) identifies almost 200 linguistic features (not only syntactic ones) and provides the corresponding values for up to 1000 languages of the world.

Your goal will be to investigate some of these features/universals using syntactically annotated texts. You can use already parsed corpora from the Universal Dependencies project, or annotate your own dataset. It could also be interesting to see how different text genres lead to different distributions of features.

Publisert 6. okt. 2023 10:40 - Sist endret 10. okt. 2023 18:10

Veileder(e)

Omfang (studiepoeng)

60