Oppgaven er ikke lenger tilgjengelig

Dependency Interconversion

Dependency representations have gained enormous popularity for a wide range of NLP tasks in the past decade. Ivanova et al. (2012) review some of the more widely used schemes for representing syntacto-semantic bi-lexical dependencies (for English) and seek to uncover similarities and differences among schemes. However, the focus of their work is on formal and representational aspects, rather than on linguistic content.

For example, even though most schemes for syntactic dependencies will include a notion of subject, it is unknown to what degree this dependency type is defined uniformly across different schemes, i.e. how often and under which conditions a dependent labeled as a ‘subject’ (in the interpretation of one individual dependency scheme) will be assigned the same dependency types in other schemes. For other dependency types, and also for the choice of heads, establishing linguistic correspondences among the definitions assumed in each scheme is likely (potentially a lot) more intricate.

This project attempts to design, implement, and evaluate automated converters among common representation schemes, most notably the PTB- and PropBank-derived ‘mainstream’ syntactic and semantic representations (as used in various CoNLL Shared Tasks) and the syntactic and semantic dependencies derived from the HPSG analyses of the LinGO English Resource Grammar (ERG). This work has a qualitative (i.e. linguistic) and quantitative component. For the latter, the recent release (Flickinger et al., 2012) of a large part of the same WSJ text as underlying the PTB with gold-standard ERG analyses makes possible, for the first time, a data-driven approach to aligning dependency triples across frameworks. Such quantitative analysis will require interpretation and likely call for revisions and extensions grounded on linguistic knowledge about specific use patterns (of heads and dependency types) on either side of the conversion.

Concretely, the project could be conducted in three phases, either partly overlapping or sequential.  (A) For a selection of pairs of dependency schemes, automatically establish alignments between dependency links (or, possibly, also paths of dependency links) over annotated WSJ data, to gather statistics over common and less common alignment patterns.  (B) Design and build a coversion toolkit to automatically map dependency trees (or graphs) in one format into other formats, probably focussing on conversion from the ‘richer’ ERG representations to the ‘more coarse’ CoNLL or Stanford Dependency formats; most likely, this converter will combine some rule-based (i.e. heuristic) techniques with some machine learning.  (C) Evaluate the accuracy and robustness of the converter, as well as use the converter to evaluate parsing results from the ERG against ‘standard’ annotations, i.e. PTB- and PropBank-derived dependencies.

The project requires a combination of skills and interests, including the ability to automatically analyse existing dependency data sets (in various formats), scripting of alignment and conversion procedures, basic statistical analysis, and linguistic interpretation of core syntactic and semantic dependency analyses. Depending on individual expertise and expectations, different of these aspects can be foregrounded or backgrounded to various degrees; please discuss possible refinements of the project with Stephan Oepen or Lilja Øvrelid.

Publisert 3. jan. 2013 22:35 - Sist endret 25. okt. 2019 12:18

Veileder(e)

Omfang (studiepoeng)

60