Unification-based grammatical frameworks (like HPSG or LFG) have gained popularity in Natural Language Processing because they enable a linguistically adequate account of the system of rules in natural languages. However, efficient parsing and generation with large unification-based grammars remains an active field of research, in part owed to the complexity of the descriptive formalism. At the same time, many existing unification-based grammars (UBGs) capture languages that are context free or ‘nearly context free’.
This project will adapt, implement, and experimentally validate methods for converting UBGs into context-free grammars (CFGs), either aiming to produce an equivalent grammar, or one that approximates the original UBG to a certain degree, i.e. recognizes a language that includes the original language (and also accepts additional utterances, which are ungrammatical according to UBG). Finding the optimal degree of approximation is a balancing act, as inclusion of more information from the original UBG in the approximation can lead to exponential growth in the size of the CFG. At the same time, even a relative crude CFG approximation may have practical value, for example to serve as a ‘filter’ on full unification-based parsing (for improved efficiency), or as the backbone of probabilistic disambiguation and pruning (for better disambiguation).
This work will take the algorithm of Kiefer & Krieger (2004) and the LinGO English Resource Grammar (ERG) as its points of departure. The project requires a good understanding of unification-based grammar, ideally some knowledge of English syntax, as well as good programming experience. Possible implementation languages are C++ (preferred) or Common Lisp. Please contact Stephan Oepen for details.