Oppgaven er ikke lenger tilgjengelig

Re-Ranking Candidate Translations

The LOGON Machine Translation system builds on a combination of grammar-based parsing (of Norwegian), semantic transfer, and tactical generation (in English) to determine candidate translations of Norwegian inputs. The system will often hypothesize large numbers of possible translations, all systematically related to the input but some far less likely than others. Consider, for example, the following candidate translations for the Norwegian sentence Den andre bratte veien mot Bergen er kort (1)"The other steep path towards Bergen is short", (2) "The second steep road towards Bergen is short", (3) "The other steep path against Bergen is short", (4) "The other steep path towards Bergen is a card", and so on.

The LOGON backbone already incorporates stochastic components (e.g. parse selection and realization ranking) to "guide" the MT system through the search space of possible translations and, ultimately, rank candidates according to their probability. This project aims to improve the ranking of translations as a post-processing step to LOGON. The project will leverage parallel Norwegian;endash;English corpora and off-the-shelf SMT technology to determine the translation probability of candidate LOGON outputs. SMT scores and a range of LOGON-internal properties can be combined in a discriminative re-ranking model, in the spirit of Och and Ney (2002) and Och et al. (2004). The project assumes basic familiarity with rule-based as well as statistical approaches to MT, MT evaluation metrics (BLEU and NIST, for example), and a general interest in machine learning approaches to natural language processing. The project will have a substantial experimental component. Please contact Stephan Oepen to discuss this project further.

Publisert 14. mars 2011 11:27 - Sist endret 25. jan. 2016 12:50

Veileder(e)

Student(er)

  • André Lynum

Omfang (studiepoeng)

60