Automated scoring of essays is the process of estimating the grade of the essay written by a second language learner. A related task known as Native Language Identification is defined as the task of predicting the native language of the learner from written essays. Both the tasks use machine learning methods such as SVM classifiers based on lexical and syntactic features. However, until now, there is no effort to perform joint prediction of both language and CEFR proficiency from the same source.
This project requires familiarity with traditional machine learning techniques such as SVM and knowledge of neural networks is an advantage eg., INF5860. The project will work with TOEFL11 corpus and will investigate both SVM and neural architectures to jointly predict the grade and native language of written essays.