Sentiment Analysis for Technical Discussion

This project specifically focuses on the IETF mailing list, which is the main channel of communication for all participants in the drafting of Internet protocols.

Sentiment Analysis or opinion mining is a widely studied field in NLP. We've seen the application of such techniques to service (hotel, restaurant, airline), merchandise (product), and stock news. Among the varied datasets, we are especially interested in technicians' or professionals' opinions and thoughts regarding design issues and topics. Their opinions can be influential in the final decision-making of the publication of new (versions) protocols. Compared to sentiment analysis in social media and blogs, its application in technician forums is still in the early stage. There are a few studies that analyze open source forums, eg Apache open source mailing list, StackOverflow discussion, Mobile app review, and JIRA issue comments (Sentiment Analysis for SE), etc. However, the data for identifying sentiment expressed in the IETF mailing list is not available yet.

Our first concern is what kinds of sentiment we can observe from the IETF mailing list, so one contribution of this project is to create a dataset, eg by sampling emails from the mailing list, extracting sentiment expressions, and annotating them manually. Like software developers, the IETF participants have their unique way of conveying their opinions and expressing their emotions. Examples of sentiments from the apache open source mailing list are shown below, 6 categories of positive, and 4 categories of negative sentiments were differentiated. 

Table from Monitoring Sentiment in Open Source Mailing Lists

Coming after that, we aim to determine whether existing NLP methods, including lexical-based analysis, machine learning, deep learning language models, eg BERT, hybrid approaches, or others, eg transfer learning, are beneficial for detecting such data. 

Prerequisites: Curiosity for sentiment analysis in the open source mailing list, Machine learning for NLP (equivalent to IN5550 )

