Text Clustering for Thematic Analysis in the Social Sciences

An important foundation of qualitative analyses in the social sciences is the identification of recurring themes in collections of text such as interview transcripts, diaries and (more recently) in blog posts and online forums. Preparing thematic analyses is a time-consuming process, requiring a bottom-up application of labels representing author beliefs before grouping these into themes and determining an ontological structure.

This project will investigate possible uses of language technology in aiding researchers to label text more efficiently. For instance, text can be clustered using vector-space models in order to find related expressions. These clusters of related text can then be presented to the researcher for refinement and labeling.

Prerequisites: Programming skills (preferably but not necessarily Java), familiarity with basic issues in natural language processing and an interest in developing software for text clustering.

