Scientific Background

Parsing human language is the process of analyzing grammatical structure, i.e. working out who did what to whom? Arguably parsing is a mandatory step towards interpretation and has always been among the most central techniques in parsing the relatively short history of human language technology.

Comparing approaches to parsing

Mirroring developments in other fields, notably Artificial Intelligence, approaches to parsing human language originally were anchored in formal grammars, hand-built systems of 1990s linguistic rules. Due to the cost of development and computational complexity involved, rule-based parsers typically do not scale easily to free text. Increased adoption of statistical methods and the availability of large corpora of text annotated with grammatical structure since the mid-1990s led to a surge of parsers trained automatically on annotated corpora, notably the Penn Treebank (PTB; Marcus, Santorini, & Marcinkiewicz, 1993). Such statistical parsers can achieve good results on texts similar to the training material, but often they are difficult to adapt to different types of input (Gildea, 2001). The plot above gives a schematic view of evolution along the two dimensions, suggesting a classic trade-off: the ability to successfully process unrestricted inputs comes at the expense of output precision, by which we refer to both the granularity (or amount) of information available and the average correctness of outputs.

Emphasizing the vertical dimension of comparison, parsing approaches are often classified as ‘deep’ vs. ‘shallow’. These terms, however ubiquitous in the scientific discourse, remain at best poorly defined. A prototypical ’shallow’ parser will not explicate the difference in use of the preposition in in examples like Cisco is interested in Tandberg. vs. Tandberg is headquartered in Oslo. Conversely, a ’deep’ parser would be expected to recover the underlying relations in a sentence like Cisco, CNN reports, sets out to acquire Tandberg in December. Here, the network maker is the agent of both the aiming (the semantics of the multi-word set out) and the acquiring. The arrows in figure above symbolize recent and ongoing developments, with contemporary R&D now seeking to combine the strengths of both worlds. Hybrid paradigms combine in-depth linguistic analysis with sophisticated statistical methods. A third dimension of variation is the cost of computation. More in-depth analysis and higher granularity of output information makes the parsing problem harder, typically enlarging the search space of candidate solutions. With per-sentence parsing times varying between several milliseconds and a few seconds, algorithmic efficiency remains a technological barrier to the large-scale deployment of deeper analysis.

There is a wide range of user-centric HLT applications to Web content that motivate R&D towards semantic parsing as a general purpose technology. Areas such as entity and relation extraction, ontology learning, social network analysis, and others already experience limitations imposed by a lack of reliable, large-scale analysis of grammatical structure. Returning to our application vision of opinion mining over user-generated content, parsing has been demonstrated to significantly improve sentiment analysis (Choi, Breck, & Cardie, 2006). A sentiment is a relation in which a source has an evaluative judgment regarding a target. Sentiment analysis traditionally is applied at the document level, e.g. determining whether a product review is broadly positive or negative.

Recently, however, the recognition of sentiment at the level of individual utterances (typically a one-sentence statement) has received more attention. This involves subtasks such as the recognition of opinion holders (entities), determining the strength and polarity of opinion expressions, and the joint extraction of opinion entities and their relations (Choi et al., 2006). It has been shown that the use of relational linguistic information is important in particular in expression-level sentiment analysis (Bunescu & Mooney, 2005; Choi et al., 2006; inter alios), and more generally that structural choices directly influence the perceived sentiment of an utterance (Greene & Resnik, 2009). Standard ‘bag-of-words’ approaches in information retrieval, however, are agnostic to structural relations in human language and essentially use identical representations for the sentences Cisco acquired Tandberg vs. Tandberg acquired Cisco. Consider two examples from technology blogs:

Google isn’t universally loved by newspaper execs and other content creators.
Despite the overall attractiveness of the iPhone, it lacks too many vital features to replace the BlackBerry as the corporate weapon of choice.

Example 1 above invokes the so-called passive construction (which is frequent in many languages), where the agent phrase newspaper execs and other content creators is not expressed as the syntactic subject (which in English tends to occur sentence-initially), but rather as an argument of the preposition by. The semantic ‘object’ of the love relation (i.e. Google), on the other hand, occupies the position of the syntactic subject. Furthermore, the negation expressed by the contracted auxiliary isn’t crucially takes scope over the complete sentiment expressed by the main verb and its arguments. Conversely, example 2 above invokes a comparison of two smartphones and relies on a contrasting relation (provided by despite) between two independent statements. Without working out the grammatical structure of the utterance as a whole, including resolution of the pronoun it as referring to the iPhone and recognition of inherent negation in words like lack, it would be impossible to say which device is favored. Such structural relations are captured by semantic analysis.

There is insufficient knowledge for empirical comparison of competing approaches. Recent parsing R&D for English has near-exclusively focused on a subset of the syntactic information available in the PTB and its narrow subject matter: Wall Street Journal articles from the late 1980s. Extant parser evaluation metrics (of which some have been used widely) thus fail to provide insights into what it means for linguistic analysis to be ‘deep’ to some degree. Distinct approaches offer seemingly comparable performance on this limited task (Miyao, Sagae, & Tsujii, 2007). In contrast, user-generated Web content presents very different challenges, stemming in part from informal and more creative language use, and in part from ‘noise’ inserted in electronic communication. Likewise, no studies exist that systematically correlate linguistic adequacy with non-linguistic properties—robustness to diverse inputs, processing efficiency, and adaptability across domains, for example.

Scalable (semantic) parsing technology is too expensive to develop for a single player, corporate or academic. The interdisciplinary nature of the task, combining linguistics and computer science (and, in principle at least, also cognitive psychology), calls for the incremental and sustained development of formal theory, computational grammars and annotated corpora, as well as carefully implemented technology. Therefore, a long-term perspective, collaborative development, focus on task-, genre-, and domain-adaptable approaches, and the exchange and reuse of knowledge and resources are widely viewed as prerequisites to broader use of parsing in next-generation ICT solutions.

By Stephan Oepen, Lilja Øvrelid

Published Sep. 27, 2012 8:57 AM - Last modified Sep. 27, 2012 1:54 PM