Generative Lexicon Theory
The focus of research in Generative Lexicon Theory is on the computational and cognitive modeling of natural language meaning. More specifically, the investigation is in how words and their meanings combine to make meaningful texts. This research has focused on developing a lexically oriented theory of semantics, based on a methodology making use of formal and computational semantics. That is, we are looking at how word meaning in natural language might be characterized both formally and computationally, in order to account for both the subtle use of words in different sentences, as well as the creative use of words in novel contexts. One of the major goals in our current research, therefore, is to study polysemy, ambiguity, and sense shifting phenomena in different languages.
Communicating with Computers
The Communicating with Computers (CwC) program aims to reimagine computers not only as tools, but also as collaborators working towards common goals. To that end, we are exploring how ideas can be conveyed between humans and computers and vice versa using various communicative modalities like language, gesture, visualization, or action; and how simple ideas can be composed into more complex ideas and interpreted in context. The core of our CwC work has been centered on the modeling of composable object and event semantics in a multimodal real-time simulation environment that serves as the semantic common ground between a human and a computer. Our environment and multimodal semantics facilitates a number of shared tasks, including collaborative structure building, curating biological databases, and the composition of stories and music. We use multimodal simulation as the scaffold for automatic learning of object and event properties with the goal of interacting with robots and and semi-autonomous systems.
Language Application Grid
The Language Application (LAPPS) Grid is an open web service platform for natural language processing (NLP) research and development. Together with researchers at Vassar College, Carnegie Mellon University, and the Linguistic Data Consortium at the University of Pennsylvania, we are working toward interoperability among language resources and NLP tools. Specifically, we are developing standards for interchange of linguistic objects among tools with different input and output formats, including evaluation tools based on the Open Advancement framework, and developing an easy-to-use interface for users to combine NLP tools from various sources into their own customized pipelines.
Integrating the Generative Lexicon into VerbNet
VerbNet is a comprehensive lexicon of English verbs, that categorizes verbs according to the syntactic structures they allow and the semantic restrictions they place on their arguments. It also contains a basic event semantics, in which events are divided into a beginning, middle, and end. We are currently working on augmenting the VerbNet representations of verbs with their event structures from Generative Lexicon Theory. Specifically, we propose a compositional model of events as consisting of some number of subevents, which can themselves be predicates of the verb’s arguments. We believe that more detailed models of subevent structure can help distinguish the meanings of verbs in the same VerbNet class, and help explain the semantics of verb polysemy in different contexts.
ISO-TimeML: Temporal Markup Language
TimeML is a robust specification language for events and temporal expressions in natural language. It is designed to address four problems in event and temporal expression markup. TimeML has been developed in the context of three AQUAINT workshops and projects. The 2002 TERQAS workshop set out to enhance natural language question answering systems to answer temporally-based questions about the events and entities in news articles. The first version of TimeML was defined and the TimeBank corpus was created as an illustration. TANGO was a follow-up workshop in which a graphical annotation tool was developed. The TARSQI project developed algorithms that tag events and time expressions in NL texts and temporally anchor and order the events.
The TARSQI project allowed developers and analysts to sort and organize information in NL texts based on their temporal characteristics. Specifically, we developed algorithms that tag mentions of events in NL texts, tag time expressions and normalize them, and temporally anchor and order the events. We also developed temporal reasoning algorithms that operate on the resulting event-time graphs for each document. These temporal reasoning algorithms include a graph query capability, that will, for example, find when a particular event occurs, or which events occur in a time period. They also include a temporal closure algorithm that allows more complete coverage of queries (by using the transitivity of temporal precedence and inclusion relationships to insert additional links into the graph), and a timelining algorithm that provides chronological views at various granularities of an event graph as a whole or a region of it. We also developed a capability to compare event graphs across documents. Finally, we developed a model of the typical durations of various kinds of events.
ISO-Space: Spatiotemporal Reasoning
The goals of the ISO-Space research are to further the representational and algorithmic support for spatio-temporal reasoning from natural language text in the service of practical applications. One such task is tracking the movements of individuals; providing automated support for such a task can be vital for national security. To create such technological support, we used lexical resources to integrate two existing annotation schemes, creating an entirely new representation that captures, in a fine-grained manner, the movement of individuals through spatial and temporal locations. This integrated representation can be extracted automatically from natural language documents using symbolic and machine learning methods.
The other challenge we address is translating verbal subjective descriptions of spatial relations into metrically meaningful positional information, and extending this capability to spatiotemporal monitoring. Document collections, transcriptions, cables, and narratives routinely make reference to objects moving through space over time. Integrating such information derived from textual sources into a geosensor data system can enhance the overall spatiotemporal representation in changing and evolving situations, such as when tracking objects through space with limited image data.