Uncategorized | Brandeis Department of Computer ScienceBrandeis Department of Computer Science

March 27, 2013 · 4:30 pm

Natural Language Annotation for Machine Learning

James Pustejovsky and Amber Stubbs

Create your own natural language training corpus for machine learning. Whether you’re working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycle—the process of adding metadata to your training corpus to help ML algorithms work more efficiently. You don’t need any programming or linguistics experience to get started.

Using detailed examples at every step, you’ll learn how the MATTER Annotation Development Process helps you Model, Annotate, Train, Test, Evaluate, and Revise your training corpus. You also get a complete walkthrough of a real-world annotation project.

Define a clear annotation goal before collecting your dataset (corpus)
Learn tools for analyzing the linguistic content of your corpus
Build a model and specification for your annotation project
Examine the different annotation formats, from basic XML to the Linguistic Annotation Framework
Create a gold standard corpus that can be used to train and test ML algorithms
Select the ML algorithms that will process your annotated data
Evaluate the test results and revise your annotation task
Learn how to use lightweight software for annotating texts and adjudicating the annotations

This book is a perfect companion to O’Reilly’s Natural Language Processing with Python.

Publication Date: November 1, 2012

Pustejovsky, James and Amber Stubbs. Natural Language Annotation and Machine Learning. O’Reilly Publishers, 2012.

Comments Off on Natural Language Annotation for Machine Learning

Filed under Uncategorized

· 4:29 pm

Interpreting Motion: Grounded Representations for Spatial Language

Inderjeet Mani and James Pustejovsky

Interpreting Motion presents an integrated perspective on how language structures constrain concepts of motion and how the world shapes the way motion is linguistically expressed. Natural language allows for efficient communication of elaborate descriptions of movement without requiring a precise specification of the motion. Interpreting Motion is the first book to analyze the semantics of motion expressions in terms of the formalisms of qualitative spatial reasoning. It shows how motion descriptions in language are mapped to trajectories of moving entities based on qualitative spatio-temporal relationships. The authors provide an extensive discussion of prior research on spatial prepositions and motion verbs, devoting chapters to the compositional semantics of motion sentences, the formal representations needed for computers to reason qualitatively about time, space, and motion, and the methodology for annotating corpora with linguistic information in order to train computer programs to reproduce the annotation. The applications they illustrate include route navigation, the mapping of travel narratives, question-answering, image and video tagging, and graphical rendering of scenes from textual descriptions.

The book is written accessibly for a broad scientific audience of linguists, cognitive scientists, computer scientists, and those working in fields such as artificial intelligence and geographic information systems.

Publication Date: April 7, 2012

Comments Off on Interpreting Motion: Grounded Representations for Spatial Language

Filed under Uncategorized

· 4:29 pm

The Generative Lexicon

James Pustejovsky

The Generative Lexicon presents a novel and exciting theory of lexical semantics that addresses the problem of the “multiplicity of word meaning”; that is, how we are able to give an infinite number of senses to words with finite means. The first formally elaborated theory of a generative approach to word meaning, it lays the foundation for an implemented computational treatment of word meaning that connects explicitly to a compositional semantics.In contrast to the static view of word meaning (where each word is characterized by a predetermined number of word senses) that imposes a tremendous bottleneck on the performance capability of any natural language processing system, Pustejovsky proposes that the lexicon becomes an active — and central — component in the linguistic description. The essence of his theory is that the lexicon functions generatively, first by providing a rich and expressive vocabulary for characterizing lexical information; then, by developing a framework for manipulating fine-grained distinctions in word descriptions; and finally, by formalizing a set of mechanisms for specialized composition of aspects of such descriptions of words, as they occur in context, extended and novel senses are generated.The subjects covered include semantics of nominals (figure/ground nominals, relational nominals, and other event nominals); the semantics of causation (in particular, how causation is lexicalized in language, including causative/unaccusatives, aspectual predicates, experiencer predicates, and modal causatives); how semantic types constrain syntactic expression (such as the behavior of type shifting and type coercion operations); a formal treatment of event semantics with subevents); and a general treatment of the problem of polysemy.

Language, Speech, and Communication series.

Publication Date: January 9, 1998

Pustejovsky, J. The Generative Lexicon, MIT Press, Cambridge. 1995.

Comments Off on The Generative Lexicon

Filed under Uncategorized

· 4:28 pm

Annotating, Extracting and Reasoning about Time and Events

Frank Schilder, Graham Katz and James Pustejovsky

This state-of-the-art survey comprises a selection of the material presented at the International Dagstuhl Seminar on Annotating, Extracting and Reasoning about Time and Events, held in Dagstuhl Castle, Germany, in April 2005. The seminar centered around an emerging de facto standard for time and event annotation: TimeML. It features nine papers that detail current research and discuss open problems concerning annotation, temporal reasoning, and event identification.

Publication Date: December 14, 2007

Schilder, F., G. Katz, and Pustejovsky, James, ed. Annotating, Extracting, and Reasoning about Time and Event. Berlin: Springer, 2007.

Comments Off on Annotating, Extracting and Reasoning about Time and Events

Filed under Uncategorized

· 4:28 pm

Semantics and the Lexicon

James Pustejovsky

This book integrates the research being carried out in the field of lexical semantics in linguistics with the work on knowledge representation and lexicon design in computational linguistics. It provides a stimulating and unique discussion between the computational perspective of lexical meaning and the concerns of the linguist for the semantic description of lexical items in the context of syntactic descriptions.

Publication Date: December 3, 2010, Edition: Softcover reprint of hardcover 1st ed. 1993

Pustejovsky, J. ed. Semantics and the Lexicon, Kluwer, Dordrecht, The Netherlands. 1993.

Comments Off on Semantics and the Lexicon

Filed under Uncategorized

· 4:27 pm

Advances in Generative Lexicon Theory

James Pustejovsky, Pierrette Bouillon, Hitoshi Isahara and Kyoko Kanzaki

This collection of papers takes linguists to the leading edge of techniques in generative lexicon theory, the linguistic composition methodology that arose from the imperative to provide a compositional semantics for the contextual modifications in meaning that emerge in real linguistic usage. Today’s growing shift towards distributed compositional analyses evinces the applicability of GL theory, and the contributions to this volume, presented at three international workshops (GL-2003, GL-2005 and GL-2007) address the relationship between compositionality in language and the mechanisms of selection in grammar that are necessary to maintain this property. The core unresolved issues in compositionality, relating to the interpretation of context and the mechanisms of selection, are treated from varying perspectives within GL theory, including its basic theoretical mechanisms and its analytical viewpoint on linguistic phenomena.

Publication Date: December 19, 2012, Edition: 2013

Pustejovsky, James, Pierrette Bouillon, Hitoshi Isahara, Kyoko Kanzaki, Chungmin Lee. Advances in Generative Lexicon Theory. Springer, 2013.

Comments Off on Advances in Generative Lexicon Theory

Filed under Uncategorized

· 4:25 pm

Lexical Semantics: The Problem of Polysemy

James Pustejovsky and Branimir Boguraev

Lexical ambiguity presents one of the most intractable problems for language processing studies and, not surprisingly, it is at the core of research in lexical semantics. Originally published as two special issues of the Journal of Semantics, this collection focuses on the problem of polysemy, from the point of view of practitioners of computational linguistics.

Publication Date: January 2, 1997

Pustejovsky, J. and B. Boguraev, eds. Lexical Semantics and the Problem of Polysemy, Oxford University Press, Oxford. 1997.

Comments Off on Lexical Semantics: The Problem of Polysemy

Filed under Uncategorized

· 4:25 pm

The Language of Time: A Reader

Inderjeet Mani, James Pustejovsky and Robert Gaizauskas

This reader collects and introduces important work in linguistics, computer science, artificial intelligence, and computational linguistics on the use of linguistic devices in natural languages to situate events in time: whether they are past, present, or future; whether they are real or hypothetical; when an event might have occurred, and how long it could have lasted. Clear, self-contained editorial introductions to each area provide the necessary technical background for the non-specialist, explaining the underlying connections across disciplines.

Publication Date: August 11, 2005

Mani, I. J. Pustejovsky, R. Gaizauskas, (eds.) The language of time: readings in temporal information processing, Oxford University Press. 2005.

Comments Off on The Language of Time: A Reader

Filed under Uncategorized

· 4:24 pm

Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax

Carol L. Tenny and James Pustejovsky

Researchers in lexical semantics, logical semantics, and syntax have traditionally employed different approaches in their study of natural languages. Yet, recent research in all three fields have demonstrated a growing recognition that the grammars of natural languages structure and refer to events in particular ways. This convergence on the theory of events as grammatical objects is the motivation for this volume, which brings together premiere researchers in these disciplines to specifically address the topic of event structure. The selection of works presented in this volume originated from a 1997 workshop funded by the National Science Foundation regarding Events as Grammatical Objects, from the Combined Perspectives of Lexical Semantics, Logical Semantics and Syntax.

Publication Date: April 1, 2001

Tenny, C. and J. Pustejovsky, (eds.) Events as Grammatical Objects, Cambridge University Press. 2000.

Comments Off on Events as Grammatical Objects: The Converging Perspectives of Lexical Semantics and Syntax

Filed under Uncategorized

· 4:20 pm

Lexical Semantics and Knowledge Representation

James Pustejovsky and Sabine Bergler

Recent work on formal methods in computational lexical semantics has had the effect of bringing many linguistic formalisms much closer to the knowledge representation languages used in artificial intelligence. Formalisms are now emerging which may be more expressive and formally better understood than many knowledge representation languages. The interests of computational linguists now extend to include such domains as commonsense knowledge, inheritance, default reasoning, collocational relations, and even domain knowledge. With such an extension of the normal purview of “linguistic” knowledge, one may question whether there is any logical justification for distinguishing between lexical semantics and commonsense reasoning. This volume explores the question from several methodological and theoretical perspectives. What emerges is a clear consensus that the notion of the lexicon and lexical knowledge assumed in earlier linguistic research is grossly inadequate and fails to address the deeper semantic issues required for natural language analysis.

Publication Date: October 8, 1992

Pustejovsky, J. and S. Bergler. (eds.) Lexical Semantics and Knowledge Representation, Springer Verlag, Berlin. 1992.

Comments Off on Lexical Semantics and Knowledge Representation

Filed under Uncategorized