James Pustejovsky and Amber Stubbs
Create your own natural language training corpus for machine learning. Whether you’re working with English, Chinese, or any other natural language, this hands-on book guides you through a proven annotation development cycle—the process of adding metadata to your training corpus to help ML algorithms work more efficiently. You don’t need any programming or linguistics experience to get started.
Using detailed examples at every step, you’ll learn how the MATTER Annotation Development Process helps you Model, Annotate, Train, Test, Evaluate, and Revise your training corpus. You also get a complete walkthrough of a real-world annotation project.
- Define a clear annotation goal before collecting your dataset (corpus)
- Learn tools for analyzing the linguistic content of your corpus
- Build a model and specification for your annotation project
- Examine the different annotation formats, from basic XML to the Linguistic Annotation Framework
- Create a gold standard corpus that can be used to train and test ML algorithms
- Select the ML algorithms that will process your annotated data
- Evaluate the test results and revise your annotation task
- Learn how to use lightweight software for annotating texts and adjudicating the annotations
This book is a perfect companion to O’Reilly’s Natural Language Processing with Python.
Publication Date: November 1, 2012
Pustejovsky, James and Amber Stubbs. Natural Language Annotation and Machine Learning. O’Reilly Publishers, 2012.
Inderjeet Mani and James Pustejovsky
Interpreting Motion presents an integrated perspective on how language structures constrain concepts of motion and how the world shapes the way motion is linguistically expressed. Natural language allows for efficient communication of elaborate descriptions of movement without requiring a precise specification of the motion. Interpreting Motion is the first book to analyze the semantics of motion expressions in terms of the formalisms of qualitative spatial reasoning. It shows how motion descriptions in language are mapped to trajectories of moving entities based on qualitative spatio-temporal relationships. The authors provide an extensive discussion of prior research on spatial prepositions and motion verbs, devoting chapters to the compositional semantics of motion sentences, the formal representations needed for computers to reason qualitatively about time, space, and motion, and the methodology for annotating corpora with linguistic information in order to train computer programs to reproduce the annotation. The applications they illustrate include route navigation, the mapping of travel narratives, question-answering, image and video tagging, and graphical rendering of scenes from textual descriptions.
The book is written accessibly for a broad scientific audience of linguists, cognitive scientists, computer scientists, and those working in fields such as artificial intelligence and geographic information systems.
Publication Date: April 7, 2012
The Generative Lexicon presents a novel and exciting theory of lexical semantics that addresses the problem of the “multiplicity of word meaning”; that is, how we are able to give an infinite number of senses to words with finite means. The first formally elaborated theory of a generative approach to word meaning, it lays the foundation for an implemented computational treatment of word meaning that connects explicitly to a compositional semantics.In contrast to the static view of word meaning (where each word is characterized by a predetermined number of word senses) that imposes a tremendous bottleneck on the performance capability of any natural language processing system, Pustejovsky proposes that the lexicon becomes an active — and central — component in the linguistic description. The essence of his theory is that the lexicon functions generatively, first by providing a rich and expressive vocabulary for characterizing lexical information; then, by developing a framework for manipulating fine-grained distinctions in word descriptions; and finally, by formalizing a set of mechanisms for specialized composition of aspects of such descriptions of words, as they occur in context, extended and novel senses are generated.The subjects covered include semantics of nominals (figure/ground nominals, relational nominals, and other event nominals); the semantics of causation (in particular, how causation is lexicalized in language, including causative/unaccusatives, aspectual predicates, experiencer predicates, and modal causatives); how semantic types constrain syntactic expression (such as the behavior of type shifting and type coercion operations); a formal treatment of event semantics with subevents); and a general treatment of the problem of polysemy.
Language, Speech, and Communication series.
Publication Date: January 9, 1998
Pustejovsky, J. The Generative Lexicon, MIT Press, Cambridge. 1995.
Frank Schilder, Graham Katz and James Pustejovsky
This state-of-the-art survey comprises a selection of the material presented at the International Dagstuhl Seminar on Annotating, Extracting and Reasoning about Time and Events, held in Dagstuhl Castle, Germany, in April 2005. The seminar centered around an emerging de facto standard for time and event annotation: TimeML. It features nine papers that detail current research and discuss open problems concerning annotation, temporal reasoning, and event identification.
Publication Date: December 14, 2007
Schilder, F., G. Katz, and Pustejovsky, James, ed. Annotating, Extracting, and Reasoning about Time and Event. Berlin: Springer, 2007.
This book integrates the research being carried out in the field of lexical semantics in linguistics with the work on knowledge representation and lexicon design in computational linguistics. It provides a stimulating and unique discussion between the computational perspective of lexical meaning and the concerns of the linguist for the semantic description of lexical items in the context of syntactic descriptions.
Publication Date: December 3, 2010, Edition: Softcover reprint of hardcover 1st ed. 1993
Pustejovsky, J. ed. Semantics and the Lexicon, Kluwer, Dordrecht, The Netherlands. 1993.
James Pustejovsky, Pierrette Bouillon, Hitoshi Isahara and Kyoko Kanzaki
This collection of papers takes linguists to the leading edge of techniques in generative lexicon theory, the linguistic composition methodology that arose from the imperative to provide a compositional semantics for the contextual modifications in meaning that emerge in real linguistic usage. Today’s growing shift towards distributed compositional analyses evinces the applicability of GL theory, and the contributions to this volume, presented at three international workshops (GL-2003, GL-2005 and GL-2007) address the relationship between compositionality in language and the mechanisms of selection in grammar that are necessary to maintain this property. The core unresolved issues in compositionality, relating to the interpretation of context and the mechanisms of selection, are treated from varying perspectives within GL theory, including its basic theoretical mechanisms and its analytical viewpoint on linguistic phenomena.
Publication Date: December 19, 2012, Edition: 2013
Pustejovsky, James, Pierrette Bouillon, Hitoshi Isahara, Kyoko Kanzaki, Chungmin Lee. Advances in Generative Lexicon Theory. Springer, 2013.
James Pustejovsky and Branimir Boguraev
Lexical ambiguity presents one of the most intractable problems for language processing studies and, not surprisingly, it is at the core of research in lexical semantics. Originally published as two special issues of the Journal of Semantics, this collection focuses on the problem of polysemy, from the point of view of practitioners of computational linguistics.
Publication Date: January 2, 1997
Pustejovsky, J. and B. Boguraev, eds. Lexical Semantics and the Problem of Polysemy, Oxford University Press, Oxford. 1997.
Inderjeet Mani, James Pustejovsky and Robert Gaizauskas
This reader collects and introduces important work in linguistics, computer science, artificial intelligence, and computational linguistics on the use of linguistic devices in natural languages to situate events in time: whether they are past, present, or future; whether they are real or hypothetical; when an event might have occurred, and how long it could have lasted. Clear, self-contained editorial introductions to each area provide the necessary technical background for the non-specialist, explaining the underlying connections across disciplines.
Publication Date: August 11, 2005
Mani, I. J. Pustejovsky, R. Gaizauskas, (eds.) The language of time: readings in temporal information processing, Oxford University Press. 2005.
Carol L. Tenny and James Pustejovsky
Researchers in lexical semantics, logical semantics, and syntax have traditionally employed different approaches in their study of natural languages. Yet, recent research in all three fields have demonstrated a growing recognition that the grammars of natural languages structure and refer to events in particular ways. This convergence on the theory of events as grammatical objects is the motivation for this volume, which brings together premiere researchers in these disciplines to specifically address the topic of event structure. The selection of works presented in this volume originated from a 1997 workshop funded by the National Science Foundation regarding Events as Grammatical Objects, from the Combined Perspectives of Lexical Semantics, Logical Semantics and Syntax.
Publication Date: April 1, 2001
Tenny, C. and J. Pustejovsky, (eds.) Events as Grammatical Objects, Cambridge University Press. 2000.
James Pustejovsky and Sabine Bergler
Recent work on formal methods in computational lexical semantics has had the effect of bringing many linguistic formalisms much closer to the knowledge representation languages used in artificial intelligence. Formalisms are now emerging which may be more expressive and formally better understood than many knowledge representation languages. The interests of computational linguists now extend to include such domains as commonsense knowledge, inheritance, default reasoning, collocational relations, and even domain knowledge. With such an extension of the normal purview of “linguistic” knowledge, one may question whether there is any logical justification for distinguishing between lexical semantics and commonsense reasoning. This volume explores the question from several methodological and theoretical perspectives. What emerges is a clear consensus that the notion of the lexicon and lexical knowledge assumed in earlier linguistic research is grossly inadequate and fails to address the deeper semantic issues required for natural language analysis.
Publication Date: October 8, 1992
Pustejovsky, J. and S. Bergler. (eds.) Lexical Semantics and Knowledge Representation, Springer Verlag, Berlin. 1992.