Corpus Processing for Lexical Acquisition

Branimir Boguraev and James Pustejovsky

The lexicon has emerged from the study of computational linguistics as a fundamental resource that enables a variety of linguistic processes to operate in the course of tasks ranging from language analysis and text processing to machine translation. Lexicon acquisition, therefore, plays an essential part in getting any natural language processing system to function in the real world. Computers that process natural language require a variety of lexical information in addition to what can be found in standard dictionaries. Moreover, machine-readable dictionaries of the conventional sort have been found to be inadequate for fully supporting realistic natural language processing tasks. This volume describes corpus processing techniques that can be used to extract the additional lexical information required.

Bringing together a balanced blend of the theoretical and practical, the contributions provide the most recent look at lexical acquisition techniques and practices. These include coping with unknown lexicalizations, task-driven lexical induction, categorization of lexical units, lexical semantics from corpus analysis, and measuring lexical acquisition.

The problems addressed reflect a host of topics including recognition of open compounds, incremental acquisition of meanings from sentence usages, recognition of new senses of existing words, sense disambiguation, recognition of specific classes of works, and recognition and annotation of patterns of word use, each of them important to the overall language analysis process, and each employing text analysis techniques in a useful and theoretically motivated way.

Language, Speech, and Communication series

Publication Date: May 23, 1996

Boguraev, B. and J. Pustejovsky, (eds.) Corpus Processing for Lexical Acquisition, MIT Press. 1996.

Comments Off on Corpus Processing for Lexical Acquisition

Filed under Uncategorized

Introduction to Formal Language Theory

Robert N. Moll, Michael A. Arbib, A. J. Kfoury

This volume combines “An Introduction to Formal Language ” “Theory” with issues in computational linguistics. The book begins with standard formal language material, including a discussion of regular, context-free, context sensitive, and arbitrary phrase structure languages. This is followed by a discussion of the corresponding families of automata: finite-state, push-down, linear bounded and Turing machines. Important topics introduced along the way include closure properties, normal forms, nondeterminism, basic parsing algorithms, and the theory of computability and undecidability. Special emphasis is given to the role of algebraic techniques in formal language theory through a chapter devoted to the fixed point approach to the analysis of context-free languages. Advanced topics in parsing are also emphasized in an unusually clear and precise presentation. A unique feature of the book is the two chapter introduction to the formal theory of natural languages. Alternative schemes for representing natural language are discussed, in particular ATNs and GPSG. This book is part of the AKM Series in Theoretical Computer Science. “A Basis for Theoretical Computer Science,” also in the series, should provide the necessary background for this volume intended to serve as a text for upper undergraduate and graduate level students.

Publication Date: 1988

Moll, Robert, Michael Arbib, Assaf Kfoury (with contributions by James Pustejovsky). Introduction to Formal Language Theory. Springer Verlag, Berlin, 1988.

Comments Off on Introduction to Formal Language Theory

Filed under Uncategorized