Online Public Access Catalogue (OPAC)
Library,Documentation and Information Science Division

“A research journal serves that narrow

borderland which separates the known from the unknown”

-P.C.Mahalanobis


Image from Google Jackets

Building and using comparable corpora / [edited by] Serge Sharoff...[et al.].

Contributor(s): Material type: TextTextPublication details: Berlin : Springer-Verlag, 2013.Description: xii, 335 p. : illustrations (some color) ; 25 cmISBN:
  • 9783642201271
Subject(s): DDC classification:
  • 006.35 23 Sh531
Contents:
Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora.- S.Sharoff, R.Rapp, P.Zweigenbaum.- Part I: Compiling and Measuring Comparable Corpora.- Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web. Simon Shi and Pascale Fung-- Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making. A.Gurrutxaga, I.Leturia, I.San Vicente, X.Saralegi.- Statistical Comparability: Methodological Caveats. R.Kohler.- Methods for Collection and Evaluation of Comparable Documents. M.Lestari Paramita, D.Guthrie, E.Kanoulas, R.Gaizauskas, P.Clough and M.Sanderson.- Measuring the Distance between Comparable Corpora between Languages. S.Sharoff.- Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality. B.Li, E.Gaussier.- Statistical Corpus and Language Comparison on Comparable Corpora. T.Eckart, U.Quasthoff.- Comparable Multilingual Patents as Large-scale Parallel Corpora. B.Lu and B.Tsou.- Part II: Using Comparable Corpora.- Extracting Parallel Phrases from Comparable Data. S.Hewavitharana, S.Vogel.- Exploiting Comparable Corpora. D.S.Munteanu, D.Marcu.- Paraphrase Detection in Comparable Monolingual Corpora. L.Deleger, B.Cartoni, P.Zweigenbaum.- Information Network Construction and Alignment from Automatically Acquired Comparable Corpora. H.Ji, W.-P.Lin.- Bilingual Terminology Mining from Comparable Corpora. B.Daille, E.Morin.- The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. K.Kageura, T.Abekawa.- Old Needs, New Solutions: Comparable Corpora for Language Professionals. S.Bernardini, A.Ferraresi.- Exploiting the Incomparability of Comparable Corpora for Contrastive Linguistics and Translation Studies. S.Neumann, S.Hansen-Schirra.
Summary: The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume providesa reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.
Tags from this library: No tags from this library for this title. Log in to add tags.
Holdings
Item type Current library Call number Status Date due Barcode Item holds
Books ISI Library, Kolkata 006.35 Sh531 (Browse shelf(Opens below)) Available 136351
Total holds: 0

Includes bibliographical references.

Overviewing Important Aspects of the Last Twenty Years of Research in Comparable Corpora.- S.Sharoff, R.Rapp, P.Zweigenbaum.-

Part I: Compiling and Measuring Comparable Corpora.-

Mining Parallel Documents Using Low Bandwidth and High Precision CLIR from the Heterogeneous Web. Simon Shi and Pascale Fung--
Automatic Comparable Web Corpora Collection and Bilingual Terminology Extraction for Specialized Dictionary Making. A.Gurrutxaga, I.Leturia, I.San Vicente, X.Saralegi.-
Statistical Comparability: Methodological Caveats. R.Kohler.- Methods for Collection and Evaluation of Comparable Documents. M.Lestari Paramita, D.Guthrie, E.Kanoulas, R.Gaizauskas, P.Clough and M.Sanderson.-
Measuring the Distance between Comparable Corpora between Languages. S.Sharoff.-
Exploiting Comparable Corpora for Lexicon Extraction: Measuring and Improving Corpus Quality. B.Li, E.Gaussier.- Statistical Corpus and Language Comparison on Comparable Corpora. T.Eckart, U.Quasthoff.-
Comparable Multilingual Patents as Large-scale Parallel Corpora. B.Lu and B.Tsou.-

Part II: Using Comparable Corpora.-
Extracting Parallel Phrases from Comparable Data. S.Hewavitharana, S.Vogel.-
Exploiting Comparable Corpora. D.S.Munteanu, D.Marcu.- Paraphrase Detection in Comparable Monolingual Corpora. L.Deleger, B.Cartoni, P.Zweigenbaum.-
Information Network Construction and Alignment from Automatically Acquired Comparable Corpora. H.Ji, W.-P.Lin.- Bilingual Terminology Mining from Comparable Corpora. B.Daille, E.Morin.-
The Place of Comparable Corpora in Providing Terminological Reference Information to Online Translators: A Strategic Framework. K.Kageura, T.Abekawa.-
Old Needs, New Solutions: Comparable Corpora for Language Professionals. S.Bernardini, A.Ferraresi.-
Exploiting the Incomparability of Comparable Corpora for Contrastive Linguistics and Translation Studies. S.Neumann, S.Hansen-Schirra.

The 1990s saw a paradigm change in the use of corpus-driven methods in NLP. In the field of multilingual NLP (such as machine translation and terminology mining) this implied the use of parallel corpora. However, parallel resources are relatively scarce: many more texts are produced daily by native speakers of any given language than translated. This situation resulted in a natural drive towards the use of comparable corpora, i.e. non-parallel texts in the same domain or genre. Nevertheless, this research direction has not produced a single authoritative source suitable for researchers and students coming to the field. The proposed volume providesa reference source, identifying the state of the art in the field as well as future trends. The book is intended for specialists and students in natural language processing, machine translation and computer-assisted translation.

There are no comments on this title.

to post a comment.
Library, Documentation and Information Science Division, Indian Statistical Institute, 203 B T Road, Kolkata 700108, INDIA
Phone no. 91-33-2575 2100, Fax no. 91-33-2578 1412, ksatpathy@isical.ac.in