Silvestre-Cerdà, Joan Albert; García-Martínez, Mercedes; Barrón-Cedeño, Alberto; Civera, Jorge; Rosso, Paolo Extracción de corpus paralelos de la Wikipedia basada en la obtención de alineamientos bilingües a nivel de frase Inproceedings Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011), pp. 14-21, CEUR-WS, 2011, ISSN: 1613-0073. Abstract | Links | BibTeX | Tags: Comparable Corpora, Parallel Sentences Extraction, Statistical machine translation @inproceedings{Silvestre-Cerdà2011b,
title = {Extracción de corpus paralelos de la Wikipedia basada en la obtención de alineamientos bilingües a nivel de frase},
author = {Joan Albert Silvestre-Cerdà and Mercedes García-Martínez and Alberto Barrón-Cedeño and Jorge Civera and Paolo Rosso},
url = {http://hdl.handle.net/10251/27930},
issn = {1613-0073},
year = {2011},
date = {2011-01-01},
booktitle = {Proceedings of the Workshop on Iberian Cross-Language Natural Language Processing Tasks (ICL 2011)},
volume = {824},
pages = {14-21},
publisher = {CEUR-WS},
abstract = {This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging.},
keywords = {Comparable Corpora, Parallel Sentences Extraction, Statistical machine translation},
pubstate = {published},
tppubtype = {inproceedings}
}
This paper presents a proposal for extracting parallel corpora from Wikipedia on the basis of statistical machine translation techniques. We have used word-level alignment models from IBM in order to obtain phrase-level bilingual alignments between documents pairs. We have manually annotated a set of test English-Spanish comparable documents in order to evaluate the model. The obtained results are encouraging. |