Europarl-ST

Europarl-ST is a Multilingual Speech Translation Corpus, that contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012.

For this initial release, the corpus contains samples both from and into 6 European languages (German, English, Spanish, French, Italian and Portuguese), for a total of 30 different translation directions.

The full details of the corpus are available in the paper:
https://arxiv.org/abs/1911.03167

For more information about the activities of our research group, visit:
https://www.mllp.upv.es/

For any questions or comments regarding the corpus, don't hesitate to contact Javier Iranzo-Sánchez (jairsan@upv.es)

If you use the corpus in your research please cite the following reference:

@inproceedings{europarlst,
  title = {Europarl-ST: A Multilingual Corpus For Speech Translation Of Parliamentary Debates},
  author = {Javier Iranzo-S\'{a}nchez and Joan Albert Silvestre-Cerd\`{a} and Javier Jorge and Nahuel Rosell\'{o} and Adri\`{a} Gim\'{e}nez and Albert Sanchis and Jorge Civera and Alfons Juan},
  note = "Accepted to ICASSP2020",
}
  
Download corpus