Europarl-ST is a Multilingual Speech Translation Corpus which contains paired audio-text samples for Speech Translation, constructed using the debates carried out in the European Parliament in the period between 2008 and 2012. https://mllp.upv.es/europarl-st/

Gonçal V. Garcés Díaz-Munío e09567fc0a Add citation 1 year ago
README.md e09567fc0a Add citation 1 year ago

README.md

Europarl-ST

Europarl-ST is a multilingual Spoken Language Translation corpus containing paired audio-text samples for SLT from and into 9 European languages, for a total of 72 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012.

Citation:

@inproceedings{jairsan2020a,
  author={J. {Iranzo-Sánchez} and J. A. {Silvestre-Cerdà} and J. {Jorge} and N. {Roselló} and A. {Giménez} and A. {Sanchis} and J. {Civera} and A. {Juan}},
  title={Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates},
  booktitle={Proc. of 2020 IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020)},
  year={2020},
  pages={8229-8233}
}

Get the corpus

You can read more and download the corpus at: https://mllp.upv.es/europarl-st/