Europarl-ST is a multilingual Spoken Language Translation corpus containing paired audio-text samples for SLT from and into 9 European languages, for a total of 72 different translation directions. This corpus has been compiled using the debates held in the European Parliament in the period between 2008 and 2012.
Citation:
@inproceedings{jairsan2020a,
author={J. {Iranzo-Sánchez} and J. A. {Silvestre-Cerdà} and J. {Jorge} and N. {Roselló} and A. {Giménez} and A. {Sanchis} and J. {Civera} and A. {Juan}},
title={Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates},
booktitle={Proc. of 2020 IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020)},
year={2020},
pages={8229-8233}
}
You can read more and download the corpus at: https://mllp.upv.es/europarl-st/