Interspeech 2021: Two MLLP articles accepted for publication at the largest event on spoken language processing

The articles “Towards simultaneous machine interpretation” and “Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization“, by Alejandro Pérez González de Martos, Gonçal Garcés Díaz-Munío and other MLLP researchers, have been accepted for publication at the ISCA’s Interspeech 2021 conference (CORE A).

The 21st Annual Conference of the International Speech Communication Association (Interspeech 2021), to be held this year in Brno (Czech Republic) on 30 August–3 September, is the world’s largest conference on spoken language processing, with over 1000 attendees, over 600 papers, and a CORE A conference ranking. Interspeech emphasizes interdisciplinary approaches addressing all aspects of speech science and technology, and this year it will be organized around the topic “Speech everywhere”.

We are proud to report the acceptance for publication at Interspeech 2021 of two articles by MLLP researchers, one of them on speech-to-speech translation with voice cloning, and the other presenting a new large speech and text corpus for streaming ASR benchmarking and speech data filtering (also introducing experiments on speech data verbatimization). Here are the details for each article:

“Towards simultaneous machine interpretation”
Full article: https://dx.doi.org/10.21437/Interspeech.2021-201
Alejandro Pérez-González-de-Martos, Javier Iranzo-Sánchez, Adrià Giménez Pastor, Javier Jorge, Joan-Albert Silvestre-Cerdà, Jorge Civera, Albert Sanchis, Alfons Juan
Automatic speech-to-speech translation (S2S) is one of the most challenging speech and language processing tasks, especially when considering its application to real-time settings. Recent advances in streaming Automatic Speech Recognition (ASR), simultaneous Machine Translation (MT) and incremental neural Text-To-Speech (TTS) make it possible to develop real-time cascade S2S systems with greatly improved accuracy. On the way to simultaneous machine interpretation, a state-of-the-art cascade streaming S2S system is described and empirically assessed in the simultaneous interpretation of European Parliament debates. We pay particular attention to the TTS component, particularly in terms of speech naturalness under a variety of response-time settings, as well as in terms of speaker similarity for its cross-lingual voice cloning capabilities.

“Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization”
Full article: https://dx.doi.org/10.21437/Interspeech.2021-1905
Europarl-ASR corpus website: https://www.mllp.upv.es/europarl-asr
Gonçal V. Garcés Díaz-Munío, Joan-Albert Silvestre-Cerdà, Javier Jorge, Adrià Giménez Pastor, Javier Iranzo-Sánchez, Pau Baquero-Arnal, Nahuel Roselló, Alejandro Pérez-González-de-Martos, Jorge Civera, Albert Sanchis, Alfons Juan
We introduce Europarl-ASR, a large speech and text corpus of parliamentary debates including 1300 hours of transcribed speeches and 70 million tokens of text in English extracted from European Parliament sessions. The training set is labelled with the Parliament’s non-fully-verbatim official transcripts, time-aligned. As verbatimness is critical for acoustic model training, we also provide automatically noise-filtered and automatically verbatimized transcripts of all speeches based on speech data filtering and verbatimization techniques. Additionally, 18 hours of transcribed speeches were manually verbatimized to build reliable speaker-dependent and speaker-independent development/test sets for streaming ASR benchmarking. The availability of manual non-verbatim and verbatim transcripts for dev/test speeches makes this corpus useful for the assessment of automatic filtering and verbatimization techniques. This paper describes the corpus and its creation, and provides off-line and streaming ASR baselines for both the speaker-dependent and speaker-independent tasks using the three training transcription sets. The corpus is publicly released under an open licence.

Do keep an eye on the conference’s programme at the Interspeech 2021 website to learn when we will be presenting these articles.

Since the foundation of the MLLP research group (2014), MLLP members have published over 10 international journal articles (Neural Networks, 2021; IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 2015; …) and over 20 international conference papers (EMNLP 2020; Interspeech 2020; ICASSP 2020 [1][2]; …). You can browse through all of the 200+ publications by MLLP researchers in the Publications section in our website.

We at the MLLP are very glad to participate in Interspeech 2021. We look forward to seeing you there!