New MLLP article on streaming speech translation with direct segmentation published in the INNS Neural Networks journal

The article “Streaming cascade-based speech translation leveraged by a direct segmentation model“, by Javier Iranzo-Sánchez and other MLLP researchers, has been accepted for publication in the 2021 Special Issue of the INNS journal Neural Networks.

The article’s authors, MLLP members Javier Iranzo-Sánchez, Javier Jorge, Pau Baquero-Arnal, Joan Albert Silvestre-Cerdà, Adrià Giménez, Jorge Civera, Albert Sanchis and Alfons Juan, have summarized it in the following abstract:

The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. Nowadays, state-of-the-art ST systems are populated with deep neural networks that are conceived to work in an offline setup in which the audio input to be translated is fully available in advance. However, a streaming setup defines a completely different picture, in which an unbounded audio input gradually becomes available and at the same time the translation needs to be generated under real-time constraints. In this work, we present a state-of-the-art streaming ST system in which neural-based models integrated in the ASR and MT components are carefully adapted in terms of their training and decoding procedures in order to run under a streaming setup. In addition, a direct segmentation model that adapts the continuous ASR output to the capacity of simultaneous MT systems trained at the sentence level is introduced to guarantee low latency while preserving the translation quality of the complete ST system. The resulting ST system is thoroughly evaluated on the real-life streaming Europarl-ST benchmark to gauge the trade-off between quality and latency for each component individually as well as for the complete ST system.

The journal Neural Networks, published by the International Neural Network Society (INNS), covers all aspects of neural networks and related approaches to computational intelligence. With a JCR 2020 impact factor of 8.050, it’s a Q1 journal in the category of Computer Science – Artificial Intelligence.

Since the foundation of the MLLP research group (2014), MLLP members have published over 10 international journal articles (Neural Networks, 2021; IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 2015; …) and over 20 international conference papers (Interspeech 2021 [1][2]; EMNLP 2020; ICASSP 2020 [1][2]; …). You can browse through all of the 200+ publications by MLLP researchers in the Publications section in our website.