InterSpeech 2020: MLLP article on “Improved Hybrid Streaming ASR” accepted for publication

The article “Improved Hybrid Streaming ASR with Transformer Language Models”, by Pau Baquero, Javier Jorge and other MLLP researchers, has been accepted for publication at the ISCA’s InterSpeech 2020 conference (CORE A).

The 21st Annual Conference of the International Speech Communication Association (InterSpeech 2020), to be held this year in Shanghai (China) on October 25–29, is the world’s largest technical conference focused on speech processing and application, with over 1000 attendees and over 600 papers and a CORE A conference ranking. InterSpeech emphasizes interdisciplinary approaches addressing all aspects of speech science and technology, and this year it will be organized around the topic “Cognitive Intelligence for Speech Processing”.

We’re proud to announce that the MLLP article “Improved Hybrid Streaming ASR with Transformer Language Models”, by MLLP researchers Pau Baquero-Arnal, Javier Jorge, Adrià Giménez, Joan Albert Silvestre-Cerdà, Javier Iranzo-Sánchez, Albert Sanchis, Jorge Civera and Alfons Juan, has been accepted for publication at the conference. You can read here the paper’s abstract:

Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.

Since the foundation of the MLLP research group (2014), MLLP members have published over 10 international journal articles (IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 51, 2015; …) and over 20 international conference papers (ICASSP 2020 [1][2]; Interspeech 2019; AMTA 2014; …). You can browse through all of the 200+ publications by MLLP researchers in the Publications section in our website.

We at the MLLP are very glad to participate in InterSpeech 2020. We look forward to seeing you there!