The Universitat Politècnica de València’s VRAIN-MLLP consolidate their position as leaders in live automatic subtitling for TV with their first place at the IberSpeech-RTVE 2020 Challenge

IberSpeech2020The MLLP research group of VRAIN-Universitat Politècnica de València have obtained the first place in the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge, repeating their 2018 victory. This consolidates UPV-VRAIN-MLLP technology as the best in Spain and among the best in the world for live automatic subtitling of TV and broadcast contents.

[En castellano: 6/4/2021, “Resultados del Reto Albayzín RTVE 2020”. Más enlaces al final de la noticia.]

IberSPEECH 2020 (11th Jornadas en Tecnologías del Habla and 7th Iberian SLTech Workshop), held this time online on 24–25 March 2021, is the main international conference focused on research and industry-university collaboration around speech and language technologies on Iberian languages. It is also the context in which the long-running Albayzín evaluation challenges are held. In this edition, the Albayzín evaluation challenges 2020 have focused again on TV broadcast content, based on a large Spanish-language TV show corpus created in collaboration with the Spanish state broadcaster Radio y Televisión Española (RTVE).

For the second time, the MLLP research group of VRAIN-Universitat Politècnica de València participated in the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge. This challenge focused on automatically transcribing different types of TV shows in Spanish. The results of this challenge were announced on 17 March at a live event hosted and broadcast online by RTVE.

We are happy to report that our state-of-the-art streaming speech recognition system obtained the 1st place in this international challenge on automatic transcription of Spanish-language TV shows, repeating our success in the previous 2018 edition.

In the following chart we can see the results of the challenge by system (grouped by participant) in terms of Word Error Rate (WER):

As we can see, the 1st place was for the MLLP’s primary system, which reduced the Word Error Rate (WER) to 16%, a significantly better figure with respect to other participants. This figure was obtained on a blind test set made up of 56 hours of transcribed speech from 16 different Spanish-language TV programmes of different characteristics. More details on the official results of the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge can be found at the Albayzín 2020 website.

The MLLP’s winning primary system (p-streaming_1500ms) was a deep neural network-based streaming speech recognition system trained with the MLLP’s TLK transLectures-UPV toolkit. This made the MLLP the only participant with its own non-Kaldi based state-of-the-art streaming speech recognition toolkit. As the system is streaming-enabled, it could be put into production environments for automatic captioning of live media streams, with a theoretical delay of 1.5 seconds. Along with the primary system, the MLLP’s contrastive system c2-streaming_600ms reduced the latency to 0.81 seconds while keeping the WER as low as 16.9%; that is, we obtained state-of-the-art latencies for high-quality live automatic captioning.

A complete description of the MLLP’s winning streaming ASR systems can be found in the IberSPEECH 2020 article “MLLP-VRAIN Spanish ASR Systems for the Albayzin-RTVE 2020 Speech-To-Text Challenge”, by the MLLP team.

This technology has been developed and deployed at Universitat Politècnica de València (UPV) since 2012 by the MLLP research group of VRAIN Valencian Research Institute for Artificial Intelligence, and it is currently in use for automatic multilingual subtitling of the UPV’s award-winning video lecture repository UPVmèdia and for live automatic subtitling of conferences and lectures with poliSubs. We are also collaborating with Valencian public broadcaster À Punt Mèdia in bringing this technology to their TV, radio and Internet programming.

We would like to thank the organizers of the IberSpeech-RTVE 2020 TV Speech-to-Text Challenge for their work: RTVE, and the Universidad de Zaragoza’s RTVE Chair, Engineering Research Institute of Aragon (I3A) and ViVoLab research laboratory. We look forward to participating in future editions of the challenge.


The VRAIN-MLLP’s 1st place in IberSpeech-RTVE 2020 in the news

