New MLLP article “Towards cross-lingual voice cloning in higher education” published in the IFAC Engineering Applications of Artificial Intelligence journal

EAAIThe article Towards cross-lingual voice cloning in higher education“, by Alejandro Pérez González de Martos and other MLLP researchers, has been accepted for publication in Volume 105 of the IFAC’s journal Engineering Applications of Artificial Intelligence.

The article’s authors, MLLP members Alejandro Pérez González de Martos, Gonçal Garcés Díaz-Munío, Adrià Giménez, Joan Albert Silvestre-Cerdà, Albert Sanchis, Jorge Civera, Manuel Jiménez, Carlos Turró and Alfons Juan, have summarized it in the following abstract:

The rapid progress of modern AI tools for automatic speech recognition and machine translation is leading to a progressive cost reduction to produce publishable subtitles for educational videos in multiple languages. Similarly, text-to-speech technology is experiencing large improvements in terms of quality, flexibility and capabilities. In particular, state-of-the-art systems are now capable of seamlessly dealing with multiple languages and speakers in an integrated manner, thus enabling lecturer’s voice cloning in languages they might not even speak. This work is to report the experience gained on using such systems at the Universitat Politècnica de València (UPV), mainly as a guidance for other educational organizations willing to conduct similar studies. It builds on previous work on the UPV’s main repository of educational videos, mèdiaUPV, to produce multilingual subtitles at scale and low cost. Here, a detailed account is given on how this work has been extended to also allow for massive machine dubbing of mèdiaUPV. This includes collecting 59 hours of clean speech data from UPV’s academic staff, and extending our production pipeline of subtitles with a state-of-the-art multilingual and multi-speaker text-to-speech system trained from the collected data. Our main result comes from an extensive subjective evaluation of this system by lecturers contributing to data collection. In brief, it is shown that text-to-speech technology is not only mature enough for its application to mèdiaUPV, but also needed as soon as possible by students to improve its accessibility and bridge language barriers.

The journal Engineering Applications of Artificial Intelligence, published by the International Federation of Automatic Control (IFAC), covers work on the practical application of AI methods in all branches of engineering. With a JCR 2020 impact factor of 6.212, it’s a Q1 journal in the category of “Computer Science, Artificial Intelligence” (as well as in “Engineering, Electrical & Electronic”, “Automation and Control Systems” and “Engineering, Multidiscplinary”).

Since the foundation of the MLLP research group in 2014, MLLP members have published over 10 international journal articles (Neural Networks, 2021; IEEE-ACM Trans. Audio Speech Lang., 2018; Pattern Recognition Letters, 2015; …) and over 20 international conference papers (Interspeech 2021 [1][2]; EMNLP 2020; ICASSP 2020 [1][2]; …). You can browse through all of the 200+ publications by MLLP researchers in the Publications section in our website.

