2025 |
Iranzo-Sánchez, Jorge; Santamaría-Jordà, Jaume; Mas-Mollà, Gerard; Garcés Díaz-Munío, Gonçal V; Iranzo-Sánchez, Javier; Jorge, Javier; Silvestre-Cerdà, Joan Albert; Giménez, Adrià; Civera, Jorge; Sanchis, Albert; Juan, Alfons Speech Translation for Multilingual Medical Education Leveraged by Large Language Models Journal Article Forthcoming Artificial Intelligence In Medicine, Forthcoming. Abstract | BibTeX | Tags: Automatic Speech Recognition, domain adaptation, large language models, Machine Translation, oncology, Speech Translation @article{Iranzo-Sánchez2025, title = {Speech Translation for Multilingual Medical Education Leveraged by Large Language Models}, author = {Jorge Iranzo-Sánchez AND Jaume Santamaría-Jordà AND Gerard Mas-Mollà AND Garcés Díaz-Munío, Gonçal V. AND Javier Iranzo-Sánchez AND Javier Jorge AND Joan Albert Silvestre-Cerdà AND Adrià Giménez AND Jorge Civera AND Albert Sanchis AND Alfons Juan}, year = {2025}, date = {2025-01-01}, journal = {Artificial Intelligence In Medicine}, abstract = {The application of large language models (LLMs) to speech translation (ST), or in general, to machine translation (MT), has recently provided excellent results superseding conventional encoder-decoder MT systems in the general domain. However, this is not clearly the case when LLMs as MT systems are translating medical-related materials. In this respect, the provision of multilingual training materials for oncology professionals is a goal of the EU project Interact-Europe in which this work was framed. To this end, cross-language technology adapted to the oncology domain was developed, evaluated and deployed for multilingual interspeciality medical education. More precisely, automatic speech recognition (ASR) and MT models were adapted to the oncology domain to translate English pre-recorded training videos, kindly provided by the European School of Oncology (ESO), into French, Spanish, German and Slovene. In this work, three categories of MT models adapted to the medical domain were assessed: bilingual encoder-decoder MT models trained from scratch, pre-trained large multilingual encoder-decoder MT models and multilingual decoder-only LLMs. The experimental results underline the competitiveness in translation quality of LLMs compared to encoder-decoder MT models. Finally, the ESO speech dataset, comprising roughly 1,000 videos and 745 hours for the training and evaluation of ASR and MT models, was publicly released for the scientific community.}, keywords = {Automatic Speech Recognition, domain adaptation, large language models, Machine Translation, oncology, Speech Translation}, pubstate = {forthcoming}, tppubtype = {article} } The application of large language models (LLMs) to speech translation (ST), or in general, to machine translation (MT), has recently provided excellent results superseding conventional encoder-decoder MT systems in the general domain. However, this is not clearly the case when LLMs as MT systems are translating medical-related materials. In this respect, the provision of multilingual training materials for oncology professionals is a goal of the EU project Interact-Europe in which this work was framed. To this end, cross-language technology adapted to the oncology domain was developed, evaluated and deployed for multilingual interspeciality medical education. More precisely, automatic speech recognition (ASR) and MT models were adapted to the oncology domain to translate English pre-recorded training videos, kindly provided by the European School of Oncology (ESO), into French, Spanish, German and Slovene. In this work, three categories of MT models adapted to the medical domain were assessed: bilingual encoder-decoder MT models trained from scratch, pre-trained large multilingual encoder-decoder MT models and multilingual decoder-only LLMs. The experimental results underline the competitiveness in translation quality of LLMs compared to encoder-decoder MT models. Finally, the ESO speech dataset, comprising roughly 1,000 videos and 745 hours for the training and evaluation of ASR and MT models, was publicly released for the scientific community. |
2020 |
Iranzo-Sánchez, Javier; Silvestre-Cerdà, Joan Albert; Jorge, Javier; Roselló, Nahuel; Giménez, Adrià; Sanchis, Albert; Civera, Jorge; Juan, Alfons Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation @inproceedings{Iranzo2020, title = {Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates}, author = {Javier Iranzo-Sánchez and Joan Albert Silvestre-Cerdà and Javier Jorge and Nahuel Roselló and Adrià Giménez and Albert Sanchis and Jorge Civera and Alfons Juan}, url = {https://arxiv.org/abs/1911.03167 https://paperswithcode.com/paper/europarl-st-a-multilingual-corpus-for-speech https://www.mllp.upv.es/europarl-st/}, doi = {10.1109/ICASSP40776.2020.9054626}, year = {2020}, date = {2020-01-01}, booktitle = {Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020)}, pages = {8229--8233}, address = {Barcelona (Spain)}, abstract = {Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable.}, keywords = {Automatic Speech Recognition, Machine Translation, Multilingual Corpus, Speech Translation, Spoken Language Translation}, pubstate = {published}, tppubtype = {inproceedings} } Current research into spoken language translation (SLT), or speech-to-text translation, is often hampered by the lack of specific data resources for this task, as currently available SLT datasets are restricted to a limited set of language pairs. In this paper we present Europarl-ST, a novel multilingual SLT corpus containing paired audio-text samples for SLT from and into 6 European languages, for a total of 30 different translation directions. This corpus has been compiled using the de-bates held in the European Parliament in the period between2008 and 2012. This paper describes the corpus creation process and presents a series of automatic speech recognition,machine translation and spoken language translation experiments that highlight the potential of this new resource. The corpus is released under a Creative Commons license and is freely accessible and downloadable. |
Publications
Accessibility Automatic Speech Recognition Computer-assisted transcription Confidence measures Deep Neural Networks Docencia en Red Education language model adaptation Language Modeling Language Technologies Length modelling Log-linear models Machine Translation Massive Adaptation Models basats en seqüències de paraules Models log-lineals Multilingualism Neural Machine Translation Opencast Matterhorn Polimedia Sliding window Speaker adaptation Speech Recognition Speech Translation Statistical machine translation streaming text-to-speech transcripciones video lecture repositories Video Lectures
2025 |
Speech Translation for Multilingual Medical Education Leveraged by Large Language Models Journal Article Forthcoming Artificial Intelligence In Medicine, Forthcoming. |
2020 |
Europarl-ST: A Multilingual Corpus for Speech Translation of Parliamentary Debates Inproceedings Proc. of 45th Intl. Conf. on Acoustics, Speech, and Signal Processing (ICASSP 2020), pp. 8229–8233, Barcelona (Spain), 2020. |