2020 |
Baquero-Arnal, Pau ; Jorge, Javier ; Giménez, Adrià ; Silvestre-Cerdà, Joan Albert ; Iranzo-Sánchez, Javier ; Sanchis, Albert ; Civera, Jorge ; Juan, Alfons Improved Hybrid Streaming ASR with Transformer Language Models Inproceedings Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020), pp. 2127–2131, Shanghai (China), 2020. Abstract | Links | BibTeX | Tags: hybrid ASR, language models, streaming, Transformer @inproceedings{Baquero-Arnal2020, title = {Improved Hybrid Streaming ASR with Transformer Language Models}, author = {Baquero-Arnal, Pau and Jorge, Javier and Giménez, Adrià and Silvestre-Cerdà, Joan Albert and Iranzo-Sánchez, Javier and Sanchis, Albert and Civera, Jorge and Juan, Alfons}, url = {http://dx.doi.org/10.21437/Interspeech.2020-2770}, year = {2020}, date = {2020-01-01}, booktitle = {Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020)}, pages = {2127--2131}, address = {Shanghai (China)}, abstract = {Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.}, keywords = {hybrid ASR, language models, streaming, Transformer}, pubstate = {published}, tppubtype = {inproceedings} } Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup. |
Iranzo-Sánchez, Javier; Giménez Pastor, Adrià ; Silvestre-Cerdà, Joan Albert; Baquero-Arnal, Pau; Saiz, Jorge Civera; Juan, Alfons Direct Segmentation Models for Streaming Speech Translation Inproceedings 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 2599–2611, 2020. Abstract | Links | BibTeX | Tags: Segmentation, Speech Translation, streaming @inproceedings{Iranzo-Sánchez2020, title = {Direct Segmentation Models for Streaming Speech Translation}, author = {Javier Iranzo-Sánchez and Giménez Pastor, Adrià and Joan Albert Silvestre-Cerdà and Pau Baquero-Arnal and Jorge Civera Saiz and Alfons Juan}, url = {http://dx.doi.org/10.18653/v1/2020.emnlp-main.206}, year = {2020}, date = {2020-01-01}, booktitle = {2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020)}, pages = {2599--2611}, abstract = {The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is especially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work.}, keywords = {Segmentation, Speech Translation, streaming}, pubstate = {published}, tppubtype = {inproceedings} } The cascade approach to Speech Translation (ST) is based on a pipeline that concatenates an Automatic Speech Recognition (ASR) system followed by a Machine Translation (MT) system. These systems are usually connected by a segmenter that splits the ASR output into, hopefully, semantically self-contained chunks to be fed into the MT system. This is especially challenging in the case of streaming ST, where latency requirements must also be taken into account. This work proposes novel segmentation models for streaming ST that incorporate not only textual, but also acoustic information to decide when the ASR output is split into a chunk. An extensive and thorough experimental setup is carried out on the Europarl-ST dataset to prove the contribution of acoustic information to the performance of the segmentation model in terms of BLEU score in a streaming ST scenario. Finally, comparative results with previous work also show the superiority of the segmentation models proposed in this work. |
2019 |
Baquero-Arnal, Pau ; Iranzo-Sánchez, Javier ; Civera, Jorge ; Juan, Alfons The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 179-184, Florence (Italy), 2019. Abstract | Links | BibTeX | Tags: Machine Translation, Neural Machine Translation, WMT19 @inproceedings{Baquero-Arnal2019, title = {The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task}, author = {Baquero-Arnal, Pau and Iranzo-Sánchez, Javier and Civera, Jorge and Juan, Alfons}, url = {https://www.aclweb.org/anthology/W19-5423/ https://www.mllp.upv.es/wp-content/uploads/2019/09/poster-2.pdf}, year = {2019}, date = {2019-01-01}, booktitle = {Proc. of Fourth Conference on Machine Translation (WMT19)}, pages = {179-184}, address = {Florence (Italy)}, abstract = {This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. They are based on the Transformer architecture, as well as on a novel architecture called 2D alternating RNN. Both systems have been domain adapted through fine-tuning which has been shown to be very effective.}, keywords = {Machine Translation, Neural Machine Translation, WMT19}, pubstate = {published}, tppubtype = {inproceedings} } This paper describes the participation of the MLLP research group of the Universitat Politècnica de València in the WMT 2019 Similar Language Translation Shared Task. We have submitted systems for the Portuguese ↔ Spanish language pair, in both directions. They are based on the Transformer architecture, as well as on a novel architecture called 2D alternating RNN. Both systems have been domain adapted through fine-tuning which has been shown to be very effective. |
2018 |
Valor Miró, Juan Daniel ; Baquero-Arnal, Pau; Civera, Jorge; Turró, Carlos; Juan, Alfons Multilingual videos for MOOCs and OER Journal Article Journal of Educational Technology & Society, 21 (2), pp. 1–12, 2018. Abstract | Links | BibTeX | Tags: Machine Translation, MOOCs, multilingual, Speech Recognition, video lecture repositories @article{Miró2018, title = {Multilingual videos for MOOCs and OER}, author = {Valor Miró, Juan Daniel and Pau Baquero-Arnal and Jorge Civera and Carlos Turró and Alfons Juan}, url = {https://www.mllp.upv.es/wp-content/uploads/2019/11/JETS2018MLLP.pdf https://www.j-ets.net/collection/published-issues/21_2}, year = {2018}, date = {2018-01-01}, journal = {Journal of Educational Technology & Society}, volume = {21}, number = {2}, pages = {1--12}, abstract = {Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domain-adapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%–75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative.}, keywords = {Machine Translation, MOOCs, multilingual, Speech Recognition, video lecture repositories}, pubstate = {published}, tppubtype = {article} } Massive Open Online Courses (MOOCs) and Open Educational Resources (OER) are rapidly growing, but are not usually offered in multiple languages due to the lack of cost-effective solutions to translate the different objects comprising them and particularly videos. However, current state-of-the-art automatic speech recognition (ASR) and machine translation (MT) techniques have reached a level of maturity which opens the possibility of producing multilingual video subtitles of publishable quality at low cost. This work summarizes authors' experience in exploring this possibility in two real-life case studies: a MOOC platform and a large video lecture repository. Apart from describing the systems, tools and integration components employed for such purpose, a comprehensive evaluation of the results achieved is provided in terms of quality and efficiency. More precisely, it is shown that draft multilingual subtitles produced by domain-adapted ASR/MT systems reach a level of accuracy that make them worth post-editing, instead of generating them ex novo, saving approximately 25%–75% of the time. Finally, the results reported on user multilingual data consumption reflect that multilingual subtitles have had a very positive impact in our case studies boosting student enrolment, in the case of the MOOC platform, by 70% relative. |
Iranzo-Sánchez, Javier ; Baquero-Arnal, Pau ; Garcés Díaz-Munío, Gonçal V; Martínez-Villaronga, Adrià ; Civera, Jorge ; Juan, Alfons The MLLP-UPV German-English Machine Translation System for WMT18 Inproceedings Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers, pp. 422–428, Brussels (Belgium), 2018. Abstract | Links | BibTeX | Tags: Data Selection, Machine Translation, Neural Machine Translation, WMT18 news translation @inproceedings{Iranzo-Sánchez2018, title = {The MLLP-UPV German-English Machine Translation System for WMT18}, author = {Iranzo-Sánchez, Javier and Baquero-Arnal, Pau and Garcés Díaz-Munío, Gonçal V. and Martínez-Villaronga, Adrià and Civera, Jorge and Juan, Alfons}, url = {http://dx.doi.org/10.18653/v1/W18-6414 https://www.mllp.upv.es/wp-content/uploads/2018/11/wmt18_mllp-upv_poster.pdf}, year = {2018}, date = {2018-01-01}, booktitle = {Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers}, pages = {422--428}, address = {Brussels (Belgium)}, abstract = {[EN] This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German>English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under "constrained" conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora. [CA] "El sistema de traducció automàtica alemany>anglés de l'MLLP-UPV per a WMT18": En aquest article descrivim el sistema de traducció automàtica estadística creat pel grup d'investigació MLLP de la Universitat Politècnica de València per a la competició de traducció de notícies alemany>anglés de la Third Conference on Machine Translation (WMT18, associada a la conferència EMNLP 2018). Hem utilitzat una combinació de sistemes de traducció automàtica neuronal basats en l'arquitectura Transformer. Per a entrenar el nostre sistema en la categoria "fitada" (només amb els corpus lingüístics oficials de la competició), hem filtrat les dades paral·leles disponibles amb una tècnica que assigna puntuacions utilitzant models de llenguatge de caràcters, i hem afegit dades paral·leles basades en frases d'origen sintètiques generades a partir dels corpus monolingües disponibles.}, keywords = {Data Selection, Machine Translation, Neural Machine Translation, WMT18 news translation}, pubstate = {published}, tppubtype = {inproceedings} } [EN] This paper describes the statistical machine translation system built by the MLLP research group of Universitat Politècnica de València for the German>English news translation shared task of the EMNLP 2018 Third Conference on Machine Translation (WMT18). We used an ensemble of Transformer architecture–based neural machine translation systems. To train our system under "constrained" conditions, we filtered the provided parallel data with a scoring technique using character-based language models, and we added parallel data based on synthetic source sentences generated from the provided monolingual corpora. [CA] "El sistema de traducció automàtica alemany>anglés de l'MLLP-UPV per a WMT18": En aquest article descrivim el sistema de traducció automàtica estadística creat pel grup d'investigació MLLP de la Universitat Politècnica de València per a la competició de traducció de notícies alemany>anglés de la Third Conference on Machine Translation (WMT18, associada a la conferència EMNLP 2018). Hem utilitzat una combinació de sistemes de traducció automàtica neuronal basats en l'arquitectura Transformer. Per a entrenar el nostre sistema en la categoria "fitada" (només amb els corpus lingüístics oficials de la competició), hem filtrat les dades paral·leles disponibles amb una tècnica que assigna puntuacions utilitzant models de llenguatge de caràcters, i hem afegit dades paral·leles basades en frases d'origen sintètiques generades a partir dels corpus monolingües disponibles. |
Publications
2020 |
Improved Hybrid Streaming ASR with Transformer Language Models Inproceedings Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020), pp. 2127–2131, Shanghai (China), 2020. |
Direct Segmentation Models for Streaming Speech Translation Inproceedings 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP 2020), pp. 2599–2611, 2020. |
2019 |
The MLLP-UPV Spanish-Portuguese and Portuguese-Spanish Machine Translation Systems for WMT19 Similar Language Translation Task Inproceedings Proc. of Fourth Conference on Machine Translation (WMT19), pp. 179-184, Florence (Italy), 2019. |
2018 |
Multilingual videos for MOOCs and OER Journal Article Journal of Educational Technology & Society, 21 (2), pp. 1–12, 2018. |
The MLLP-UPV German-English Machine Translation System for WMT18 Inproceedings Proc. of the Third Conference on Machine Translation (WMT18), Volume 2: Shared Task Papers, pp. 422–428, Brussels (Belgium), 2018. |