Baquero Arnal, Pau Transformer models for Machine Translation and Streaming Automatic Speech Recognition PhD Thesis Universitat Politècnica de València, 2023, (Advisors: Alfons Juan Ciscar and Hermann Ney). Links | BibTeX | Tags: Automatic Speech Recognition, Neural Machine Translation, Transformer, Transformer Language Model @phdthesis{Arnal2023,
title = {Transformer models for Machine Translation and Streaming Automatic Speech Recognition},
author = {Baquero Arnal, Pau},
url = {https://www.upv.es/pls/oalu/sic_ted.mostrar_tesis?p_num_reg=12917},
year = {2023},
date = {2023-01-01},
school = {Universitat Politècnica de València},
note = {Advisors: Alfons Juan Ciscar and Hermann Ney},
keywords = {Automatic Speech Recognition, Neural Machine Translation, Transformer, Transformer Language Model},
pubstate = {published},
tppubtype = {phdthesis}
}
|
Baquero-Arnal, Pau ; Jorge, Javier ; Giménez, Adrià ; Silvestre-Cerdà, Joan Albert ; Iranzo-Sánchez, Javier ; Sanchis, Albert ; Civera, Jorge ; Juan, Alfons Improved Hybrid Streaming ASR with Transformer Language Models Inproceedings Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020), pp. 2127–2131, Shanghai (China), 2020. Abstract | Links | BibTeX | Tags: hybrid ASR, language models, streaming, Transformer @inproceedings{Baquero-Arnal2020,
title = {Improved Hybrid Streaming ASR with Transformer Language Models},
author = {Baquero-Arnal, Pau and Jorge, Javier and Giménez, Adrià and Silvestre-Cerdà, Joan Albert and Iranzo-Sánchez, Javier and Sanchis, Albert and Civera, Jorge and Juan, Alfons},
url = {http://dx.doi.org/10.21437/Interspeech.2020-2770},
year = {2020},
date = {2020-01-01},
booktitle = {Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020)},
pages = {2127--2131},
address = {Shanghai (China)},
abstract = {Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.},
keywords = {hybrid ASR, language models, streaming, Transformer},
pubstate = {published},
tppubtype = {inproceedings}
}
Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup. |