Baquero-Arnal, Pau ; Jorge, Javier ; Giménez, Adrià ; Silvestre-Cerdà, Joan Albert ; Iranzo-Sánchez, Javier ; Sanchis, Albert ; Civera, Jorge ; Juan, Alfons Improved Hybrid Streaming ASR with Transformer Language Models Inproceedings Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020), pp. 2127–2131, Shanghai (China), 2020. Abstract | Links | BibTeX | Tags: hybrid ASR, language models, streaming, Transformer @inproceedings{Baquero-Arnal2020,
title = {Improved Hybrid Streaming ASR with Transformer Language Models},
author = {Baquero-Arnal, Pau and Jorge, Javier and Giménez, Adrià and Silvestre-Cerdà, Joan Albert and Iranzo-Sánchez, Javier and Sanchis, Albert and Civera, Jorge and Juan, Alfons},
url = {http://dx.doi.org/10.21437/Interspeech.2020-2770},
year = {2020},
date = {2020-01-01},
booktitle = {Proc. of 21st Annual Conf. of the Intl. Speech Communication Association (InterSpeech 2020)},
pages = {2127--2131},
address = {Shanghai (China)},
abstract = {Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup.},
keywords = {hybrid ASR, language models, streaming, Transformer},
pubstate = {published},
tppubtype = {inproceedings}
}
Streaming ASR is gaining momentum due to its wide applicability, though it is still unclear how best to come close to the accuracy of state-of-the-art off-line ASR systems when the output must come within a short delay after the incoming audio stream. Following our previous work on streaming one-pass decoding with hybrid ASR systems and LSTM language models, in this work we report further improvements by replacing LSTMs with Transformer models. First, two key ideas are discussed so as to run these models fast during inference. Then, empirical results on LibriSpeech and TED-LIUM are provided showing that Transformer language models lead to improved recognition rates on both tasks. ASR systems obtained in this work can be seamlessly transfered to a streaming setup with minimal quality losses. Indeed, to the best of our knowledge, no better results have been reported on these tasks when assessed under a streaming setup. |
Villar Lafuente, Carlos ; Garcés Díaz-Munío, Gonçal Several approaches for tweet topic classification in COSET – IberEval 2017 Inproceedings Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), pp. 36–42, Murcia (Spain), 2017. Abstract | Links | BibTeX | Tags: COSET2017, language models, linear models, neural networks, sentence embeddings, text classification @inproceedings{Lafuente2017,
title = {Several approaches for tweet topic classification in COSET – IberEval 2017},
author = {Villar Lafuente, Carlos and Garcés Díaz-Munío, Gonçal},
url = {http://hdl.handle.net/10251/166361
http://ceur-ws.org/Vol-1881/COSET_paper_4.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)},
pages = {36--42},
address = {Murcia (Spain)},
abstract = {[EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline.
[CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF".},
keywords = {COSET2017, language models, linear models, neural networks, sentence embeddings, text classification},
pubstate = {published},
tppubtype = {inproceedings}
}
[EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline.
[CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF". |