Jorge, Javier ; Giménez, Adrià ; Silvestre-Cerdà, Joan Albert ; Civera, Jorge ; Sanchis, Albert ; Alfons, Juan Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models Journal Article IEEE/ACM Transactions on Audio, Speech, and Language Processing, 30 , pp. 148–161, 2021. Abstract | Links | BibTeX | Tags: acoustic modelling, Automatic Speech Recognition, decoding, language modelling, neural networks, streaming @article{Jorge2021b,
title = {Live Streaming Speech Recognition Using Deep Bidirectional LSTM Acoustic Models and Interpolated Language Models},
author = {Jorge, Javier and Giménez, Adrià and Silvestre-Cerdà, Joan Albert and Civera, Jorge and Sanchis, Albert and Juan Alfons},
doi = {10.1109/TASLP.2021.3133216},
year = {2021},
date = {2021-11-23},
journal = {IEEE/ACM Transactions on Audio, Speech, and Language Processing},
volume = {30},
pages = {148--161},
abstract = {Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience in this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal perfomance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows from a large Spanish broadcasting station.},
keywords = {acoustic modelling, Automatic Speech Recognition, decoding, language modelling, neural networks, streaming},
pubstate = {published},
tppubtype = {article}
}
Although Long-Short Term Memory (LSTM) networks and deep Transformers are now extensively used in offline ASR, it is unclear how best offline systems can be adapted to work with them under the streaming setup. After gaining considerable experience in this regard in recent years, in this paper we show how an optimized, low-latency streaming decoder can be built in which bidirectional LSTM acoustic models, together with general interpolated language models, can be nicely integrated with minimal perfomance degradation. In brief, our streaming decoder consists of a one-pass, real-time search engine relying on a limited-duration window sliding over time and a number of ad hoc acoustic and language model pruning techniques. Extensive empirical assessment is provided on truly streaming tasks derived from the well-known LibriSpeech and TED talks datasets, as well as from TV shows from a large Spanish broadcasting station. |
Villar Lafuente, Carlos ; Garcés Díaz-Munío, Gonçal Several approaches for tweet topic classification in COSET – IberEval 2017 Inproceedings Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), pp. 36–42, Murcia (Spain), 2017. Abstract | Links | BibTeX | Tags: COSET2017, language models, linear models, neural networks, sentence embeddings, text classification @inproceedings{Lafuente2017,
title = {Several approaches for tweet topic classification in COSET – IberEval 2017},
author = {Villar Lafuente, Carlos and Garcés Díaz-Munío, Gonçal},
url = {http://hdl.handle.net/10251/166361
http://ceur-ws.org/Vol-1881/COSET_paper_4.pdf},
year = {2017},
date = {2017-01-01},
booktitle = {Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)},
pages = {36--42},
address = {Murcia (Spain)},
abstract = {[EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline.
[CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF".},
keywords = {COSET2017, language models, linear models, neural networks, sentence embeddings, text classification},
pubstate = {published},
tppubtype = {inproceedings}
}
[EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline.
[CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF". |