del-Agua, Miguel Ángel; Piqueras, Santiago; Giménez, Adrià; Sanchis, Alberto; Civera, Jorge; Juan, Alfons ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks Inproceedings Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016), pp. 3464–3468, San Francisco (USA), 2016. Abstract | Links | BibTeX | Tags: BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition @inproceedings{del-Agua2016,
title = {ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks},
author = {Miguel Ángel del-Agua and Santiago Piqueras and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan},
doi = {10.21437/Interspeech.2016-1142},
year = {2016},
date = {2016-09-08},
booktitle = {Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016)},
pages = {3464--3468},
address = {San Francisco (USA)},
abstract = {Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation.},
keywords = {BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition},
pubstate = {published},
tppubtype = {inproceedings}
}
Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation. |
Sanchez-Cortina, Isaias; Andrés-Ferrer, Jesús; Sanchis, Alberto; Juan, Alfons Speaker-adapted confidence measures for speech recognition of video lectures Journal Article Computer Speech & Language, 37 , pp. 11–23, 2016, ISBN: 0885-2308. Abstract | Links | BibTeX | Tags: Confidence measures, Log-linear models, Online video lectures, Speaker adaptation, Speech Recognition @article{SanchezCortina2016,
title = {Speaker-adapted confidence measures for speech recognition of video lectures},
author = {Isaias Sanchez-Cortina and Jesús Andrés-Ferrer and Alberto Sanchis and Alfons Juan},
url = {http://www.sciencedirect.com/science/article/pii/S0885230815000960
http://authors.elsevier.com/a/1SAsB39HpSHRc0},
isbn = {0885-2308},
year = {2016},
date = {2016-01-01},
journal = {Computer Speech & Language},
volume = {37},
pages = {11--23},
abstract = {Abstract Automatic Speech Recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent naïve Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the \\{NB\\} behaviour. Additionally, as a main contribution, we propose to adapt the \\{CM\\} to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the \\{NB\\} model is clearly superseded by the proposed \\{LR\\} classifier.},
keywords = {Confidence measures, Log-linear models, Online video lectures, Speaker adaptation, Speech Recognition},
pubstate = {published},
tppubtype = {article}
}
Abstract Automatic Speech Recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent naïve Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the \{NB\} behaviour. Additionally, as a main contribution, we propose to adapt the \{CM\} to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the \{NB\} model is clearly superseded by the proposed \{LR\} classifier. |