del-Agua, Miguel Ángel; Piqueras, Santiago; Giménez, Adrià; Sanchis, Alberto; Civera, Jorge; Juan, Alfons ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks Inproceedings Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016), pp. 3464–3468, San Francisco (USA), 2016. Abstract | Links | BibTeX | Tags: BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition @inproceedings{del-Agua2016,
title = {ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks},
author = {Miguel Ángel del-Agua and Santiago Piqueras and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan},
doi = {10.21437/Interspeech.2016-1142},
year = {2016},
date = {2016-09-08},
booktitle = {Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016)},
pages = {3464--3468},
address = {San Francisco (USA)},
abstract = {Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation.},
keywords = {BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition},
pubstate = {published},
tppubtype = {inproceedings}
}
Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation. |