Iranzo-Sánchez, Jorge; Iranzo-Sánchez, Javier; Giménez, Adrià; Civera, Jorge Going Beyond Your Expectations in Latency Metrics for Simultaneous Speech Translation Inproceedings Forthcoming ACL 2025, Vienna (Austria), Forthcoming. Abstract | Links | BibTeX | Tags: latency metrics, Simultaneous Speech Translation @inproceedings{Iranzo-SánchezACL2025,
title = {Going Beyond Your Expectations in Latency Metrics for Simultaneous Speech Translation},
author = {Jorge Iranzo-Sánchez AND Javier Iranzo-Sánchez AND Adrià Giménez AND Jorge Civera},
url = {https://2025.aclweb.org/program/find_papers/},
year = {2025},
date = {2025-01-01},
booktitle = {ACL 2025},
address = {Vienna (Austria)},
abstract = {Current evaluation practices in Simultaneous Speech Translation (SimulST) systems typically involve segmenting the input audio and corresponding translations, calculating quality and latency metrics for each segment, and averaging the results. Although this approach may provide a reliable estimation of translation quality, it can lead to misleading values of latency metrics due to an inherent assumption that average latency values are good enough estimators of SimulST systems' response time. However, our detailed analysis of latency evaluations for state-of-the-art SimulST systems demonstrates that latency distributions are often skewed and subject to extreme variations. As a result, the mean in latency metrics fails to capture these anomalies, potentially masking the lack of robustness in some systems and metrics. In this paper, a thorough analysis of the results of systems submitted to recent editions of the IWSLT simultaneous track is provided to support our hypothesis and alternative ways to report latency metrics are proposed in order to provide a better understanding of SimulST systems' latency.},
keywords = {latency metrics, Simultaneous Speech Translation},
pubstate = {forthcoming},
tppubtype = {inproceedings}
}
Current evaluation practices in Simultaneous Speech Translation (SimulST) systems typically involve segmenting the input audio and corresponding translations, calculating quality and latency metrics for each segment, and averaging the results. Although this approach may provide a reliable estimation of translation quality, it can lead to misleading values of latency metrics due to an inherent assumption that average latency values are good enough estimators of SimulST systems' response time. However, our detailed analysis of latency evaluations for state-of-the-art SimulST systems demonstrates that latency distributions are often skewed and subject to extreme variations. As a result, the mean in latency metrics fails to capture these anomalies, potentially masking the lack of robustness in some systems and metrics. In this paper, a thorough analysis of the results of systems submitted to recent editions of the IWSLT simultaneous track is provided to support our hypothesis and alternative ways to report latency metrics are proposed in order to provide a better understanding of SimulST systems' latency. |
Iranzo-Sánchez, Javier; Jorge, Javier; Pérez-González-de-Martos, Alejandro; Giménez, Adrià; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert; Civera, Jorge; Sanchis, Albert; Juan, Alfons MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks Inproceedings Proc. of 19th Intl. Conf. on Spoken Language Translation (IWSLT 2022), pp. 255–264, Dublin (Ireland), 2022. Abstract | Links | BibTeX | Tags: Simultaneous Speech Translation, speech-to-speech translation @inproceedings{Iranzo-Sánchez2022b,
title = {MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks},
author = {Javier Iranzo-Sánchez and Javier Jorge and Alejandro Pérez-González-de-Martos and Adrià Giménez and Garcés Díaz-Munío, Gonçal V. and Pau Baquero-Arnal and Joan Albert Silvestre-Cerdà and Jorge Civera and Albert Sanchis and Alfons Juan},
doi = {10.18653/v1/2022.iwslt-1.22},
year = {2022},
date = {2022-01-01},
booktitle = {Proc. of 19th Intl. Conf. on Spoken Language Translation (IWSLT 2022)},
pages = {255--264},
address = {Dublin (Ireland)},
abstract = {This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference.},
keywords = {Simultaneous Speech Translation, speech-to-speech translation},
pubstate = {published},
tppubtype = {inproceedings}
}
This work describes the participation of the MLLP-VRAIN research group in the two shared tasks of the IWSLT 2022 conference: Simultaneous Speech Translation and Speech-to-Speech Translation. We present our streaming-ready ASR, MT and TTS systems for Speech Translation and Synthesis from English into German. Our submission combines these systems by means of a cascade approach paying special attention to data preparation and decoding for streaming inference. |