Publications

Iranzo-Sánchez, Jorge; Iranzo-Sánchez, Javier; Giménez, Adrià; Civera, Jorge

Going Beyond Your Expectations in Latency Metrics for Simultaneous Speech Translation Inproceedings

ACL (Findings) 2025, pp. 18205–18228, Vienna (Austria), 2025.

Abstract | Links | BibTeX | Tags: latency metrics, Simultaneous Speech Translation

Iranzo-Sánchez, Jorge ; Iranzo-Sánchez, Javier ; Giménez, Adrià ; Civera, Jorge ; Juan, Alfons

MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation Task Inproceedings

IWSLT 2025, pp. 340–346, Vienna (Austria), 2025.

Abstract | Links | BibTeX | Tags: Simultaneous Speech Translation

@inproceedings{Iranzo-Sánchez2025b,
title = {MLLP-VRAIN UPV system for the IWSLT 2025 Simultaneous Speech Translation Translation Task},
author = {Iranzo-Sánchez, Jorge AND Iranzo-Sánchez, Javier AND Giménez, Adrià AND Civera, Jorge AND Juan, Alfons},
url = {https://arxiv.org/pdf/2506.18828},
doi = {10.18653/v1/2025.iwslt-1.35},
year = {2025},
date = {2025-01-01},
booktitle = {IWSLT 2025},
pages = {340--346},
address = {Vienna (Austria)},
abstract = {This work describes the participation of the MLLP-VRAIN research group in the shared task of the IWSLT 2025 Simultaneous Speech Translation track. Our submission addresses the unique challenges of real-time translation of long-form speech by developing a modular cascade system that adapts strong pre-trained models to streaming scenarios. We combine Whisper Large-V3-Turbo for ASR with the multilingual NLLB-3.3B model for MT, implementing lightweight adaptation techniques rather than training new end-to-end models from scratch. Our approach employs document-level adaptation with prefix training to enhance the MT model's ability to handle incomplete inputs, while incorporating adaptive emission policies including a wait-k strategy and RALCP for managing the translation stream. Specialized buffer management techniques and segmentation strategies ensure coherent translations across long audio sequences. Experimental results on the ACL60/60 dataset demonstrate that our system achieves a favorable balance between translation quality and latency, with a BLEU score of 31.96 and non-computational-aware StreamLAAL latency of 2.94 seconds. Our final model achieves a preliminary score on the official test set (IWSLT25Instruct) of 29.8 BLEU. Our work demonstrates that carefully adapted pre-trained components can create effective simultaneous translation systems for long-form content without requiring extensive in-domain parallel data or specialized end-to-end training.},
keywords = {Simultaneous Speech Translation},
pubstate = {published},
tppubtype = {inproceedings}
}

Close

Iranzo-Sánchez, Javier; Jorge, Javier; Pérez-González-de-Martos, Alejandro; Giménez, Adrià; Garcés Díaz-Munío, Gonçal V; Baquero-Arnal, Pau; Silvestre-Cerdà, Joan Albert; Civera, Jorge; Sanchis, Albert; Juan, Alfons

MLLP-VRAIN UPV systems for the IWSLT 2022 Simultaneous Speech Translation and Speech-to-Speech Translation tasks Inproceedings

Proc. of 19th Intl. Conf. on Spoken Language Translation (IWSLT 2022), pp. 255–264, Dublin (Ireland), 2022.

Abstract | Links | BibTeX | Tags: Simultaneous Speech Translation, speech-to-speech translation