Gonçal V. Garcés Díaz-Munío 2 years ago
parent
commit
faa5582961
1 changed files with 1 additions and 1 deletions
  1. 1 1
      README.md

+ 1 - 1
README.md

@@ -27,7 +27,7 @@ Please refer to the publication:
 
 In the cascade approach to Speech Translation, an ASR system first transcribes the audio, and then the transcriptions are translated by a downstream MT system. 
 
-Standard MT systems are usually trained with sequences of around 100-150 tokens. Thus, if the audio is short, we can directly translate the transcription. However, when we have a long audio stream, the resulting transcription is many times longer than the maximum length seen by the MT model during training. This is why it is necessary to have a segmenter model that takes as input the stream of transcriped words, and outputs a stream of (hopefully) semantically self-contained segments, which are then translated independently by the MT model. The model presented here has been prepared to carry out segmentation in a streaming fashion.
+Standard MT systems are usually trained with sequences of around 100-150 tokens. Thus, if the audio is short, we can directly translate the transcription. However, when we have a long audio stream, the resulting transcription is many times longer than the maximum length seen by the MT model during training. This is why it is necessary to have a segmenter model that takes as input the stream of transcribed words, and outputs a stream of (hopefully) semantically self-contained segments, which are then translated independently by the MT model. The model presented here has been prepared to carry out segmentation in a streaming fashion.
 
 
 ## Get the code