|
@@ -82,8 +82,9 @@ GET THE DATA
|
|
|
|
|
|
Download the full Europarl-ASR speech and text corpus from:
|
|
|
|
|
|
-https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0.tar.gz
|
|
|
-
|
|
|
+https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0.tar.gz
|
|
|
+Size: 18 GiB
|
|
|
+SHA-256 checksum: 4d360170ef8f1d1ece55566eda4211274b27328427a3443061f43d80d3346e74
|
|
|
|
|
|
ADDITIONAL Europarl-ASR MATERIALS
|
|
|
---------------------------------
|
|
@@ -93,21 +94,23 @@ described in this document, we are making available for download the following
|
|
|
materials to facilitate the reproducibility of our experiments:
|
|
|
|
|
|
* The pretrained Europarl-ASR English-language n-gram language model, together
|
|
|
- with its vocabulary file:
|
|
|
-
|
|
|
- https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0_ngram_lm_and_vocab.tar.gz
|
|
|
+ with its vocabulary file:
|
|
|
+ https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0_ngram_lm_and_vocab.tar.gz
|
|
|
+ Size: 1,1 GiB
|
|
|
+ SHA-256 checksum: 2be8eb7918086a233545e6e5a0592b7ae83a09ffb5ce479b68e28329d710cd6a
|
|
|
|
|
|
* The Europarl-ASR English-language verbatim transcription guidelines, which
|
|
|
were applied to produce the manually revised verbatim transcriptions for the
|
|
|
- dev and test sets:
|
|
|
-
|
|
|
- https://www.mllp.upv.es/europarl-asr/Europarl-ASR_transcription_guidelines.pdf
|
|
|
+ dev and test sets:
|
|
|
+ https://www.mllp.upv.es/europarl-asr/Europarl-ASR_transcription_guidelines.pdf
|
|
|
+ Size: 309 KiB
|
|
|
+ SHA-256 checksum: 66dac867b76c984d9e583caab0a8fd7540a664017e88e9ec4190c90ab67ce8e6
|
|
|
|
|
|
|
|
|
CORPUS STRUCTURE AND CONTENTS
|
|
|
-----------------------------
|
|
|
|
|
|
-Total size: 20 GB
|
|
|
+Total size: 18 GiB
|
|
|
|
|
|
The data is organized in 3 main directories: "train" (training data), "dev"
|
|
|
(validation data) and "test" (evaluation data). Each directory contains the
|