Browse Source

README added corpus download link

Gonçal 2 years ago
parent
commit
c50c3ddd73
1 changed files with 30 additions and 20 deletions
  1. 30 20
      README.md

+ 30 - 20
README.md

@@ -4,7 +4,7 @@ v1.0<br />
 [www.mllp.upv.es/europarl-asr](www.mllp.upv.es/europarl-asr)
 
 A large English-language speech and text corpus of parliamentary debates for
-streaming ASR benchmarking and speech data filtering/verbatimization.
+streaming ASR benchmarking, speech data filtering and speech data verbatimization.
 
 Keywords: automatic speech recognition; speech corpus; speech data filtering;
 speech data verbatimization.
@@ -19,8 +19,9 @@ README CONTENTS
 ---------------
 
 - [Overview](#overview)
-- [Corpus structure and contents](#contents)
+- [Get the data](#get)
 - [Additional Europarl-ASR materials](#additional)
+- [Corpus structure and contents](#contents)
 - [Extended description](#description)
 - [Acknowledgements](#ack)
 - [Legal disclaimers](#legal)
@@ -58,6 +59,33 @@ tokens, Europarl-ASR also includes tools to add all English-language text from
 the DCEP Digital Corpus of the European Parliament.
 
 
+<a id="get"></a>GET THE DATA
+------------
+
+Download the full Europarl-ASR speech and text corpus from:
+
+https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0.tar.gz
+
+
+<a id="additional-materials"></a>ADDITIONAL Europarl-ASR MATERIALS
+---------------------------------
+
+In addition to the speech and text data included in the main release and
+described in this document, we are making available for download the following
+materials to facilitate the reproducibility of our experiments:
+
+* The pretrained Europarl-ASR English-language n-gram language model, together
+  with its vocabulary file:
+  
+  https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0_ngram_lm_and_vocab.tar.gz
+
+* The Europarl-ASR English-language verbatim transcription guidelines, which
+  were applied to produce the manually revised verbatim transcriptions for the
+  dev and test sets:
+  
+  https://www.mllp.upv.es/europarl-asr/Europarl-ASR_transcription_guidelines.pdf
+
+
 <a id="contents"></a>CORPUS STRUCTURE AND CONTENTS
 -----------------------------
 
@@ -213,24 +241,6 @@ Each "text" directory contains 2 subdirectories: "raw" (except in
   website and save it in compressed plain text (.txt.gz).
 
 
-<a id="additional-materials"></a>ADDITIONAL Europarl-ASR MATERIALS
----------------------------------
-
-https://www.mllp.upv.es/europarl-asr/Europarl-ASR_v1.0_ngram_lm_and_vocab.tar.gz<br />
-https://www.mllp.upv.es/europarl-asr/Europarl-ASR_transcription_guidelines.pdf
-
-In addition to the speech and text data included in the main release and
-described in this document, we are making available for download the following
-materials to facilitate the reproducibility of our experiments:
-
-* The pretrained Europarl-ASR English-language n-gram language model, together
-  with its vocabulary file.
-
-* The Europarl-ASR English-language verbatim transcription guidelines, which
-  were applied to produce the manually revised verbatim transcriptions for the
-  dev and test sets.
-
-
 <a id="description"></a>EXTENDED DESCRIPTION
 --------------------