|
@@ -20,7 +20,7 @@ README CONTENTS
|
|
|
|
|
|
- [Overview](#overview)
|
|
|
- [Citation](#citation)
|
|
|
-- [Get the data](#get)
|
|
|
+- [Get the data](#get-the-data)
|
|
|
- [Additional Europarl-ASR materials](#additional)
|
|
|
- [Corpus structure and contents](#contents)
|
|
|
- [Extended description](#description)
|
|
@@ -29,7 +29,7 @@ README CONTENTS
|
|
|
- [Licence](#licence)
|
|
|
|
|
|
|
|
|
-<a id="overview"></a>OVERVIEW
|
|
|
+OVERVIEW
|
|
|
--------
|
|
|
|
|
|
Europarl-ASR (EN) includes:
|
|
@@ -60,7 +60,7 @@ website from 1996 to 2020. Additionally, to increase text data up to 170M
|
|
|
tokens, Europarl-ASR also includes tools to add all English-language text from
|
|
|
the DCEP Digital Corpus of the European Parliament.
|
|
|
|
|
|
-<a id="citation"></a>CITATION
|
|
|
+CITATION
|
|
|
--------
|
|
|
Garcés Díaz-Munío, Gonçal V.; Silvestre-Cerdà, Joan Albert; Jorge, Javier; Giménez, Adrià; Iranzo-Sánchez, Javier; Baquero-Arnal, Pau; Roselló, Nahuel; Pérez-González-de-Martos, Alejandro; Civera, Jorge; Sanchis, Albert; Juan, Alfons. "Europarl-ASR: A Large Corpus of Parliamentary Debates for Streaming ASR Benchmarking and Speech Data Filtering/Verbatimization". In Proc. Interspeech 2021, Brno (Czech Republic), 2021 (in press).
|
|
|
|
|
@@ -77,7 +77,7 @@ abstract = {[EN] We introduce Europarl-ASR, a large speech and text corpus of pa
|
|
|
}
|
|
|
```
|
|
|
|
|
|
-<a id="get"></a>GET THE DATA
|
|
|
+GET THE DATA
|
|
|
------------
|
|
|
|
|
|
Download the full Europarl-ASR speech and text corpus from:
|