|
@@ -247,7 +247,7 @@ Each "text" directory contains 2 subdirectories: "raw" (except in
|
|
|
|
|
|
"raw" contains the raw text data for the corresponding set (`*.txt.gz`), and
|
|
|
its metadata (`*.csv`). In the cases of "dev" and "test", both the official
|
|
|
- non-verbatim transcriptions (*.orig.*) and the manually revised verbatim
|
|
|
+ non-verbatim transcriptions (`*.orig.*`) and the manually revised verbatim
|
|
|
transcriptions (`*.rev.*`) are included.
|
|
|
|
|
|
"prepro" contains the text data for the corresponding set, preprocessed for
|
|
@@ -256,7 +256,7 @@ Each "text" directory contains 2 subdirectories: "raw" (except in
|
|
|
|
|
|
Finally, "scripts" (only in "train/text/external") contains the script
|
|
|
get_DCEP.sh, which can be used to download the DCEP corpus from its original
|
|
|
- website and save it in compressed plain text (.txt.gz).
|
|
|
+ website and save it in compressed plain text (`.txt.gz`).
|
|
|
|
|
|
|
|
|
EXTENDED DESCRIPTION
|
|
@@ -368,10 +368,8 @@ LICENCE
|
|
|
corpus terms of use ( https://www.statmt.org/europarl/ ).
|
|
|
|
|
|
* Europarl-ASR data and code not covered by the previously mentioned licences
|
|
|
- © 2021 by Pau Baquero-Arnal, Jorge Civera, Gonçal V. Garcés Dı́az-Munı́o,
|
|
|
- Adrià Giménez, Javier Iranzo-Sánchez, Javier Jorge, Alfons Juan, Alejandro
|
|
|
- Pérez-González-de-Martos, Nahuel Roselló, Albert Sanchis and Joan Albert
|
|
|
- Silvestre-Cerdà are licenced under CC BY 4.0. To view a copy of this
|
|
|
- licence, visit http://creativecommons.org/licenses/by/4.0/
|
|
|
+ © 2021 Universitat Politècnica de València are licenced under CC BY 4.0. To
|
|
|
+ view a copy of this licence, visit
|
|
|
+ http://creativecommons.org/licenses/by/4.0/
|
|
|
|
|
|
See the [LICENSE](https://mllp.upv.es/git-pub/ggarces/Europarl-ASR/src/master/LICENSE) file for the full licence texts.
|