|
@@ -182,8 +182,8 @@ whether it is in the train set or in the dev/test sets):
|
|
|
Finally, in "refs" (only in "dev" and "test") each file contains every speech
|
|
|
in the corresponding dev or test set, that is, the full reference for that
|
|
|
set. In each case, we will find 4 files, containing the official non-verbatim
|
|
|
-reference (*.orig.*) and the manually revised verbatim reference (*.rev.*), as
|
|
|
-transcriptions (*.ref) and as segment time marked files (*.stm). In all 4
|
|
|
+reference (`*.orig.*`) and the manually revised verbatim reference (`*.rev.*`), as
|
|
|
+transcriptions (`*.ref`) and as segment time marked files (`*.stm`). In all 4
|
|
|
cases, the text is presented preprocessed for evaluation (tokenized,
|
|
|
lowercased, punctuation removed...).
|
|
|
|
|
@@ -199,10 +199,10 @@ Each "text" directory contains 2 subdirectories: "raw" (except in
|
|
|
"train/external"), "prepro" (in all sets), or "scripts" (only in
|
|
|
"train/external").
|
|
|
|
|
|
- "raw" contains the raw text data for the corresponding set (*.txt.gz), and
|
|
|
- its metadata (*.csv). In the cases of "dev" and "test", both the official
|
|
|
+ "raw" contains the raw text data for the corresponding set (`*.txt.gz`), and
|
|
|
+ its metadata (`*.csv`). In the cases of "dev" and "test", both the official
|
|
|
non-verbatim transcriptions (*.orig.*) and the manually revised verbatim
|
|
|
- transcriptions (*.rev.*) are included.
|
|
|
+ transcriptions (`*.rev.*`) are included.
|
|
|
|
|
|
"prepro" contains the text data for the corresponding set, preprocessed for
|
|
|
training or evaluation (tokenized, lowercased, punctuation removed...). This
|