4 年之前 · 51afc8c3b0
--- a/README.md
+++ b/README.md
@@ -36,18 +36,19 @@ Europarl-ASR (EN) includes:
 
				 
			
 
				 #### Speech data
			
 
				 
			
 
				-* 1.3K hours of English-language annotated speech data.
			
 
				+* 1300 hours of English-language annotated speech data.
			
 
				+* 3 full sets of timed transcriptions: official non-verbatim transcriptions,
			
 
				+  automatically noise-filtered transcriptions and automatically verbatimized
			
 
				+  transcriptions.
			
 
				 * 18 hours of speech data with both manually revised verbatim transcriptions
			
 
				   and official non-verbatim transcriptions, split in 2 independent validation-
			
 
				   evaluation partitions for 2 realistic ASR tasks (with vs. without previous
			
 
				   knowledge of the speaker).
			
 
				-* 3 full sets of timed transcriptions for the rest of the speech data
			
 
				-  (training partition): official non-verbatim transcriptions, automatically
			
 
				-  noise-filtered transcriptions and automatically verbatimized transcriptions.
			
 
				+
			
 
				 
			
 
				 #### Text data
			
 
				 
			
 
				-* 70M tokens of English-language text data.
			
 
				+* 70 million tokens of English-language text data.
			
 
				 
			
 
				 #### Pretrained language models
			
 
				 
			
@@ -275,19 +276,19 @@ Europarl-ASR (EN) includes:
 
				 
			
 
				 #### Speech data
			
 
				 
			
 
				-* 1.3K hours of English-language annotated speech data (33K speeches, 1K
			
 
				+* 1300 hours of English-language annotated speech data (33K speeches, 1K
			
 
				   speakers).
			
 
				+* 3 full sets of timed transcriptions: official non-verbatim transcriptions,
			
 
				+  automatically noise-filtered transcriptions and automatically verbatimized
			
 
				+  transcriptions.
			
 
				 * 18 hours of speech data with both manually revised verbatim transcriptions
			
 
				   and official non-verbatim transcriptions, split in 2 independent validation-
			
 
				   evaluation partitions for 2 realistic ASR tasks (with vs. without previous
			
 
				   knowledge of the speaker).
			
 
				-* 3 full sets of timed transcriptions for the rest of the speech data
			
 
				-  (training partition): official non-verbatim transcriptions, automatically
			
 
				-  noise-filtered transcriptions and automatically verbatimized transcriptions.
			
 
				 
			
 
				 #### Text data
			
 
				 
			
 
				-* 70M tokens of English-language text data.
			
 
				+* 70 million tokens of English-language text data.
			
 
				 
			
 
				 #### Language models