2017 |
Villar Lafuente, Carlos ; Garcés Díaz-Munío, Gonçal Several approaches for tweet topic classification in COSET – IberEval 2017 Inproceedings Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), pp. 36–42, Murcia (Spain), 2017. Abstract | Links | BibTeX | Tags: COSET2017, language models, linear models, neural networks, sentence embeddings, text classification @inproceedings{Lafuente2017, title = {Several approaches for tweet topic classification in COSET – IberEval 2017}, author = {Villar Lafuente, Carlos and Garcés Díaz-Munío, Gonçal}, url = {http://hdl.handle.net/10251/166361 http://ceur-ws.org/Vol-1881/COSET_paper_4.pdf}, year = {2017}, date = {2017-01-01}, booktitle = {Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017)}, pages = {36--42}, address = {Murcia (Spain)}, abstract = {[EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline. [CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF".}, keywords = {COSET2017, language models, linear models, neural networks, sentence embeddings, text classification}, pubstate = {published}, tppubtype = {inproceedings} } [EN] These working notes summarize the different approaches we have explored in order to classify a corpus of tweets related to the 2015 Spanish General Election (COSET 2017 task from IberEval 2017). Two approaches were tested during the COSET 2017 evaluations: Neural Networks with Sentence Embeddings (based on TensorFlow) and N-gram Language Models (based on SRILM). Our results with these approaches were modest: both ranked above the “Most frequent" baseline, but below the “Bag-of-words + SVM” baseline. A third approach was tried after the COSET 2017 evaluation phase was over: Advanced Linear Models (based on fastText). Results measured over the COSET 2017 Dev and Test show that this approach is well above the “TF-IDF+RF” baseline. [CA] "Alguns mètodes per a la classificació temàtica de tuits en COSET - IberEval 2017": Aquest article resumeix els diferents mètodes que hem explorat per a classificar un corpus de tuits sobre les eleccions generals d'Espanya de 2015 (tasca COSET 2017 del taller IberEval 2017). Analitzàrem dos mètodes durant les avaluacions de COSET 2017: xarxes neuronals amb vectorització ("embedding") a nivell de frase (basat en TensorFlow) i models de llenguatge d'n-grames (basat en SRILM). Els nostres resultats amb aquests mètodes van ser modests: ambdós quedaren per damunt del valor de referència d'"el més freqüent" ("Most frequent"), però per davall del valor de referència de "bossa de paraules+SVM" ("Bag-of-words+SVM"). Analitzàrem un tercer mètode quan ja havia acabat la fase d'avaluacions de COSET 2017: models lineals avançats (basat en fastText). Els resultats mesurats sobre els conjunts de validació i prova de COSET 2017 mostren que aquest mètode supera clarament el valor de referència "TF-IDF+RF". |
Piqueras, Santiago ; Pérez, Alejandro ; Turró, Carlos ; Jiménez, Manuel ; Sanchis, Albert ; Civera, Jorge ; Juan, Alfons Hacia la traducción integral de vídeo charlas educativas Inproceedings Proc. of III Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2017), pp. 117–124, València (Spain), 2017. Abstract | Links | BibTeX | Tags: MOOCs, multilingual, translation @inproceedings{Piqueras2017, title = {Hacia la traducción integral de vídeo charlas educativas}, author = {Piqueras, Santiago and Pérez, Alejandro and Turró, Carlos and Jiménez, Manuel and Sanchis, Albert and Civera, Jorge and Juan, Alfons}, url = {http://ocs.editorial.upv.es/index.php/INRED/INRED2017/paper/view/6812}, year = {2017}, date = {2017-01-01}, booktitle = {Proc. of III Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2017)}, pages = {117--124}, address = {València (Spain)}, abstract = {More and more universities and educational institutions are banking on the production of technological resources for different uses in higher education. The MLLP research group has been working closely with the ASIC at UPV in order to enrich educational multimedia resources through the use of machine learning technologies, such as automatic speech recognition, machine translation or text-to-speech synthesis. In this work, developed under the framework of the UPV\'s Plan de Docencia en Red 2016-17, we present the application of innovative technologies in order to achieve the integral translation of educational videos.}, keywords = {MOOCs, multilingual, translation}, pubstate = {published}, tppubtype = {inproceedings} } More and more universities and educational institutions are banking on the production of technological resources for different uses in higher education. The MLLP research group has been working closely with the ASIC at UPV in order to enrich educational multimedia resources through the use of machine learning technologies, such as automatic speech recognition, machine translation or text-to-speech synthesis. In this work, developed under the framework of the UPV's Plan de Docencia en Red 2016-17, we present the application of innovative technologies in order to achieve the integral translation of educational videos. |
2016 |
Silvestre-Cerdà, Joan Albert; Juan, Alfons; Civera, Jorge Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures @inproceedings{Silvestre-Cerdà2016b, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà and Alfons Juan and Jorge Civera}, url = {http://www.mllp.upv.es/wp-content/uploads/2016/11/poster.pdf http://www.mllp.upv.es/wp-content/uploads/2016/11/paper.pdf http://hdl.handle.net/10251/62194}, isbn = {978-3-319-49168-4 }, year = {2016}, date = {2016-11-24}, booktitle = {Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016)}, pages = {313-319}, address = {Lisbon (Portugal)}, abstract = {In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions.}, keywords = {Automatic Speech Recognition, Automatic transcription and translation, Machine Translation, Video Lectures}, pubstate = {published}, tppubtype = {inproceedings} } In recent years, on-line multimedia repositories have experiencied a strong growth that have made them consolidated as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that gives accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities, among many other benefits and applications. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate highquality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main outcome derived from this multidisciplinary thesis, The transLectures-UPV Platform, has been publicly released as an open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in many Spanish and European universities and institutions. |
del-Agua, Miguel Ángel; Piqueras, Santiago; Giménez, Adrià; Sanchis, Alberto; Civera, Jorge; Juan, Alfons ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks Inproceedings Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016), pp. 3464–3468, San Francisco (USA), 2016. Abstract | Links | BibTeX | Tags: BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition @inproceedings{del-Agua2016, title = {ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks}, author = {Miguel Ángel del-Agua and Santiago Piqueras and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan}, doi = {10.21437/Interspeech.2016-1142}, year = {2016}, date = {2016-09-08}, booktitle = {Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016)}, pages = {3464--3468}, address = {San Francisco (USA)}, abstract = {Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation.}, keywords = {BLSTM, Confidence measures, Recurrent Neural Networks, Speaker adaptation, Speech Recognition}, pubstate = {published}, tppubtype = {inproceedings} } Confidence estimation for automatic speech recognition has been very recently improved by using Recurrent Neural Networks (RNNs), and also by speaker adaptation (on the basis of Conditional Random Fields). In this work, we explore how to obtain further improvements by combining RNNs and speaker adaptation. In particular, we explore different speaker-dependent and -independent data representations for Bidirectional Long Short Term Memory RNNs of various topologies. Empirical tests are reported on the LibriSpeech dataset, showing that the best results are achieved by the proposed combination of RNNs and speaker adaptation. |
Silvestre-Cerdà, Joan Albert Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures @phdthesis{Silvestre-Cerdà2016, title = {Different Contributions to Cost-Effective Transcription and Translation of Video Lectures}, author = {Joan Albert Silvestre-Cerdà}, url = {http://hdl.handle.net/10251/62194 http://www.mllp.upv.es/wp-content/uploads/2016/01/slides.pdf http://www.mllp.upv.es/wp-content/uploads/2016/01/thesis.pdf http://www.mllp.upv.es/phd-thesis-different-contributions-to-cost-effective-transcription-and-translation-of-video-lectures-by-joan-albert-silvestre-cerda-abstract/}, year = {2016}, date = {2016-01-27}, school = {Universitat Politècnica de València}, abstract = {In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions.}, note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {Automatic Speech Recognition, Education, Language Technologies, Machine Translation, Massive Adaptation, Multilingualism, video lecture repositories, Video Lectures}, pubstate = {published}, tppubtype = {phdthesis} } In recent years, online multimedia repositories have experienced a strong growth that has consolidated them as essential knowledge assets, especially in the area of education, where large repositories of video lectures have been built in order to complement or even replace traditional teaching methods. However, most of these video lectures are neither transcribed nor translated due to a lack of cost-effective solutions to do so in a way that provides accurate enough results. Solutions of this kind are clearly necessary in order to make these lectures accessible to speakers of different languages and to people with hearing disabilities. They would also facilitate lecture searchability and analysis functions, such as classification, recommendation or plagiarism detection, as well as the development of advanced educational functionalities like content summarisation to assist student note-taking. For this reason, the main aim of this thesis is to develop a cost-effective solution capable of transcribing and translating video lectures to a reasonable degree of accuracy. More specifically, we address the integration of state-of-the-art techniques in Automatic Speech Recognition and Machine Translation into large video lecture repositories to generate high-quality multilingual video subtitles without human intervention and at a reduced computational cost. Also, we explore the potential benefits of the exploitation of the information that we know a priori about these repositories, that is, lecture-specific knowledge such as speaker, topic or slides, to create specialised, in-domain transcription and translation systems by means of massive adaptation techniques. The proposed solutions have been tested in real-life scenarios by carrying out several objective and subjective evaluations, obtaining very positive results. The main technological outcome derived from this thesis, the transLectures-UPV Platform (TLP), has been publicly released as open-source software, and, at the time of writing, it is serving automatic transcriptions and translations for several thousands of video lectures in Spanish and European universities and institutions. |
Valor Miró, Juan Daniel ; Turró, C; Civera, J; Juan, A Generación eficiente de transcripciones y traducciones automáticas en poliMedia Inproceedings Proc. of II Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2016), pp. 21–29, València (Spain), 2016. Abstract | Links | BibTeX | Tags: Docencia en Red, e-learning, transcription, translation, video @inproceedings{Valor2016, title = {Generación eficiente de transcripciones y traducciones automáticas en poliMedia}, author = {Valor Miró, Juan Daniel and Turró, C. and Civera, J. and Juan, A.}, url = {http://dx.doi.org/10.4995/INRED2016.2016.4276}, year = {2016}, date = {2016-01-01}, booktitle = {Proc. of II Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2016)}, pages = {21--29}, address = {València (Spain)}, abstract = {The use of educational videos in higher education has increased quickly for several educational applications, which leads to platforms and services such as poliMèdia at the Universitat Politècnica de València (UPV), which enables the creation, publication and dissemination of this educational multimedia content. Through various research projects, and specifically the EU project transLectures, the UPV has implemented a system that automatically generates subtitles in various languages for all poliMèdia videos. these subtitles are created by an automatic speech recognition and machine translation system that provides high accuracy in both recognition and translation of the main European languages. Transcriptions and translations are not only used to improve accessibility, but also enable the search and retrieval of video contents within the video portal. Thus, a user can locate the video, and the time within it, where a certain word is said for later viewing. In this article we also extend previous work in the assessment of the review process, including transcription of French and translation of Spanish into Catalan.}, keywords = {Docencia en Red, e-learning, transcription, translation, video}, pubstate = {published}, tppubtype = {inproceedings} } The use of educational videos in higher education has increased quickly for several educational applications, which leads to platforms and services such as poliMèdia at the Universitat Politècnica de València (UPV), which enables the creation, publication and dissemination of this educational multimedia content. Through various research projects, and specifically the EU project transLectures, the UPV has implemented a system that automatically generates subtitles in various languages for all poliMèdia videos. these subtitles are created by an automatic speech recognition and machine translation system that provides high accuracy in both recognition and translation of the main European languages. Transcriptions and translations are not only used to improve accessibility, but also enable the search and retrieval of video contents within the video portal. Thus, a user can locate the video, and the time within it, where a certain word is said for later viewing. In this article we also extend previous work in the assessment of the review process, including transcription of French and translation of Spanish into Catalan. |
Sanchez-Cortina, Isaias; Andrés-Ferrer, Jesús; Sanchis, Alberto; Juan, Alfons Speaker-adapted confidence measures for speech recognition of video lectures Journal Article Computer Speech & Language, 37 , pp. 11–23, 2016, ISBN: 0885-2308. Abstract | Links | BibTeX | Tags: Confidence measures, Log-linear models, Online video lectures, Speaker adaptation, Speech Recognition @article{SanchezCortina2016, title = {Speaker-adapted confidence measures for speech recognition of video lectures}, author = {Isaias Sanchez-Cortina and Jesús Andrés-Ferrer and Alberto Sanchis and Alfons Juan}, url = {http://www.sciencedirect.com/science/article/pii/S0885230815000960 http://authors.elsevier.com/a/1SAsB39HpSHRc0}, isbn = {0885-2308}, year = {2016}, date = {2016-01-01}, journal = {Computer Speech & Language}, volume = {37}, pages = {11--23}, abstract = {Abstract Automatic Speech Recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent naïve Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the \\{NB\\} behaviour. Additionally, as a main contribution, we propose to adapt the \\{CM\\} to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the \\{NB\\} model is clearly superseded by the proposed \\{LR\\} classifier.}, keywords = {Confidence measures, Log-linear models, Online video lectures, Speaker adaptation, Speech Recognition}, pubstate = {published}, tppubtype = {article} } Abstract Automatic Speech Recognition applications can benefit from a confidence measure (CM) to predict the reliability of the output. Previous works showed that a word-dependent naïve Bayes (NB) classifier outperforms the conventional word posterior probability as a CM. However, a discriminative formulation usually renders improved performance due to the available training techniques. Taking this into account, we propose a logistic regression (LR) classifier defined with simple input functions to approximate to the \{NB\} behaviour. Additionally, as a main contribution, we propose to adapt the \{CM\} to the speaker in cases in which it is possible to identify the speakers, such as online lecture repositories. The experiments have shown that speaker-adapted models outperform their non-adapted counterparts on two difficult tasks from English (videoLectures.net) and Spanish (poliMedia) educational lectures. They have also shown that the \{NB\} model is clearly superseded by the proposed \{LR\} classifier. |
Sánchez-Cortina, Isaías Confidence Measures for Automatic and Interactive Speech Recognition PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Alberto Sanchis Navarro). @phdthesis{Sánchez-Cortina2016, title = {Confidence Measures for Automatic and Interactive Speech Recognition}, author = {Sánchez-Cortina, Isaías}, url = {http://hdl.handle.net/10251/61473 http://www.mllp.upv.es/phd-thesis-confidence-measures-for-automatic-and-interactive-speech-recognition-by-isaias-sanchez-cortina-abstract/}, year = {2016}, date = {2016-01-01}, school = {Universitat Politècnica de València}, note = {Advisors: Alfons Juan Ciscar and Alberto Sanchis Navarro}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } |
del-Agua, Miguel Ángel; Martínez-Villaronga, Adrià; Giménez, Adrià; Sanchis, Alberto; Civera, Jorge; Juan, Alfons The MLLP system for the 4th CHiME Challenge Inproceedings Proc. of the 4th Intl. Workshop on Speech Processing in Everyday Environments (CHiME 2016), pp. 57–59, San Francisco (USA), 2016. Abstract | Links | BibTeX | Tags: @inproceedings{del-Aguadel-Agua2016, title = {The MLLP system for the 4th CHiME Challenge}, author = {Miguel Ángel del-Agua and Adrià Martínez-Villaronga and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan}, url = {http://www.mllp.upv.es/wp-content/uploads/2017/11/DelAgua2016-The_MLLP_system_for_the_4th_CHiME_Challenge.pdf http://hdl.handle.net/10251/177497 http://spandh.dcs.shef.ac.uk/chime_workshop/chime2016/chime2016proceedings.pdf}, year = {2016}, date = {2016-01-01}, booktitle = {Proc. of the 4th Intl. Workshop on Speech Processing in Everyday Environments (CHiME 2016)}, pages = {57--59}, address = {San Francisco (USA)}, abstract = {The MLLP's CHiME-4 system is presented in this paper. It has been built using the transLectures-UPV toolkit (TLK), developed by the MLLP research group, which makes use of state-of-the-art speech techniques. Our best system built for the CHiME-4 challenge consists on the combination of different sub-systems in order to deal with the variety of acoustic conditions. Each sub-system in turn, follows a hybrid approach with different acoustic models, such as Deep Neural Networks or BLSTM Networks.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } The MLLP's CHiME-4 system is presented in this paper. It has been built using the transLectures-UPV toolkit (TLK), developed by the MLLP research group, which makes use of state-of-the-art speech techniques. Our best system built for the CHiME-4 challenge consists on the combination of different sub-systems in order to deal with the variety of acoustic conditions. Each sub-system in turn, follows a hybrid approach with different acoustic models, such as Deep Neural Networks or BLSTM Networks. |
2015 |
del-Agua, Miguel Ángel; Martínez-Villaronga, Adrià; Piqueras, Santiago; Giménez, Adrià; Sanchis, Alberto; Civera, Jorge; Juan, Alfons The MLLP ASR Systems for IWSLT 2015 Inproceedings Proc. of 12th Intl. Workshop on Spoken Language Translation (IWSLT 2015), pp. 39–44, Da Nang (Vietnam), 2015. Abstract | Links | BibTeX | Tags: @inproceedings{delAgua15, title = {The MLLP ASR Systems for IWSLT 2015}, author = {Miguel Ángel del-Agua and Adrià Martínez-Villaronga and Santiago Piqueras and Adrià Giménez and Alberto Sanchis and Jorge Civera and Alfons Juan}, url = {https://aclanthology.org/2015.iwslt-evaluation.5/}, year = {2015}, date = {2015-12-03}, booktitle = {Proc. of 12th Intl. Workshop on Spoken Language Translation (IWSLT 2015)}, pages = {39--44}, address = {Da Nang (Vietnam)}, abstract = {This paper describes the Machine Learning and Language Processing (MLLP) ASR systems for the 2015 IWSLT evaluation campaing. The English system is based on the combination of five different subsystems which consist of two types of Neural Networks architectures (Deep feed-forward and Convolutional), two types of activation functions (sigmoid and rectified linear) and two types of input features (fMLLR and FBANK). All subsystems perform a speaker adaptation step based on confidence measures, the output of which is then combined with ROVER. This system achieves a Word Error Rate (WER) of 13.3% on the official IWSLT 2015 English test set.}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } This paper describes the Machine Learning and Language Processing (MLLP) ASR systems for the 2015 IWSLT evaluation campaing. The English system is based on the combination of five different subsystems which consist of two types of Neural Networks architectures (Deep feed-forward and Convolutional), two types of activation functions (sigmoid and rectified linear) and two types of input features (fMLLR and FBANK). All subsystems perform a speaker adaptation step based on confidence measures, the output of which is then combined with ROVER. This system achieves a Word Error Rate (WER) of 13.3% on the official IWSLT 2015 English test set. |
Valor Miró, Juan Daniel ; Silvestre-Cerdà, Joan Albert ; Civera, Jorge ; Turró, Carlos ; Juan, Alfons Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories Inproceedings Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), pp. 485–490, Toledo (Spain), 2015, ISBN: 978-3-319-24258-3. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories @inproceedings{valor2015efficient, title = {Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories}, author = {Valor Miró, Juan Daniel and Silvestre-Cerdà, Joan Albert and Civera, Jorge and Turró, Carlos and Juan, Alfons}, url = {http://link.springer.com/chapter/10.1007/978-3-319-24258-3_44 http://www.mllp.upv.es/wp-content/uploads/2016/03/paper.pdf }, isbn = {978-3-319-24258-3}, year = {2015}, date = {2015-09-17}, booktitle = {Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015)}, pages = {485--490}, address = {Toledo (Spain)}, abstract = {Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles.}, keywords = {Automatic Speech Recognition, Docencia en Red, Efficient video subtitling, Polimedia, Statistical machine translation, video lecture repositories}, pubstate = {published}, tppubtype = {inproceedings} } Video lectures are a valuable educational tool in higher education to support or replace face-to-face lectures in active learning strategies. In 2007 the Universitat Polit‘ecnica de Val‘encia (UPV) implemented its video lecture capture system, resulting in a high quality educational video repository, called poliMedia, with more than 10.000 mini lectures created by 1.373 lecturers. Also, in the framework of the European project transLectures, UPV has automatically generated transcriptions and translations in Spanish, Catalan and English for all videos included in the poliMedia video repository. transLectures’s objective responds to the widely-recognised need for subtitles to be provided with video lectures, as an essential service for non-native speakers and hearing impaired persons, and to allow advanced repository functionalities. Although high-quality automatic transcriptions and translations were generated in transLectures, they were not error-free. For this reason, lecturers need to manually review video subtitles to guarantee the absence of errors. The aim of this study is to evaluate the efficiency of the manual review process from automatic subtitles in comparison with the conventional generation of video subtitles from scratch. The reported results clearly indicate the convenience of providing automatic subtitles as a first step in the generation of video subtitles and the significant savings in time of up to almost 75% involved in reviewing subtitles. |
Pérez González de Martos, Alejandro ; Silvestre-Cerdà, Joan Albert ; Valor Miró, Juan Daniel ; Civera, Jorge ; Juan, Alfons MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures @misc{mllpttp, title = {MLLP Transcription and Translation Platform}, author = {Pérez González de Martos, Alejandro and Silvestre-Cerdà, Joan Albert and Valor Miró, Juan Daniel and Civera, Jorge and Juan, Alfons}, url = {http://hdl.handle.net/10251/65747 http://www.mllp.upv.es/wp-content/uploads/2015/09/ttp_platform_demo_ectel2015.pdf http://ectel2015.httc.de/index.php?id=722}, year = {2015}, date = {2015-09-16}, booktitle = {Tenth European Conference On Technology Enhanced Learning (EC-TEL 2015)}, abstract = {This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch.}, note = {Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.}, keywords = {Automatic Speech Recognition, Docencia en Red, Document translation, Efficient video subtitling, Machine Translation, MLLP, Post-editing, Video Lectures}, pubstate = {published}, tppubtype = {misc} } This paper briefly presents the main features of MLLP’s Transcription and Translation Platform, which uses state-of-the-art automatic speech recognition and machine translation systems to generate multilingual subtitles of educational audiovisual and textual content. It has proven to reduce user effort up to 1/3 of the time needed to generate transcriptions and translations from scratch. |
Valor Miró, Juan Daniel ; Turró, C; Civera, J; Juan, A Evaluación de la revisión de transcripciones y traducciones automáticas de vídeos poliMedia Inproceedings Proc. of I Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2015), pp. 464–468, València (Spain), 2015. Links | BibTeX | Tags: Docencia en Red, evaluaciones con usuario, Polimedia, traducciones, transcripciones @inproceedings{Valor-InRed2015, title = {Evaluación de la revisión de transcripciones y traducciones automáticas de vídeos poliMedia}, author = {Valor Miró, Juan Daniel and Turró, C. and Civera, J. and Juan, A.}, url = {http://hdl.handle.net/10251/52755 http://www.mllp.upv.es/wp-content/uploads/2015/06/1574-3087-1-PB.pdf}, year = {2015}, date = {2015-06-30}, booktitle = {Proc. of I Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2015)}, pages = {464--468}, address = {València (Spain)}, keywords = {Docencia en Red, evaluaciones con usuario, Polimedia, traducciones, transcripciones}, pubstate = {published}, tppubtype = {inproceedings} } |
Khoury, Ihab; Giménez, Adrià; Juan, Alfons; Andrés-Ferrer, Jesús Window Repositioning for Printed Arabic Recognition Journal Article Pattern Recognition Letters, 51 , pp. 86–93, 2015, ISSN: 0167-8655. Abstract | Links | BibTeX | Tags: Bernoulli HMMs, Printed Arabic Recognition, Repositioning, Sliding window @article{Kho14, title = {Window Repositioning for Printed Arabic Recognition}, author = {Ihab Khoury and Adrià Giménez and Alfons Juan and Jesús Andrés-Ferrer}, url = {http://dx.doi.org/10.1016/j.patrec.2014.08.009}, issn = {0167-8655}, year = {2015}, date = {2015-01-01}, journal = {Pattern Recognition Letters}, volume = {51}, pages = {86--93}, abstract = {Bernoulli HMMs are conventional HMMs in which the emission probabilities are modeled with Bernoulli mixtures. They have recently been applied, with good results, in off-line text recognition in many languages, in particular, Arabic. A key idea that has proven to be very effective in this application of Bernoulli HMMs is the use of a sliding window of adequate width for feature extraction. This idea has allowed us to obtain very competitive results in the recognition of both Arabic handwriting and printed text. Indeed, a system based on it ranked first at the ICDAR 2011 Arabic recognition competition on the Arabic Printed Text Image (APTI) database. More recently, this idea has been refined by using repositioning techniques for extracted windows, leading to further improvements in Arabic handwriting recognition. In the case of printed text, this refinement led to an improved system which ranked second at the ICDAR 2013 second competition on APTI, only at a marginal distance from the best system. In this work, we describe the development of this improved system. Following evaluation protocols similar to those of the competitions on APTI, exhaustive experiments are detailed from which state-of-the-art results are obtained.}, keywords = {Bernoulli HMMs, Printed Arabic Recognition, Repositioning, Sliding window}, pubstate = {published}, tppubtype = {article} } Bernoulli HMMs are conventional HMMs in which the emission probabilities are modeled with Bernoulli mixtures. They have recently been applied, with good results, in off-line text recognition in many languages, in particular, Arabic. A key idea that has proven to be very effective in this application of Bernoulli HMMs is the use of a sliding window of adequate width for feature extraction. This idea has allowed us to obtain very competitive results in the recognition of both Arabic handwriting and printed text. Indeed, a system based on it ranked first at the ICDAR 2011 Arabic recognition competition on the Arabic Printed Text Image (APTI) database. More recently, this idea has been refined by using repositioning techniques for extracted windows, leading to further improvements in Arabic handwriting recognition. In the case of printed text, this refinement led to an improved system which ranked second at the ICDAR 2013 second competition on APTI, only at a marginal distance from the best system. In this work, we describe the development of this improved system. Following evaluation protocols similar to those of the competitions on APTI, exhaustive experiments are detailed from which state-of-the-art results are obtained. |
Khoury, Ihab Arabic Text Recognition and Machine Translation PhD Thesis Universitat Politècnica de València, 2015, (Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer). @phdthesis{Khoury2015, title = {Arabic Text Recognition and Machine Translation}, author = {Ihab Khoury}, url = {http://hdl.handle.net/10251/53029 http://www.mllp.upv.es/phd-thesis-arabic-text-recognition-and-machine-translation-by-ihab-khoury-abstract/}, year = {2015}, date = {2015-01-01}, school = {Universitat Politècnica de València}, note = {Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } |
Brouns, Francis; Serrano Martínez-Santos, Nicolás ; Civera, Jorge; Kalz, Marco; Juan, Alfons Supporting language diversity of European MOOCs with the EMMA platform Inproceedings Proc. of the European MOOC Stakeholder Summit EMOOCs 2015, pp. 157–165, Mons (Belgium), 2015. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, EMMA, Statistical machine translation @inproceedings{Brouns2015, title = {Supporting language diversity of European MOOCs with the EMMA platform}, author = {Francis Brouns and Serrano Martínez-Santos, Nicolás and Jorge Civera and Marco Kalz and Alfons Juan}, url = {http://www.emoocs2015.eu/node/55}, year = {2015}, date = {2015-01-01}, booktitle = {Proc. of the European MOOC Stakeholder Summit EMOOCs 2015}, pages = {157--165}, address = {Mons (Belgium)}, abstract = {This paper introduces the cross-language support of the EMMA MOOC platform. Based on a discussion of language diversity in Europe, we introduce the development and evaluation of automated translation of texts and subtitling of videos from Dutch into English. The development of an Automatic Speech Recognition (ASR) system and a Statistical Machine Translation (SMT) system is described. The resources employed and evaluation approach is introduced. Initial evaluation results are presented. Finally, we provide an outlook into future research and development.}, keywords = {Automatic Speech Recognition, EMMA, Statistical machine translation}, pubstate = {published}, tppubtype = {inproceedings} } This paper introduces the cross-language support of the EMMA MOOC platform. Based on a discussion of language diversity in Europe, we introduce the development and evaluation of automated translation of texts and subtitling of videos from Dutch into English. The development of an Automatic Speech Recognition (ASR) system and a Statistical Machine Translation (SMT) system is described. The resources employed and evaluation approach is introduced. Initial evaluation results are presented. Finally, we provide an outlook into future research and development. |
Valor Miró, Juan Daniel ; Silvestre-Cerdà, Joan Albert; Civera, Jorge; Turró, Carlos; Juan, Alfons Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories Journal Article Speech Communication, 74 , pp. 65–75, 2015, ISSN: 0167-6393. Abstract | Links | BibTeX | Tags: Automatic Speech Recognition, Computer-assisted transcription, Interface design strategies, Usability study, video lecture repositories @article{Valor201565, title = {Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories}, author = {Valor Miró, Juan Daniel and Joan Albert Silvestre-Cerdà and Jorge Civera and Carlos Turró and Alfons Juan}, url = {http://www.sciencedirect.com/science/article/pii/S0167639315001016 http://www.mllp.upv.es/wp-content/uploads/2016/03/paper1.pdf}, issn = {0167-6393}, year = {2015}, date = {2015-01-01}, journal = {Speech Communication}, volume = {74}, pages = {65--75}, abstract = {Abstract Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Politècnica de València (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on \\{CM\\} with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies.}, keywords = {Automatic Speech Recognition, Computer-assisted transcription, Interface design strategies, Usability study, video lecture repositories}, pubstate = {published}, tppubtype = {article} } Abstract Video lectures are widely used in education to support and complement face-to-face lectures. However, the utility of these audiovisual assets could be further improved by adding subtitles that can be exploited to incorporate added-value functionalities such as searchability, accessibility, translatability, note-taking, and discovery of content-related videos, among others. Today, automatic subtitles are prone to error, and need to be reviewed and post-edited in order to ensure that what students see on-screen are of an acceptable quality. This work investigates different user interface design strategies for this post-editing task to discover the best way to incorporate automatic transcription technologies into large educational video repositories. Our three-phase study involved lecturers from the Universitat Politècnica de València (UPV) with videos available on the poliMedia video lecture repository, which is currently over 10,000 video objects. Simply by conventional post-editing automatic transcriptions users almost reduced to half the time that would require to generate the transcription from scratch. As expected, this study revealed that the time spent by lecturers reviewing automatic transcriptions correlated directly with the accuracy of said transcriptions. However, it is also shown that the average time required to perform each individual editing operation could be precisely derived and could be applied in the definition of a user model. In addition, the second phase of this study presents a transcription review strategy based on confidence measures (CM) and compares it to the conventional post-editing strategy. Finally, a third strategy resulting from the combination of that based on \{CM\} with massive adaptation techniques for automatic speech recognition (ASR), achieved to improve the transcription review efficiency in comparison with the two aforementioned strategies. |
2014 |
Piqueras Gozalbes, Santiago Romualdo Applying Machine Learning technologies to the synthesis of video lectures Masters Thesis Universitat Politècnica de València, 2014. Abstract | Links | BibTeX | Tags: @mastersthesis{Gozalbes2014, title = {Applying Machine Learning technologies to the synthesis of video lectures}, author = {Piqueras Gozalbes, Santiago Romualdo}, url = {http://hdl.handle.net/10251/53367}, year = {2014}, date = {2014-09-26}, school = {Universitat Politècnica de València}, abstract = {Machine learning technologies have been applied and compared to the problem of training voice synthesis systems for subtitles in Spanish and English. A voice synthesis system in both languages has been developed for the video lectures platform poliMedia.}, keywords = {}, pubstate = {published}, tppubtype = {mastersthesis} } Machine learning technologies have been applied and compared to the problem of training voice synthesis systems for subtitles in Spanish and English. A voice synthesis system in both languages has been developed for the video lectures platform poliMedia. |
Valor Miró, Juan Daniel ; Spencer, R N; Pérez González de Martos, A; Garcés Díaz-Munío, G; Turró, C; Civera, J; Juan, A Evaluación del proceso de revisión de transcripciones automáticas para vídeos Polimedia Inproceedings Proc. of I Jornadas de Innovación Educativa y Docencia en Red (IN-RED 2014), pp. 272–278, Valencia (Spain), 2014. Abstract | Links | BibTeX | Tags: ASR, Docencia en Red, evaluaciones, Polimedia, transcripciones @inproceedings{Valor14-InRed, title = {Evaluación del proceso de revisión de transcripciones automáticas para vídeos Polimedia}, author = {Valor Miró, Juan Daniel and Spencer, R.N. and Pérez González de Martos, A. and Garcés Díaz-Munío, G. and Turró, C. and Civera, J. and Juan, A.}, url = {http://hdl.handle.net/10251/40404 http://dx.doi.org/10.4995/INRED.2014 http://www.mllp.upv.es/wp-content/uploads/2015/04/paper1.pdf https://www.mllp.upv.es/wp-content/uploads/2019/09/poster.pdf}, year = {2014}, date = {2014-07-01}, booktitle = {Proc. of I Jornadas de Innovación Educativa y Docencia en Red (IN-RED 2014)}, pages = {272--278}, address = {Valencia (Spain)}, abstract = {[EN] Video lectures are a tool of proven value and wide acceptance in universities which is leading to video lecture platforms like poliMèdia (Universitat Politècnica de València). transLectures is an EU project that generates high-quality automatic transcriptions and translations for the poliMèdia platform, and improves them by using massive adaptation and intelligent interaction techniques. In this paper we present the evaluation with lecturers carried out under the Docència en Xarxa 2012-2013 action plan with the aim of studying the process of transcription post-editing, in contrast with transcribing from scratch. [ES] Los vídeos docentes son una herramienta de demostrada utilidad y gran aceptación en el mundo universitario que está dando lugar a plataformas de vídeos docentes como poliMèdia (Universitat Politècnica de València). transLectures es un proyecto europeo que genera transcripciones y traducciones automáticas de alta calidad para la plataforma poliMèdia, mediante técnicas de adaptación masiva e interacción inteligente. En este artículo presentamos la evaluación con profesores que se realizó en el marco del plan de acción Docència en Xarxa 2012-2013, con el objetivo de estudiar el proceso de supervisión de transcripciones, comparándolo con la obtención de la transcripción sin disponer de una transcripción automática previa. [CA] Els vídeos docents són una eina d'utilitat demostrada i amb gran acceptació en el món universitari que està donant lloc a plataformes de vídeos docents com poliMèdia (Universitat Politècnica de València). transLectures és un projecte europeu que genera transcripcions i traduccions automàtiques d'alta qualitat per a la plataforma poliMèdia, mitjançant tècniques d'adaptació massiva i interacció intel·ligent. En aquest article presentem l'avaluació amb professors que es realitzà en el marc del pla d'acció Docència en Xarxa 2012-2013, amb l'objectiu d'estudiar el procés de supervisió de transcripcions, comparant-lo amb l'obtenció de la transcripció sense disposar d'una transcripció automàtica prèvia.}, keywords = {ASR, Docencia en Red, evaluaciones, Polimedia, transcripciones}, pubstate = {published}, tppubtype = {inproceedings} } [EN] Video lectures are a tool of proven value and wide acceptance in universities which is leading to video lecture platforms like poliMèdia (Universitat Politècnica de València). transLectures is an EU project that generates high-quality automatic transcriptions and translations for the poliMèdia platform, and improves them by using massive adaptation and intelligent interaction techniques. In this paper we present the evaluation with lecturers carried out under the Docència en Xarxa 2012-2013 action plan with the aim of studying the process of transcription post-editing, in contrast with transcribing from scratch. [ES] Los vídeos docentes son una herramienta de demostrada utilidad y gran aceptación en el mundo universitario que está dando lugar a plataformas de vídeos docentes como poliMèdia (Universitat Politècnica de València). transLectures es un proyecto europeo que genera transcripciones y traducciones automáticas de alta calidad para la plataforma poliMèdia, mediante técnicas de adaptación masiva e interacción inteligente. En este artículo presentamos la evaluación con profesores que se realizó en el marco del plan de acción Docència en Xarxa 2012-2013, con el objetivo de estudiar el proceso de supervisión de transcripciones, comparándolo con la obtención de la transcripción sin disponer de una transcripción automática previa. [CA] Els vídeos docents són una eina d'utilitat demostrada i amb gran acceptació en el món universitari que està donant lloc a plataformes de vídeos docents com poliMèdia (Universitat Politècnica de València). transLectures és un projecte europeu que genera transcripcions i traduccions automàtiques d'alta qualitat per a la plataforma poliMèdia, mitjançant tècniques d'adaptació massiva i interacció intel·ligent. En aquest article presentem l'avaluació amb professors que es realitzà en el marc del pla d'acció Docència en Xarxa 2012-2013, amb l'objectiu d'estudiar el procés de supervisió de transcripcions, comparant-lo amb l'obtenció de la transcripció sense disposar d'una transcripció automàtica prèvia. |
Serrano Martínez-Santos, Nicolás Interactive Transcription of Old Text Documents PhD Thesis Universitat Politècnica de València, 2014, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). @phdthesis{Martínez-Santos2014, title = {Interactive Transcription of Old Text Documents}, author = {Serrano Martínez-Santos, Nicolás}, url = {http://hdl.handle.net/10251/37979}, year = {2014}, date = {2014-05-22}, school = {Universitat Politècnica de València}, note = {Advisors: Alfons Juan Ciscar and Jorge Civera Saiz}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } |
Giménez Pastor, Adrià Bernoulli HMMs for Handwritten Text Recognition PhD Thesis Universitat Politècnica de València , 2014, (Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer). @phdthesis{Pastor2014, title = {Bernoulli HMMs for Handwritten Text Recognition}, author = {Giménez Pastor, Adrià}, url = {http://hdl.handle.net/10251/37978}, year = {2014}, date = {2014-05-22}, school = {Universitat Politècnica de València }, note = {Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } |
Alabau Gonzalvo, Vicent Multimodal interactive structured prediction PhD Thesis Universitat Politècnica de València, 2014, (Advisors: Francisco Casacuberta Nolla and Alberto Sanchis Navarro). @phdthesis{Gonzalvo2014, title = {Multimodal interactive structured prediction}, author = {Alabau Gonzalvo, Vicent}, url = {http://hdl.handle.net/10251/35135}, year = {2014}, date = {2014-01-10}, school = {Universitat Politècnica de València}, note = {Advisors: Francisco Casacuberta Nolla and Alberto Sanchis Navarro}, keywords = {}, pubstate = {published}, tppubtype = {phdthesis} } |
Serrano, Nicolás; Civera, Jorge; Sanchis, Alberto; Juan, A Effective balancing error and user effort in interactive handwriting recognition Journal Article Pattern Recognition Letters, 37 , pp. 135–142, 2014. @article{Serrano14b, title = {Effective balancing error and user effort in interactive handwriting recognition}, author = {Nicolás Serrano and Jorge Civera and Alberto Sanchis and A. Juan}, url = {http://dx.doi.org/10.1016/j.patrec.2013.03.010}, year = {2014}, date = {2014-01-01}, journal = {Pattern Recognition Letters}, volume = {37}, pages = {135--142}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Alabau, Vicent; Sanchis, Alberto; Casacuberta, Francisco Improving on-line handwritten recognition in interactive machine translation Journal Article Pattern Recognition, 47 (3) , pp. 1217–1228, 2014. @article{valabau13c, title = {Improving on-line handwritten recognition in interactive machine translation}, author = {Vicent Alabau and Alberto Sanchis and Francisco Casacuberta}, url = {http://dx.doi.org/10.1016/j.patcog.2013.09.035}, year = {2014}, date = {2014-01-01}, journal = {Pattern Recognition}, volume = {47 (3)}, pages = {1217--1228}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Serrano, Nicolás; Giménez, Adrià; Civera, Jorge; Sanchis, Alberto; Juan, Alfons Interactive Handwriting Recognition with Limited User effort Journal Article Intl. Journal on Document Analysis and Recognition (IJDAR), 17 , pp. 47–59, 2014. @article{Serrano14a, title = {Interactive Handwriting Recognition with Limited User effort}, author = {Nicolás Serrano and Adrià Giménez and Jorge Civera and Alberto Sanchis and Alfons Juan}, url = {http://dx.doi.org/10.1007/s10032-013-0204-5}, year = {2014}, date = {2014-01-01}, journal = {Intl. Journal on Document Analysis and Recognition (IJDAR)}, volume = {17}, pages = {47--59}, keywords = {}, pubstate = {published}, tppubtype = {article} } |
Giménez, Adrià; Andrés-Ferrer, Jesús; Juan, Alfons Discriminative Bernoulli HMMs for isolated handwritten word recognition Journal Article Pattern Recognition Letters, 35 (0), pp. 157–168, 2014, ISSN: 0167-8655, (Frontiers in Handwriting Processing). @article{Giménez2014157, title = {Discriminative Bernoulli HMMs for isolated handwritten word recognition}, author = {Adrià Giménez and Jesús Andrés-Ferrer and Alfons Juan}, url = {http://dx.doi.org/10.1016/j.patrec.2013.05.016}, issn = {0167-8655}, year = {2014}, date = {2014-01-01}, journal = {Pattern Recognition Letters}, volume = {35}, number = {0}, pages = {157--168}, note = {Frontiers in Handwriting Processing}, keywords = {RIMES}, pubstate = {published}, tppubtype = {article} } |
Giménez, Adrià; Khoury, Ihab; Andrés-Ferrer, Jesús; Juan, Alfons Handwriting word recognition using windowed Bernoulli HMMs Journal Article Pattern Recognition Letters, 35 (0), pp. 149–156, 2014, ISSN: 0167-8655, (Frontiers in Handwriting Processing). Links | BibTeX | Tags: Sliding window @article{Giménez2014149, title = {Handwriting word recognition using windowed Bernoulli HMMs}, author = {Adrià Giménez and Ihab Khoury and Jesús Andrés-Ferrer and Alfons Juan}, url = {http://dx.doi.org/10.1016/j.patrec.2012.09.002 http://hdl.handle.net/10251/37326}, issn = {0167-8655}, year = {2014}, date = {2014-01-01}, journal = {Pattern Recognition Letters}, volume = {35}, number = {0}, pages = {149--156}, note = {Frontiers in Handwriting Processing}, keywords = {Sliding window}, pubstate = {published}, tppubtype = {article} } |
Martínez-Villaronga, A; del-Agua, M A; Silvestre-Cerdà, J A; Andrés-Ferrer, J; Juan, A Language model adaptation for lecture transcription by document retrieval Inproceedings Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014), Las Palmas de Gran Canaria (Spain), 2014. @inproceedings{MarAgu14, title = {Language model adaptation for lecture transcription by document retrieval}, author = {A. Martínez-Villaronga and M. A. del-Agua and J.A. Silvestre-Cerdà and J. Andrés-Ferrer and A. Juan}, url = {http://www.mllp.upv.es/wp-content/uploads/2015/04/ibsp14-cameraReady.pdf}, year = {2014}, date = {2014-01-01}, booktitle = {Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014)}, address = {Las Palmas de Gran Canaria (Spain)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Pérez-González-de-Martos, A; Silvestre-Cerdá, J A; Rihtar, M; Juan, A; Civera, J Using Automatic Speech Transcriptions in Lecture Recommendation Systems Inproceedings Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014), Las Palmas de Gran Canaria (Spain), 2014. @inproceedings{PerSil14, title = {Using Automatic Speech Transcriptions in Lecture Recommendation Systems}, author = {A. Pérez-González-de-Martos and J. A. Silvestre-Cerdá and M. Rihtar and A. Juan and J. Civera}, url = {http://www.mllp.upv.es/wp-content/uploads/2015/04/lavie_is2014_camready1.pdf}, year = {2014}, date = {2014-01-01}, booktitle = {Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014)}, address = {Las Palmas de Gran Canaria (Spain)}, keywords = {}, pubstate = {published}, tppubtype = {inproceedings} } |
Valor Miró, Juan Daniel ; Spencer, R N; Pérez González de Martos, A; Garcés Díaz-Munío, G; Turró, C; Civera, J; Juan, A Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures Journal Article Open Learning: The Journal of Open, Distance and e-Learning, 29 (1), pp. 72–85, 2014. Abstract | Links | BibTeX | Tags: @article{doi:10.1080/02680513.2014.909722, title = {Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures}, author = {Valor Miró, Juan Daniel and Spencer, R.N. and Pérez González de Martos, A. and Garcés Díaz-Munío, G. and Turró, C. and Civera, J. and Juan, A.}, url = {http://hdl.handle.net/10251/55925 http://dx.doi.org/10.1080/02680513.2014.909722 http://www.mllp.upv.es/wp-content/uploads/2015/04/author_version.pdf}, year = {2014}, date = {2014-01-01}, journal = {Open Learning: The Journal of Open, Distance and e-Learning}, volume = {29}, number = {1}, pages = {72--85}, abstract = {[EN] Video lectures are fast becoming an everyday educational resource in higher education. They are being incorporated into existing university curricula around the world, while also emerging as a key component of the open education movement. In 2007 the Universitat Politècnica de València (UPV) implemented its poliMèdia lecture capture system for the creation and publication of quality educational video content and now has a collection of over 10,000 video objects. In 2011 it embarked on the EU-subsidised transLectures project to add automatic subtitles to these videos in both Spanish and other languages. By doing so, it allows access to their educational content by non-native speakers and the deaf and hard-of-hearing, as well as enabling advanced repository management functions. In this paper, following a short introduction to poliMèdia, transLectures and Docència en Xarxa, the UPV's action plan to boost the use of digital resources at the university, we will discuss the three-stage evaluation process carried out with the collaboration of UPV lecturers to find the best interaction protocol for the task of post-editing automatic subtitles. [CA] "Avaluació d'interfícies intel·ligents per a la postedició de transcripcions automàtiques de vídeos docents en línia": Els vídeos docents s'estan convertint en un recurs d'ús quotidià en l'educació superior. Estan entrant en els plans d'estudis universitaris de tot el món, al mateix temps que es defineixen com un component clau del moviment de l'educació lliure. L'any 2007, la Universitat Politècnica de València (UPV) implementà el seu sistema d'enregistrament de classes poliMèdia, que permet la producció i publicació de continguts audiovisuals d'alta qualitat, i que ja acumula més de 10.000 vídeos. L'any 2011, la UPV va entrar en el projecte europeu transLectures per a afegir subtítols automàtics a aquests vídeos en castellà, català i altres llengües. Així, es facilita l'accés a aquests continguts educatius per part de parlants d'altres llengües i de persones sordes o amb dificultats auditives, i també es proporcionen funcions avançades de gestió del repositori. En aquest article, després de presentar poliMèdia, transLectures i Docència en Xarxa (el pla d'acció de la UPV per a impulsar l'ús de recursos digitals en la universitat), explicarem el procés d'avaluació en tres fases que s'ha realitzat amb la col·laboració de professors de la UPV per a trobar el millor protocol d'interacció per a la postedició de subtítols automàtics.}, keywords = {}, pubstate = {published}, tppubtype = {article} } [EN] Video lectures are fast becoming an everyday educational resource in higher education. They are being incorporated into existing university curricula around the world, while also emerging as a key component of the open education movement. In 2007 the Universitat Politècnica de València (UPV) implemented its poliMèdia lecture capture system for the creation and publication of quality educational video content and now has a collection of over 10,000 video objects. In 2011 it embarked on the EU-subsidised transLectures project to add automatic subtitles to these videos in both Spanish and other languages. By doing so, it allows access to their educational content by non-native speakers and the deaf and hard-of-hearing, as well as enabling advanced repository management functions. In this paper, following a short introduction to poliMèdia, transLectures and Docència en Xarxa, the UPV's action plan to boost the use of digital resources at the university, we will discuss the three-stage evaluation process carried out with the collaboration of UPV lecturers to find the best interaction protocol for the task of post-editing automatic subtitles. [CA] "Avaluació d'interfícies intel·ligents per a la postedició de transcripcions automàtiques de vídeos docents en línia": Els vídeos docents s'estan convertint en un recurs d'ús quotidià en l'educació superior. Estan entrant en els plans d'estudis universitaris de tot el món, al mateix temps que es defineixen com un component clau del moviment de l'educació lliure. L'any 2007, la Universitat Politècnica de València (UPV) implementà el seu sistema d'enregistrament de classes poliMèdia, que permet la producció i publicació de continguts audiovisuals d'alta qualitat, i que ja acumula més de 10.000 vídeos. L'any 2011, la UPV va entrar en el projecte europeu transLectures per a afegir subtítols automàtics a aquests vídeos en castellà, català i altres llengües. Així, es facilita l'accés a aquests continguts educatius per part de parlants d'altres llengües i de persones sordes o amb dificultats auditives, i també es proporcionen funcions avançades de gestió del repositori. En aquest article, després de presentar poliMèdia, transLectures i Docència en Xarxa (el pla d'acció de la UPV per a impulsar l'ús de recursos digitals en la universitat), explicarem el procés d'avaluació en tres fases que s'ha realitzat amb la col·laboració de professors de la UPV per a trobar el millor protocol d'interacció per a la postedició de subtítols automàtics. |
Publications
2017 |
Several approaches for tweet topic classification in COSET – IberEval 2017 Inproceedings Proc. of 2nd Workshop on Evaluation of Human Language Technologies for Iberian Languages (IberEval 2017), pp. 36–42, Murcia (Spain), 2017. |
Hacia la traducción integral de vídeo charlas educativas Inproceedings Proc. of III Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2017), pp. 117–124, València (Spain), 2017. |
2016 |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures Inproceedings Proc. of IX Jornadas en Tecnología del Habla and V Iberian SLTech Workshop (IberSpeech 2016), pp. 313-319, Lisbon (Portugal), 2016, ISBN: 978-3-319-49168-4 . |
ASR Confidence Estimation with Speaker-Adapted Recurrent Neural Networks Inproceedings Proc. of the 17th Annual Conf. of the ISCA (Interspeech 2016), pp. 3464–3468, San Francisco (USA), 2016. |
Different Contributions to Cost-Effective Transcription and Translation of Video Lectures PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). |
Generación eficiente de transcripciones y traducciones automáticas en poliMedia Inproceedings Proc. of II Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2016), pp. 21–29, València (Spain), 2016. |
Speaker-adapted confidence measures for speech recognition of video lectures Journal Article Computer Speech & Language, 37 , pp. 11–23, 2016, ISBN: 0885-2308. |
Confidence Measures for Automatic and Interactive Speech Recognition PhD Thesis Universitat Politècnica de València, 2016, (Advisors: Alfons Juan Ciscar and Alberto Sanchis Navarro). |
The MLLP system for the 4th CHiME Challenge Inproceedings Proc. of the 4th Intl. Workshop on Speech Processing in Everyday Environments (CHiME 2016), pp. 57–59, San Francisco (USA), 2016. |
2015 |
The MLLP ASR Systems for IWSLT 2015 Inproceedings Proc. of 12th Intl. Workshop on Spoken Language Translation (IWSLT 2015), pp. 39–44, Da Nang (Vietnam), 2015. |
Efficient Generation of High-Quality Multilingual Subtitles for Video Lecture Repositories Inproceedings Proc. of 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), pp. 485–490, Toledo (Spain), 2015, ISBN: 978-3-319-24258-3. |
MLLP Transcription and Translation Platform Miscellaneous 2015, (Short paper for demo presentation accepted at 10th European Conf. on Technology Enhanced Learning (EC-TEL 2015), Toledo (Spain), 2015.). |
Evaluación de la revisión de transcripciones y traducciones automáticas de vídeos poliMedia Inproceedings Proc. of I Congreso Nacional de Innovación Educativa y Docencia en Red (IN-RED 2015), pp. 464–468, València (Spain), 2015. |
Window Repositioning for Printed Arabic Recognition Journal Article Pattern Recognition Letters, 51 , pp. 86–93, 2015, ISSN: 0167-8655. |
Arabic Text Recognition and Machine Translation PhD Thesis Universitat Politècnica de València, 2015, (Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer). |
Supporting language diversity of European MOOCs with the EMMA platform Inproceedings Proc. of the European MOOC Stakeholder Summit EMOOCs 2015, pp. 157–165, Mons (Belgium), 2015. |
Efficiency and usability study of innovative computer-aided transcription strategies for video lecture repositories Journal Article Speech Communication, 74 , pp. 65–75, 2015, ISSN: 0167-6393. |
2014 |
Applying Machine Learning technologies to the synthesis of video lectures Masters Thesis Universitat Politècnica de València, 2014. |
Evaluación del proceso de revisión de transcripciones automáticas para vídeos Polimedia Inproceedings Proc. of I Jornadas de Innovación Educativa y Docencia en Red (IN-RED 2014), pp. 272–278, Valencia (Spain), 2014. |
Interactive Transcription of Old Text Documents PhD Thesis Universitat Politècnica de València, 2014, (Advisors: Alfons Juan Ciscar and Jorge Civera Saiz). |
Bernoulli HMMs for Handwritten Text Recognition PhD Thesis Universitat Politècnica de València , 2014, (Advisors: Alfons Juan Ciscar and Jesús Andrés Ferrer). |
Multimodal interactive structured prediction PhD Thesis Universitat Politècnica de València, 2014, (Advisors: Francisco Casacuberta Nolla and Alberto Sanchis Navarro). |
Effective balancing error and user effort in interactive handwriting recognition Journal Article Pattern Recognition Letters, 37 , pp. 135–142, 2014. |
Improving on-line handwritten recognition in interactive machine translation Journal Article Pattern Recognition, 47 (3) , pp. 1217–1228, 2014. |
Interactive Handwriting Recognition with Limited User effort Journal Article Intl. Journal on Document Analysis and Recognition (IJDAR), 17 , pp. 47–59, 2014. |
Discriminative Bernoulli HMMs for isolated handwritten word recognition Journal Article Pattern Recognition Letters, 35 (0), pp. 157–168, 2014, ISSN: 0167-8655, (Frontiers in Handwriting Processing). |
Handwriting word recognition using windowed Bernoulli HMMs Journal Article Pattern Recognition Letters, 35 (0), pp. 149–156, 2014, ISSN: 0167-8655, (Frontiers in Handwriting Processing). |
Language model adaptation for lecture transcription by document retrieval Inproceedings Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014), Las Palmas de Gran Canaria (Spain), 2014. |
Using Automatic Speech Transcriptions in Lecture Recommendation Systems Inproceedings Proc. of VIII Jornadas en Tecnología del Habla and IV Iberian SLTech Workshop (IberSpeech 2014), Las Palmas de Gran Canaria (Spain), 2014. |
Evaluating intelligent interfaces for post-editing automatic transcriptions of online video lectures Journal Article Open Learning: The Journal of Open, Distance and e-Learning, 29 (1), pp. 72–85, 2014. |