Software by the MLLP

Free software for machine learning and language processing developed and released by MLLP members.

TLK: The transLectures-UPV Toolkit

for automatic speech recognition

TLP: The transLectures-UPV Platform

for the integration of automatic transcription and translation in MOOC and media repositories

GIDOC: GIMP-based Interactive transcription of old text DOCuments

A computer-assisted transcription prototype for handwritten text in old documents. Developed within the project iTransDoc.


  • Interactive transcription of old text documents.
  • System training.

Learn more and download

BibTeX for citations:

author = {N. Serrano and L. Tarazón and O. Ramos~Terrades and A. Juan}
title = {{The GIDOC prototype}},
booktitle = {Proc. of the 10th Intl. Workshop on Pattern Recognition in Information Systems (PRIS 2010)},
pages = {82–89},
year = {2010},
address = {Funchal (Portugal)}

TLM: The transLectures-Matterhorn Plug-in

The transLectures Matterhorn Plug-in provides a transLectures Matterhorn Service and a transLectures Matterhorn Custom Workflow in order to integrate the transLectures Platform tools into the Opencast Matterhorn platform. It has been developed and tested for Opencast Matterhorn 1.4.0, but it can be easily extended to support different versions. Developed within the project transLectures.

Learn more and download

BibTeX for citations:

author={The transLectures-UPV Team},
title={{TLM: The transLectures Matterhorn plug-in}},


Another Kit for the building and use of Bernoulli (and diagonal Gaussian) Hidden Markov Models (HMMs).


  • Free HMM-based toolkit for (handwritten) text (or speech) recognition.
  • Supports Bernoulli mixture and diagonal Gaussian mixture HMMs.
  • Core implemented as a dynamic library.

Learn more and download

BibTeX for citations:

author={Giménez Pastor, Adrià and del Agua Teba, Miguel Ángel and Andrés Ferrer, Jesús and Juan Ciscar, Alfons},
title={AK toolkit},



Software for training phrase-based Hidden semi-Markov Models for SMT. Learn more and download.


Similarity Word-Sequence Kernels for Sentence Clustering toolkit. Learn more and download.


A C++ library for Statistical Language Processing tasks. Learn more and download.

BibTeX for citations:

author = {Jesús Andrés-Ferrer and Alfons Juan},
title = {{A phrase-based hidden semi-Markov approach to machine translation}},
booktitle = {Proc. of the 13th Conf. of the European Association for Machine Translation (EAMT 2009)},
pages = {168–175},
year = {2009},
address = {Barcelona (Spain)}

author = {Jesús Andrés-Ferrer and Germán Sanchis-Trilles and Francisco Casacuberta},
title = {{Similarity Word-Sequence Kernels for Sentence Clustering}},
booktitle = {Proc. of the 8th Intl. Workshop on Statistical Pattern Recognition (S+SSPR 2010)},
pages = {610–619},
year = {2010},
address = {Cesme (Turkey)}

author={Jesús Andrés-Ferrer},

Bilingual Text Classification

A software package implementing statistical mixture models for bilingual text classification trained with the EM algorithm.

Learn more and download

BibTeX for citations:

author = {J. Civera},
title = {{Novel statistical approaches to text classification, machine translation and computer-assisted translation}},
school = {Universitat Politècnica de València},
year = {2008},
note = {Advisors: A. Juan and F. Casacuberta}