Software by the MLLP

Free software for machine learning and language processing developed and released by MLLP members.

TLK: The transLectures-UPV Toolkit

for automatic speech recognition

TLP: The transLectures-UPV Platform

for the integration of automatic transcription and translation in MOOC and media repositories

GIDOC: GIMP-based Interactive transcription of old text DOCuments

A computer-assisted transcription prototype for handwritten text in old documents. Developed within the project iTransDoc.


  • Interactive transcription of old text documents.
  • System training.

Learn more and download

TLM: The transLectures-Matterhorn Plug-in

The transLectures Matterhorn Plug-in provides a transLectures Matterhorn Service and a transLectures Matterhorn Custom Workflow in order to integrate the transLectures Platform tools into the Opencast Matterhorn platform. It has been developed and tested for Opencast Matterhorn 1.4.0, but it can be easily extended to support different versions. Developed within the project transLectures.

Learn more and download

Another Kit for the building and use of Bernoulli (and diagonal Gaussian) Hidden Markov Models (HMMs).


  • Free HMM-based toolkit for (handwritten) text (or speech) recognition.
  • Supports Bernoulli mixture and diagonal Gaussian mixture HMMs.
  • Core implemented as a dynamic library.

Learn more and download

Software for training phrase-based Hidden semi-Markov Models for SMT. Learn more and download.


Similarity Word-Sequence Kernels for Sentence Clustering toolkit. Learn more and download.


A C++ library for Statistical Language Processing tasks. Learn more and download.

Bilingual Text Classification

A software package implementing statistical mixture models for bilingual text classification trained with the EM algorithm.

Learn more and download

