Active2Trans: Active Interaction for Speech Transcription and Translation
Period: 1/2/2013 – 31/1/2016
Research project supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under reference no. TIN2012-31723.

The Interactive Paradigm (IP) has become an important research field in the Pattern Recognition framework aiming at developing collaborative systems in which users are assisted by automatic systems to produce correct solutions. Generally speaking, interactive systems are those in which system predictions are iteratively proposed to the user and user interactions are devoted to fix possible output errors. This iterative process finishes when the system prediction corresponds to the result that the user has in mind. A relevant advantage of IP is that user feedback can be used to improve performance in two ways: adapting the system to the changing environment, and generating better predictions based on user-validated data. Another advantage is the opportunity to use multimodality as a natural property of interaction. This tight collaboration enables us to reduce user effort in the generation of correct solutions.

Although there have been important research efforts in developing interactive systems over the last years, current systems before this project presented very limited interaction features mainly based on passive interaction protocols. Passive protocols are those in which the user decides the elements to be amended and the interaction is performed usually following a sequential left-to-right order. On the other hand, adaptation techniques can only be exploited based on user interactions.

Active protocols present a wide range of interaction strategies and adaptation techniques that can be developed. In active interaction, the system actively selects low confidence parts of the solution and asks the user for supervision. The objective is to optimally exploit user supervision, while minimising user effort and error rates. Alternatively, the user may require a specific degree of accuracy in the final outputs. In this case, the system estimates the accuracy while interacting with the user and asks the user to stop supervising when the required degree of accuracy has been achieved. Although active interaction has provided promising results in recent years, they have been only slightly explored.

Active2Trans aimed at investigating open research challenges in active interaction. The first one was to abandon the sequential left-to-right order in user supervision to study alternative supervision strategies that reduce user effort and fully exploit user supervision. Thus, novel search algorithms were studied in order to produce predictions based on non-consecutive words. Moreover, new interaction strategies were evaluated in which the user effort could be better exploited. As a result, the system can benefit from different degrees of user supervision: supervised, semi-supervised, or unsupervised. The second challenge was to investigate adaptation techniques in which system models could be improved using both user-validated data and high-confident system predictions. Finally, multimodal interaction by means of speech was explored as a more sophisticated mode of interaction, enhancing overall system performance and usability.

The research tasks of Active2Trans were developed in tight collaboration with two related EU projects. This guaranteed that outstanding data resources were available for the successful achievement of the project objectives, and also allowed for technology transfer to real-life applications. The project has contributed significant advances to the state of the art in IP.