Quality requirements and metrics for the evaluation of MT: 

analysis and integration of expertise

SNSF Project n. 200020-113604 

October 2006 - September 2008

The present project is a continuation of  project Nr.  200021-103318 in the field of evaluation of machine translation (MT) software. The main goal is to build upon the principles and implementation of the FEMTI tool for MT evaluation achieved in the initial project, in order to integrate into FEMTI the existing expertise regarding the definition of quality models for MT, as well as the selection of quality metrics among the numerous options that have been proposed.  The work proposed here covers three main areas: case-study analysis of MT evaluation metrics in terms of correlation and scales; input of expertise into FEMTI’s core correspondence (the ‘generic contextual quality model’) thanks to appropriate experts interfaces  and FEMTI documentation for users, experts, and administrators.  The main objectives are: 

  1. To enhance the FEMTI web-based resource for MT evaluation with information about the behaviour of the most frequently used metrics, in particular about some of the recently proposed n-gram based automatic metrics, and about the correlations between metrics.
  2. To revive the process of consensus building and dissemination that was started by ISLE activities, through an expert group meeting and a paper presentation workshop.
  3. To document the FEMTI guidelines, in relation to recent updates of the ISO/IEC quality-related standards, and to document the FEMTI implementation in order to facilitate its maintenance after the end of the proposed project.


For more information, see the following web pages

FEMTI guidelines, a Framework for MT Evaluation
Former ISLE Evaluation Working Group:

Related publications

  •  Estrella P., Popescu-Belis A. & King M. (2007) - A New Method for the Study of Correlations between MT Evaluation Metrics. Proceedings of TMI-07 (11th Conference on Theoretical and Methodological Issues in Machine Translation), Skövde, Sweden, 10 p.
  • Estrella P., Hamon O. & Popescu-Belis A. (2007) - How Much Data is Needed for Reliable MT Evaluation? Using Bootstrapping to Study Human and Automatic Metrics. Proceedings of Machine Translation Summit XI, Copenhagen, 8 p.
  • A model for context-based evaluation of language processing systems and its application to machine translation evaluation. Popescu-Belis A., Estrella P., King M. & Underwood N., in LREC 2006 (Fourth International Conference on Language Resources and Evaluation), Genoa, Italy, p.691-696.
  • Finding the System that Suits you Best: Towards the Normalization of MT Evaluation, Estrella P., Popescu-Belis A. & Underwood N., in Proceedings of the 27th International Conference on Translating and the Computer, ASLIB, 24-25 November 2005, London.

Back to:  ISSCO -- ISSCO's Projects

Paula Estrella -- Last modified: Mon Aug 20 2007