Machine Translation Summit IX
|
Time | Title of presentation | Author(s) |
9:00-9:15 | Introduction to the workshop: Systematizing MT Evaluation | Organizers |
9:15-10:00 | Invited talk: Cross-domain Study of N-gram Co-occurrence Metrics | Chin-Yew Lin (USC/ISI, USA) |
10:00-10:30 | Break | |
10:30-11:15 | Granularity in MT Evaluation | Florence Reeder (MITRE, USA) and John S. White (Northrop-Grumman, USA) |
11:15-12:00 | Training a Super Model Look-Alike: Featuring Edit Distance, N-Gram Occurrence, and One Reference Translation | Eva Forsbom (Uppsala University, Sweden) |
12:00-13:30 | Lunch break | |
13:30-14:15 | Task-based MT Evaluation: Tackling Software, Experimental Design, & Statistical Models | Calandra Tate (University of Maryland, USA), Sooyon Lee (ARTI, Inc., USA), and Clare R. Voss (Army Research Laboratory, USA) |
14:15-15:00 | Evaluation Techniques Applied to Domain Tuning of MT Lexicons | Necip Fazil Ayan, Bonnie J. Dorr, Okan Kolakonnie (University of Maryland, USA) |
15:00-15:30 | Break | |
15:30-16:15 | Considerations of Methodology and Human Factors in Rating a Suite of Translated Sentences | Leslie Barrett (Transclick, Inc., USA) |
16:15-17:00 | Pragmatics-based Translation and MT Evaluation | David Farwell and Stephen Helmreich (New Mexico State University, USA) |
Estimating the quality of any machine-translation system accurately is only possible if the evaluation methodology is robust and systematic. The Evaluation Work Group of the NSF and EU-funded ISLE project has created a taxonomy that relates situations and measures for a variety of MT applications. The Framework for MT Evaluation in ISLE (FEMTI) is now available online at http://www.issco.unige.ch/projects/isle/femti/.
The effort of matching these measures correctly with their appropriate evaluation tasks, however, is an area that needs further attention. For example, what effect do user needs have on the functionality characteristics specified in the FEMTI guidelines? To what extent are there unseen relationships in the branches of the taxonomy? How can we judge when a given evaluation measure is appropriate? Issues that come to bear on this question are the automation of MT evaluation, the extension to MT applications such as automated speech-translation, and the evaluation of the very training corpora that an MT system relies on to improve output quality.
This workshop welcomes papers for 30-minute presentations on the comparison between MT evaluation measures, studies of the behavior of individual measures (i.e., meta-evaluation), new uses for measures, analysis of MT evaluation tasks with respect to measures, and related topics on this theme. We solicit submissions to the workshop that address some of the following issues, however any other topic related to MT Testing and Evaluation is also acceptable.
Machine Translation Evaluation MeasuresPaper submission deadline: | May 11, 2003 |
Notification of acceptance: | June 30, 2003 |
Camera-ready version due: | July 31, 2003 |
Submissions consist of full papers up to 8 pages in length. Papers must be submitted electronically to Leslie Barrett (barrett@semanticdatasystems.com) and to Andrei Popescu-Belis (andrei.popescu-belis@issco.unige.ch), preferably in .pdf, .ps, .rtf, or .txt, format.
Andrei Popescu-Belis Last modified: Wed Apr 9 12:54:58 MEST 2003 |