EAGLES II Workshop
Evaluation in Language Engineering:
Standards and Sharing

Brussels, November 26th-27th, 1997

Please Note:
pre-registration is highly recommended, although registration on the site is possible.

The current round of EAGLES Evaluation work aims at dissemination and consensus building. To this end, we hope to be able to bring together those who are actively involved in or interested by work on evaluation, and to create a forum within which such matters can be discussed. As a first step, the EAGLES Evaluation Group is pleased to announce a workshop to be held in Brussels, on November 26th and 27th, 1997.

In line with previous EAGLES work, the workshop will focus on the development of evaluation methodologies, seen from two different but complementary points of view:

Standardisation of evaluation design

The ISO 9126 standard
ISO/IEC Information technology - Software Product Evaluation: Quality Characteristics and Guidelines for their use. First edition, ISO 1991-12-15

ISO 9126 was issued in 1991 and concerns quality characteristics for the evaluation of software and guidelines for their use. A revised multi-part version, is currently being produced, which introduces the distinction between internal quality, external quality and quality in use, and suggests appropriate metrics.

The production of standard quality characteristics for software as a generic product is clearly directly pertinent to the establishment of quality characteristics for Language Engineering products and systems.

The draft revised version of ISO 9126-1 will be presented by the document editor, Nigel Bevan, and technical feedback actively sought. Further information on ISO 9126 and related standards can be found in the paper on Quality in Use.

The EAGLES evaluation methodology

During the first round of EAGLES work (1993 -1995) the Evaluation Group concentrated on producing a methodology for the design of evaluations of language engineering products. ISO 9126 was one of the main starting points for the work, although the group adapted and specialised the standard to take into account particular aspects of language engineering systems. Within an associated LRE project, TEMAA, a technique for formalizing the description of systems, of quality (sub)characteristics and associated metrics and of users through the use of feature structures was developed, and a prototype Evaluator's Workbench implemented.

The theoretical framework developed within the context of EAGLES was put to practical test in both EAGLES and TEMAA.

The results of EAGLES I and of TEMAA will be presented, and feedback on future directions actively sought.

More detail on EAGLES I work, including a full copy of the final report and its annexes, can be found at the EWG home page

Towards a reference system for speech technology

Performance assessment tests using opinion scales form the basis for rating the speech quality of all telephony transmission systems. Recent work has explored how these techniques may be adapted to evaluate other speech technologies such as Text to Speech and Automatic Speech recognition systems. Denis Johnston of British Telecom will present this work, giving an overview and a critical discussion of a variety of metrics in order to show how and why this approach has been adopted and describing how a new reference system has been defined.

Sharing evaluation know-how, resources and results

Coherent approaches to evaluation design
Karen Sparck Jones, who will present this section, has a long history of close involvement in evaluation, especially in the framework of the ARPA/DARPA programmes. She is co-author of a recent book on evaluation, which will serve as the basis for this session:

Karen Sparck Jones and Julia R. Galliers ``Evaluating Natural Language Processing Systems'', Springer, 1996. (Lecture Notes in Computer Science; 1083)

INUSE and usability evaluation 

The major objective of INUSE is to set up a network of usability Support Centres across Europe to assist both companies and projects within the EC Telematics Applications programme. The basis of this support service is to provide a portfolio of state-of-the-art usability tools and methods which can be applied to a range of application areas. These may relate to user requirements generation (using methods produced by the related RESPECT project), prototyping and design, usability evaluation and implementing new technology.

The project will be presented, and feedback actively sought.

More detail on the work of the INUSE is available.

MEGATAQ and defining user needs 

The MEGATAQ objective is to provide Telematics Applications Projects (TAPs) with evaluation guidelines and consultancy. The evaluation techniques are based on scientific understanding of users' needs and when applied can aid in the development and validation of telematics applications and services.

The project will be presented and feedback actively sought.

More information about MEGATAQ is available.


Participation in the Workshop