Some NLP evaluation websites

The updating of this page is under way.
Thank you for your comprehension.

AMARYLLIS : French evaluation campaign for information access/retrieval systems (in French). First edition: 1996-1997, second edition: 1998-1999.

ATIS (Air Travel Information System – DARPA Spoken Language Systems) : evaluation of spoken dialogue systems on given task (1989-1995).

DIET (Diagnostic and Evaluation Tools for Natural Language Applications) : European project, aiming to develop methods and tools for glass box evaluation, for English, French and German. (started 1997).

EAGLES (The Expert Advisory Group on Language Engineering Standards – Evaluation Workgroup) : European initiative, one of whose working groups has proposed a user-based methodology for evaluation. Two phases, EAGLES-I and since 1996 EAGLES-II. (final report)

ELSE (Evaluation of Language and Speech Engineering) : European initiative aiming to define a generic methodology for black-box, semi-automatic, quantitative evaluation.

FRACAS (A Framework for Computational Semantics) : European project that has elaborated a set of 350 DQA tests (DQA : Declarative + Question + Yes/no answer) that illustrate (in English) almost 100 basic semantic phenomena.

GRACE (Grammaires et Ressources pour les Analyseurs de Corpus et leur Evaluation – CNRS) : French evaluation initiative for morpho-syntactic taggers (1997) .

JST-FRANCIL (Journées Scientifiques et Techniques du Réseau FRANCophone de l'Ingénierie de la Langue, programme de l’Aupelf-Uref) : summary of the Concerted Research Actions (ARC) on NLP evaluation (avril 1997, published proceedings).

MUC (Message Understanding Conferences – DARPA) : American evaluation campaigns, black-box, on several tasks related to the instantiation of given templates with informations extracted from texts (published proceedings: MUC-3 to MUC-7 in 1998).

SENSEVAL (Evaluating Word Sense Disambiguation Systems) : European project, automatic evaluation using a probabilistic algorithm and annotated corpora (sept. 1998, proceedings to be published). Sub-project ROMANSEVAL for Romance languages.

SUMMAC (First Automatic Text Summarization – DARPA) : three levels of summarization evaluation, user-based (may 1998).

TDT (Topic Detection and Tracking Project – NIST) : segmentation, detection and tracking of a given subject in an information flow. TDT-1 : 1997, TDT-2 : 1998. (TDT-2)

TEMAA (A Testbed Study of Evaluation Methodologies: Authoring Aids) : European project, aimed at developing a user-based evaluation framework, following the EAGLES action. Application to spell checkers.

TIPSTER Text Program : DARPA effort to advance the state of the art in text processing technologies; formally ended in the Fall of 1998 - Document Detection, Information Extraction, Summarization

TREC (Text REtrieval Conferences – NIST and DARPA) : evaluations in information retrieval. Main track and workshops on more prospective tasks. Proceedings published from TREC-1 to TREC-7 ; TREC-8 ongoing.

TSNLP (Test Suites for Natural Language Processing) : European project, aimed at developing systematic test suites to test syntactic capacities of NLP programs .