Some NLP evaluation websites
The updating of this page is under way.
Thank you for your comprehension.
AMARYLLIS : French evaluation campaign for information access/retrieval systems (in French). First edition: 1996-1997, second edition: 1998-1999.
ATIS (Air Travel Information System DARPA Spoken Language Systems) : evaluation of spoken dialogue systems on given task (1989-1995).
DIET (Diagnostic and Evaluation Tools for Natural Language Applications) : European project, aiming to develop methods and tools for glass box evaluation, for English, French and German. (started 1997).
EAGLES (The Expert Advisory Group on Language Engineering Standards Evaluation Workgroup) : European initiative, one of whose working groups has proposed a user-based methodology for evaluation. Two phases, EAGLES-I and since 1996 EAGLES-II.
ELSE (Evaluation of Language and Speech Engineering) : European initiative aiming to define a generic methodology for black-box, semi-automatic, quantitative evaluation.
FRACAS (A Framework for Computational Semantics) : European project that has elaborated a set of 350 DQA tests (DQA : Declarative + Question + Yes/no answer) that illustrate (in English) almost 100 basic semantic phenomena.
GRACE (Grammaires et Ressources pour les Analyseurs de Corpus et leur Evaluation CNRS) : French evaluation initiative for morpho-syntactic taggers (1997) .
JST-FRANCIL (Journées Scientifiques et Techniques du Réseau FRANCophone de l'Ingénierie de la Langue, programme de lAupelf-Uref) : summary of the Concerted Research Actions (ARC) on NLP evaluation (avril 1997, published proceedings).
MUC (Message Understanding Conferences DARPA) : American evaluation campaigns, black-box, on several tasks related to the instantiation of given templates with informations extracted from texts (published proceedings: MUC-3 to MUC-7 in 1998).
SENSEVAL (Evaluating Word Sense Disambiguation Systems) : European project, automatic evaluation using a probabilistic algorithm and annotated corpora (sept. 1998, proceedings to be published). Sub-project ROMANSEVAL for Romance languages.
SUMMAC (First Automatic Text Summarization DARPA) : three levels of summarization evaluation, user-based (may 1998).
TDT (Topic Detection and Tracking Project NIST) : segmentation, detection and tracking of a given subject in an information flow. TDT-1 : 1997, TDT-2 : 1998.
TEMAA (A Testbed Study of Evaluation Methodologies: Authoring Aids) : European project, aimed at developing a user-based evaluation framework, following the EAGLES action. Application to spell checkers.
TIPSTER Text Program : DARPA effort to advance the state of the art in text processing technologies; formally ended in the Fall of 1998 - Document Detection, Information Extraction, Summarization
TREC (Text REtrieval Conferences NIST and DARPA) : evaluations in information retrieval. Main track and workshops on more prospective tasks. Proceedings published from TREC-1 to TREC-7 ; TREC-8 ongoing.
TSNLP (Test Suites for Natural Language Processing) : European project, aimed at developing systematic test suites to test syntactic capacities of NLP programs .