Some NLP evaluation websites

Andrei.Popescu-Belis@issco.unige.ch

The updating of this page is under way.
Thank you for your comprehension.

AMARYLLIS : French evaluation campaign for information access/retrieval systems (in French). First edition: 1996-1997, second edition: 1998-1999.
http://www.inist.fr/accueil/profran.htm

ATIS (Air Travel Information System – DARPA Spoken Language Systems) : evaluation of spoken dialogue systems on given task (1989-1995).

DIET (Diagnostic and Evaluation Tools for Natural Language Applications) : European project, aiming to develop methods and tools for glass box evaluation, for English, French and German. (started 1997).
http://www.dfki.de/lt/projects/diet-e.html

EAGLES (The Expert Advisory Group on Language Engineering Standards – Evaluation Workgroup) : European initiative, one of whose working groups has proposed a user-based methodology for evaluation. Two phases, EAGLES-I and since 1996 EAGLES-II.
http://www.ilc.pi.cnr.it/EAGLES/home.html
http://www.issco.unige.ch/projects/ewg96/ewg96.html (final report)
http://www.cst.ku.dk/projects/eagles2.html

ELSE (Evaluation of Language and Speech Engineering) : European initiative aiming to define a generic methodology for black-box, semi-automatic, quantitative evaluation.
http://www.limsi.fr/TLP/ELSE/

FRACAS (A Framework for Computational Semantics) : European project that has elaborated a set of 350 DQA tests (DQA : Declarative + Question + Yes/no answer) that illustrate (in English) almost 100 basic semantic phenomena.
http://www.cogsci.ed.ac.uk/~fracas

GRACE (Grammaires et Ressources pour les Analyseurs de Corpus et leur Evaluation – CNRS) : French evaluation initiative for morpho-syntactic taggers (1997) .
http://www.limsi.fr/TLP/grace/

JST-FRANCIL (Journées Scientifiques et Techniques du Réseau FRANCophone de l'Ingénierie de la Langue, programme de l’Aupelf-Uref) : summary of the Concerted Research Actions (ARC) on NLP evaluation (avril 1997, published proceedings).
http://www.limsi.fr/Recherche/FRANCIL/frcl.html

MUC (Message Understanding Conferences – DARPA) : American evaluation campaigns, black-box, on several tasks related to the instantiation of given templates with informations extracted from texts (published proceedings: MUC-3 to MUC-7 in 1998).
http://www.muc.saic.com/

SENSEVAL (Evaluating Word Sense Disambiguation Systems) : European project, automatic evaluation using a probabilistic algorithm and annotated corpora (sept. 1998, proceedings to be published). Sub-project ROMANSEVAL for Romance languages.
http://www.itri.brighton.ac.uk/events/senseval/
http://www.lpl.univ-aix.fr/projects/romanseval/

SUMMAC (First Automatic Text Summarization – DARPA) : three levels of summarization evaluation, user-based (may 1998).
http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/sumslides.htm

TDT (Topic Detection and Tracking Project – NIST) : segmentation, detection and tracking of a given subject in an information flow. TDT-1 : 1997, TDT-2 : 1998.
http://www.nist.gov/speech/tdt98/tdt98.htm (TDT-2)

TEMAA (A Testbed Study of Evaluation Methodologies: Authoring Aids) : European project, aimed at developing a user-based evaluation framework, following the EAGLES action. Application to spell checkers.
http://www.cst.ku.dk/projects/temaa/temaa.html

TIPSTER Text Program : DARPA effort to advance the state of the art in text processing technologies; formally ended in the Fall of 1998 - Document Detection, Information Extraction, Summarization

http://www.itl.nist.gov/iaui/894.02/related_projects/tipster/

TREC (Text REtrieval Conferences – NIST and DARPA) : evaluations in information retrieval. Main track and workshops on more prospective tasks. Proceedings published from TREC-1 to TREC-7 ; TREC-8 ongoing.
http://trec.nist.gov/

TSNLP (Test Suites for Natural Language Processing) : European project, aimed at developing systematic test suites to test syntactic capacities of NLP programs .
http://cl-www.dfki.uni-sb.de/tsnlp/