TSNLP - test suites for NLP

The TSNLP (Test Suites for Natural Language Processing) project is also an LRE project. It started in December 1993 and ended in October 1995. The partners in the project are researchers from the University of Essex, DFKI GmbH (Saarbrücken) ISSCO (Geneva) and Aérospatiale (Paris). The project is concerned with the design and use of test suites in NLP (See also (Balkan94)).

The background of the project is an increasing demand for large, systematic and well-documented test suites for use in evaluation and development of linguistic applications. However, test suites which meet these demands do not exist presently. Existing test suites are characterised by lack of morphological, semantic and extragrammatical phenomena, poor systematicity in testing ill-formed constructions and the co-occurrence of different phenomena, scanty documentation and annotation, and little generality. The results of applying such test suites are sometimes hard to interpret, and reuse of them is difficult.

The project's main effort consists in the production of guidelines for test suite construction and substantial amounts of test data in three languages, i.e. English, French and German. Various tools for generating test suites automatically and for lexicon replacement have also been developed during the course of the project. The results of the project are in the public domain. The main part of the test suites produced covers core syntactic phenomena and is intended for testing systems involving general syntactic processing. The remaining part has been developed for specific applications, e.g. parsers, grammar checkers and controlled language checkers.

According to the guidelines for constructing test suites developed and used by TSNLP, a limited vocabulary should be used, syntactic and semantic ambiguities should be avoided and sentences should be short and simple. The latter can be achieved by e.g. using declarative sentences in the present tense and avoiding the use of modifiers and adjuncts. To make sure that ill-formed sentences for the phenomenon to be tested are also included in the test suite, the parameters of the phenomenon should be identified and varied systematically in order to achieve an exhaustive representation of the phenomenon.

A key issue in TSNLP has been the development of an annotation scheme for annotating test suites. The annotation scheme is intended to make test suites more general and reusable. The TSNLP annotation scheme includes information on the string of words, its length, category, functional analysis and well-formedness, etc.

The project points out the advantages of test suites over corpora. These include e.g. data which is focused on specific phenomena in isolation or controlled combinations, data which reflects systematic variations over a phenomenon, non-redundant data, negative data and annotation. However, it is stressed that the two types of resource should complement each other. Another issue is the relation between test suites and evaluation type. The project points out that test suites are not exclusively useful for diagnostic and progressive evaluation. It envisages that test suites can be useful for adequacy evaluation, if they are designed in a way that takes the frequency and relevance of tested phenomena into account. This is done by providing annotations for the relevance of a test phenomenon for a particular application type, and for its frequency and weighting for a particular system and text type. However, it has been outside the scope of TSNLP to actually provide values for these last features.

