Grammars which are Re-usable to Automatically Analyze Language

The aim of the EUREKA research project GRAAL (the acronym stands for ``Grammars which are Re-usable to Automatically Analyze Languages) is to provide a linguistic toolbox consisting in modules for Natural Language Processing, that will serve to build various NLP applications of different types.

The budget, drawn over 4 years, exceeds 20 millions ECUs. This project gathers the skills of more than 50 persons (1300 man/month) coming from 11 companies and research organisms from 6 European countries:

  1. GSI-Erli (France), project co-ordinator and leader for France,
  2. AEROSPATIALE (France),
  3. EDF (France),
  4. FIAT (Italy),
  5. Institute for Language and Speech Product (ILSP) (Greece),
  6. Instituto de Linguistica Teorica e Computacional (ILTEC) (Portugal),
  7. IRIT-CNRS (France),
  8. IRST (Italy),
  9. ISSCO (Switzerland),
  10. LINGSOFT (Finland),
  11. NOKIA (Finland).
This project will permit a considerable reduction in the development costs of an application implementing techniques of language automatic processing. Today these applications can only be developed at the expense of considerable efforts, at the level both of grammars and dictionaries. Therefore it is important to try to reduce the costs through the capitalisation and maximum re-use of the various components.

1. The GRAAL ToolBox, with its own grammar writing formalism, should allow industrial partners to benefit from recent results in NLP research. Methods of analysis and representation which have already proved useful in research systems can now be applied in real size industrial system where their utility and dependability can be further desmonstrated.

The theoretical basis of the GRAAL formalism are those of Typed Feature Structures and of unification-based systems, extended with a limited set of constraints}. The grammars can thus be `declarative', `reversible' and `modular'. Grammar modularity is the key concept, allowing the grammar designer to build a `core' grammar, which can then be modified by extensions which are specific to an application or a set of applications.

For dictionaries, GRAAL re-uses the results of the GENELEX Eureka project, regarding models as well as lexicographic resources.

2. A considerable part of this project is dedicated to development, implementation and maintenance tools. As there are software development benches, our objective in this project is to implement a linguistic application development bench allowing to:

  1. finalize generic grammars (development, interactive finalizing, tests, validation, management of releases...),
  2. implement applications using these grammars (characterisation of applications, adaptation of generic dictionaries and grammars, integration),
  3. maintain these applications (quality monitoring, evolution management, non-regression tests...).
3. Beyond the development of basic tools, GRAAL is aimed at effectively implementing pilot applications for the partners who are also users (AEROSPATIALE, EDF, Nokia, FIAT). Applications such as text automatic indexing using various types of reference files (thesauri, term collections...), knowledge extraction (aid to the constitution of terminological bases, thesauri or knowledge bases...), computer-aided translation and machine translation, namely translation of simplified languages, will thus be implemented. The systems which are foreseen will for now be concerned primarily with French, English and Italian.

ISSCO Projects