Multilingual Corpora for Cooperation

MLCC is a corpus acquisition project under the EAGLES program for Int'l Science and Technology Cooperation, funded by the EC Telematics program and the Swiss Federal Government.

The aim was to collect a set of texts representing a substantial improvement in range, quantity and quality of corpus material available. Two sub corpora have been defined to meet current needs for multilingual data consisting of a comparable set of texts in six languages and a parallel set of data in 9 languages.

The comparable text collection includes financial newspaper articles from the early '90s. The parallel data is taken from the Official Journal of the CEC, sub-series Written Questions to Parliament and the Proceedings of European Parliament.

The data has been converted to an SGML, TEI-conformant mark-up. Negotiations are underway for distribution of the data.

Participants: LTG, Edinburgh and ISSCO with coordination by CNR,Pisa