This project is concerned with the extraction of information from
machine-readable texts, and more particularly information which can be
used to assist in the construction of computer systems able to process
such texts efficiently and accurately.
Results include methods for automatically pairing words and phrases in
translated texts, detecting passages within an extended pair of texts which
may have been incorrectly translated, analysing input whose structure in known
only approximately, and extending a partial grammar by adding rules that cover
previously unrecognized input.
The aim of the project was to (i) develop and evaluate methods which allow to extract information contained in corpora of a restricted domain and (ii) to compare the different representation in order to able to find automatically semantic equivalences between these units. The representation of information has been done by semantic tagging.
Final report (gzipped Postscript version)(written in French).