next previous contents
Next: The document production chain Up: Background Previous: Computing equipment

Computerized tools

Document preparation tools

In what follows and throughout this report, tools is taken in its widest sense to mean any tool which may help the work of SdT as a whole. Thus planning and management tools are included as well as tools more directly related to the production of translations.

One item, which is not strictly speaking a computerized tool nonetheless plays an important role in facilitating computerization of text handling: this is the EUROLOOK standard for document preparation, which is intended to ensure a uniform appearance of texts and whose use also facilitates convertibility between the different text-processing systems in use. (Note that some of the requesting services use Word as their text-processing system, rather than Word Perfect, which is standard throughout the SdT).

At the end of 1994, the following tools were in use.

Word-processing
Word-Perfect 5.2, integrated under Windows is available on the PCs, Q-One on the terminals which are being phased out.
Spelling checkers
spelling checkers are available for all Union languages.
Grammar checkers
the set of checkers that come with Word-Perfect 5.2 are available, but are only installed on request.
Preparation and manipulation of tables
EXCEL
Converters
DIALOGIKA on UNIX/PC.
Preprocessing
the term preprocessing is used to mean automatic replacement in a source text of terms and expressions by their translation in the target language. On the UNIX servers using the Q-One text-processor, preprocessing was done by a programme ``replace'' which uses script-files, bilingual files manually created in the Q-One editor. On the PCs, preprocessing is done using TMan, a piece of software created by AGL 3, to manage bilingual or multilingual lists of terminology/phraseology for automatic replacements in Word Perfect files. This limited distribution (c. 50 installations) tool is also used for pre-Eurodicautom (q.v.) terminology processing and computer-aided repetition analysis and multilingual concordancy of Wordperfect texts, primarily by AGL 3. Certain highly repetitive texts, such as the monthly Bulletin of Commission activities, are systematically subjected to preprocessing as soon as the source texts arrive. TMan is also being used by AGL 3 and the Secretariat-General jointly in pilot operations for the integrated multilingual production of structured texts.
Document comparison software
DOCUCOMP (on some PCs). Document comparison software is important for the management of documents which may come in successive different versions. The problems associated with version control will be further discussed later.

Communications tools

Management tools

Translators' aids

EURODICAUTOM
central terminology data base. EURODICAUTOM is very heavily used: to give an idea, there are around 3'500 accesses to it a day.
PREDIC
An ad-hoc system for the preparation and uploading to the EURODICAUTOM mainframe of provisional (local) terminology.
TERMI
A server-based local terminology data base of limited functionality due for imminent replacement.
SYSTRAN
machine translation system. SYSTRAN is maintained and developed by D.G. XIII with assistance from the SdT. Any user inside the Commission and some authorized users outside the Commission may have access to SYSTRAN by e-mail. The user may obtain a raw SYSTRAN translation with automated search for references in the CELEX legislation data base, or a bi-lingual list of terms generated from a text by the SYSTRAN dictionaries. It should be noted that the EURODICAUTOM terminology is now integrated into the SYSTRAN dictionaries. There is also a service used mainly by officials outside the SdT whereby raw SYSTRAN translation is post-edited rapidly by free-lances. It is also possible for a user of the search system for Parliamentary questions to request an automatic SYSTRAN translation of the answer to the question.

Recent policy has been to concentrate on the development of SYSTRAN as a tool for translating the less disseminated languages into German, English and French. Along the same lines, there are Contracts of Association with some national administrations, e.g. the Greek government, for the development of certain language pairs.

DOCUMENTS
classification scheme for translation dossiers.
COM/SEC/C documents
References for translations. The abbreviations refer to the numbering schemes used for documents. COM documents are Commission documents, SEC documents are proposals for legislation, C documents are Council documents.
Documentary data bases
CELEX, ACTU, ECO1, PRC ...: of these, CELEX, a data base of legislation, court judgements and preparatory documents is of primary importance.
External data bases
some fifty external data bases are available, including, for example, data bases of specialist and general newspapers and reviews.

It is perhaps worth noticing that of all the tools available, some are used much more heavily than the rest. EURODICAUTOM and CELEX in particular are frequently consulted and much appreciated.

Multilingual generation tools

Maintenance, development and future plans

All of the computing machinery and tools listed above of course require maintenance. Many of the tools are also affected by the change from UNIX and Q-One to PCs with windows and Word-Perfect; their adaptation is either completed or under way. Other tools evolve with time or the changing needs of the environment. Here, we simply pick out some of the important developments planned for the near future. This section will also begin to take us into some of the problem areas connected with present and future functioning of the SdT.

POETRY
Some modifications are needed to the pilot installation currently in use, after which it is planned to install the system throughout most of the Commission services. Building up a large electronic corpus of texts and translations is an essential pre-condition for the use of many new translation tools, of which translation memories are only the most obvious. Widespread use of POETRY will contribute to the creation of such a corpus.

However, encouraging the exchange of documents by electronic means does raise some problems. First, it is important that requests going through POETRY are also recorded in SUIVI. This is linked with one of the very common work-flow management problems caused by working with electronic means: because there is no physical object in the form of a pile of paper, people forget to send requests or send them twice, and similarly forget sometimes that a document has been received. The solution here is a change in working habits, but experience with other electronic tools shows that the change can take time. Another problem is the validation of authorisation of a request for translation: signatures are not easily sent by electronic mail. On a more banal level, printing and photo-copying may lead to organisational problems. On the non-electronic system, requesters prepare as many copies of a document as the number of languages they request translations into, plus one extra. With an electronic system, this burden is passed to the translation services, who must print out and copy the documents. This problem is aggravated by the difficulty experienced in the SdT with keeping secretarial staff. The problem would, of course, disappear if all translations were done completely electronically. But several of the translators interviewed pointed out that even if they work on the screen, they like to have a paper copy of the translation at their side to refer to, partly because scrolling backwards and forwards on the screen is tiresome and inefficient, partly in order to have an overview of the text easily available, partly because looking at a screen all the time is very tiring.

MULTIDOC
It is intended to integrate a number of existing documentary data bases (ACQUI, DOTBIB, GESPER, TRADOC, PC-BIB).

Document Server
This is a project to create a document server. In fact the server will be a virtual entity made up of all the hardware, software and organisational tools needed to manage electronic documents within the SdT. Eventually, the document server will be a repository for all original and reference documents, as well as for all translations.

A very frequently mentioned problem with the current functioning of the SdT should be mentioned in connection with the document server. A document receives an identity when it arrives in the Translation Services and is given a translation number. In some cases, this number disappears when it leaves the services and is replaced by a COM, SEC or C number. Moreover, a single SdT document does not always correspond to a single COM or SEC document: a COM document may very well consist of several SdT documents put together. The contrary may also sometimes be the case. Furthermore, the text and its translations may subsequently be modified, by the requester, by the Legal Services or by the Council. Sometimes, but by no means always, the modifications may come back to the Translation Services. If they do not come back, the translations stored in the SdT archives do not correspond to the definitive version which is eventually published in the Official Journal. The translation may in any case be difficult to trace because of the change in numbering. The COM, SEC, C tool mentioned earlier is a partial response to this problem, but the only way to be absolutely sure that the definitive version is available for future reference would be to have access to the Official Journal versions, and perhaps to be able to search them by key-word rather than by document numbers which are susceptible to change. Full-text search would also be useful. Unfortunately, making the text of the Official Journal available in electronic form for archiving and retrieval involves a certain number of legal problems, since the Publications Office has separate contracts with a number of different publishers for publication of the Official Journal. CELEX makes the official versions of some documents available, but the problem was nonetheless mentioned often enough to suggest that this is still felt to be a pressing problem.

It should be noted too that the practice of modifying translations after they have left the Translation Services can be a source of considerable frustration for the translators, who sometimes see their work ``corrected'' in such a way as even to introduce grammatical errors. And since the published version is the official version, this can also lead to having to quote the offending passage!

SEI-BUD
This is an integrated document preparation system for the draft budget proposal. Three main actors are involved: D.G. XIX (the authors), the Publications Office (printing) and the SdT.

The document consists of some 1,700 pages, and has up until now been prepared literally by using cut and paste on paper to introduce the modifications which take place each year. To increase the burden, even in the literal sense, the operation was carried out on A3 paper. The whole text was then re-typed in the Publications Office.

A prototype of the system allows the whole to be stored in a central repository as a single document, in the form of ``editorial objects'', coded in SGML. Each object contains a segment of text. Other information such as what part of the whole this object is, what language it is in, what version it is, who is working on it and so on is associated with the object. A set of filters allows the various actors to use their own tools: WinWord, Word-Perfect, Excel, Interleaf. This avoids re-typing, and also allows translation of parts which are unlikely to change to proceed without waiting for the whole document to be finished.

The system was successfully put into operation for the first time in 1995. Reactions from the three main actors are positive.

D.G. XIX had to change their working habits from working on paper to working on an electronic document, and had to hire auxiliary typing help to get the data entry done. Nonetheless, the advantage of being able to dispose immediately of a clean electronic copy for internal distribution outweighed the minimal rise in cost.

For the SdT, the improvement was very noticeable. It proved possible to meet the deadlines easily, and since the document entered mainline production, it caused no major perturbation to the normal functioning of the service.

The Publications Office suffered most problems, primarily due to the printer not being able to adapt to processing SGML files rather than introducing all data from scratch. Consequently, the time savings which had been foreseen did not materialize, although the deadlines were kept. Experience should help to iron out most of the difficulty.

The system will now be extended to include the other two Institutions, Council and Parliament. It will also serve as a model for a new project, SEI-Leg (for legislation) to provide for a central repository of Commission documents being worked on.

All in all, the experience proved the usefulness of stocking documents in SGML, thereby rendering them immune to different word-processing systems used by the different actors and to changes over time coming from successive versions.

APEX
APEX is the management system which is used to co-ordinate free-lance translation. (We have already noticed that at the end of 1994 about 17 of the total volume of translation was carried out by free-lances, and that this percentage has grown to about 20 in 1995. In the longer term, the proportion may be as much as 30).

The procedure for selecting free-lance translators has recently changed. Previously, most translation units had a group of free-lances with whom they worked regularly. The free-lances were almost ``distant colleagues'', who became used to the texts dealt with by a particular service and to Commission procedures and in-house jargon as well as to the specific terminology of the work of the unit.

However, the volume of work passed to free-lances became so large that a new procedure, aiming at transparency in the selection procedure, had to be put in place. Jointly with the Parliament translation services, a call for expressions of interest was published. The three thousand or so individuals or agencies who responded were asked to translate a brief text for each of their languages, and those whose work was satisfactory are now recorded on a central register. This implies that any translation unit in the SdT can, subject to price constraints, call on the services of any free-lance, as can also the services of Parliament. We shall see later that the change in procedures has given rise to some disquiet amongst the Heads of Department and those responsible for central or group planning. Here we simply note that the change in procedure implies re-design of APEX.

SUIVI
Follow-up, the re-design SUIVI, the work-flow management system used to manage the passage of documents through the translation services, is well under way. One feature of the re-design is to permit information to be down-loaded into local versions to be used by the translation units in order to facilitate their own day-to-day management.

Page Counter
This project will investigate the use of a page counter. This question is not so straightforward as it may seem to the non-translator.

First, all pages of translation are not equal. Some texts are a great deal easier to translate than others. For example, an experienced translator, dictating, and working with a type of text with which he is very familiar, has been known to produce up to fifty pages of translation a day. The same translator, working on a different kind of text may be lucky if he produces one page, especially if he runs into problems of terminology which require considerable research or if the author of the original does not write the most lucid prose.

Then there is the question of what exactly to count: many documents come in successive versions, and only the modifications have to be translated from one version to another. If a three hundred page document contains five minor modifications, it would be mis-leading, to say the least, to say that three hundred pages were translated. On the other hand, it might also be misleading to count only the volume of the modifications: translating one sentence may involve reading the whole section in which it is embedded or even more to get a sufficient context.

Translator's Workbench Systems
A number of commercial products are currently being evaluated within the Translation Services.

Once again, though, the particular work context of the SdT creates special needs. The products currently on the market are primarily oriented towards a single-user environment, where one translator gradually builds up his own archive of texts. Another factor in a large translation service is that staff sometimes suffer from frequent interruptions in their work. Not all tools tolerate interruptions well: sometimes a user has to finish a job completely or abandon it and re-start from scratch, instead of being able to leave it for a while in mid-task.

Within a very large translation service, creating a translation memory is far from being straightforward. It would be naive to think that any translation done could simply and automatically be used to feed the translation memory. It may be that the new translation is not consistent with previous translations of the same or very similar text elements, and that in fact the previous translations are preferable. It may be that two people in two different departments are simultaneously translating very similar texts; the probability that they will produce the same translation is very low, and adding both to the archive may unnecessarily increase the amount of redundant material retrieved when the memory is consulted. It may be that the new translation is subsequently to be revised, and that it would be inappropriate to archive it before revision. If the reviser is not working on an electronic copy, there may be difficulties in any case in capturing his revisions. Thus, there is a series of questions to do with validation of the translation which have consequences for how the system can be used.

One suggestion was that it would be helpful to make a strong distinction between an archiving system, a working system, and the individual's own previous copies.

There are also, of course, all the storage and retrieval problems associated with a translation service that produces more than a million pages of translation a year. It is worth mentioning that one of the people interviewed pointed out that there is already a memory problem. Documents cannot be kept in electronic archives allowing easy access indefinitely, because there is not enough space. It sometimes happens that a document which has been removed from the live archives to tape archives is needed. It is possible to get it with the help of the computer staff, but that can sometimes take more time than is available.

Local Terminology Tools
Again, a number of tools for use on the PCs to allow a user to create and use his own local terminology base are under evaluation, and individual users or groups of users have already created local terminology bases on their own initiative. Intuitively, such local terminology bases are a valuable potential source of material for the central terminology resources. But questions of validation very similar to those raised above in the context of translation memory systems arise here too.

With terminology other additional factors affect the issue. Several of the people interviewed mentioned that translators were sometimes reluctant to share terminology. This is not only because of a proprietory attitude to the fruit of their own labours. A translator may know that he has been forced to produce a translation in a hurry, and be somewhat uncomfortable about some of the solutions he has adopted, at least feeling that he would have liked more time to mull the problems over or to do research. In these circumstances, he will obviously be unhappy with any system which automatically seizes his solutions to feed a common resource.

On the opposite side, because any translator is aware of having been sometimes forced by urgency to take short-cuts, he can lack confidence in other people's solutions: they too may, after all, not have had the time to search for the really good solution. He will want at the very least to know the source of the solution.

At the time of writing, a strategy to deal with the practical issues of combining local data with general availability is being discussed. The proposal is that local data should have a structure which is a subset of the EURODICAUTOM structure, so that immediate uploading capability is available, and that individual translators should be encouraged to upload their local files into a central area. Subsequently the local data will be taken from the local area and validated by the terminologists before being included into the generally available terminology. The open question is still, of course, how to encourage translators to upload their material. One way, it is felt, is ensure that updating of the generally available terminology is done speedily, so that the individual translator can see the results of his collaboration made concrete in terms of improved general resources within the space of a couple of months.

New organisational structures
A very recent innovation is the creation of new organisational structures within which work on new tools can be concentrated. The aim is to develop and continuously test new tools by using them to translate suitable types of documents, in particular repetitive and/or urgent texts.

Two different but parallel structures are foreseen, a "Translation Workshop" ("Atelier de Traduction") in Brussels, and a "Modernisation Network" ("Réseau de Modernisation") in Luxembourg.

Translation workshop
The strategy here is to bring together translators and linguists from AGL (both assigned on a voluntary basis for one year) with some secretarial support to constitute a new task force of some twenty staff. One of the main ideas here is that the volunteers will become specialists in the use of new tools and will be able to pass their expertise to their own unit when they return to it.

Modernisation Network
The strategy here is that participants in the network stay in their own units, with a loose structure holding them together. The network is coordinated by a member of the AGL staff. The idea is to check how integration of new tools, mostly pre-processing and translators' workbenches can work within the existing structures of the Service thus avoiding the need to create new structures.

Inter-Institutional Collaboration
Inter-Institutional Collaboration touches on three main issues.

The first of these is recruitment and training, where the possibility of organising common entry examinations is being investigated, and where training courses are being jointly organized to ensure that each course can reach the critical mass required.

The second of these is concerned with complementarity in management, the idea being that the translation service of one Institution can help out that of another in time of need, or, for example, during the summer when staff are less available and when work loads are very variable across the Institutions.

The third concerns terminology, documentation and new computer aids. There is considerable activity in these areas.

Any translation service faces a problem of obtaining, validating and stocking terminology. A very large translation service dealing with documents in very many different areas and needing to ensure consistency both over large groups of people and over large numbers of documents faces the problem to an even greater degree. We shall return to some of these problems later.

Here, though, we should mention projects to alleviate some of the problems through collaboration with the other European Institutions. The Council of Ministers' terminology base, TIS, can already be consulted, and an update procedure is being implemented.

More ambitiously, a decision to create a single terminological data base for the European Union has been taken; implementing the decision will involve tackling and resolving a number of quite complex organisational issues.

The main issue in documentation is the creation of a single numbering scheme for document identification. (We have already mentioned the complications caused by COMM, SECC and C numbers and their failure to correspond). This too is quite a complex issue, and will have to be resolved at the level of the Secretary General.

There are also on-going inter-institutional discussions on document archiving and translator's workbenches, which are intended to lead to creating common technical specifications and to joint calls for tenders for translators' work bench tools.

Information access and exchange
There are plans to create a bulletin board service accessible to all SdT staff, which will allow them to post notices and exchange information.

Pilot tests of access to INTERNET have also been carried out and are continuing, although on a fairly limited scale.

A CD-Rom server will be put into prototype service in 1995.

Distribution of tools

Most tools are distributed freely to those who request them. However, some tools are perhaps best suited to be used by specialists, and there is some feeling that their distribution should be more limited. Examples include some machine-aided translation tools which are not yet totally satisfactory, being perhaps slow and working through an interface which is not very intuitive.

Some tools by their nature require varying levels of access for different people. SUIVI here is an obvious example. Another example is certain data management tools, such as terminology bases, translation archives and so on, where some users will have read-only access, some add but not modify access and so on.

It should also be noted that where a variety of tools for doing roughly the same job are commercially available, for reasons of support and maintenance, a user cannot necessarily have the particular tool he would like, but only the tool which has been centrally approved. As might be expected, people do not always agree with the central decision, and may sometimes buy and install the tool of their choice.


next up previous contents
Next: The document production chain Up: Background Previous: Computing equipment

ceditor@tnos.ilc.pi.cnr.it