The first phase in the design of the knowledge base is a general knowledge acquisition phase in which we will follow to some degree the guidelines specified for knowledge-based systems as described, for example, in (Boose90). The goal is to:
In our work, this phase was divided into two stages: an initial inquiry stage and a detailed investigation stage, including an analysis of technical documents (of a specific domain). In the second stage, we interviewed and observed 12 technical writers, translators and decision makers at different phases of the production process of documentation and technical manuals during a 12 month period according to the methods described in (Norman85) and (Card83) (cf. (Ross91) and (Ross92)). These interviews were first open and then semi-structured; the answers and the observations were reported in a protocol book. For the formalisation of the identified tasks, we used a modified question-answering procedure (Graesser78).
The formal model that guided our empirical research is that of (Flower80), which provides a general framework for the writing process. To it, we have added the knowledge acquisition phase of the technical writer for preparing the documentation process, because we observed that the technical writer draws upon a plan or outline to divide the writing task into nearly independent subtasks, chosen so that they can be achieved separately, without constant attention to possible interactions between them. This plan acts as a structural constraint on the delineation and ordering of subtasks. The subtasks are not fully independent; instead they are subject to contextual constraints, such as convention of writing styles, simplified natural language (syntax and semantics), abbreviation consistency or terminology. Another set of constraints is characterised as resource constraints -- the text may need to include particular references, figures and graphics; or one has to avoid certain abbreviations or jargon. A detailed analysis of the production process is described in (Ross93); it consists of 8 main phases:
Our focus is on phases 2-4 and 6 which constitute the actual writing process. In most cases, phase 1 is also executed by a technical writer, but decision makers are also involved in this process. Phase 5 is to be seen as a control phase of the product, according to the planning phase and other constraints of the enterprise. The last two phases are performed by persons other than the technical writer.
Our belief is that a true purpose-driven information repository can only be approached asymptotically. Our construction method involves a stepwise folding in of different information clusters one at a time. We have to distinguish between:
We believe that appropriate interfaces can be designed and implemented with systems that deal with knowledge-based computational terminology, natural language processing and machine translation.
We have to distinguish between the knowledge acquisition phase that builds up a first instantiation of the information repository and the knowledge acquisition phase that tailors the information repository to a specific need. The former is normally done by a knowledge engineer, whereas the latter is done by the technical writer and corresponds to the phase in which the technical writer gathers all the information he needs in order to write a technical document (phase 2 above). This phase has two main tasks:
The first task involves the identification of specific relations among groups of concepts, the clustering of concepts and the modification of their representation for the envisaged tasks, and results in:
In this phase, one has to obey several constraining features, such as norms that, on the one hand, are specified by the customer and that, on the other hand, are inherited by the domain, applicable tools and the sources which supply information. In order to perform this task effectively, the technical writer needs appropriate support in the form of a sophisticated information retrieval tool and tools that allow for consistent treatment of terminology and norms.
The second main task of the technical writer is audience analysis. This task is crucial because technical writing is essentially a message planning task. The quality of the technical document depends on how well the audience analysis has been done. The main goals are to:
The output of this task is:
In practice, technical writers perform audience analysis intuitively. Therefore, there is a need for a methodology on how to adapt the knowledge to different audiences supported by appropriate software tools.
The knowledge acquisition phase for initialising the repository is concerned with the conceptual analysis of the domain, on the one hand, and on the other hand, with the building up of the actual conceptual structure of the domain, and the conceptual engineering of the tasks that are relevant for documentation and manual production. For the former, we have envisaged re-using already existing domain ontologies. By doing this, we shall also prove the feasibility of sharing knowledge resources across projects, which is essential to real world applications. If the ontology contains links to linguistic realisations and formalised concept definitions, it is possible to define the relationships between documentation models and associated tasks, and the ontologies. This, then, allows for a task oriented view over the ontology.
The knowledge obtained in both acquisition phases is then processed in three steps for our purposes:
A further step would be to merge the results with existing interlingual ontologies in order to enable the link to language-dependent linguistic realisations (multilinguality).
The methodology adopted is based not on intuitive grounds about what is and is not `true' about the considered world, but instead on empirical and practical concerns, namely what data, information and knowledge the document production processes require in order to perform different tasks.
After the initial knowledge acquisition process, the resulting knowledge structures contain, declaratively and explicitly represented, those distinctions required to control the document production process in various ways and the acquisition of new information in a guided and tailorable way. The latter is a crucial aspect in KBSs, especially in real world industrial applications. Thus, the knowledge base maintains three different conceptual structures, i.e. a domain model, a document model and a task model, to permit the reusability in other application domains. The links between the models specify the application-oriented relationships between the knowledge sources.
For the actual representational devices of the repository, we are considering, besides classical knowledge representation formalisms, a formalism based on typed feature logics (cf. (Carpenter92)) which will allow for an appropriate (virtual) interface to unification-based natural language processing systems.
In the knowledge organisation phase, information from the identified knowledge sources is layered into appropriate description devices. The goal of the technical writer is to transform the outline from the knowledge acquisition phase into a coherent representation by obeying the same constraints as in the acquisition phase. This involves such processes as:
The output is normally a hierarchy of concepts which correspond to the model of the document and which is formalisable in terms of a document planning language (DPL).
The document production phase consists mainly of writing and reviewing tasks. This phase is realised by linguistic and graphical means according to the kind of document and its envisaged layout. As an intermediate product, the technical writer constructs a block of technical text. He linearises sequences of characters divided into words, lines, sections and paragraphs, etc. He reviews the global organisation of the document, moves and refits large units of text. Then, he focusses on coherence relations among smaller segments within an intermediate-scale frame of reference, such as sections and paragraphs.
The output of this phase is a coherent technical document. The process corresponds to a linguistic encoding of the document plan that was the output of the knowledge organisation phase. The main constraint is to write with the appropriate audience clearly in mind.
In this phase, the technical writer needs linguistic tools, such as a terminology consistency checker, a grammar checker and a spelling checker which reports syntactic and stylistic errors. Well established and useful tools are available on the market, but they only support revisions at or below the level of the sentence and not across sentence boundaries. Therefore, we claim that a discourse information tool is necessary. So far, while much effort has been put into resolving the grammatical and semantic issues in document production, little is as yet known about the role of the explicit signalling of discourse factors in determining the ultimate effectiveness of the produced document.