One of the major benefits sought from the developing EAGLES evaluation framework is that of reusability. Reusable evaluation resources not only save effort in the invention of the wheel, but promote comparability and standardisation by the use of similar methods in different evaluations. The Parameterisable Test Bed (PTB) sketched in section Towards formalisation and automation of the main report foresees libraries of attributes and associated measures and methods being built up, together with guidelines on how to put them together to make up a new evaluation. From the requirements side, too, such libraries can be envisaged. This section sketches, in the context of the application areas covered in the EAGLES work so far (grammar checkers, spelling checkers, and translators' aids), some aspects of reusability that should be relevant to the requirements analysis process, and could be included in an envisaged `PTB+'.
We will look at aspects of the requirements analysis process covered in the last section, and for each, consider what use can be made of commonalities, and what constraints we need to place on representations in an envisaged PTB+ to make this possible.
In the previous section, the first step was to construct a description of the task at a level that provides a reasonably valid representation of user requirements, separate from system considerations. It was suggested that it is useful to start from a simple process model of the situation, which represents the data flows and the rôles of data transforming processes in the setup.
For tasks which are essentially document transformation filters, this is relatively straightforward, since the state of the document before the filter is a given (dependent on the setup), and the required state of the document after can usually be determined by analogy with human processes. The system under test can be evaluated, on this highest level, in terms of comparisons between two document types, the input and output to a process very much like that illustrated in Figure C.5.3. All proofing-reading, including spelling checks and grammar checks, can be described at this level of abstraction by the same task model. Even translation can be accommodated to the same model. The task description might be given as
(F1) The filter transforms the text produced by the writer into the text required by the reader.
The commonality can be maintained at the level of the next process model, which includes a human and software co-operation in the task, and hence allows evaluation of the quality of advice. The task model at this level has the basic structure of a particular sub-type of text transformation systems, namely interactive or computer-assisted text transformation systems, and once again this is more or less applicable to both writers' aids and translators' aids.
Figure C.6 illustrates the relationships.
Figure C.5: Part of an Abstraction Hierarchy for Text Transformation Systems
The benefit of identifying the commonality is that once a common task model at a particular level of abstraction has been developed, requirements for particular quality characteristics that are relevant at that level can be associated with that model, and are then available as prompts to suggest the investigation of their relevance for new evaluations with setups which appear to match the existing task model. Identifying hierarchies of task models and associated requirements allows potentially relevant requirements to be inherited from higher levels. For instance, for all nodes underneath Semi-automatic in Figure C.6, it will be relevant to consider quality requirements to do with the suitability of the information presented by the system for the particular end-user. Some simple quality requirements can be fully expressed at this level, for instance, the requirement that the language of the information presented by the system be one in which the end-user is adequately profficient. This is independent of whether the information is about suggested corrections or suggested translations.
Task descriptions, such as (S1), (S2) and (F1), consist of a number of data processing rôles (writer, reader, filter, editor...) and data types (text, error...). These have parameters for any factors associated with these rôles that affect requirements. These rôles and data types can be thought of as arranged in inheritance hierarchies, so that for example all data types that are subtypes or specialisations of text will have a parameter language; this serves as the basis for a simple quality requirement that can be applied to all systems of this type, namely that the nominal language which the system deals with should be that of the text it is required to process.
The language of the end-user and of the system presentations are parameters that ought to live at the level of node Semi-automatic, together with the quality characteristics that depend only on them. More detailed requirements about the style or nature of the information belong lower down in the tree, where the task descriptions become more specific and requirements can be expressed in terms of suggestions for replacements, or diagnoses of errors.
New evaluations can be classified in a hierarchy like that in Figure C.6 by characterising the tree as a discrimination network based on differences in the values of parameters as well as the basic structures of the process models. If a new evaluation `fits' down to an existing level, but doesn't fit any of the children of that level, a new child node can be constructed. The methods by which its sibling nodes were decomposed from the common parent may be used as guidelines about how to fill in the task model and requirements at the new node.
For instance, if we have evaluations for spelling checkers in a number of languages (French and English, say), we may want to add one for a new language (Danish, say). At the very top level, as it happens, the text input and output to the filter process has a parameter language; we would realise at this point that none of the available values were suitable. Then we'd work our way through the discrimination tree to the point where the difference in language makes a difference to a decomposition of a task description, somewhere below the Spelling node where the specific errors begin to be described at a level that is language-specific. Having submitted a new value for the language parameter, none of the relevant children at this level would match, since they would have values filled in for the specific language in which they dealt. If, however, there are specific knowledge acquisition methods associated with the decomposition of the idea of spelling errors into English spelling errors and French spelling errors, such as corpus gathering and analysis methods, or even pointers to types of literature that might contain relevant information, these `intensional' descriptions can be reused for the new language.
It is not yet clear what the best way of formalising the structure of task descriptions and requirements elements is. The foregoing discussion is clearly illustrative, suggestive even, rather than precise. There appear to be a number of representations or approaches that might be suitable; systemic networks, models of inheritance in the lexicon, straightforward object-oriented methods and classic KR techniques should be considered for further elaboration of the sketches here.