After this top level definition of quality, we need to turn to a consideration of the systems we are interested in evaluating. Figure C.5.4 shows the place of the systems under consideration in the new task model:
Figure C.4: S3: Introduction of computational system.
At this level of analysis, another set of quality questions becomes relevant. The rôle of the human editor, and the relations between the advice from the system and that editor, becomes available for analysis. New quality requirements can be defined in terms of the new concepts introduced. Further knowledge acquisition is required to determine the possible variable elements in different types of human editor in terms of what kind of advice is useful. At this level of analysis too, all sorts of non-functional quality characteristics of the system become relevant, from usability to compatibility with existing software environments.
This part of the modelling corresponds to the co-operation model in KADS, specifying the interactions required between system and user. Here, KA methods of the sort discussed in section Knowledge acquisition for requirements analysis are necessary for finding out what particular kinds of editor (as end-user) need in the way of advice. This may need a teacher rather than a naive user, but it may need experiments.
Up till now, the requirements analysis has all been top-down. To decompose the basic recall and precision functionality requirement into useful sub-attributes, we need to take a partly bottom-up approach based on the categories that are relevant to system performance. This must be based on some prior experience of the kinds of system under consideration, and hence will be liable to improve from repeated open-ended evaluation. For instance, it is only because we know something about the operation and limitations of spelling checkers that we might have a separate sub-attribute for their coverage of multi-word elements like ad hoc.
For each task, we need to enumerate and classify the ways in which different systems fulfil them.
For instance, a problem level requirement on spelling checkers for some users might be that customisations should be readily sharable between end-users. A reportable attribute called something like `sharability of user dictionaries' might have a nominal value for each of the ways that existing or envisagable systems satisfy the requirement, including values corresponding to various kinds of failure to satisfy the requirement at a given level.
For instance, in the incorporation of the results of checking back into the text, there are a number of options, including automatic incorporation, user incorporation, version control, access.... The idea of a spelling error can be split up according to things like the kind of text function it occurs in, such as closed class vocabulary or productive vocabulary (names falling somewhere in between) since different methods of error detection work for each. Specialist vocabulary -- how likely is it that this will make a difference. Development of these categories can be done to some extent a priori, on the basis of models of the structure of text and so on, and to some extent must be done empirically. (Cf TEMAA)
The problem of knowing what to consider testing as a factor can be alleviated by two means: the most fundamental (which should never be ignored) is the open-ended evaluation by people sensitive to the domain requirements and high level quality considerations, but the second, more related to what we can systematise in a framework, is related to the reuse and adaptation of factors that have proved useful for similar evaluations before. That is the issue considered next.