Considering both methodological attempts to define software evaluation and practical test reports in the broad software engineering area, one may roughly distinguish three principal motivations behind testing:
While both software developers and users may share the principal motivation behind testing to a certain extent, the actual testing procedures will differ largely. Despite this and despite the lack of terminological consensus among leading software engineers, in order to relate the extensive testing experience of software engineers to user-oriented evaluation, we now introduce some new user-oriented testing terminology. We shall distinguish three major test types:
Each of these will be discussed in turn below.
The term scenario entered software evaluation in the early 1990s. A scenario test is a test which aims at using a realistic user background for the evaluation of software. It is an instance of black box testing where the major objective is to assess the suitability of a software product for every-day routines. Briefly, it involves putting the system to its intended use by its envisaged type of user, performing a standardised task. Of all test types, it is the scenario test which is best suited to providing detailed empirical information on those attributes that make up the usability quality characteristic. Apart from understandability, learnability and operability, which are sub-characteristics of usability, scenario tests can also provide information on suitability, accuracy, interoperability, time behaviour, resource behaviour, changeability and adaptability.
Two different ways of performing scenario tests are reported in the software engineering literature: field tests and laboratory tests. These involve different testing environments, tasks, requirements of test systems, user participation, instruments, testing expertise and, last but not least, time and money constraints.
A field test is a type of scenario test in which the testing environment is the normal working place of the user, who is observed by one or more evaluators taking notes, recording times, etc. An obvious advantage of field tests, as compared to laboratory tests, is that the test task can include problems of data transfer between the test-system and existing systems. If the test task is to be treated as part of the daily organisational routine, the software system undergoing testing needs to be in a highly operable condition. The instruments commonly used in field tests range from the simple observation of users and noting their behaviour and interaction times on evaluation checklists, to pre- and post-testing interviews, think-aloud protocols, and, last but not least, logfile recording. The choice of instruments depends on a variety of factors such as time and money constraints, technical facilities, evaluation expertise etc.
There is an obvious connection between field testing and adequacy evaluation.
A laboratory test is an instance of a scenario test in which the testing environment includes a number of isolated users who perform a given task in a test laboratory, which offers a great variety of data collection techniques. Since there is a much greater flexibility in the definition of the test task, laboratory tests are particularly useful if the system under testing is not fully operable. The artificial environment in laboratory tests allows the usage of a great number of technical instruments. Well-equipped laboratories offer one-way mirrors, video and audio recording facilities, as well as different logging programs. Whether a laboratory test is useful in the context of adequacy evaluation depends on the extent to which the tasks performed and the metrics used are revelatory of information pertinent to the user's real life needs.
The estimated costs of laboratory tests are reported to be around four times greater than those of comparable field tests. The major factor in this calculation is the very expensive maintenance of a laboratory with its various technical devices.
Under the term systematic testing all testing activities will be subsumed that examine the behaviour of software under specific conditions with particular results expected. Whereas the objectives behind scenario testing ask for the integration of users into the testing exercise, systematic tests can be performed solely by software engineers and/or user representatives. There are three objectives that are particularly relevant for user-oriented testing, which are discussed in the following:
Task-oriented testing is performed to examine whether a piece of software actually fulfils pre-defined tasks. These tasks may either be stated in the requirements specification document, as in software development projects, or may be implied by third parties as, for instance, in consumer reports. The primary quality characteristic under investigation in task-oriented testing is functionality. The testing environment is normally the working place of the evaluator and is in principle not relevant to the interpretation of results. Task-oriented testing can be carried out during the software development process at any stage of the software life-cycle as well as with any off-the-shelf software product. The costs of task-oriented testing are comparatively small and the investment in man/hours depends on the number of tasks tested. Apart from the technical environment of the evaluator (hardware and software) no extra investment in testing equipment or instruments is normally necessary.
Menu-oriented testing is carried out to test each program feature or function in sequence. It is prominent in many glass and black box testing techniques. The software is examined in great detail; the evaluator follows every possible path of program execution, considering each individual function as it is sequentially offered in the menu bar. Thus, while in both scenario and task-oriented testing only those functions that are necessary to perform the test tasks are performed, in menu-oriented testing each function of the software is executed at least once. As with task-oriented testing, menu-oriented testing can be performed at any stage of the software life-cycle as well as with off-the-shelf products. The costs of menu-oriented testing mainly lie in the recruitment of excellent evaluation personnel, who are capable of the ad hoc generation of metrics and data.
Benchmark testing examines the performance of systems. The notion of performance can be applied either to individual functions, to system modules or to the system as a whole. In the strict technical sense, a benchmark test is a measurement of system performance which cannot be affected by variables resulting from human involvement. Typical reporting instruments applied for benchmark tests are checklists that cover the quality characteristic, the benchmark, measurement technique and results. If a benchmark involves the execution of more than one function, it is useful to construct a standard way of calling the procedures for testing purposes.
All of the types of systematic testing described here are potentially of interest in the context of adequacy evaluation. Task-oriented testing can indicate whether a product does in fact do what a user wants it to do. Menu-oriented testing can be used to ensure that little used, but perhaps nonetheless important, functionalities are not forgotten. Benchmark testing can be used to check whether a product meets a user's minimum requirements.
The aim of feature inspection is to be able to describe the technical features of a piece of software in as much detail as possible, so as to allow comparison between systems of the same type. Feature checklists are compiled in the awareness of the possibilities individual systems offer and with the aim of demonstrating the differences between similar tools. The results of feature inspection are meant to help consumers decide which of the systems on the market are most appropriate for their particular environment. Any feature checklist in the context of evaluation needs to be both standardised, in the sense that it should not be constrained by particular situational variables, and open, in the sense that it can cover different approaches to a problem without being prescriptive in nature.