The choice of instruments depends on various factors such as time and money constraints, testing experience, testing environment, etc. Also, one needs to take into account what testing instruments work well in combination. Individual testing instruments used in isolation can, by their nature, only provide limited data. However, by combining two or more testing instruments, one may allow the data to be further interpreted and thus, in some sense, enriched. For example, using a logging program will provide one kind of data, videoing the user will provide another. Interpreting one set of data thus obtained in the light of the other may well contribute information which could not be garnered from either set, interpreted independently.
There are basically two types of instrument, those used for data collection in the testing phase and those used for data reporting in the reporting phase.
In the testing phase, there are two major kinds of testing instruments, those that ask for a manual collection of data and those that perform automatic data collection. Among the most prominent manual testing instruments are questionnaires, checklists, interviews, observations and think-aloud protocols. Most automatic test instruments are developed for specific types of application, e.g. to perform benchmark tests for translation memory programs and the like. There are many different names that denote the two major types of tool used for scenario testing:
Data obtained from the recording of non-verbalised operations, i.e. all keystrokes and mouse activities, including incorrect inputs, provide useful information on attributes related to the usability and functionality of the software.
Successful testing and evaluation is not only a matter of the choice of metrics and the optimal combination between test type and instrument for the particular test environment, but also includes a detailed, correct and adequate reporting of the test results. Among those reporting instruments that are of relevance for user-oriented testing are
Evaluation descriptions cover all factors that influence the overall evaluation procedure, including all details that are necessary to judge the performance of the test and to verify the interpretation of results. There are four major factors that determine the type of evaluation and the corresponding testing exercise, listed here in descending order of importance:
Test problem reports provide developers with the detailed description of problems that occur during testing. They are very important instruments that aim at the improvement of software under development. Thus, they are mostly relevant for diagnostic and progress evaluation rather than for those adequacy evaluation environments in which no feedback between evaluator and developer is planned or possible.
Result reports cover all details on the metrics applied and observations made during the different testing exercises and are normally provided as an appendix to the overall test documentation. They allow interested parties to look up the detailed results of the tests.
Assessment reports are top-level reporting instruments that are based on the detailed data which are provided as appendices. The testing of a software system leads to a great number of individual results that are documented mainly at the metric level. In order to gain a picture of the whole software performance, it is necessary to proceed from the specific result at the metric level, through reporting on the sub-characteristics, to a general statement at the top level of quality reporting, i.e. evaluating the system's performance in terms of its eventual functionality, reliability, usability, efficiency, maintainability and portability.