As can be seen from the above, measures also differ in the way that values for the attributes are obtained. In some cases, the value is a fact, which can be discovered by reading the technical literature accompanying the product, by looking at the screen or by carrying out some other mean of inspection. In other cases, the value depends on a human judgement. An example here is the guessability attribute mentioned above. In yet others, some sort of test is carried out in order to determine the value. For example, the percentage of cases in which the correct suggestion is the first offered will be determined by carrying out a test. In the interests of reliability, it is advisable to limit human intervention as much as possible: however, in the current state of the art, it is inevitable that some measures will rely on human involvement in determining the values to be assigned.
This brings us back to the general issue of reliability. In general, a measure is reliable if the same answer is obtained each time the same measure is applied to the same object of evaluation. This is very difficult to check when human intervention is involved. It is very important therefore that an appropriate method be chosen for carrying out the measurement. The next section offers some general considerations regarding methods, based on previous experience in software evaluation in general and on experience with the evaluation of language engineering systems in particular.