According to ISO 2382/1 (1984) information retrieval is defined as:
...[the] actions, methods and procedures for recovering stored data to provide information on a given subject.
Under actions, the ISO document subsumes text indexing, inquiry analysis and relevance analysis; with data, it identifies text, tables, diagrams, speech, video, etc., and also hypermedia in order to distinguish between non-linear structured texts or parts of texts, and linear texts (documents); with information, it associates the relevant knowledge that is needed for supporting problem solving, knowledge acquisition, etc.; and with subject, it associates a concept, as opposed to a character string (word).
The central problem of IR is the analysis and measurement of the relevance of the stored information, i.e. the relation between requested information and retrieved information. In other words, given an information inquiry, the IR system has to check whether the stored information is relevant to the inquiry. Traditionally, this problem has been solved by organising the database that is used for the search as an inverted file of the significant character strings occurring in stored texts, i.e. the inverted file specifies where the strings occur in the texts. Normally, the strings used are natural language words excluding determiners, conjunctions, prepositions and the like (which are known as stop-words). An inquiry to an IR system is then composed of these character strings combined by Boolean operators and some specific additional features such as contextual or positional operators. Since there is no linguistic analysis of the semantics of the stored texts or of the inquiries, IR systems are mostly domain independent.
Today, there are several approaches to improving IR systems in general, particularly approaches oriented towards relevance analysis. Among these are, for example, dialogue components (menu-based or natural language based) and statistical approaches. The former are mainly used to improve the use of a given query language according to pre-defined user profiles (novice, expert, etc.), whereas the latter will improve the relevance analysis directly. There is also on-going work in the field of introducing NLP technology into IR systems (cf. for example (Salton90)) which obviously would be a further improvement of IR systems. In addition, future IR systems will have to take into account new styles of publishing and electronic communication, including capabilities investigated in the field of virtual reality (cf. (Sherman93)).
Other problematic areas in IR which are closely related to the relevance problem involve the measures of precision, recall and specificity. These may be glossed as follows (see section Precision, recall and specificity for further details):