TIM/ISSCO - ETI - Université de Genève - 40 Bvd du
Pont-d'Arve - CH-1205 Genève - Suisse
DIM - University Hospital of Geneva, 24 Micheli-du-Crest, 1211
Geneva - Suisse
published in:
JADT2000, 5èmes Journées internationales d'Analyse statistique des Données Textuelles,
Lausanne, pp. 35-42.
Pierrette Bouillon , Robert Baud
, Gilbert
Robert
, Patrick Ruch
Lexical ambiguity is a fundamental problem in Information Retrieval (IR), especially in the medical domain. Many systems use a subset of the words contained in the document to represent the content, but they are faced with the problem of ambiguity. In this paper, we propose a method for disambiguation based on existing medical terminological resources on the one hand, and statistical tools for linguistic annotation on the other, in order to develop more satisfactory indexing techniques for patient reports. The main hypothesises guiding this method are that: (i) Syntax can help to distinguate meanings of words that are polyfunctional. (ii) Syntactic analysis can be done by a probabilistic tagger (HMM, Hidden Markov Model) and, more daringly, (iii) remaining semantic ambiguity can also be solved (mutatis mutandis) by an HMM tagger.