next up previous
Next: Introduction

tex2html_wrap_inline356 TIM/ISSCO - ETI - Université de Genève - 40 Bvd du Pont-d'Arve - CH-1205 Genève - Suisse
tex2html_wrap_inline358 DIM - University Hospital of Geneva, 24 Micheli-du-Crest, 1211 Geneva - Suisse

Indexing by statistical tagging

published in:
JADT2000, 5èmes Journées internationales d'Analyse statistique des Données Textuelles,
Lausanne, pp. 35-42.

Pierrette Bouillon tex2html_wrap_inline356 , Robert Baud tex2html_wrap_inline358 , Gilbert Robert tex2html_wrap_inline356 , Patrick Ruch tex2html_wrap_inline358

Abstract:

Lexical ambiguity is a fundamental problem in Information Retrieval (IR), especially in the medical domain. Many systems use a subset of the words contained in the document to represent the content, but they are faced with the problem of ambiguity. In this paper, we propose a method for disambiguation based on existing medical terminological resources on the one hand, and statistical tools for linguistic annotation on the other, in order to develop more satisfactory indexing techniques for patient reports. The main hypothesises guiding this method are that: (i) Syntax can help to distinguate meanings of words that are polyfunctional. (ii) Syntactic analysis can be done by a probabilistic tagger (HMM, Hidden Markov Model) and, more daringly, (iii) remaining semantic ambiguity can also be solved (mutatis mutandis) by an HMM tagger.

keywords13





Sabine Lehmann
jeudi, 22 juin 2000, 11:35:42 MET DST