DOCUMENT RANKING USING AN ENRICHED THESAURUS
Abstract
A thesaurus may be viewed as a graph, and document retrieval algorithms can exploit this graph when both the documents and the query are represented by thesaurus terms. These retrieval algorithms measure the distance between the query and documents by using the path lengths in the graph. Previous work with such strategies has shown that the hierarchical relations in the thesaurus are useful but the non‐hierarchical relations are not. This paper shows that when the query explicitly mentions a particular non‐hierarchical relation, the retrieval algorithm benefits from the presence of such relations in the thesaurus. Our algorithms were applied to the Excerpta Medica bibliographic citation database whose citations are indexed with terms from the EMTREE thesaurus. We also created an enriched EMTREE by systematically adding non‐hierarchical relations from a medical knowledge base. Our algorithms used at one time EMTREE and, at another time, the enriched EMTREE in the course of ranking documents from Excerpta Medica against queries. When, and only when, the query specifically mentioned a particular non‐hierarchical relation type, did EMTREE enriched with that relation type lead to a ranking that better corresponded to an expert's ranking.
Citation
RADA, R., BARLOW, J., POTHARST, J., ZANSTRA, P. and BIJSTRA, D. (1991), "DOCUMENT RANKING USING AN ENRICHED THESAURUS", Journal of Documentation, Vol. 47 No. 3, pp. 240-253. https://doi.org/10.1108/eb026879
Publisher
:MCB UP Ltd
Copyright © 1991, MCB UP Limited