AUTOMATIC THESAURUS CONSTRUCTION AND The RELATION OF A THESAURUS TO INDEXING TERMS
Abstract
My research over the last few years has been concerned with the use of automatically‐obtained keyword classifications for information retrieval. Such a classification can be described as a thesaurus, but those classifications which have been most successful in my experiments do not resemble the normal kind of manually‐constructed thesaurus, and the bases on which automatic and manual thesauri are constructed are quite different. Human beings explicitly consider the meanings of words in grouping them, but word meanings are not accessible to computers. Automatic word classification is therefore based on information about the distributional behaviour of words in documents, on the assumption that words which behave in similar ways in terms of document occurrences are semantically related. That is to say, groups of words which are based on the statistical associations of their members in documents should reflect their meaning relations, at least sufficiently for the purposes of retrieval.
Citation
SPARCK JONES, K. (1970), "AUTOMATIC THESAURUS CONSTRUCTION AND The RELATION OF A THESAURUS TO INDEXING TERMS", Aslib Proceedings, Vol. 22 No. 5, pp. 226-233. https://doi.org/10.1108/eb050241
Publisher
:MCB UP Ltd
Copyright © 1970, MCB UP Limited