A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of a semantic disjunctive normal form. Concepts of semantic…
Abstract
A new method of document retrieval is presented on the basis of fundamental fuzzy set theory operations and the notion of a semantic disjunctive normal form. Concepts of semantic normal forms are defined, i.e. the semantic disjunctive normal form and the semantic conjunctive normal form, and their elementary properties, are presented. The syntax and the semantics of the proposed document retrieval language are given and an algorithm for allocating documents to particular queries is described. The document retrieval strategy based on the concept of a semantic disjunctive normal form is exemplified. A basic advantage of the use of the fuzzy set theory for the document retrieval system description is that it takes, in a simple way, into consideration the differentiation of descriptor importance, document search patterns and the differentiation of formal relevance grades of individual documents to a given query. In an information system the documents of the highest grades of formal relevance to a given query are retrieved by means of the application of simple operations of the fuzzy set theory.
A new and promising approach to document clustering consists of utilizing previously formed clusters of queries to cluster documents. To employ this approach in practice a…
Abstract
A new and promising approach to document clustering consists of utilizing previously formed clusters of queries to cluster documents. To employ this approach in practice a similarity measure for queries must be available. This requirement does not cause any problem in the case of information retrieval systems in which both the search request formulations and document representations are sets of weighted or unweighted index terms. However, in most operational retrieval systems search request formulations are Boolean combinations of index terms. Research into similarity measures for search request formulations of this type has already been undertaken by the author and reported elsewhere. The present paper provides further results of investigations in this area. The novelty of the approach discussed is the incorporation within the methodology described earlier of a weighting mechanism to indicate the relative importance of particular attributes of a given Boolean search request formulation. A modification suggested is based on the standard probabilistic approach to information retrieval.
A need for developing an information retrieval technique maintaining the appeal of Boolean retrieval schemes and in addition providing the advantages of a ranked search output has…
Abstract
A need for developing an information retrieval technique maintaining the appeal of Boolean retrieval schemes and in addition providing the advantages of a ranked search output has been pointed out in the literature for many years. However, a previous attempt to incorporate into the Boolean retrieval schemes a weighting mechanism to produce ranked lists of documents has not been fully successful. Specifically, further research has demonstrated that the theory behind the previous approach is characterized by disturbing ambiguities and inconsistencies, with equivalent Boolean search request formulations yielding different rankings of documents retrieved. As a result of this more recent research an alternative approach has been outlined. However, a closer analysis of this second approach reveals that it is also not free from some intrinsic weaknesses. The present paper provides the results of this new analysis and suggests a more rigorous methodology.
BRIAN VICKERY and ALINA VICKERY
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…
Abstract
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.
The development of a given discipline in science and technology often depends on the availability of theories capable of describing the processes which control the field and of…
Abstract
The development of a given discipline in science and technology often depends on the availability of theories capable of describing the processes which control the field and of modelling the interactions between these processes. The absence of an accepted theory of information retrieval has been blamed for the relative disorder and the lack of technical advances in the area. The main mathematical approaches to information retrieval are examined in this study, including both algebraic and probabilistic models, and the difficulties which impede the formalization of information retrieval processes are described. A number of developments are covered where new theoretical understandings have directly led to the improvement of retrieval techniques and operations.
This paper discusses a knowledge based information retrieval model with hierarchical thesaurus. The model computes the conceptual distance between a query and an object and both…
Abstract
This paper discusses a knowledge based information retrieval model with hierarchical thesaurus. The model computes the conceptual distance between a query and an object and both are indexed with weighted terms from a hierarchical thesaurus. The hierarchical thesaurus is represented by a hierarchical‐concept graph (HCG) in which nodes represent concepts and directed edges represent generalisation relationships. Rada et al. have developed a similar model. However, their model considered only a binary indexing scheme and revealed some counter‐intuitive results. Our proposed model extends theirs by allowing the index term and the edge of the HCG to be weighted. A new concept mapping method is devised to overcome Rada's counter‐intuitive results. In addition, a scheme for allowing Boolean operators in user queries is provided with a formula for computing conceptual distance from negated index terms. Experimental results have shown that our model simulates human performance more closely than Rada's model.
For reasons of technical convenience, current retrieval algorithms based on probabilistic reasoning are derived from models that assume patrons evaluate documents using a two…
Abstract
For reasons of technical convenience, current retrieval algorithms based on probabilistic reasoning are derived from models that assume patrons evaluate documents using a two value relevance scale. This paper extends the theory by describing a model which includes a more general relevance scale. This model permits a re‐examination of the earlier theory as a special case of that developed here and leads to a more satisfying interpretation of the ranking principle of the earlier models.
Ian G Hendry, Peter Willett and Frances E. Wood
This paper describes INSTRUCT, an interactive computer program which has been developed as a teaching aid for use within schools of librarianship and information science. The…
Abstract
This paper describes INSTRUCT, an interactive computer program which has been developed as a teaching aid for use within schools of librarianship and information science. The program demonstrates some of the techniques that have been suggested for implementing document retrieval systems in the future, and currently runs on a search file that comprises 6,004 documents from the Library and Information Science Abstracts database. INSTRUCT has facilities for natural language query processing, including the use of a stop‐word list, a stemming algorithm and a fuzzy‐matching routine that allows the automatic identification of a range of word variants; the provision of ranked output using automatic term weighting and a nearest‐neighbour searching procedure; and automatic relevance feedback using probabilistic relevance weights. The program is menu‐driven and can be used by searchers with little or no user training.
S.E. ROBERTSON and N.J. BELKIN
It is often suggested that information retrieval systems should rank documents rather than simply retrieving a set. Two separate reasons are adduced for this: that relevance…
Abstract
It is often suggested that information retrieval systems should rank documents rather than simply retrieving a set. Two separate reasons are adduced for this: that relevance itself is a multi‐valued or continuous variable; and that retrieval is an essentially approximate process. These two reasons lead to different ranking principles, one according to degree of relevance, the other according to probability of relevance. This paper explores the possibility of combining the two principles, but concludes that while neither is adequate alone, nor can any single all‐embracing ranking principle be constructed to replace the two. The only general solution to the problem would be to find an optimal ranking by exploring the effect on the user of every possible ranking. However, some more practical approximate solutions appear possible.
The megadimensional nature of the complex social systems of the twentieth century, and the increasing levels of interrelatedness, present the individual with a bewildering array…
Abstract
The megadimensional nature of the complex social systems of the twentieth century, and the increasing levels of interrelatedness, present the individual with a bewildering array of information sources and services.