The paper describes research designed to improve automatic pre‐coordinate term indexing by applying powerful general‐purpose language analysis techniques to identify term sources…
Abstract
The paper describes research designed to improve automatic pre‐coordinate term indexing by applying powerful general‐purpose language analysis techniques to identify term sources in requests, and to generate variant expressions of the concepts involved for document text searching.
BRIAN VICKERY and ALINA VICKERY
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely…
Abstract
There is a huge amount of information and data stored in publicly available online databases that consist of large text files accessed by Boolean search techniques. It is widely held that less use is made of these databases than could or should be the case, and that one reason for this is that potential users find it difficult to identify which databases to search, to use the various command languages of the hosts and to construct the Boolean search statements required. This reasoning has stimulated a considerable amount of exploration and development work on the construction of search interfaces, to aid the inexperienced user to gain effective access to these databases. The aim of our paper is to review aspects of the design of such interfaces: to indicate the requirements that must be met if maximum aid is to be offered to the inexperienced searcher; to spell out the knowledge that must be incorporated in an interface if such aid is to be given; to describe some of the solutions that have been implemented in experimental and operational interfaces; and to discuss some of the problems encountered. The paper closes with an extensive bibliography of references relevant to online search aids, going well beyond the items explicitly mentioned in the text. An index to software appears after the bibliography at the end of the paper.
The current method of information retrieval used for bibliographic and full‐text databases of journal articles assumes a semantic representation together with stemming, Boolean…
Abstract
The current method of information retrieval used for bibliographic and full‐text databases of journal articles assumes a semantic representation together with stemming, Boolean connectives and so on. This requires searchers to have a well‐defined idea of what it is that they are searching for. This is unhelpful for many categories of searcher, in particular expert browsers and non‐expert searchers. An alternative method is developed in this paper, based on the idea that articles advance an argument and that this argumentation can be represented in a manner which enables flexible and robust searching to be carried out.
This paper examines the relevance of the expert systems approach to information retrieval. First, the purpose and nature of expert systems are outlined, and it is argued that only…
Abstract
This paper examines the relevance of the expert systems approach to information retrieval. First, the purpose and nature of expert systems are outlined, and it is argued that only in domains with relatively limited and clear‐cut rule‐sets will such systems be viable in the near future. The expert systems approach is then related to the provision of interfaces for OPACs and IR systems. Intermediary systems, designed to clarify through dialogue the terms of a client's information need, are described, and the prospects for such systems are discussed. It is argued that relatively straightforward ride‐sets should suffice, and that useful systems may therefore be available fairly soon. Since they could facilitate access to general as well as to specialised information, the potential demand for such systems would seem to be enormous.
Stephen J. Wade and Peter Willett
INSTRUCT is a multi‐user, text retrieval system which was developed as an interactive teaching package for demonstrating modern information retrieval techniques, these including…
Abstract
INSTRUCT is a multi‐user, text retrieval system which was developed as an interactive teaching package for demonstrating modern information retrieval techniques, these including natural language query processing, best match searching and automatic relevance feedback based on probabilistic term weighting. INSTRUCT has recently been extended and now additionally has facilities for query expansion using both relevance and term co‐occurrence data, for cluster‐based searching and for two browsing search strategies. These retrieval mechanisms are used to search a file of 26,280 titles and abstracts from the Library and Information Science Abstracts database; both menu‐based and command‐based searching are allowed.
Carmen Galvez, Félix de Moya‐Anegón and Víctor H. Solana
To propose a categorization of the different conflation procedures at the two basic approaches, non‐linguistic and linguistic techniques, and to justify the application of…
Abstract
Purpose
To propose a categorization of the different conflation procedures at the two basic approaches, non‐linguistic and linguistic techniques, and to justify the application of normalization methods within the framework of linguistic techniques.
Design/methodology/approach
Presents a range of term conflation methods, that can be used in information retrieval. The uniterm and multiterm variants can be considered equivalent units for the purposes of automatic indexing. Stemming algorithms, segmentation rules, association measures and clustering techniques are well evaluated non‐linguistic methods, and experiments with these techniques show a wide variety of results. Alternatively, the lemmatisation and the use of syntactic pattern‐matching, through equivalence relations represented in finite‐state transducers (FST), are emerging methods for the recognition and standardization of terms.
Findings
The survey attempts to point out the positive and negative effects of the linguistic approach and its potential as a term conflation method.
Originality/value
Outlines the importance of FSTs for the normalization of term variants.
Details
Keywords
ROY RADA, HAFEDH MILI, GARY LETOURNEAU and DOUG JOHNSTON
An indexing language is made more accessible to searchers and indexers by the presence of entry terms or near‐synonyms. This paper first presents an evaluation of existing entry…
Abstract
An indexing language is made more accessible to searchers and indexers by the presence of entry terms or near‐synonyms. This paper first presents an evaluation of existing entry terms and then presents and tests a strategy for creating entry terms. The key tools in the evaluation of the entry terms are documents already indexed into the Medical Subject Headings (MeSH) and an automatic indexer. If the automatic indexer can better map the title to the index terms with the use of entry terms than without entry terms, then the entry terms have helped. Sensitive assessment of the automatic indexer requires the introduction of measures of conceptual closeness between the computer and human output. With the tools described in this paper, one can systematically demonstrate that certain entry terms have ambiguous meanings. In the selection of new entry terms another controlled vocabulary or thesaurus, called the Systematized Nomenclature of Medicine (SNOMED), was consulted. An algorithm for mapping terms from SNOMED to MeSH was implemented and evaluated with the automatic indexer. The new SNOMED‐based entry terms did not help indexing but did show how new concepts might be identified which would constitute meaningful amendments to MeSH. Finally, an improved algorithm for combining two thesauri was applied to the Computing Reviews Classification Structure (CRCS) and MeSH. CRCS plus MeSH supported better indexing than did MeSH alone.
Carmen Galvez and Félix de Moya‐Anegón
To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).
Abstract
Purpose
To evaluate the accuracy of conflation methods based on finite‐state transducers (FSTs).
Design/methodology/approach
Incorrectly lemmatized and stemmed forms may lead to the retrieval of inappropriate documents. Experimental studies to date have focused on retrieval performance, but very few on conflation performance. The process of normalization we used involved a linguistic toolbox that allowed us to construct, through graphic interfaces, electronic dictionaries represented internally by FSTs. The lexical resources developed were applied to a Spanish test corpus for merging term variants in canonical lemmatized forms. Conflation performance was evaluated in terms of an adaptation of recall and precision measures, based on accuracy and coverage, not actual retrieval. The results were compared with those obtained using a Spanish version of the Porter algorithm.
Findings
The conclusion is that the main strength of lemmatization is its accuracy, whereas its main limitation is the underanalysis of variant forms.
Originality/value
The report outlines the potential of transducers in their application to normalization processes.
Details
Keywords
Sameera Mohamed Al Zaidi, Shilpa Iyanna, Fauzia Jabeen and Khalid Mehmood
This paper aims to investigate the impact of situational factors and internal psychological states on employees’ decisions to perform voluntary pro-environmental behavior. This…
Abstract
Purpose
This paper aims to investigate the impact of situational factors and internal psychological states on employees’ decisions to perform voluntary pro-environmental behavior. This study used a model combining the theory of planned behavior, norm activation model and comprehensive action determination model. This stud also explored the moderating role of habit (HAB) on the relationship between intention and actual voluntary pro-environmental behavior.
Design/methodology/approach
Data were collected through three waves of time-lagged survey questionnaires from 519 employees of public organizations in Abu Dhabi, United Arab Emirates.
Findings
Employees’ perceptions of corporate social responsibility (CSR) had a significant impact on intention to perform voluntary pro-environmental behavior, as did all other variables except perceived behavioral control. HABs related to pro-environmental behavior enhanced the relationship between intention and actual behavior.
Practical implications
The main factors influencing employees’ voluntary pro-environmental behavioral intentions were perceived CSR, personal moral norms, organizational citizenship behaviors toward the environment and attitude. Public organization planners, managers and practitioners can use these findings to improve their organization’s environmental performance, leveraging nonmandated actions.
Social implications
Employees can achieve a better work–life balance in organizations with flexible CSR policies and which sponsor social activities to improve public well-being and individuals’ life quality. Positive sense-making of corporate social activity helps employees develop social interactions with stakeholders, increasing their involvement in society and decreasing work stress.
Originality/value
This study sheds light on the factors influencing employees’ voluntary pro-environmental behavior. To the best of the authors’ knowledge, this is the first study of its kind to combine these three models to explain the variables affecting intent to perform voluntary pro-environmental behavior in the workplace.