Retrieval of bibliographic records using Apache Lucene
Abstract
Purpose
The aim of the research is modeling and implementing a software component for the retrieval of bibliographic records using the Apache Lucene retrieval engine.
Design/methodology/approach
Object‐oriented methodology is used for modeling and implementation of the bibliographic record retrieval engine. Modeling is carried out in the CASE tool that supports the unified modeling language (UML 2.0), while the implementation is using the Java programming language and open source components.
Findings
The result is a software component for the retrieval of bibliographic records that are independent of the bibliographic format used in cataloging. It features great flexibility in terms of configuring search types without the need to change the software implementation.
Research limitations/implications
One of the constraints of this system relates to the problem of searching linking entry fields. UNIMARC format defines fields used to link the item being cataloged to another bibliographic item, so those fields may contain other fields, which can be termed secondary fields. In this proposed solution, secondary fields are treated as all other fields and there is no information whether the search term belongs to the secondary or a regular field.
Practical implications
The proposed solution is integrated into library information system BISIS, version 4. This version of the BISIS system is in use at university, public and special libraries. By introducing this version, system performance as well as flexibility of the indexing process are improved and at the same time librarians are able to perform sophisticated and effective retrieval of bibliographic records.
Originality/value
The contribution of this work is in the design of a customizable record retrieval component. It is configured by means of an XML document for specifying mapping rules between subfields of the bibliographic record format and search types. By using XML it is possible to add new mapping rules without additional programming. In addition, great attention has been paid to the indexing of subfields that contain punctuation marks having special semantic meanings for librarians and the transliteration between Cyrillic and Latin scripts. Also, originality of this work lies in using the Apache Lucene search engine, which facilitates building highly flexible and efficient retrieval systems.
Keywords
Citation
Milosavljević, B., Boberić, D. and Surla, D. (2010), "Retrieval of bibliographic records using Apache Lucene", The Electronic Library, Vol. 28 No. 4, pp. 525-539. https://doi.org/10.1108/02640471011065355
Publisher
:Emerald Group Publishing Limited
Copyright © 2010, Emerald Group Publishing Limited