HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans
ISSN: 0264-0473
Article publication date: 28 October 2020
Issue publication date: 12 December 2020
Abstract
Purpose
The purpose of this paper is to provide a methodology for automatic annotation of a multimedia collection of intangible cultural heritage mostly in the form of interviews. Assigned annotations provide a way to search the collection.
Design/methodology/approach
Annotation is based on automatic extraction of metadata and is conducted by named entity and topic extraction from textual descriptions with a rule-based approach supported by vocabulary resources, a compiled domain-specific classification scheme and domain-oriented corpus analysis.
Findings
The proposed methodology for automatic annotation of a collection of intangible cultural heritage, applied on the cultural heritage of the Balkans, has very good results according to F measure, which is 0.87 for the named entity and 0.90 for topic annotation. The overall methodology enables encapsulating domain-specific and language-specific knowledge into collections of finite state transducers and allows further improvements.
Originality/value
Although cultural heritage has a significant role in the development of identity of a group or an individual, it is one of those specific domains that have not yet been fully explored in case of many languages. A methodology is proposed that can be used for incorporating natural language processing techniques into digital libraries of cultural heritage.
Keywords
Acknowledgements
This work has been carried out within the project III47003 of the Ministry of Science and Technological Development, Serbia.
Citation
Tanasijević, I. and Pavlović-Lažetić, G. (2020), "HerCulB: content-based information extraction and retrieval for cultural heritage of the Balkans", The Electronic Library, Vol. 38 No. 5/6, pp. 905-918. https://doi.org/10.1108/EL-03-2020-0052
Publisher
:Emerald Publishing Limited
Copyright © 2020, Emerald Publishing Limited