To read this content please select one of the options below:

Unearthing historical insights: semantic organization and application of historical newspapers from a fine-grained knowledge element perspective

Shaodan Sun (School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing, China)
Jun Deng (School of Business and Management, Jilin University, Changchun, China)
Xugong Qin (School of Cyber Science and Engineering, Nanjing University of Science and Technology, Nanjing, China)

Aslib Journal of Information Management

ISSN: 2050-3806

Article publication date: 14 November 2023

185

Abstract

Purpose

This paper aims to amplify the retrieval and utilization of historical newspapers through the application of semantic organization, all from the vantage point of a fine-grained knowledge element perspective. This endeavor seeks to unlock the latent value embedded within newspaper contents while simultaneously furnishing invaluable guidance within methodological paradigms for research in the humanities domain.

Design/methodology/approach

According to the semantic organization process and knowledge element concept, this study proposes a holistic framework, including four pivotal stages: knowledge element description, extraction, association and application. Initially, a semantic description model dedicated to knowledge elements is devised. Subsequently, harnessing the advanced deep learning techniques, the study delves into the realm of entity recognition and relationship extraction. These techniques are instrumental in identifying entities within the historical newspaper contents and capturing the interdependencies that exist among them. Finally, an online platform based on Flask is developed to enable the recognition of entities and relationships within historical newspapers.

Findings

This article utilized the Shengjing Times·Changchun Compilation as the datasets for describing, extracting, associating and applying newspapers contents. Regarding knowledge element extraction, the BERT + BS consistently outperforms Bi-LSTM, CRF++ and even BERT in terms of Recall and F1 scores, making it a favorable choice for entity recognition in this context. Particularly noteworthy is the Bi-LSTM-Pro model, which stands out with the highest scores across all metrics, notably achieving an exceptional F1 score in knowledge element relationship recognition.

Originality/value

Historical newspapers transcend their status as mere artifacts, evolving into invaluable reservoirs safeguarding the societal and historical memory. Through semantic organization from a fine-grained knowledge element perspective, it can facilitate semantic retrieval, semantic association, information visualization and knowledge discovery services for historical newspapers. In practice, it can empower researchers to unearth profound insights within the historical and cultural context, broadening the landscape of digital humanities research and practical applications.

Keywords

Acknowledgements

The authors gratefully acknowledge the financial support of National Social Science Fund, China: “Research on archival data resources mining and intelligent service under the cultural digitization strategy” (23ATQ001).

Citation

Sun, S., Deng, J. and Qin, X. (2023), "Unearthing historical insights: semantic organization and application of historical newspapers from a fine-grained knowledge element perspective", Aslib Journal of Information Management, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/AJIM-05-2023-0180

Publisher

:

Emerald Publishing Limited

Copyright © 2023, Emerald Publishing Limited

Related articles