Latent topics identification from the articles of Sri Lankan authors using LDA
Global Knowledge, Memory and Communication
ISSN: 2514-9342
Article publication date: 16 February 2023
Abstract
Purpose
The purpose of the study is to identify the latent topics from 9102 Web of Science (WoS) indexed research articles published in 2645 journals of the Sri Lankan authors from 1989 to 2021 by applying Latent Dirichlet Allocation to the abstracts. Dominant topics in the corpus of text, the posterior probability of different terms in the topics and the publication proportions of the topics were discussed in the article.
Design/methodology/approach
Abstracts and other details of the studied articles are collected from WoS database by the authors. Data preprocessing is performed before the analysis. “ldatuning” from the R package is applied after preprocessing of text for deciding subjects in light of factual elements. Twenty topics are decided to extract as latent topics through four metrics methods.
Findings
It is observed that medical science, agriculture, research and development and chemistry-related topics dominate the subject categories as a whole. “Irrigation” and “mortality and health care” have a significant growth in the publication proportion from 2019 to 2021. For the most occurring latent topics, it is seen that terms like “activity” and “acid” carry higher posterior probability.
Practical implications
Topic models permit us to rapidly and efficiently address higher perspective inquiries without human mediation and are also helpful in information retrieval and document clustering. The unique feature of this study has highlighted how the growth of the universe of knowledge for a specific country can be studied using the LDA topic model.
Originality/value
This study will create an incentive for text analysis and information retrieval areas of research. The results of this paper gave an understanding of the writing development of the Sri Lankan authors in different subject spaces and over the period. Trends and intensity of publications from the Sri Lankan authors on different latent topics help to trace the interests and mostly practiced areas in different domains.
Keywords
Acknowledgements
The authors received no financial support for the research, authorship and/or publication of this article. The article is the authors’ original work and has not received prior publication.
Statement and declaration: It is to certify that the authors have no affiliations with or involvement in any organization or entity with any financial or nonfinancial interest in the subject matter or materials discussed in this manuscript.
Citation
Ravikumar, S., Boruah, B.B. and Gayang, F.L. (2023), "Latent topics identification from the articles of Sri Lankan authors using LDA", Global Knowledge, Memory and Communication, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/GKMC-08-2022-0206
Publisher
:Emerald Publishing Limited
Copyright © 2023, Emerald Publishing Limited