Heng-yang Lu, Jun Yang, Wei Fang, Xiaoning Song and Chongjun Wang
The COVID-19 has become a global pandemic, which has caused large number of deaths and huge economic losses. These losses are not only caused by the virus but also by the related…
Abstract
Purpose
The COVID-19 has become a global pandemic, which has caused large number of deaths and huge economic losses. These losses are not only caused by the virus but also by the related rumors. Nowadays, online social media are quite popular, where billions of people express their opinions and propagate information. Rumors about COVID-19 posted on online social media usually spread rapidly; it is hard to analyze and detect rumors only by artificial processing. The purpose of this paper is to propose a novel model called the Topic-Comment-based Rumor Detection model (TopCom) to detect rumors as soon as possible.
Design/methodology/approach
The authors conducted COVID-19 rumor detection from Sina Weibo, one of the most widely used Chinese online social media. The authors constructed a dataset about COVID-19 from January 1 to June 30, 2020 with a web crawler, including both rumor and non-rumors. The rumor detection task is regarded as a binary classification problem. The proposed TopCom model exploits the topical memory networks to fuse latent topic information with original microblogs, which solves the sparsity problems brought by short-text microblogs. In addition, TopCom fuses comments with corresponding microblogs to further improve the performance.
Findings
Experimental results on a publicly available dataset and the proposed COVID dataset have shown superiority and efficiency compared with baselines. The authors further randomly selected microblogs posted from July 1–31, 2020 for the case study, which also shows the effectiveness and application prospects for detecting rumors about COVID-19 automatically.
Originality/value
The originality of TopCom lies in the fusion of latent topic information of original microblogs and corresponding comments with DNNs-based models for the COVID-19 rumor detection task, whose value is to help detect rumors automatically in a short time.
Details
Keywords
Heng-Yang Lu, Yi Zhang and Yuntao Du
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet…
Abstract
Purpose
Topic model has been widely applied to discover important information from a vast amount of unstructured data. Traditional long-text topic models such as Latent Dirichlet Allocation may suffer from the sparsity problem when dealing with short texts, which mostly come from the Web. These models also exist the readability problem when displaying the discovered topics. The purpose of this paper is to propose a novel model called the Sense Unit based Phrase Topic Model (SenU-PTM) for both the sparsity and readability problems.
Design/methodology/approach
SenU-PTM is a novel phrase-based short-text topic model under a two-phase framework. The first phase introduces a phrase-generation algorithm by exploiting word embeddings, which aims to generate phrases with the original corpus. The second phase introduces a new concept of sense unit, which consists of a set of semantically similar tokens for modeling topics with token vectors generated in the first phase. Finally, SenU-PTM infers topics based on the above two phases.
Findings
Experimental results on two real-world and publicly available datasets show the effectiveness of SenU-PTM from the perspectives of topical quality and document characterization. It reveals that modeling topics on sense units can solve the sparsity of short texts and improve the readability of topics at the same time.
Originality/value
The originality of SenU-PTM lies in the new procedure of modeling topics on the proposed sense units with word embeddings for short-text topic discovery.