CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method
Data Technologies and Applications
ISSN: 2514-9288
Article publication date: 29 August 2023
Issue publication date: 15 April 2024
Abstract
Purpose
A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset.
Design/methodology/approach
This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance.
Findings
This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy.
Originality/value
The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.
Keywords
Acknowledgements
Funding: The research is based on work supported by Taiwan Ministry of Science and Technology under Grant No. MOST 107-2410-H-006 040-MY3 and MOST 108-2511-H-006-009. We would like to thank partially research grant supported by “Higher Education SPROUT Project” and “Center for Innovative FinTech Business Models” of National Cheng Kung University (NCKU), sponsored by the Ministry of Education, Taiwan.
Citation
Wang, H.-C., Maslim, M. and Liu, H.-Y. (2024), "CA-CD: context-aware clickbait detection using new Chinese clickbait dataset with transfer learning method", Data Technologies and Applications, Vol. 58 No. 2, pp. 243-266. https://doi.org/10.1108/DTA-03-2023-0072
Publisher
:Emerald Publishing Limited
Copyright © 2023, Emerald Publishing Limited