The eigenspace-based fuzzy c-means (EFCM) combines representation learning and clustering. The textual data are transformed into a lower-dimensional eigenspace using truncated singular value decomposition. Fuzzy c-means is performed on the eigenspace to identify the centroids of each cluster. The topics are provided by transforming back the centroids into the nonnegative subspace of the original space. In this paper, we extend the EFCM method for scalability by using the two approaches, i.e. single-pass and online. We call the developed topic detection methods as oEFCM and spEFCM.

Findings

Our simulation shows that both oEFCM and spEFCM methods provide faster running times than EFCM for data sets that do not fit in memory. However, there is a decrease in the average coherence score. For both data sets that fit and do not fit into memory, the oEFCM method provides a tradeoff between running time and coherence score, which is better than spEFCM.

Originality/value

This research produces a scalable topic detection method. Besides this scalability capability, the developed method also provides a faster running time for the data set that fits in memory.

Details

Data Technologies and Applications, vol. 55 no. 4

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 9 January 2019

Topic features for machine learning-based sentiment analysis in Indonesian tweets

Hendri Murfi, Furida Lusi Siagian and Yudi Satria

The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.

HTML

PDF (354 KB)

Downloads

390

Abstract

Purpose

The purpose of this paper is to analyze topics as alternative features for sentiment analysis in Indonesian tweets.

Design/methodology/approach

Given Indonesian tweets, the processes of sentiment analysis start by extracting features from the tweets. The features are words or topics. The authors use non-negative matrix factorization to extract the topics and apply a support vector machine to classify the tweets into its sentiment class.

Findings

The authors analyze the accuracy using the two-class and three-class sentiment analysis data sets. Both data sets are about sentiments of candidates for Indonesian presidential election. The experiments show that the standard word features give better accuracies than the topics features for the two-class sentiment analysis. Moreover, the topic features can slightly improve the accuracy of the standard word features. The topic features can also improve the accuracy of the standard word features for the three-class sentiment analysis.

Originality/value

The standard textual data representation for sentiment analysis using machine learning is bag of word and its extensions mainly created by natural language processing. This paper applies topics as novel features for the machine learning-based sentiment analysis in Indonesian tweets.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 1

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

Access

Year

All dates (2)
From To Go

Content type

Article (2)

1 – 2 of 2

Per page

10 20 50

A scalable eigenspace-based fuzzy c-means for topic detection

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Topic features for machine learning-based sentiment analysis in Indonesian tweets

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions