Search results

1 – 1 of 1

Per page

10 20 50

(0)

Citations:

View access options

Article

Publication date: 26 February 2025

Visualization algorithm based on FDR control testing for dimension reduction of textual data

Sung-Inn Pyo, Soohyun Ahn and Soon-Sun Kwon

Visualizing relations of textual data requires dimension reduction to increase the interpretability of output. However, traditional dimension reduction methods have some…

HTML

PDF (12.9 MB)

Downloads

Abstract

Purpose

Visualizing relations of textual data requires dimension reduction to increase the interpretability of output. However, traditional dimension reduction methods have some limitations, such as the loss of feature information during extraction or projection in dimension reduction and uncertain results due to the mixture of word labels. In this study, we develop the textual data visualization algorithm using statistical methods to present statistical inferences on the data. We also construct the algorithm in a way that the user can analyze textual data easily.

Design/methodology/approach

Unstructured data, such as textual data, is sensitive to choosing analysis methods. In addition, textual data is generally large-sized and sparse. Considering such characteristics, we applied latent Dirichlet allocation to separate data to minimize the loss of information, and false discover rate (FDR) control to reduce dimension in a statistical way.

Findings

The relation of textual data can be derived in a one-click way, and the output can be interpreted without background information, with separated topics.

Originality/value

The algorithm is constructed based on the Korean language. However, any language can be used without linguistic information. This study can be an example of usage and flow, which using not well-known dimension reduction methods can replace traditional methods.

Details

Data Technologies and Applications, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

Access

Year

Content type

Earlycite article (1)