Search results

Article

Publication date: 13 March 2020

Predicting corporate credit rating based on qualitative information of MD&A transformed using document vectorization techniques

Jinwook Choi, Yongmoo Suh and Namchul Jung

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative…

HTML

PDF (442 KB)

Downloads

817

Abstract

Purpose

The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative information represented by published reports or management interview has been known as an important source in addition to quantitative information represented by financial values in assigning corporate credit rating in practice. Nevertheless, prior studies have room for further research in that they rarely employed qualitative information in developing prediction model of corporate credit rating.

Design/methodology/approach

This study adopted three document vectorization methods, Bag-Of-Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the experiments, we used the corpus of Management’s Discussion and Analysis (MD&A) section in 10-K financial reports as well as financial variables and corporate credit rating data.

Findings

Experimental results from a series of multi-class classification experiments show the predictive models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark models trained only by traditional financial variables.

Originality/value

This study proposed a new approach for corporate credit rating prediction by using qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also, this research adopted and compared three textual vectorization methods in the domain of corporate credit rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.

Details

Data Technologies and Applications, vol. 54 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 14 May 2020

Identifying financial statement fraud with decision rules obtained from Modified Random Forest

Byungdae An and Yongmoo Suh

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…

HTML

PDF (751 KB)

Downloads

1003

Abstract

Purpose

Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.

Design/methodology/approach

Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.

Findings

Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.

Originality/value

This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.

Details

Data Technologies and Applications, vol. 54 no. 2

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 14 May 2018

Recommending valuable ideas in an open innovation community: A text mining approach to information overload problem

Hanjun Lee, Keunho Choi, Donghee Yoo, Yongmoo Suh, Soowon Lee and Guijia He

Open innovation communities are a growing trend across diverse industries because they provide opportunities of collaborating with customers and exploiting their knowledge…

HTML

PDF (465 KB)

Downloads

1621

Abstract

Purpose

Open innovation communities are a growing trend across diverse industries because they provide opportunities of collaborating with customers and exploiting their knowledge effectively. Although open innovation communities can be strategic assets that can help firms innovate, firms nonetheless face the challenge of information overload incurred due to the characteristic of the community. The purpose of this paper is to mitigate the problem of information overload in an open innovation environment.

Design/methodology/approach

This study chose MyStarbucksIdea.com (MSI) as a target open innovation community in which customers share their ideas. The authors analyzed a large data set collected from MSI utilizing text mining techniques including TF-IDF and sentiment analysis, while considering both term and non-term features of the data set. Those features were used to develop classification models to calculate the adoption probability of each idea.

Findings

The results showed that term and non-term features play important roles in predicting the adoptability of ideas and the best classification accuracy was achieved by the hybrid classification models. In most cases, the precisions of classification models decreased as the number of recommendations increased, while the models’ recalls and F1s increased.

Originality/value

This research dealt with the problem of information overload in an open innovation context. A large amount of customer opinions from an innovation community were examined and a recommendation system to mitigate the problem was proposed. Using the proposed system, the firm can get recommendations for ideas that could be valuable for its business innovation in the idea generation phase, thereby resolving the information overload and enhancing the effectiveness of open innovation.

Details

Industrial Management & Data Systems, vol. 118 no. 4

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 11 April 2016

Who creates value in a user innovation community? A case study of MyStarbucksIdea.com

Hanjun Lee and Yongmoo Suh

Successful open innovation requires that many ideas be posted by a number of users and that the posted ideas be evaluated to find ideas of high quality. As such, successful open…

HTML

PDF (432 KB)

Downloads

1502

Abstract

Purpose

Successful open innovation requires that many ideas be posted by a number of users and that the posted ideas be evaluated to find ideas of high quality. As such, successful open innovation community would have inherently information overload problem. The purpose of this paper is to mitigate the information problem by identifying potential idea launchers, so that they can pay attention to their ideas.

Design/methodology/approach

This research chose MyStarbucksIdea.com as a target innovation community where users freely share their ideas and comments. We extracted basic features from idea, comment and user information and added further features obtained from sentiment analysis on ideas and comments. Those features are used to develop classification models to identify potential idea launchers, using data mining techniques such as artificial neural network, decision tree and Bayesian network.

Findings

The results show that the number of ideas posted and the number of comments posted are the most significant among the features. And most of comment-related sentiment features found to be meaningful, while most of idea-related sentiment features are not in the prediction of idea launchers. In addition, this study show classification rules for the identification of potential idea launchers.

Originality/value

This study dealt with information overload problem in an open innovation context. A large volume of textual customer contents from an innovation community were examined and classification models to mitigate the problem were proposed using sentiment analysis and data mining techniques. Experimental results show that the proposed classification models can help the firm identify potential idea launchers for its efficient business innovation.

Details

Online Information Review, vol. 40 no. 2

Type: Research Article

DOI:

ISSN: 1468-4527

Predicting corporate credit rating based on qualitative information of MD&A transformed using document vectorization techniques

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Identifying financial statement fraud with decision rules obtained from Modified Random Forest

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Recommending valuable ideas in an open innovation community: A text mining approach to information overload problem

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Who creates value in a user innovation community? A case study of MyStarbucksIdea.com

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

Predicting corporate credit rating based on qualitative information of MD&A transformed using document vectorization techniques

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Identifying financial statement fraud with decision rules obtained from Modified Random Forest

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Recommending valuable ideas in an open innovation community: A text mining approach to information overload problem

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Who creates value in a user innovation community? A case study of MyStarbucksIdea.com

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions