Jinwook Choi, Yongmoo Suh and Namchul Jung
The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative…
Abstract
Purpose
The purpose of this study is to investigate the effectiveness of qualitative information extracted from firm’s annual report in predicting corporate credit rating. Qualitative information represented by published reports or management interview has been known as an important source in addition to quantitative information represented by financial values in assigning corporate credit rating in practice. Nevertheless, prior studies have room for further research in that they rarely employed qualitative information in developing prediction model of corporate credit rating.
Design/methodology/approach
This study adopted three document vectorization methods, Bag-Of-Words (BOW), Word to Vector (Word2Vec) and Document to Vector (Doc2Vec), to transform an unstructured textual data into a numeric vector, so that Machine Learning (ML) algorithms accept it as an input. For the experiments, we used the corpus of Management’s Discussion and Analysis (MD&A) section in 10-K financial reports as well as financial variables and corporate credit rating data.
Findings
Experimental results from a series of multi-class classification experiments show the predictive models trained by both financial variables and vectors extracted from MD&A data outperform the benchmark models trained only by traditional financial variables.
Originality/value
This study proposed a new approach for corporate credit rating prediction by using qualitative information extracted from MD&A documents as an input to ML-based prediction models. Also, this research adopted and compared three textual vectorization methods in the domain of corporate credit rating prediction and showed that BOW mostly outperformed Word2Vec and Doc2Vec.
Details
Keywords
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies…
Abstract
Purpose
Financial statement fraud (FSF) committed by companies implies the current status of the companies may not be healthy. As such, it is important to detect FSF, since such companies tend to conceal bad information, which causes a great loss to various stakeholders. Thus, the objective of the paper is to propose a novel approach to building a classification model to identify FSF, which shows high classification performance and from which human-readable rules are extracted to explain why a company is likely to commit FSF.
Design/methodology/approach
Having prepared multiple sub-datasets to cope with class imbalance problem, we build a set of decision trees for each sub-dataset; select a subset of the set as a model for the sub-dataset by removing the tree, each of whose performance is less than the average accuracy of all trees in the set; and then select one such model which shows the best accuracy among the models. We call the resulting model MRF (Modified Random Forest). Given a new instance, we extract rules from the MRF model to explain whether the company corresponding to the new instance is likely to commit FSF or not.
Findings
Experimental results show that MRF classifier outperformed the benchmark models. The results also revealed that all the variables related to profit belong to the set of the most important indicators to FSF and that two new variables related to gross profit which were unapprised in previous studies on FSF were identified.
Originality/value
This study proposed a method of building a classification model which shows the outstanding performance and provides decision rules that can be used to explain the classification results. In addition, a new way to resolve the class imbalance problem was suggested in this paper.
Details
Keywords
Hanjun Lee, Keunho Choi, Donghee Yoo, Yongmoo Suh, Soowon Lee and Guijia He
Open innovation communities are a growing trend across diverse industries because they provide opportunities of collaborating with customers and exploiting their knowledge…
Abstract
Purpose
Open innovation communities are a growing trend across diverse industries because they provide opportunities of collaborating with customers and exploiting their knowledge effectively. Although open innovation communities can be strategic assets that can help firms innovate, firms nonetheless face the challenge of information overload incurred due to the characteristic of the community. The purpose of this paper is to mitigate the problem of information overload in an open innovation environment.
Design/methodology/approach
This study chose MyStarbucksIdea.com (MSI) as a target open innovation community in which customers share their ideas. The authors analyzed a large data set collected from MSI utilizing text mining techniques including TF-IDF and sentiment analysis, while considering both term and non-term features of the data set. Those features were used to develop classification models to calculate the adoption probability of each idea.
Findings
The results showed that term and non-term features play important roles in predicting the adoptability of ideas and the best classification accuracy was achieved by the hybrid classification models. In most cases, the precisions of classification models decreased as the number of recommendations increased, while the models’ recalls and F1s increased.
Originality/value
This research dealt with the problem of information overload in an open innovation context. A large amount of customer opinions from an innovation community were examined and a recommendation system to mitigate the problem was proposed. Using the proposed system, the firm can get recommendations for ideas that could be valuable for its business innovation in the idea generation phase, thereby resolving the information overload and enhancing the effectiveness of open innovation.
Details
Keywords
Successful open innovation requires that many ideas be posted by a number of users and that the posted ideas be evaluated to find ideas of high quality. As such, successful open…
Abstract
Purpose
Successful open innovation requires that many ideas be posted by a number of users and that the posted ideas be evaluated to find ideas of high quality. As such, successful open innovation community would have inherently information overload problem. The purpose of this paper is to mitigate the information problem by identifying potential idea launchers, so that they can pay attention to their ideas.
Design/methodology/approach
This research chose MyStarbucksIdea.com as a target innovation community where users freely share their ideas and comments. We extracted basic features from idea, comment and user information and added further features obtained from sentiment analysis on ideas and comments. Those features are used to develop classification models to identify potential idea launchers, using data mining techniques such as artificial neural network, decision tree and Bayesian network.
Findings
The results show that the number of ideas posted and the number of comments posted are the most significant among the features. And most of comment-related sentiment features found to be meaningful, while most of idea-related sentiment features are not in the prediction of idea launchers. In addition, this study show classification rules for the identification of potential idea launchers.
Originality/value
This study dealt with information overload problem in an open innovation context. A large volume of textual customer contents from an innovation community were examined and classification models to mitigate the problem were proposed using sentiment analysis and data mining techniques. Experimental results show that the proposed classification models can help the firm identify potential idea launchers for its efficient business innovation.