Shrawan Kumar Trivedi, Shubhamoy Dey and Anil Kumar
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an…
Abstract
Purpose
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers.
Design/methodology/approach
In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time).
Findings
The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM.
Originality/value
This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.
Details
Keywords
Mohd Mustaqeem, Suhel Mustajab and Mahfooz Alam
Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have…
Abstract
Purpose
Software defect prediction (SDP) is a critical aspect of software quality assurance, aiming to identify and manage potential defects in software systems. In this paper, we have proposed a novel hybrid approach that combines Grey Wolf Optimization with Feature Selection (GWOFS) and multilayer perceptron (MLP) for SDP. The GWOFS-MLP hybrid model is designed to optimize feature selection, ultimately enhancing the accuracy and efficiency of SDP. Grey Wolf Optimization, inspired by the social hierarchy and hunting behavior of grey wolves, is employed to select a subset of relevant features from an extensive pool of potential predictors. This study investigates the key challenges that traditional SDP approaches encounter and proposes promising solutions to overcome time complexity and the curse of the dimensionality reduction problem.
Design/methodology/approach
The integration of GWOFS and MLP results in a robust hybrid model that can adapt to diverse software datasets. This feature selection process harnesses the cooperative hunting behavior of wolves, allowing for the exploration of critical feature combinations. The selected features are then fed into an MLP, a powerful artificial neural network (ANN) known for its capability to learn intricate patterns within software metrics. MLP serves as the predictive engine, utilizing the curated feature set to model and classify software defects accurately.
Findings
The performance evaluation of the GWOFS-MLP hybrid model on a real-world software defect dataset demonstrates its effectiveness. The model achieves a remarkable training accuracy of 97.69% and a testing accuracy of 97.99%. Additionally, the receiver operating characteristic area under the curve (ROC-AUC) score of 0.89 highlights the model’s ability to discriminate between defective and defect-free software components.
Originality/value
Experimental implementations using machine learning-based techniques with feature reduction are conducted to validate the proposed solutions. The goal is to enhance SDP’s accuracy, relevance and efficiency, ultimately improving software quality assurance processes. The confusion matrix further illustrates the model’s performance, with only a small number of false positives and false negatives.
Details
Keywords
Qingqing Li, Ziming Zeng, Shouqiang Sun and Tingting Li
Aspect category-based sentiment analysis (ACSA) has been widely used in consumer preference mining and marketing strategy formulation. However, existing studies ignore the…
Abstract
Purpose
Aspect category-based sentiment analysis (ACSA) has been widely used in consumer preference mining and marketing strategy formulation. However, existing studies ignore the variability in features and the intrinsic correlation among diverse aspect categories in ACSA tasks. To address these problems, this paper aims to propose a novel integrated framework.
Design/methodology/approach
The integrated framework consists of three modules: text feature extraction and fusion, adaptive feature selection and category-aware decision fusion. First, text features from global and local views are extracted and fused to comprehensively capture the potential information in the different dimensions of the review text. Then, an adaptive feature selection strategy is devised for each aspect category to determine the optimal feature set. Finally, considering the intrinsic associations between aspect categories, a category-aware decision fusion strategy is constructed to enhance the performance of ACSA tasks.
Findings
Comparative experimental results demonstrate that the integrated framework can effectively detect aspect categories and their corresponding sentiment polarities from review texts, achieving a macroaveraged F1 score (Fmacro) of 72.38% and a weighted F1 score (F1) of 79.39%, with absolute gains of 2.93% to 27.36% and 4.35% to 20.36%, respectively, compared to the baselines.
Originality/value
This framework can simultaneously detect aspect categories and corresponding sentiment polarities from review texts, thereby assisting e-commerce enterprises in gaining insights into consumer preferences, prioritizing product improvements, and adjusting marketing strategies.
Details
Keywords
Faris Elghaish, Sandra Matarneh, Essam Abdellatef, Farzad Rahimian, M. Reza Hosseini and Ahmed Farouk Kineber
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly…
Abstract
Purpose
Cracks are prevalent signs of pavement distress found on highways globally. The use of artificial intelligence (AI) and deep learning (DL) for crack detection is increasingly considered as an optimal solution. Consequently, this paper introduces a novel, fully connected, optimised convolutional neural network (CNN) model using feature selection algorithms for the purpose of detecting cracks in highway pavements.
Design/methodology/approach
To enhance the accuracy of the CNN model for crack detection, the authors employed a fully connected deep learning layers CNN model along with several optimisation techniques. Specifically, three optimisation algorithms, namely adaptive moment estimation (ADAM), stochastic gradient descent with momentum (SGDM), and RMSProp, were utilised to fine-tune the CNN model and enhance its overall performance. Subsequently, the authors implemented eight feature selection algorithms to further improve the accuracy of the optimised CNN model. These feature selection techniques were thoughtfully selected and systematically applied to identify the most relevant features contributing to crack detection in the given dataset. Finally, the authors subjected the proposed model to testing against seven pre-trained models.
Findings
The study's results show that the accuracy of the three optimisers (ADAM, SGDM, and RMSProp) with the five deep learning layers model is 97.4%, 98.2%, and 96.09%, respectively. Following this, eight feature selection algorithms were applied to the five deep learning layers to enhance accuracy, with particle swarm optimisation (PSO) achieving the highest F-score at 98.72. The model was then compared with other pre-trained models and exhibited the highest performance.
Practical implications
With an achieved precision of 98.19% and F-score of 98.72% using PSO, the developed model is highly accurate and effective in detecting and evaluating the condition of cracks in pavements. As a result, the model has the potential to significantly reduce the effort required for crack detection and evaluation.
Originality/value
The proposed method for enhancing CNN model accuracy in crack detection stands out for its unique combination of optimisation algorithms (ADAM, SGDM, and RMSProp) with systematic application of multiple feature selection techniques to identify relevant crack detection features and comparing results with existing pre-trained models.
Details
Keywords
Jonathan S. Greipel, Regina M. Frank, Meike Huber, Ansgar Steland and Robert H. Schmitt
To ensure product quality within a manufacturing process, inspection processes are indispensable. One task of inspection planning is the selection of inspection characteristics…
Abstract
Purpose
To ensure product quality within a manufacturing process, inspection processes are indispensable. One task of inspection planning is the selection of inspection characteristics. For optimization of costs and benefits, key characteristics can be defined by which the product quality can be checked with sufficient accuracy. The manual selection of key characteristics requires substantial planning effort and becomes uneconomic if many product variants prevail. This paper, therefore, aims to show a method for the efficient determination of key characteristics.
Design/methodology/approach
The authors present a novel Algorithm for the Selection of Key Characteristics (ASKC) based on an auto-encoder and a risk analysis. Given historical measurement data and tolerances, the algorithm clusters characteristics with redundant information and selects key characteristics based on a risk assessment. The authors compare ASKC with the algorithm Principal Feature Analysis (PFA) using artificial and historical measurement data.
Findings
The authors find that ASKC delivers superior results than PFA. Findings show that the algorithms enable the cost-efficient selection of key characteristics while maintaining the informative value of the inspection concerning the quality.
Originality/value
This paper fills an identified gap for simplified inspection planning with the method for the efficient selection of key features via ASKC.
Details
Keywords
Hendrik Kohrs, Benjamin Rainer Auer and Frank Schuhmacher
In short-term forecasting of day-ahead electricity prices, incorporating intraday dependencies is vital for accurate predictions. However, it quickly leads to dimensionality…
Abstract
Purpose
In short-term forecasting of day-ahead electricity prices, incorporating intraday dependencies is vital for accurate predictions. However, it quickly leads to dimensionality problems, i.e. ill-defined models with too many parameters, which require an adequate remedy. This study addresses this issue.
Design/methodology/approach
In an application for the German/Austrian market, this study derives variable importance scores from a random forest algorithm, feeds the identified variables into a support vector machine and compares the resulting forecasting technique to other approaches (such as dynamic factor models, penalized regressions or Bayesian shrinkage) that are commonly used to resolve dimensionality problems.
Findings
This study develops full importance profiles stating which hours of which past days have the highest predictive power for specific hours in the future. Using the profile information in the forecasting setup leads to very promising results compared to the alternatives. Furthermore, the importance profiles provide a possible explanation why some forecasting methods are more accurate for certain hours of the day than others. They also help to explain why simple forecast combination schemes tend to outperform the full battery of models considered in the comprehensive comparative study.
Originality/value
With the information contained in the variable importance scores and the results of the extensive model comparison, this study essentially provides guidelines for variable and model selection in future electricity market research.
Details
Keywords
Yongzheng Zhang, Evangelos Milios and Nur Zincir‐Heywood
Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel…
Abstract
Purpose
Summarization of an entire web site with diverse content may lead to a summary heavily biased towards the site's dominant topics. The purpose of this paper is to present a novel topic‐based framework to address this problem.
Design/methodology/approach
A two‐stage framework is proposed. The first stage identifies the main topics covered in a web site via clustering and the second stage summarizes each topic separately. The proposed system is evaluated by a user study and compared with the single‐topic summarization approach.
Findings
The user study demonstrates that the clustering‐summarization approach statistically significantly outperforms the plain summarization approach in the multi‐topic web site summarization task. Text‐based clustering based on selecting features with high variance over web pages is reliable; outgoing links are useful if a rich set of cross links is available.
Research limitations/implications
More sophisticated clustering methods than those used in this study are worth investigating. The proposed method should be tested on web content that is less structured than organizational web sites, for example blogs.
Practical implications
The proposed summarization framework can be applied to the effective organization of search engine results and faceted or topical browsing of large web sites.
Originality/value
Several key components are integrated for web site summarization for the first time, including feature selection and link analysis, key phrase and key sentence extraction. Insight into the contributions of links and content to topic‐based summarization was gained. A classification approach is used to minimize the number of parameters.
Details
Keywords
The purpose of this study is to demonstrate the variation between the set torque and the actual torque at which the actuator trips can be minimized using Taguchi's robust…
Abstract
Purpose
The purpose of this study is to demonstrate the variation between the set torque and the actual torque at which the actuator trips can be minimized using Taguchi's robust engineering methodology. The paper also aims to demonstrate the application of feature selection approach for the identification of insignificant effects in unreplicated fractional factorial experiments.
Design/methodology/approach
The methodology used was design of experiments with the set torque as the signal factor and the tripping torque as response variable. The compounded noise factor was identified based on the type of operations and load variation, which are not under the manufacturer's control. The effect of five control factors (with two levels each) and two interactions were studied. The experiments were designed using L8 orthogonal array.
Findings
The result showed that the factors spring height, spring thickness, star washer position and the interaction between drive shaft length and spring height play a significant role in actuator performance. The implementation of the optimum combination of factors resulted in improving the overall capability indices, Cp from 0.52 to 2.12 and Cpk from 0.4 to 1.67.
Practical implications
This study provides valuable information to actuator manufacturers on optimizing actuator performance.
Originality/value
To the best of the author's knowledge, no study has been conducted using Taguchi's robust engineering methodology to optimize actuator performance. In addition, no attempt has been made in the past to identify the insignificant factors and interactions using feature selection approach for unreplicated fractional factorial experiments.
Details
Keywords
Tian Han, Bo‐Suk Yang and Zhong‐Jun Yin
The purpose of this paper is to identify the efficiency of vibration signals for fault diagnosis system of induction motors.
Abstract
Purpose
The purpose of this paper is to identify the efficiency of vibration signals for fault diagnosis system of induction motors.
Design/methodology/approach
A fault diagnosis system for induction motors using vibration signals is designed based on pattern recognition. Genetic algorithm is used for feature reduction and neural network tuning.
Findings
The usage of genetic algorithm improves the system performance through selecting significant features and optimizing network structure. The efficiency of vibration signals is demonstrated.
Practical implications
Condition monitoring and fault diagnosis for induction motors is one of the main industry maintenance parts. Motors faults usually result in whole production line breakdown. In this paper, one fault diagnosis system is proposed for induction motors based on feature recognition through combination of feature extraction, genetic algorithm and neural network techniques. From the paper, one can learn practically the whole procedure of feature‐based fault diagnosis system and the efficiency of GA and vibration signals for motor fault diagnosis. One real test has been done to validate the system performance. The results indicate that this system is promising for the real application in industry.
Originality/value
The use of genetic algorithm for feature selection and neural network tuning; the choice of vibration analysis for fault diagnosis of induction motor.
Wenzhong Gao, Xingzong Huang, Mengya Lin, Jing Jia and Zhen Tian
The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.
Abstract
Purpose
The purpose of this paper is to target on designing a short-term load prediction framework that can accurately predict the cooling load of office buildings.
Design/methodology/approach
A feature selection scheme and stacking ensemble model to fulfill cooling load prediction task was proposed. Firstly, the abnormal data were identified by the data density estimation algorithm. Secondly, the crucial input features were clarified from three aspects (i.e. historical load information, time information and meteorological information). Thirdly, the stacking ensemble model combined long short-term memory network and light gradient boosting machine was utilized to predict the cooling load. Finally, the proposed framework performances by predicting cooling load of office buildings were verified with indicators.
Findings
The identified input features can improve the prediction performance. The prediction accuracy of the proposed model is preferable to the existing ones. The stacking ensemble model is robust to weather forecasting errors.
Originality/value
The stacking ensemble model was used to fulfill cooling load prediction task which can overcome the shortcomings of deep learning models. The input features of the model, which are less focused on in most studies, are taken as an important step in this paper.