The purpose of this paper is to enhance the performance of spammer identification problem in online social networks. Hyperparameter tuning has been performed by researchers in the…
Abstract
Purpose
The purpose of this paper is to enhance the performance of spammer identification problem in online social networks. Hyperparameter tuning has been performed by researchers in the past to enhance the performance of classifiers. The AdaBoost algorithm belongs to a class of ensemble classifiers and is widely applied in binary classification problems. A single algorithm may not yield accurate results. However, an ensemble of classifiers built from multiple models has been successfully applied to solve many classification tasks. The search space to find an optimal set of parametric values is vast and so enumerating all possible combinations is not feasible. Hence, a hybrid modified whale optimization algorithm for spam profile detection (MWOA-SPD) model is proposed to find optimal values for these parameters.
Design/methodology/approach
In this work, the hyperparameters of AdaBoost are fine-tuned to find its application to identify spammers in social networks. AdaBoost algorithm linearly combines several weak classifiers to produce a stronger one. The proposed MWOA-SPD model hybridizes the whale optimization algorithm and salp swarm algorithm.
Findings
The technique is applied to a manually constructed Twitter data set. It is compared with the existing optimization and hyperparameter tuning methods. The results indicate that the proposed method outperforms the existing techniques in terms of accuracy and computational efficiency.
Originality/value
The proposed method reduces the server load by excluding complex features retaining only the lightweight features. It aids in identifying the spammers at an earlier stage thereby offering users a propitious environment.
Details
Keywords
Euripidis N. Loukis, Manolis Maragoudakis and Niki Kyriakou
Public sector has started exploiting artificial intelligence (AI) techniques, however, mainly for operational but much less for tactical or level tasks. The purpose of this study…
Abstract
Purpose
Public sector has started exploiting artificial intelligence (AI) techniques, however, mainly for operational but much less for tactical or level tasks. The purpose of this study is to exploit AI for the highest strategic-level task of government: to develop an AI-based public sector data analytics methodology for supporting policymaking for one of the most serious and large-scale challenges that governments repeatedly face, the economic crises that lead to economic recessions (though the proposed methodology is of much more general applicability).
Design/methodology/approach
A public sector data analytics methodology has been developed, which enables the exploitation of existing public and private sector data, through advanced processing of them using a big data-oriented AI technique, “all-relevant” feature selection, to identify characteristics of firms as well as their external environment that affect (positively or negatively) their resilience to economic crisis.
Findings
A first application of the proposed public sector data analytics methodology has been conducted, using Greek firms’ data concerning the economic crisis period 2009–2014, which has led to interesting conclusions and insights, revealing factors affecting the extent of sales revenue decrease in Greek firms during the above crisis period and providing a first validation of the methodology used in this study.
Research limitations/implications
This paper contributes to the advancement of two emerging highly important, for the society, but minimally researched, digital government research domains: public sector data analytics (and especially policy analytics) and government exploitation of AI. It exploits an AI feature selection algorithm, the Boruta “all-relevant” variables identification algorithm, which has been minimally exploited in the past for public sector data analytics, to support the design of public policies for addressing one of the most serious and large-scale economic challenges that governments repeatedly face: the economic crises.
Practical implications
The proposed methodology allows the identification of characteristics of firms as well as their external environment that affect positively or negatively their resilience to economic crisis. This enables a better understanding of the kinds of firms that are more strongly hit by the crisis, which is quite useful for the design of public policies for supporting them; and at the same time reveals firms’ practices, resources, capabilities, etc. that enhance their ability to cope with economic crisis, to design policies for promoting them through educational and support activities.
Social implications
This methodology can be very useful for the design of more effective public policies for reducing the negative impacts of economic crises on firms, and therefore mitigating their negative consequences for the society, such as unemployment, poverty and social exclusion.
Originality/value
This study develops a novel approach to the exploitation of public and private sector data, based on a minimally exploited, for such purposes, AI technique (“all-relevant” feature selection), to support the design of public policies for addressing one of the most threatening disruptions that modern economies and societies repeatedly face, the economic crises.
Details
Keywords
Chih‐Fong Tsai and David C. Yen
Image classification or more specifically, annotating images with keywords is one of the important steps during image database indexing. However, the problem with current research…
Abstract
Purpose
Image classification or more specifically, annotating images with keywords is one of the important steps during image database indexing. However, the problem with current research in terms of image retrieval is more concentrated on how conceptual categories can be well represented by extracted, low level features for an effective classification. Consequently, image features representation including segmentation and low‐level feature extraction schemes must be genuinely effective to facilitate the process of classification. The purpose of this paper is to examine the effect on annotation effectiveness of using different (local) feature representation methods to map into conceptual categories.
Design/methodology/approach
This paper compares tiling (five and nine tiles) and regioning (five and nine regions) segmentation schemes and the extraction of combinations of color, texture, and edge features in terms of the effectiveness of a particular benchmark, automatic image annotation set up. Differences between effectiveness on concrete or abstract conceptual categories or keywords are further investigated, and progress towards establishing a particular benchmark approach is also reported.
Findings
In the context of local feature representation, the paper concludes that the combined color and texture features are the best to use for the five tiling and regioning schemes, and this evidence would form a good benchmark for future studies. Another interesting finding (but perhaps not surprising) is that when the number of concrete and abstract keywords increases or it is large (e.g. 100), abstract keywords are more difficult to assign correctly than the concrete ones.
Research limitations/implications
Future work could consider: conduct user‐centered evaluation instead of evaluation only by some chosen ground truth dataset, such as Corel, since this might impact effectiveness results; use of different numbers of categories for scalability analysis of image annotation as well as larger numbers of training and testing examples; use of Principle Component Analysis or Independent Component Analysis, or indeed machine learning techniques for low‐level feature selection; use of other segmentation schemes, especially more complex tiling schemes and other regioning schemes; use of different datasets, use of other low‐level features and/or combination of them; use of other machine learning techniques.
Originality/value
This paper is a good start for analyzing the mapping between some feature representation methods and various conceptual categories for future image annotation research.
Details
Keywords
Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the…
Abstract
Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the early stage will save most of the women’s life. As there is an advancement in the technology research used Machine Learning (ML) algorithm Random Forest for ranking the feature, Support Vector Machine (SVM), and Naïve Bayes (NB) supervised classifiers for selection of best optimized features and prediction of BC accuracy. The estimation of prediction accuracy has been done by using the dataset Wisconsin Breast Cancer Data from University of California Irvine (UCI) ML repository. To perform all these operation, Anaconda one of the open source distribution of Python has been used. The proposed work resulted in extemporize improvement in the NB and SVM classifier accuracy. The performance evaluation of the proposed model is estimated by using classification accuracy, confusion matrix, mean, standard deviation, variance, and root mean-squared error.
The experimental results shows that 70-30 data split will result in best accuracy. SVM acts as a feature optimizer of 12 best features with the result of 97.66% accuracy and improvement of 1.17% after feature reduction. NB results with feature optimizer 17 of best features with the result of 96.49% accuracy and improvement of 1.17% after feature reduction.
The study shows that proposal model works very effectively as compare to the existing models with respect to accuracy measures.
Details
Keywords
Fatemeh Ehsani and Monireh Hosseini
As internet banking service marketing platforms continue to advance, customers exhibit distinct behaviors. Given the extensive array of options and minimal barriers to switching…
Abstract
Purpose
As internet banking service marketing platforms continue to advance, customers exhibit distinct behaviors. Given the extensive array of options and minimal barriers to switching to competitors, the concept of customer churn behavior has emerged as a subject of considerable debate. This study aims to delineate the scope of feature optimization methods for elucidating customer churn behavior within the context of internet banking service marketing. To achieve this goal, the author aims to predict the attrition and migration of customers who use internet banking services using tree-based classifiers.
Design/methodology/approach
The author used various feature optimization methods in tree-based classifiers to predict customer churn behavior using transaction data from customers who use internet banking services. First, the authors conducted feature reduction to eliminate ineffective features and project the data set onto a lower-dimensional space. Next, the author used Recursive Feature Elimination with Cross-Validation (RFECV) to extract the most practical features. Then, the author applied feature importance to assign a score to each input feature. Following this, the author selected C5.0 Decision Tree, Random Forest, XGBoost, AdaBoost, CatBoost and LightGBM as the six tree-based classifier structures.
Findings
This study acclaimed that transaction data is a reliable resource for elucidating customer churn behavior within the context of internet banking service marketing. Experimental findings highlight the operational benefits and enhanced customer retention afforded by implementing feature optimization and leveraging a variety of tree-based classifiers. The results indicate the significance of feature reduction, feature selection and feature importance as the three feature optimization methods in comprehending customer churn prediction. This study demonstrated that feature optimization can improve this prediction by increasing the accuracy and precision of tree-based classifiers and decreasing their error rates.
Originality/value
This research aims to enhance the understanding of customer behavior on internet banking service platforms by predicting churn intentions. This study demonstrates how feature optimization methods influence customer churn prediction performance. This approach included feature reduction, feature selection and assessing feature importance to optimize transaction data analysis. Additionally, the author performed feature optimization within tree-based classifiers to improve performance. The novelty of this approach lies in combining feature optimization methods with tree-based classifiers to effectively capture and articulate customer churn experience in internet banking service marketing.
Details
Keywords
S. Punitha and K. Devaki
Predicting student performance is crucial in educational settings to identify and support students who may need additional help or resources. Understanding and predicting student…
Abstract
Purpose
Predicting student performance is crucial in educational settings to identify and support students who may need additional help or resources. Understanding and predicting student performance is essential for educators to provide targeted support and guidance to students. By analyzing various factors like attendance, study habits, grades, and participation, teachers can gain insights into each student’s academic progress. This information helps them tailor their teaching methods to meet the individual needs of students, ensuring a more personalized and effective learning experience. By identifying patterns and trends in student performance, educators can intervene early to address any challenges and help students acrhieve their full potential. However, the complexity of human behavior and learning patterns makes it difficult to accurately forecast how a student will perform. Additionally, the availability and quality of data can vary, impacting the accuracy of predictions. Despite these obstacles, continuous improvement in data collection methods and the development of more robust predictive models can help address these challenges and enhance the accuracy and effectiveness of student performance predictions. However, the scalability of the existing models to different educational settings and student populations can be a hurdle. Ensuring that the models are adaptable and effective across diverse environments is crucial for their widespread use and impact. To implement a student’s performance-based learning recommendation scheme for predicting the student’s capabilities and suggesting better materials like papers, books, videos, and hyperlinks according to their needs. It enhances the performance of higher education.
Design/methodology/approach
Thus, a predictive approach for student achievement is presented using deep learning. At the beginning, the data is accumulated from the standard database. Next, the collected data undergoes a stage where features are carefully selected using the Modified Red Deer Algorithm (MRDA). After that, the selected features are given to the Deep Ensemble Networks (DEnsNet), in which techniques such as Gated Recurrent Unit (GRU), Deep Conditional Random Field (DCRF), and Residual Long Short-Term Memory (Res-LSTM) are utilized for predicting the student performance. In this case, the parameters within the DEnsNet network are finely tuned by the MRDA algorithm. Finally, the results from the DEnsNet network are obtained using a superior method that delivers the final prediction outcome. Following that, the Adaptive Generative Adversarial Network (AGAN) is introduced for recommender systems, with these parameters optimally selected using the MRDA algorithm. Lastly, the method for predicting student performance is evaluated numerically and compared to traditional methods to demonstrate the effectiveness of the proposed approach.
Findings
The accuracy of the developed model is 7.66%, 9.91%, 5.3%, and 3.53% more than HHO-DEnsNet, ROA-DEnsNet, GTO-DEnsNet, and AOA-DEnsNet for dataset-1, and 7.18%, 7.54%, 5.43% and 3% enhanced than HHO-DEnsNet, ROA-DEnsNet, GTO-DEnsNet, and AOA-DEnsNet for dataset-2.
Originality/value
The developed model recommends the appropriate learning materials within a short period to improve student’s learning ability.
Details
Keywords
Robert W. Messler, Suat Genc and Gary A. Gabriele
Suggests that, without question, while every step in a systematic approach to the design of parts for assembly using integral snap‐fit features is important, none is more…
Abstract
Suggests that, without question, while every step in a systematic approach to the design of parts for assembly using integral snap‐fit features is important, none is more important than selecting locking features. After all, it is these features that hold the assembly together. While quite different in appearance and details of their operation, all integral locking features comprise a latch and a catch component to create a locking pair. Proper, no less optimum, function requires that such locking pairs be selected using a systematic approach. Presents that approach as a six‐step methodology, but first, defines and describes latch and catch components, bringing order to their apparent boundless variety. Demonstrates the utility of the methodology with a real‐life case study.
Details
Keywords
Gergely Orbán and Gábor Horváth
The purpose of this paper is to show an efficient method for the detection of signs of early lung cancer. Various image processing algorithms are presented for different types of…
Abstract
Purpose
The purpose of this paper is to show an efficient method for the detection of signs of early lung cancer. Various image processing algorithms are presented for different types of lesions, and a scheme is proposed for the combination of results.
Design/methodology/approach
A computer aided detection (CAD) scheme was developed for detection of lung cancer. It enables different lesion enhancer algorithms, sensitive to specific lesion subtypes, to be used simultaneously. Three image processing algorithms are presented for the detection of small nodules, large ones, and infiltrated areas. The outputs are merged, the false detection rate is reduced with four separated support vector machine (SVM) classifiers. The classifier input comes from a feature selection algorithm selecting from various textural and geometric features. A total of 761 images were used for testing, including the database of the Japanese Society of Radiological Technology (JSRT).
Findings
The fusion of algorithms reduced false positives on average by 0.6 per image, while the sensitivity remained 80 per cent. On the JSRT database the system managed to find 60.2 per cent of lesions at an average of 2.0 false positives per image. The effect of using different result evaluation criteria was tested and a difference as high as 4 percentage points in sensitivity was measured. The system was compared to other published methods.
Originality/value
The study described in the paper proves the usefulness of lesion enhancement decomposition, while proposing a scheme for the fusion of algorithms. Furthermore, a new algorithm is introduced for the detection of infiltrated areas, possible signs of lung cancer, neglected by previous solutions.
Details
Keywords
Sharanabasappa and Suvarna Nandyal
In order to prevent accidents during driving, driver drowsiness detection systems have become a hot topic for researchers. There are various types of features that can be used to…
Abstract
Purpose
In order to prevent accidents during driving, driver drowsiness detection systems have become a hot topic for researchers. There are various types of features that can be used to detect drowsiness. Detection can be done by utilizing behavioral data, physiological measurements and vehicle-based data. The existing deep convolutional neural network (CNN) models-based ensemble approach analyzed the behavioral data comprises eye or face or head movement captured by using a camera images or videos. However, the developed model suffered from the limitation of high computational cost because of the application of approximately 140 million parameters.
Design/methodology/approach
The proposed model uses significant feature parameters from the feature extraction process such as ReliefF, Infinite, Correlation, Term Variance are used for feature selection. The features that are selected are undergone for classification using ensemble classifier.
Findings
The output of these models is classified into non-drowsiness or drowsiness categories.
Research limitations/implications
In this research work higher end camera are required to collect videos as it is cost-effective. Therefore, researches are encouraged to use the existing datasets.
Practical implications
This paper overcomes the earlier approach. The developed model used complex deep learning models on small dataset which would also extract additional features, thereby provided a more satisfying result.
Originality/value
Drowsiness can be detected at the earliest using ensemble model which restricts the number of accidents.
Details
Keywords
N. Venkata Sailaja, L. Padmasree and N. Mangathayaru
Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…
Abstract
Purpose
Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.
Design/methodology/approach
The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.
Findings
For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.
Originality/value
In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.