Kalyan Nagaraj, Biplab Bhattacharjee, Amulyashree Sridhar and Sharvani GS
Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of…
Abstract
Purpose
Phishing is one of the major threats affecting businesses worldwide in current times. Organizations and customers face the hazards arising out of phishing attacks because of anonymous access to vulnerable details. Such attacks often result in substantial financial losses. Thus, there is a need for effective intrusion detection techniques to identify and possibly nullify the effects of phishing. Classifying phishing and non-phishing web content is a critical task in information security protocols, and full-proof mechanisms have yet to be implemented in practice. The purpose of the current study is to present an ensemble machine learning model for classifying phishing websites.
Design/methodology/approach
A publicly available data set comprising 10,068 instances of phishing and legitimate websites was used to build the classifier model. Feature extraction was performed by deploying a group of methods, and relevant features extracted were used for building the model. A twofold ensemble learner was developed by integrating results from random forest (RF) classifier, fed into a feedforward neural network (NN). Performance of the ensemble classifier was validated using k-fold cross-validation. The twofold ensemble learner was implemented as a user-friendly, interactive decision support system for classifying websites as phishing or legitimate ones.
Findings
Experimental simulations were performed to access and compare the performance of the ensemble classifiers. The statistical tests estimated that RF_NN model gave superior performance with an accuracy of 93.41 per cent and minimal mean squared error of 0.000026.
Research limitations/implications
The research data set used in this study is publically available and easy to analyze. Comparative analysis with other real-time data sets of recent origin must be performed to ensure generalization of the model against various security breaches. Different variants of phishing threats must be detected rather than focusing particularly toward phishing website detection.
Originality/value
The twofold ensemble model is not applied for classification of phishing websites in any previous studies as per the knowledge of authors.
Details
Keywords
Biplab Bhattacharjee, Kavya Unni and Maheshwar Pratap
Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This…
Abstract
Purpose
Product returns are a major challenge for e-businesses as they involve huge logistical and operational costs. Therefore, it becomes crucial to predict returns in advance. This study aims to evaluate different genres of classifiers for product return chance prediction, and further optimizes the best performing model.
Design/methodology/approach
An e-commerce data set having categorical type attributes has been used for this study. Feature selection based on chi-square provides a selective features-set which is used as inputs for model building. Predictive models are attempted using individual classifiers, ensemble models and deep neural networks. For performance evaluation, 75:25 train/test split and 10-fold cross-validation strategies are used. To improve the predictability of the best performing classifier, hyperparameter tuning is performed using different optimization methods such as, random search, grid search, Bayesian approach and evolutionary models (genetic algorithm, differential evolution and particle swarm optimization).
Findings
A comparison of F1-scores revealed that the Bayesian approach outperformed all other optimization approaches in terms of accuracy. The predictability of the Bayesian-optimized model is further compared with that of other classifiers using experimental analysis. The Bayesian-optimized XGBoost model possessed superior performance, with accuracies of 77.80% and 70.35% for holdout and 10-fold cross-validation methods, respectively.
Research limitations/implications
Given the anonymized data, the effects of individual attributes on outcomes could not be investigated in detail. The Bayesian-optimized predictive model may be used in decision support systems, enabling real-time prediction of returns and the implementation of preventive measures.
Originality/value
There are very few reported studies on predicting the chance of order return in e-businesses. To the best of the authors’ knowledge, this study is the first to compare different optimization methods and classifiers, demonstrating the superiority of the Bayesian-optimized XGBoost classification model for returns prediction.
Details
Keywords
Rajesh Chidananda Reddy, Debasisha Mishra, D.P. Goyal and Nripendra P. Rana
The study explores the potential barriers to data science (DS) implementation in organizations and identifies the key barriers. The identified barriers were explored for their…
Abstract
Purpose
The study explores the potential barriers to data science (DS) implementation in organizations and identifies the key barriers. The identified barriers were explored for their interconnectedness and characteristics. This study aims to help organizations formulate apt DS strategies by providing a close-to-reality DS implementation framework of barriers, in conjunction with extant literature and practitioners' viewpoints.
Design/methodology/approach
The authors synthesized 100 distinct barriers through systematic literature review (SLR) under the individual, organizational and governmental taxonomies. In discussions with 48 industry experts through semi-structured interviews, 14 key barriers were identified. The selected barriers were explored for their pair-wise relationships using interpretive structural modeling (ISM) and fuzzy Matriced’ Impacts Croise's Multiplication Appliquée a UN Classement (MICMAC) analyses in formulating the hierarchical framework.
Findings
The lack of awareness and data-related challenges are identified as the most prominent barriers, followed by non-alignment with organizational strategy, lack of competency with vendors and premature governmental arrangements, and classified as independent variables. The non-commitment of top-management team (TMT), significant investment costs, lack of swiftness in change management and a low tolerance for complexity and initial failures are recognized as the linkage variables. Employee reluctance, mid-level managerial resistance, a dearth of adequate skills and knowledge and working in silos depend on the rest of the identified barriers. The perceived threat to society is classified as the autonomous variable.
Originality/value
The study augments theoretical understanding from the literature with the practical viewpoints of industry experts in enhancing the knowledge of the DS ecosystem. The research offers organizations a generic framework to combat hindrances to DS initiatives strategically.