Search results | Emerald Insight

Article

Publication date: 7 July 2020

A feature-centric spam email detection model using diverse supervised machine learning algorithms

Ammara Zamir, Hikmat Ullah Khan, Waqar Mehmood, Tassawar Iqbal and Abubakker Usman Akram

This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this…

HTML

PDF (2.6 MB)

Downloads

646

Abstract

Purpose

This research study proposes a feature-centric spam email detection model (FSEDM) based on content, sentiment, semantic, user and spam-lexicon features set. The purpose of this study is to exploit the role of sentiment features along with other proposed features to evaluate the classification accuracy of machine learning algorithms for spam email detection.

Design/methodology/approach

Existing studies primarily exploits content-based feature engineering approach; however, a limited number of features is considered. In this regard, this research study proposed a feature-centric framework (FSEDM) based on existing and novel features of email data set, which are extracted after pre-processing. Afterwards, diverse supervised learning techniques are applied on the proposed features in conjunction with feature selection techniques such as information gain, gain ratio and Relief-F to rank most prominent features and classify the emails into spam or ham (not spam).

Findings

Analysis and experimental results indicated that the proposed model with sentiment analysis is competitive approach for spam email detection. Using the proposed model, deep neural network applied with sentiment features outperformed other classifiers in terms of classification accuracy up to 97.2%.

Originality/value

This research is novel in this regard that no previous research focuses on sentiment analysis in conjunction with other email features for detection of spam emails.

Details

The Electronic Library , vol. 38 no. 3

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

Open Access

Article

Publication date: 25 July 2022

Improving handwritten digit recognition using hybrid feature selection algorithm

Fung Yuen Chin, Kong Hoong Lem and Khye Mun Wong

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the…

HTML

PDF (751 KB)

Downloads

1332

Abstract

Purpose

The amount of features in handwritten digit data is often very large due to the different aspects in personal handwriting, leading to high-dimensional data. Therefore, the employment of a feature selection algorithm becomes crucial for successful classification modeling, because the inclusion of irrelevant or redundant features can mislead the modeling algorithms, resulting in overfitting and decrease in efficiency.

Design/methodology/approach

The minimum redundancy and maximum relevance (mRMR) and the recursive feature elimination (RFE) are two frequently used feature selection algorithms. While mRMR is capable of identifying a subset of features that are highly relevant to the targeted classification variable, mRMR still carries the weakness of capturing redundant features along with the algorithm. On the other hand, RFE is flawed by the fact that those features selected by RFE are not ranked by importance, albeit RFE can effectively eliminate the less important features and exclude redundant features.

Findings

The hybrid method was exemplified in a binary classification between digits “4” and “9” and between digits “6” and “8” from a multiple features dataset. The result showed that the hybrid mRMR + support vector machine recursive feature elimination (SVMRFE) is better than both the sole support vector machine (SVM) and mRMR.

Originality/value

In view of the respective strength and deficiency mRMR and RFE, this study combined both these methods and used an SVM as the underlying classifier anticipating the mRMR to make an excellent complement to the SVMRFE.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

View access options

Article

Publication date: 2 February 2015

A feature selection approach for automatic e-book classification based on discourse segmentation

Jiunn-Liang Guo, Hei-Chia Wang and Ming-Way Lai

The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The…

HTML

PDF (481 KB)

Downloads

446

Abstract

Purpose

The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus.

Design/methodology/approach

The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique.

Findings

The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books.

Research limitations/implications

Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold.

Practical implications

The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents – e-books as against to conventional techniques.

Originality/value

A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.

Details

Program, vol. 49 no. 1

Type: Research Article

DOI:

ISSN: 0033-0337

Keywords

View access options

Article

Publication date: 10 January 2020

Phishing web site detection using diverse machine learning algorithms

Ammara Zamir, Hikmat Ullah Khan, Tassawar Iqbal, Nazish Yousaf, Farah Aslam, Almas Anjum and Maryam Hamdani

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’…

HTML

PDF (538 KB)

Downloads

3516

Abstract

Purpose

This paper aims to present a framework to detect phishing websites using stacking model. Phishing is a type of fraud to access users’ credentials. The attackers access users’ personal and sensitive information for monetary purposes. Phishing affects diverse fields, such as e-commerce, online business, banking and digital marketing, and is ordinarily carried out by sending spam emails and developing identical websites resembling the original websites. As people surf the targeted website, the phishers hijack their personal information.

Design/methodology/approach

Features of phishing data set are analysed by using feature selection techniques including information gain, gain ratio, Relief-F and recursive feature elimination (RFE) for feature selection. Two features are proposed combining the strongest and weakest attributes. Principal component analysis with diverse machine learning algorithms including (random forest [RF], neural network [NN], bagging, support vector machine, Naïve Bayes and k-nearest neighbour) is applied on proposed and remaining features. Afterwards, two stacking models: Stacking1 (RF + NN + Bagging) and Stacking2 (kNN + RF + Bagging) are applied by combining highest scoring classifiers to improve the classification accuracy.

Findings

The proposed features played an important role in improving the accuracy of all the classifiers. The results show that RFE plays an important role to remove the least important feature from the data set. Furthermore, Stacking1 (RF + NN + Bagging) outperformed all other classifiers in terms of classification accuracy to detect phishing website with 97.4% accuracy.

Originality/value

This research is novel in this regard that no previous research focusses on using feed forward NN and ensemble learners for detecting phishing websites.

Details

The Electronic Library , vol. 38 no. 1

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 11 November 2021

Hybrid generative regression-based deep intelligence to predict the risk of chronic disease

Sandeep Kumar Hegde and Monica R. Mundada

Chronic diseases are considered as one of the serious concerns and threats to public health across the globe. Diseases such as chronic diabetes mellitus (CDM), cardio…

HTML

PDF (2.8 MB)

Downloads

140

Abstract

Purpose

Chronic diseases are considered as one of the serious concerns and threats to public health across the globe. Diseases such as chronic diabetes mellitus (CDM), cardio vasculardisease (CVD) and chronic kidney disease (CKD) are major chronic diseases responsible for millions of death. Each of these diseases is considered as a risk factor for the other two diseases. Therefore, noteworthy attention is being paid to reduce the risk of these diseases. A gigantic amount of medical data is generated in digital form from smart healthcare appliances in the current era. Although numerous machine learning (ML) algorithms are proposed for the early prediction of chronic diseases, these algorithmic models are neither generalized nor adaptive when the model is imposed on new disease datasets. Hence, these algorithms have to process a huge amount of disease data iteratively until the model converges. This limitation may make it difficult for ML models to fit and produce imprecise results. A single algorithm may not yield accurate results. Nonetheless, an ensemble of classifiers built from multiple models, that works based on a voting principle has been successfully applied to solve many classification tasks. The purpose of this paper is to make early prediction of chronic diseases using hybrid generative regression based deep intelligence network (HGRDIN) model.

Design/methodology/approach

In the proposed paper generative regression (GR) model is used in combination with deep neural network (DNN) for the early prediction of chronic disease. The GR model will obtain prior knowledge about the labelled data by analyzing the correlation between features and class labels. Hence, the weight assignment process of DNN is influenced by the relationship between attributes rather than random assignment. The knowledge obtained through these processes is passed as input to the DNN network for further prediction. Since the inference about the input data instances is drawn at the DNN through the GR model, the model is named as hybrid generative regression-based deep intelligence network (HGRDIN).

Findings

The credibility of the implemented approach is rigorously validated using various parameters such as accuracy, precision, recall, F score and area under the curve (AUC) score. During the training phase, the proposed algorithm is constantly regularized using the elastic net regularization technique and also hyper-tuned using the various parameters such as momentum and learning rate to minimize the misprediction rate. The experimental results illustrate that the proposed approach predicted the chronic disease with a minimal error by avoiding the possible overfitting and local minima problems. The result obtained with the proposed approach is also compared with the various traditional approaches.

Research limitations/implications

Usually, the diagnostic data are multi-dimension in nature where the performance of the ML algorithm will degrade due to the data overfitting, curse of dimensionality issues. The result obtained through the experiment has achieved an average accuracy of 95%. Hence, analysis can be made further to improve predictive accuracy by overcoming the curse of dimensionality issues.

Practical implications

The proposed ML model can mimic the behavior of the doctor's brain. These algorithms have the capability to replace clinical tasks. The accurate result obtained through the innovative algorithms can free the physician from the mundane care and practices so that the physician can focus more on the complex issues.

Social implications

Utilizing the proposed predictive model at the decision-making level for the early prediction of the disease is considered as a promising change towards the healthcare sector. The global burden of chronic disease can be reduced at an exceptional level through these approaches.

Originality/value

In the proposed HGRDIN model, the concept of transfer learning approach is used where the knowledge acquired through the GR process is applied on DNN that identified the possible relationship between the dependent and independent feature variables by mapping the chronic data instances to its corresponding target class before it is being passed as input to the DNN network. Hence, the result of the experiments illustrated that the proposed approach obtained superior performance in terms of various validation parameters than the existing conventional techniques.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 15 no. 1

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 17 June 2008

RHC method for application to BMI‐based systems

Tohru Kawabe

The purpose of this paper is to present research in the area of control method for the man‐machine systems with brain machine interface (BMI). Concrete target system is, for…

HTML

PDF (306 KB)

Downloads

367

Abstract

Purpose

The purpose of this paper is to present research in the area of control method for the man‐machine systems with brain machine interface (BMI). Concrete target system is, for instance, a car cruising system and so on.

Design/methodology/approach

The improved receding horizon control (RHC) method for the sampled‐data systems and the adaptive digital‐to‐analog (DA) converter which has the way to switch the sampling functions according to the system status are used. The feature selection method based on the kernel support vector machines with the backward stepwise selection for the BMI signals are also used.

Findings

This paper proposes the new improved RHC method with the adaptive DA converter for the application of the BMI‐based systems. The proposed method is illustrated as useful and effective method for the systems to which switch of control laws is indispensable by the simulations.

Research limitations/implications

Although the proposed method is effective for the BMI‐based systems with switching of control laws, the faster algorithm for RHC will be need to apply to the man‐machine systems with the BMI in practical use.

Practical implications

The basic concept or framework of the proposed method can be used for the real man‐machine systems with the BMI, for examples, car crusing systems, wheel‐chaired systems and so on.

Originality/value

The paper contributes to the development of the new effective control method for the BMI‐based man‐machine systems.

Details

Kybernetes, vol. 37 no. 5

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 1 September 2021

Predicting user personality with social interactions in Weibo

Yuting Jiang, Shengli Deng, Hongxiu Li and Yong Liu

The purposes of this paper are to (1) explore how personality traits pertaining to the dominance influence steadiness compliance model manifest themselves in terms of user…

HTML

PDF (1.9 MB)

Downloads

1008

Abstract

Purpose

The purposes of this paper are to (1) explore how personality traits pertaining to the dominance influence steadiness compliance model manifest themselves in terms of user interaction behavior on social media and (2) examine whether social interaction data on social media platforms can predict user personality.

Design/methodology/approach

Social interaction data was collected from 198 users of Sina Weibo, a popular social media platform in China. Their personality traits were also measured via questionnaire. Machine learning techniques were applied to predict the personality traits based on the social interaction data.

Findings

The results demonstrated that the proposed classifiers had high prediction accuracy, indicating that our approach is reliable and can be used with social interaction data on social media platforms to predict user personality. “Reposting,” “being reposted,” “commenting” and “being commented on” were found to be the key interaction features that reflected Weibo users' personalities, whereas “liking” was not found to be a key feature.

Originality/value

The findings of this study are expected to enrich personality prediction research based on social media data and to provide insights into the potential of employing social media data for the purpose of personality prediction in the context of the Weibo social media platform in China.

Details

Aslib Journal of Information Management, vol. 73 no. 6

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 3 April 2017

A hybrid grey based artificial neural network and C&R tree for project portfolio selection

Farshad Faezy Razi and Seyed Hooman Shariat

The purpose of this paper is twofold: the selection of project portfolios through hybrid artificial neural network algorithms, feature selection based on grey relational analysis…

HTML

PDF (357 KB)

Downloads

558

Abstract

Purpose

The purpose of this paper is twofold: the selection of project portfolios through hybrid artificial neural network algorithms, feature selection based on grey relational analysis, decision tree and regression; and the identification of the features affecting project portfolio selection using the artificial neural network algorithm, decision tree and regression. The authors also aim to classify the available options using the decision tree algorithm.

Design/methodology/approach

In order to achieve the research goals, a project-oriented organization was selected and studied. In all, 49 project management indicators were chosen from A Guide to the Project Management Body of Knowledge (PMBOK Guide), and the most important indicators were identified using a feature selection algorithm and decision tree. After the extraction of rules, decision rule-based multi-criteria decision making matrices were produced. Each matrix was ranked through grey relational analysis, similarity to ideal solution method and multi-criteria optimization. Finally, a model for choosing the best ranking method was designed and implemented using the genetic algorithm. To analyze the responses, stability of the classes was investigated.

Findings

The results showed that projects ranked based on neural network weights by the grey relational analysis method prove to be better options for the selection of a project portfolio. The process of identification of the features affecting project portfolio selection resulted in the following factors: scope management, project charter, project management plan, stakeholders and risk.

Originality/value

This study presents the most effective features affecting project portfolio selection which is highly impressive in organizational decision making and must be considered seriously. Deploying sensitivity analysis, which is an innovation in such studies, played a constructive role in examining the accuracy and reliability of the proposed models, and it can be firmly argued that the results have had an important role in validating the findings of this study.

Details

Benchmarking: An International Journal, vol. 24 no. 3

Type: Research Article

DOI:

ISSN: 1463-5771

Keywords

View access options

Article

Publication date: 28 February 2019

Improving the prediction accuracy in blended learning environment using synthetic minority oversampling technique

Gabrijela Dimic, Dejan Rancic, Nemanja Macek, Petar Spalevic and Vida Drasute

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

HTML

PDF (241 KB)

Downloads

220

Abstract

Purpose

This paper aims to deal with the previously unknown prediction accuracy of students’ activity pattern in a blended learning environment.

Design/methodology/approach

To extract the most relevant activity feature subset, different feature-selection methods were applied. For different cardinality subsets, classification models were used in the comparison.

Findings

Experimental evaluation oppose the hypothesis that feature vector dimensionality reduction leads to prediction accuracy increasing.

Research limitations/implications

Improving prediction accuracy in a described learning environment was based on applying synthetic minority oversampling technique, which had affected results on correlation-based feature-selection method.

Originality/value

The major contribution of the research is the proposed methodology for selecting the optimal low-cardinal subset of students’ activities and significant prediction accuracy improvement in a blended learning environment.

Details

Information Discovery and Delivery, vol. 47 no. 2

Type: Research Article

DOI:

ISSN: 2398-6247

Keywords

View access options

Article

Publication date: 14 November 2016

A user’s personality prediction approach by mining network interaction behaviors on Facebook

Tsung-Yi Chen, Meng-Che Tsai and Yuh-Min Chen

For an enterprise, it is essential to win as many customers as possible. The key to successfully winning customers is often determined by understanding the personality…

HTML

PDF (6.1 MB)

Downloads

1760

Abstract

Purpose

For an enterprise, it is essential to win as many customers as possible. The key to successfully winning customers is often determined by understanding the personality characteristics of the object of communication in order to employ an effective communication strategy. An enterprise needs to obtain the personality information of target or potential customers. However, the traditional method for personality evaluation is extremely costly in terms of time and labor, and it cannot acquire customer personality information without their awareness. Therefore, the manner in which to effectively conduct automated personality predictions for a large number of objects is an important issue. The paper aims to discuss these issues.

Design/methodology/approach

The diverse social media that have emerged in recent years represent a digital platform on which users can publicly deliver speeches and interact with others. Thus, social media may be able to serve the needs of automated personality predictions. Based on user data of Facebook, the main social media platform around the world, this research developed a method for predicting personality types based on interaction logs.

Findings

Experimental results show that the Naïve Bayes classification algorithm combined with a feature selection algorithm produces the best performance for predicting personality types, with 70-80 percent accuracy.

Research limitations/implications

In this research, the dominance, inducement, submission, and compliance (DISC) theory was used to determine personality types. Some specific limitations were encountered. As Facebook was used as the main data source, it was necessary to obtain related data via Facebook’s API (FB API). However, the data types accessible via FB API are very limited.

Practical implications

This research serves to build a universal model for social media interaction, and can be used to propose an efficient method for designing interaction features.

Originality/value

This research has developed an approach for automatically predicting the personality types of network users based on their Facebook interactions.

Details

Online Information Review, vol. 40 no. 7

Type: Research Article

DOI:

ISSN: 1468-4527