Search results
1 – 6 of 6Shrawan Kumar Trivedi and Shubhamoy Dey
Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates…
Abstract
Purpose
Email is a rapid and cheapest medium of sharing information, whereas unsolicited email (spam) is constant trouble in the email communication. The rapid growth of the spam creates a necessity to build a reliable and robust spam classifier. This paper aims to presents a study of evolutionary classifiers (genetic algorithm [GA] and genetic programming [GP]) without/with the help of an ensemble of classifiers method. In this research, the classifiers ensemble has been developed with adaptive boosting technique.
Design/methodology/approach
Text mining methods are applied for classifying spam emails and legitimate emails. Two data sets (Enron and SpamAssassin) are taken to test the concerned classifiers. Initially, pre-processing is performed to extract the features/words from email files. Informative feature subset is selected from greedy stepwise feature subset search method. With the help of informative features, a comparative study is performed initially within the evolutionary classifiers and then with other popular machine learning classifiers (Bayesian, naive Bayes and support vector machine).
Findings
This study reveals the fact that evolutionary algorithms are promising in classification and prediction applications where genetic programing with adaptive boosting is turned out not only an accurate classifier but also a sensitive classifier. Results show that initially GA performs better than GP but after an ensemble of classifiers (a large number of iterations), GP overshoots GA with significantly higher accuracy. Amongst all classifiers, boosted GP turns out to be not only good regarding classification accuracy but also low false positive (FP) rates, which is considered to be the important criteria in email spam classification. Also, greedy stepwise feature search is found to be an effective method for feature selection in this application domain.
Research limitations/implications
The research implication of this research consists of the reduction in cost incurred because of spam/unsolicited bulk email. Email is a fundamental necessity to share information within a number of units of the organizations to be competitive with the business rivals. In addition, it is continually a hurdle for internet service providers to provide the best emailing services to their customers. Although, the organizations and the internet service providers are continuously adopting novel spam filtering approaches to reduce the number of unwanted emails, the desired effect could not be significantly seen because of the cost of installation, customizable ability and the threat of misclassification of important emails. This research deals with all the issues and challenges faced by internet service providers and organizations.
Practical implications
In this research, the proposed models have not only provided excellent performance accuracy, sensitivity with low FP rate, customizable capability but also worked on reducing the cost of spam. The same models may be used for other applications of text mining also such as sentiment analysis, blog mining, news mining or other text mining research.
Originality/value
A comparison between GP and GAs has been shown with/without ensemble in spam classification application domain.
Details
Keywords
Shrawan Kumar Trivedi and Shubhamoy Dey
To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be…
Abstract
Purpose
To be sustainable and competitive in the current business environment, it is useful to understand users’ sentiment towards products and services. This critical task can be achieved via natural language processing and machine learning classifiers. This paper aims to propose a novel probabilistic committee selection classifier (PCC) to analyse and classify the sentiment polarities of movie reviews.
Design/methodology/approach
An Indian movie review corpus is assembled for this study. Another publicly available movie review polarity corpus is also involved with regard to validating the results. The greedy stepwise search method is used to extract the features/words of the reviews. The performance of the proposed classifier is measured using different metrics, such as F-measure, false positive rate, receiver operating characteristic (ROC) curve and training time. Further, the proposed classifier is compared with other popular machine-learning classifiers, such as Bayesian, Naïve Bayes, Decision Tree (J48), Support Vector Machine and Random Forest.
Findings
The results of this study show that the proposed classifier is good at predicting the positive or negative polarity of movie reviews. Its performance accuracy and the value of the ROC curve of the PCC is found to be the most suitable of all other classifiers tested in this study. This classifier is also found to be efficient at identifying positive sentiments of reviews, where it gives low false positive rates for both the Indian Movie Review and Review Polarity corpora used in this study. The training time of the proposed classifier is found to be slightly higher than that of Bayesian, Naïve Bayes and J48.
Research limitations/implications
Only movie review sentiments written in English are considered. In addition, the proposed committee selection classifier is prepared only using the committee of probabilistic classifiers; however, other classifier committees can also be built, tested and compared with the present experiment scenario.
Practical implications
In this paper, a novel probabilistic approach is proposed and used for classifying movie reviews, and is found to be highly effective in comparison with other state-of-the-art classifiers. This classifier may be tested for different applications and may provide new insights for developers and researchers.
Social implications
The proposed PCC may be used to classify different product reviews, and hence may be beneficial to organizations to justify users’ reviews about specific products or services. By using authentic positive and negative sentiments of users, the credibility of the specific product, service or event may be enhanced. PCC may also be applied to other applications, such as spam detection, blog mining, news mining and various other data-mining applications.
Originality/value
The constructed PCC is novel and was tested on Indian movie review data.
Details
Keywords
Shrawan Kumar Trivedi, Shubhamoy Dey and Anil Kumar
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an…
Abstract
Purpose
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers.
Design/methodology/approach
In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time).
Findings
The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM.
Originality/value
This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.
Details
Keywords
Khadija Ali Vakeel, K. Sivakumar, K.R. Jayasimha and Shubhamoy Dey
The purpose of this paper is to focus on failures in online flash sales (OFS) and to explore why consumers participate in an OFS even after experiencing service failure. It also…
Abstract
Purpose
The purpose of this paper is to focus on failures in online flash sales (OFS) and to explore why consumers participate in an OFS even after experiencing service failure. It also examines the role of deal proneness, attribution, and emotions.
Design/methodology/approach
Using a mixed method approach to gain insights into this relatively unexplored phenomenon of OFS, this research uses netnography followed by a survey study.
Findings
The findings show that deal-prone customers tend to ignore service failures during OFS and re-participate in the future. In the context of OFS, failures attributed to internal locus of attribution (LOA) also have a negative effect on re-participation compared with failures attributed to external LOA. Furthermore, there is a three-way interaction among deal proneness, LOA, and past emotions. The results show that negative past emotions further exacerbate the impact of attribution on the link between deal proneness and re-participation.
Originality/value
In contrast with prior research, the paper shows that consumers participate even after service failure. The proposed difference is between customers who experience different LOA and past emotions offers insights into their behavior after service failure in a new context of an online/electronic commerce event – flash sales. This paper specifically explores the role of internal LOA and finds that it has a more negative impact than external LOA on re-participation.
Details
Keywords
Shrawan Kumar Trivedi and Shubhamoy Dey
The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with…
Abstract
Purpose
The email is an important medium for sharing information rapidly. However, spam, being a nuisance in such communication, motivates the building of a robust filtering system with high classification accuracy and good sensitivity towards false positives. In that context, this paper aims to present a combined classifier technique using a committee selection mechanism where the main objective is to identify a set of classifiers so that their individual decisions can be combined by a committee selection procedure for accurate detection of spam.
Design/methodology/approach
For training and testing of the relevant machine learning classifiers, text mining approaches are used in this research. Three data sets (Enron, SpamAssassin and LingSpam) have been used to test the classifiers. Initially, pre-processing is performed to extract the features associated with the email files. In the next step, the extracted features are taken through a dimensionality reduction method where non-informative features are removed. Subsequently, an informative feature subset is selected using genetic feature search. Thereafter, the proposed classifiers are tested on those informative features and the results compared with those of other classifiers.
Findings
For building the proposed combined classifier, three different studies have been performed. The first study identifies the effect of boosting algorithms on two probabilistic classifiers: Bayesian and Naïve Bayes. In that study, AdaBoost has been found to be the best algorithm for performance boosting. The second study was on the effect of different Kernel functions on support vector machine (SVM) classifier, where SVM with normalized polynomial (NP) kernel was observed to be the best. The last study was on combining classifiers with committee selection where the committee members were the best classifiers identified by the first study i.e. Bayesian and Naïve bays with AdaBoost, and the committee president was selected from the second study i.e. SVM with NP kernel. Results show that combining of the identified classifiers to form a committee machine gives excellent performance accuracy with a low false positive rate.
Research limitations/implications
This research is focused on the classification of email spams written in English language. Only body (text) parts of the emails have been used. Image spam has not been included in this work. We have restricted our work to only emails messages. None of the other types of messages like short message service or multi-media messaging service were a part of this study.
Practical implications
This research proposes a method of dealing with the issues and challenges faced by internet service providers and organizations that use email. The proposed model provides not only better classification accuracy but also a low false positive rate.
Originality/value
The proposed combined classifier is a novel classifier designed for accurate classification of email spam.
Details
Keywords
Malgorzata Zieba, Susanne Durst and Christoph Hinteregger
The purpose of this study is to examine the effect of knowledge risk management (KRM) on organizational sustainability and the role of innovativeness and agility in this…
Abstract
Purpose
The purpose of this study is to examine the effect of knowledge risk management (KRM) on organizational sustainability and the role of innovativeness and agility in this relationship.
Design/methodology/approach
The study presents the results of a quantitative survey performed among 179 professionals from knowledge-intensive organizations dealing with knowledge risks and their management in organizations. Data included in this study are from both private and public organizations located all over the world and were collected through an online survey.
Findings
The results have confirmed that innovativeness and agility positively impact the sustainability of organizations; agility also positively impacts organizational innovativeness. The partial influence of KRM on both innovativeness and agility of organizations has been confirmed as well.
Research limitations/implications
The paper findings contribute in different ways to the ongoing debates in the literature. First, they contribute to the general study of risk management by showing empirically its role in organizations in the given case of organizational sustainability. Second, by emphasizing the risks related to knowledge, this study contributes to emerging efforts highlighting the particular role of knowledge for sustained organizational development. Third, by linking KRM and organizational sustainability, this paper contributes empirically to building knowledge in this very recent field of study. This understanding is also useful for future development in the field of KM as a whole.
Originality/value
The paper lays the ground for both a deeper and more nuanced understanding of knowledge risks in organizations in general and regarding sustainability in particular. As such, the paper offers new food for thought for researchers dealing with the topics of knowledge risks, knowledge management and organizational risk management in general.
Details