Shrawan Kumar Trivedi, Shubhamoy Dey and Anil Kumar
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an…
Abstract
Purpose
Sentiment analysis and opinion mining are emerging areas of research for analyzing Web data and capturing users’ sentiments. This research aims to present sentiment analysis of an Indian movie review corpus using natural language processing and various machine learning classifiers.
Design/methodology/approach
In this paper, a comparative study between three machine learning classifiers (Bayesian, naïve Bayesian and support vector machine [SVM]) was performed. All the classifiers were trained on the words/features of the corpus extracted, using five different feature selection algorithms (Chi-square, info-gain, gain ratio, one-R and relief-F [RF] attributes), and a comparative study was performed between them. The classifiers and feature selection approaches were evaluated using different metrics (F-value, false-positive [FP] rate and training time).
Findings
The results of this study show that, for the maximum number of features, the RF feature selection approach was found to be the best, with better F-values, a low FP rate and less time needed to train the classifiers, whereas for the least number of features, one-R was better than RF. When the evaluation was performed for machine learning classifiers, SVM was found to be superior, although the Bayesian classifier was comparable with SVM.
Originality/value
This is a novel research where Indian review data were collected and then a classification model for sentiment polarity (positive/negative) was constructed.
Details
Keywords
The strategic management literature emphasizes the concept of business intelligence (BI) as an essential competitive tool. Yet the sustainability of the firms’ competitive…
Abstract
The strategic management literature emphasizes the concept of business intelligence (BI) as an essential competitive tool. Yet the sustainability of the firms’ competitive advantage provided by BI capability is not well researched. To fill this gap, this study attempts to develop a model for successful BI deployment and empirically examines the association between BI deployment and sustainable competitive advantage. Taking the telecommunications industry in Malaysia as a case example, the research particularly focuses on the influencing perceptions held by telecommunications decision makers and executives on factors that impact successful BI deployment. The research further investigates the relationship between successful BI deployment and sustainable competitive advantage of the telecommunications organizations. Another important aim of this study is to determine the effect of moderating factors such as organization culture, business strategy, and use of BI tools on BI deployment and the sustainability of firm’s competitive advantage.
This research uses combination of resource-based theory and diffusion of innovation (DOI) theory to examine BI success and its relationship with firm’s sustainability. The research adopts the positivist paradigm and a two-phase sequential mixed method consisting of qualitative and quantitative approaches are employed. A tentative research model is developed first based on extensive literature review. The chapter presents a qualitative field study to fine tune the initial research model. Findings from the qualitative method are also used to develop measures and instruments for the next phase of quantitative method. The study includes a survey study with sample of business analysts and decision makers in telecommunications firms and is analyzed by partial least square-based structural equation modeling.
The findings reveal that some internal resources of the organizations such as BI governance and the perceptions of BI’s characteristics influence the successful deployment of BI. Organizations that practice good BI governance with strong moral and financial support from upper management have an opportunity to realize the dream of having successful BI initiatives in place. The scope of BI governance includes providing sufficient support and commitment in BI funding and implementation, laying out proper BI infrastructure and staffing and establishing a corporate-wide policy and procedures regarding BI. The perceptions about the characteristics of BI such as its relative advantage, complexity, compatibility, and observability are also significant in ensuring BI success. The most important results of this study indicated that with BI successfully deployed, executives would use the knowledge provided for their necessary actions in sustaining the organizations’ competitive advantage in terms of economics, social, and environmental issues.
This study contributes significantly to the existing literature that will assist future BI researchers especially in achieving sustainable competitive advantage. In particular, the model will help practitioners to consider the resources that they are likely to consider when deploying BI. Finally, the applications of this study can be extended through further adaptation in other industries and various geographic contexts.
Details
Keywords
Data science lacks a distinctive identity and a theory-informed approach, both for its own sake and to properly be applied conjointly to the social sciences. This paper’s purposes…
Abstract
Purpose
Data science lacks a distinctive identity and a theory-informed approach, both for its own sake and to properly be applied conjointly to the social sciences. This paper’s purposes are twofold: to provide (1) data science an illustration of theory adoption, able to address explanation and support prediction/prescription capacities and (2) a rationale for identification of the key phenomena and properties of data science so that the data speak through a contextual understanding of reality, broader than has been usual.
Design/methodology/approach
A literature review and a derived conceptual research model for a push–pull approach (adapted for a data science study in the management field) are presented. A real location–allocation problem is solved through a specific algorithm and explained in the light of the adapted push–pull theory, serving as an instance for a data science theory-informed application in the management field.
Findings
This study advances knowledge on the definition of data science key phenomena as not just pure “data”, but interrelated data and datasets properties, as well as on the specific adaptation of the push-pull theory through its definition, dimensionality and interaction model, also illustrating how to apply the theory in a data science theory-informed research. The proposed model contributes to the theoretical strengthening of data science, still an incipient area, and the solution of the location-allocation problem suggests the applicability of the proposed approach to broad data science problems, alleviating the criticism on the lack of explanation and the focus on pattern recognition in data science practice and research.
Research limitations/implications
The proposed algorithm requires the previous definition of a perimeter of interest. This aspect should be characterised as an antecedent to the model, which is a strong assumption. As for prescription, in this specific case, one has to take complementary actions, since theory, model and algorithm are not detached from in loco visits, market research or interviews with potential stakeholders.
Practical implications
This study offers a conceptual model for practical location–allocation problem analyses, based on the push–pull theoretical components. So, it suggests a proper definition for each component (the object, the perspective, the forces, its degrees and the nature of the movement). The proposed model has also an algorithm for computational implementation, which visually describes and explains components interaction, allowing further simulation (estimated forces degrees) for prediction.
Originality/value
First, this study identifies an overlap of push–pull theoretical approaches, which suggests theory adoption eventually as mere common sense, weakening further theoretical development. Second, this study elaborates a definition for the push–pull theory, a dimensionality and a relationship between its components. Third, a typical location–allocation problem is analysed in the light of the refactored theory, showing its adequacy for that class of problems. And fourth, this study suggests that the essence of a data science should be the study of contextual relationships among data, and that the context should be provided by the spatial, temporal, political, economic and social analytical interests.
Details
Keywords
IN the two years since the last Farnborough Air Show was held by the Society of British Aerospace Companies the aircraft industry has achieved an almost complete metamorphosis…
Abstract
IN the two years since the last Farnborough Air Show was held by the Society of British Aerospace Companies the aircraft industry has achieved an almost complete metamorphosis from the body blows in the form of major programme cancellations that almost felled it in 1965 to the very healthy position that it holds today.
Mr S. S. Hall has been appointed General Divisional Manager of the Aerospace Division of the Plessey Dynamics Group, following the division of the Plessey Dynamics Group…
Abstract
Mr S. S. Hall has been appointed General Divisional Manager of the Aerospace Division of the Plessey Dynamics Group, following the division of the Plessey Dynamics Group activities into separate aerospace and industrial businesses.
Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the…
Abstract
Breast cancer (BC) is one of the leading cancer in the world, BC risk has been there for women of the middle age also, it is the malignant tumor. However, identifying BC in the early stage will save most of the women’s life. As there is an advancement in the technology research used Machine Learning (ML) algorithm Random Forest for ranking the feature, Support Vector Machine (SVM), and Naïve Bayes (NB) supervised classifiers for selection of best optimized features and prediction of BC accuracy. The estimation of prediction accuracy has been done by using the dataset Wisconsin Breast Cancer Data from University of California Irvine (UCI) ML repository. To perform all these operation, Anaconda one of the open source distribution of Python has been used. The proposed work resulted in extemporize improvement in the NB and SVM classifier accuracy. The performance evaluation of the proposed model is estimated by using classification accuracy, confusion matrix, mean, standard deviation, variance, and root mean-squared error.
The experimental results shows that 70-30 data split will result in best accuracy. SVM acts as a feature optimizer of 12 best features with the result of 97.66% accuracy and improvement of 1.17% after feature reduction. NB results with feature optimizer 17 of best features with the result of 96.49% accuracy and improvement of 1.17% after feature reduction.
The study shows that proposal model works very effectively as compare to the existing models with respect to accuracy measures.
Details
Keywords
Sirje Virkus and Emmanouel Garoufallou
Data science is a relatively new field which has gained considerable attention in recent years. This new field requires a wide range of knowledge and skills from different…
Abstract
Purpose
Data science is a relatively new field which has gained considerable attention in recent years. This new field requires a wide range of knowledge and skills from different disciplines including mathematics and statistics, computer science and information science. The purpose of this paper is to present the results of the study that explored the field of data science from the library and information science (LIS) perspective.
Design/methodology/approach
Analysis of research publications on data science was made on the basis of papers published in the Web of Science database. The following research questions were proposed: What are the main tendencies in publication years, document types, countries of origin, source titles, authors of publications, affiliations of the article authors and the most cited articles related to data science in the field of LIS? What are the main themes discussed in the publications from the LIS perspective?
Findings
The highest contribution to data science comes from the computer science research community. The contribution of information science and library science community is quite small. However, there has been continuous increase in articles from the year 2015. The main document types are journal articles, followed by conference proceedings and editorial material. The top three journals that publish data science papers from the LIS perspective are the Journal of the American Medical Informatics Association, the International Journal of Information Management and the Journal of the Association for Information Science and Technology. The top five countries publishing are USA, China, England, Australia and India. The most cited article has got 112 citations. The analysis revealed that the data science field is quite interdisciplinary by nature. In addition to the field of LIS the papers belonged to several other research areas. The reviewed articles belonged to the six broad categories: data science education and training; knowledge and skills of the data professional; the role of libraries and librarians in the data science movement; tools, techniques and applications of data science; data science from the knowledge management perspective; and data science from the perspective of health sciences.
Research limitations/implications
The limitations of this research are that this study only analyzed research papers in the Web of Science database and therefore only covers a certain amount of scientific papers published in the field of LIS. In addition, only publications with the term “data science” in the topic area of the Web of Science database were analyzed. Therefore, several relevant studies are not discussed in this paper that are not reflected in the Web of Science database or were related to other keywords such as “e-science,” “e-research,” “data service,” “data curation” or “research data management.”
Originality/value
The field of data science has not been explored using bibliographic analysis of publications from the perspective of the LIS. This paper helps to better understand the field of data science and the perspectives for information professionals.
Details
Keywords
Adrian Gepp, Martina K. Linnenluecke, Terrence J. O’Neill and Tom Smith
This paper analyses the use of big data techniques in auditing, and finds that the practice is not as widespread as it is in other related fields. We first introduce contemporary…
Abstract
This paper analyses the use of big data techniques in auditing, and finds that the practice is not as widespread as it is in other related fields. We first introduce contemporary big data techniques to promote understanding of their potential application. Next, we review existing research on big data in accounting and finance. In addition to auditing, our analysis shows that existing research extends across three other genealogies: financial distress modelling, financial fraud modelling, and stock market prediction and quantitative modelling. Auditing is lagging behind the other research streams in the use of valuable big data techniques. A possible explanation is that auditors are reluctant to use techniques that are far ahead of those adopted by their clients, but we refute this argument. We call for more research and a greater alignment to practice. We also outline future opportunities for auditing in the context of real-time information and in collaborative platforms and peer-to-peer marketplaces.
Details
Keywords
E.G. Sieverts, M. Hofstede, G. Lobbestael, B. Oude Groeniger, F. Provost and P. Šikovà
In this article, the fifth in a series on microcomputer software for information storage and retrieval, test results of seven programs are presented and various properties and…
Abstract
In this article, the fifth in a series on microcomputer software for information storage and retrieval, test results of seven programs are presented and various properties and qualities of these programs are discussed. In this instalment of the series we discuss programs for information storage and retrieval which are primarily characterised by the properties of personal information managers (PIMs), hypertext programs, or best match and ranking retrieval systems. The programs reviewed in this issue are the personal information managers 3by5/RediReference, askSam, Dayflo Tracker, and Ize; Personal Librarian uses best match and ranking; the hypertext programs are Folio Views and the HyperKRS/HyperCard combination (askSam, Ize and Personal Librarian boast hypertext features as well). HyperKRS/HyperCard is only available for the Apple Macintosh. All other programs run under MS‐DOS; versions of Personal Librarian also run under Windows and some other systems. For each of the seven programs about 100 facts and test results are tabulated. The programs are also discussed individually.
The former Minister of Aviation, Mr Roy Jenkins, announced the appointment of Mr George Chetwynd as a part‐time member of the Board of the British Overseas Airways Corporation for…
Abstract
The former Minister of Aviation, Mr Roy Jenkins, announced the appointment of Mr George Chetwynd as a part‐time member of the Board of the British Overseas Airways Corporation for a period of three years. Mr Chetwynd is Director of the North East Development Council and a member of the Economic Planning Council for the Northern Region.