Search results | Emerald Insight

Open Access

Article

Publication date: 14 July 2022

Predicting sentiment and rating of tourist reviews using machine learning

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism…

HTML

PDF (787 KB)

Downloads

8957

Abstract

Purpose

As the tourism industry becomes more vital for the success of many economies around the world, the importance of technology in tourism grows daily. Alongside increasing tourism importance and popularity, the amount of significant data grows, too. On daily basis, millions of people write their opinions, suggestions and views about accommodation, services, and much more on various websites. Well-processed and filtered data can provide a lot of useful information that can be used for making tourists' experiences much better and help us decide when selecting a hotel or a restaurant. Thus, the purpose of this study is to explore machine and deep learning models for predicting sentiment and rating from tourist reviews.

Design/methodology/approach

This paper used machine learning models such as Naïve Bayes, support vector machines (SVM), convolutional neural network (CNN), long short-term memory (LSTM) and bidirectional long short-term memory (BiLSTM) for extracting sentiment and ratings from tourist reviews. These models were trained to classify reviews into positive, negative, or neutral sentiment, and into one to five grades or stars. Data used for training the models were gathered from TripAdvisor, the world's largest travel platform. The models based on multinomial Naïve Bayes (MNB) and SVM were trained using the term frequency-inverse document frequency (TF-IDF) for word representations while deep learning models were trained using global vectors (GloVe) for word representation. The results from testing these models are presented, compared and discussed.

Findings

The performance of machine and learning models achieved high accuracy in predicting positive, negative, or neutral sentiments and ratings from tourist reviews. The optimal model architecture for both classification tasks was a deep learning model based on BiLSTM. The study’s results confirmed that deep learning models are more efficient and accurate than machine learning algorithms.

Practical implications

The proposed models allow for forecasting the number of tourist arrivals and expenditure, gaining insights into the tourists' profiles, improving overall customer experience, and upgrading marketing strategies. Different service sectors can use the implemented models to get insights into customer satisfaction with the products and services as well as to predict the opinions given a particular context.

Originality/value

This study developed and compared different machine learning models for classifying customer reviews as positive, negative, or neutral, as well as predicting ratings with one to five stars based on a TripAdvisor hotel reviews dataset that contains 20,491 unique hotel reviews.

Details

Journal of Hospitality and Tourism Insights, vol. 6 no. 3

Type: Research Article

DOI:

ISSN: 2514-9792

Keywords

Open Access

Article

Publication date: 16 July 2024

Using predictive methods to assess observation and measure importance

William M. Briggs

This study aims to find suitable replacements for hypothesis testing and variable-importance measures.

HTML

PDF (298 KB)

Downloads

128

Abstract

Purpose

This study aims to find suitable replacements for hypothesis testing and variable-importance measures.

Design/methodology/approach

This study explores under-used predictive methods.

Findings

The study's hypothesis testing can and should be replaced by predictive methods. It is the only way to know if models have any value.

Originality/value

This is the first time predictive methods have been used to demonstrate measure and variable importance. Hypothesis testing can never prove the goodness of models. Only predictive methods can.

Details

Asian Journal of Economics and Banking, vol. 8 no. 3

Type: Research Article

DOI:

ISSN: 2615-9821

Keywords

Open Access

Article

Publication date: 3 July 2017

On predicting academic performance with process mining in learning analytics

Rahila Umer, Teo Susnjak, Anuradha Mathrani and Suriadi Suriadi

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses…

HTML

PDF (580 KB)

Downloads

6592

Abstract

Purpose

The purpose of this paper is to propose a process mining approach to help in making early predictions to improve students’ learning experience in massive open online courses (MOOCs). It investigates the impact of various machine learning techniques in combination with process mining features to measure effectiveness of these techniques.

Design/methodology/approach

Student’s data (e.g. assessment grades, demographic information) and weekly interaction data based on event logs (e.g. video lecture interaction, solution submission time, time spent weekly) have guided this design. This study evaluates four machine learning classification techniques used in the literature (logistic regression (LR), Naïve Bayes (NB), random forest (RF) and K-nearest neighbor) to monitor weekly progression of students’ performance and to predict their overall performance outcome. Two data sets – one, with traditional features and second, with features obtained from process conformance testing – have been used.

Findings

The results show that techniques used in the study are able to make predictions on the performance of students. Overall accuracy (F1-score, area under curve) of machine learning techniques can be improved by integrating process mining features with standard features. Specifically, the use of LR and NB classifiers outperforms other techniques in a statistical significant way.

Practical implications

Although MOOCs provide a platform for learning in highly scalable and flexible manner, they are prone to early dropout and low completion rate. This study outlines a data-driven approach to improve students’ learning experience and decrease the dropout rate.

Social implications

Early predictions based on individual’s participation can help educators provide support to students who are struggling in the course.

Originality/value

This study outlines the innovative use of process mining techniques in education data mining to help educators gather data-driven insight on student performances in the enrolled courses.

Details

Journal of Research in Innovative Teaching & Learning, vol. 10 no. 2

Type: Research Article

DOI:

ISSN: 2397-7604

Keywords

Open Access

Article

Publication date: 22 June 2023

A partial solution for the replication crisis in economics

William M. Briggs

Important research once thought unassailable has failed to replicate. Not just in economics, but in all science. The problem is therefore not in dispute nor are some of the…

HTML

PDF (134 KB)

Downloads

1799

Abstract

Purpose

Important research once thought unassailable has failed to replicate. Not just in economics, but in all science. The problem is therefore not in dispute nor are some of the causes, like low power, selective reporting, the file drawer effect, publicly unavailable data and so forth. Some partially worthy solutions have already been offered, like pre-registering hypotheses and data analysis plans.

Design/methodology/approach

This is a review paper on the replication crisis, which is by now very well known.

Findings

This study offers another partial solution, which is to remind researchers that correlation does not logically imply causation. The effect of this reminder is to eschew “significance” testing, whether in frequentist or Bayesian form (like Bayes factors) and to report models in predictive form, so that anybody can check the veracity of any model. In effect, all papers could undergo replication testing.

Originality/value

The author argues that this, or any solution, will never eliminate all errors.

Details

Asian Journal of Economics and Banking, vol. 7 no. 2

Type: Research Article

DOI:

ISSN: 2615-9821

Keywords

Open Access

Article

Publication date: 12 June 2017

Using a naive Bayesian classifier methodology for loan risk assessment: Evidence from a Tunisian commercial bank

Aida Krichene

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To…

HTML

PDF (1.1 MB)

Downloads

7302

Abstract

Purpose

Loan default risk or credit risk evaluation is important to financial institutions which provide loans to businesses and individuals. Loans carry the risk of being defaulted. To understand the risk levels of credit users (corporations and individuals), credit providers (bankers) normally collect vast amounts of information on borrowers. Statistical predictive analytic techniques can be used to analyse or to determine the risk levels involved in loans. This paper aims to address the question of default prediction of short-term loans for a Tunisian commercial bank.

Design/methodology/approach

The authors have used a database of 924 files of credits granted to industrial Tunisian companies by a commercial bank in the years 2003, 2004, 2005 and 2006. The naive Bayesian classifier algorithm was used, and the results show that the good classification rate is of the order of 63.85 per cent. The default probability is explained by the variables measuring working capital, leverage, solvency, profitability and cash flow indicators.

Findings

The results of the validation test show that the good classification rate is of the order of 58.66 per cent; nevertheless, the error types I and II remain relatively high at 42.42 and 40.47 per cent, respectively. A receiver operating characteristic curve is plotted to evaluate the performance of the model. The result shows that the area under the curve criterion is of the order of 69 per cent.

Originality/value

The paper highlights the fact that the Tunisian central bank obliged all commercial banks to conduct a survey study to collect qualitative data for better credit notation of the borrowers.

Propósito

El riesgo de incumplimiento de préstamos o la evaluación del riesgo de crédito es importante para las instituciones financieras que otorgan préstamos a empresas e individuos. Existe el riesgo de que el pago de préstamos no se cumpla. Para entender los niveles de riesgo de los usuarios de crédito (corporaciones e individuos), los proveedores de crédito (banqueros) normalmente recogen gran cantidad de información sobre los prestatarios. Las técnicas analíticas predictivas estadísticas pueden utilizarse para analizar o determinar los niveles de riesgo involucrados en los préstamos. En este artículo abordamos la cuestión de la predicción por defecto de los préstamos a corto plazo para un banco comercial tunecino.

Diseño/metodología/enfoque

Utilizamos una base de datos de 924 archivos de créditos concedidos a empresas industriales tunecinas por un banco comercial en 2003, 2004, 2005 y 2006. El algoritmo bayesiano de clasificadores se llevó a cabo y los resultados muestran que la tasa de clasificación buena es del orden del 63.85%. La probabilidad de incumplimiento se explica por las variables que miden el capital de trabajo, el apalancamiento, la solvencia, la rentabilidad y los indicadores de flujo de efectivo.

Hallazgos

Los resultados de la prueba de validación muestran que la buena tasa de clasificación es del orden de 58.66% ; sin embargo, los errores tipo I y II permanecen relativamente altos, siendo de 42.42% y 40.47%, respectivamente. Se traza una curva ROC para evaluar el rendimiento del modelo. El resultado muestra que el criterio de área bajo curva (AUC, por sus siglas en inglés) es del orden del 69%.

Originalidad/valor

El documento destaca el hecho de que el Banco Central tunecino obligó a todas las entidades del sector llevar a cabo un estudio de encuesta para recopilar datos cualitativos para un mejor registro de crédito de los prestatarios.

Palabras clave

Curva ROC, Evaluación de riesgos, Riesgo de incumplimiento, Sector bancario, Algoritmo clasificador bayesiano.

Tipo de artículo

Artículo de investigación

Details

Journal of Economics, Finance and Administrative Science, vol. 22 no. 42

Type: Research Article

DOI:

ISSN: 2077-1886

Keywords

Open Access

Article

Publication date: 15 September 2017

Application of Bayesian networks in analysing tanker shipping bankruptcy risks

Grace W.Y. Wang, Zhisen Yang, Di Zhang, Anqiang Huang and Zaili Yang

This study aims to develop an assessment methodology using a Bayesian network (BN) to predict the failure probability of oil tanker shipping firms.

HTML

PDF (692 KB)

Downloads

2468

Abstract

Purpose

This study aims to develop an assessment methodology using a Bayesian network (BN) to predict the failure probability of oil tanker shipping firms.

Design/methodology/approach

This paper proposes a bankruptcy prediction model by applying the hybrid of logistic regression and Bayesian probabilistic networks.

Findings

The proposed model shows its potential of contributing to a powerful tool to predict financial bankruptcy of shipping operators, and provides important insights to the maritime community as to what performance measures should be taken to ensure the shipping companies’ financial soundness under dynamic environments.

Research limitations/implications

The model and its associated variables can be expanded to include more factors for an in-depth analysis in future when the detailed information at firm level becomes available.

Practical implications

The results of this study can be implemented to oil tanker shipping firms as a prediction tool for bankruptcy rate.

Originality/value

Incorporating quantitative statistical measurement, the application of BN in financial risk management provides advantages to develop a powerful early warning system in shipping, which has unique characteristics such as capital intensive and mobile assets, possibly leading to catastrophic consequences.

Details

Maritime Business Review, vol. 2 no. 3

Type: Research Article

DOI:

ISSN: 2397-3757

Keywords

Open Access

Article

Publication date: 28 July 2020

Sport analytics for cricket game results using machine learning: An experimental study

Kumash Kapadia, Hussein Abdel-Jaber, Fadi Thabtah and Wael Hadi

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the…

HTML

PDF (470 KB)

Downloads

14381

Abstract

Indian Premier League (IPL) is one of the more popular cricket world tournaments, and its financial is increasing each season, its viewership has increased markedly and the betting market for IPL is growing significantly every year. With cricket being a very dynamic game, bettors and bookies are incentivised to bet on the match results because it is a game that changes ball-by-ball. This paper investigates machine learning technology to deal with the problem of predicting cricket match results based on historical match data of the IPL. Influential features of the dataset have been identified using filter-based methods including Correlation-based Feature Selection, Information Gain (IG), ReliefF and Wrapper. More importantly, machine learning techniques including Naïve Bayes, Random Forest, K-Nearest Neighbour (KNN) and Model Trees (classification via regression) have been adopted to generate predictive models from distinctive feature sets derived by the filter-based methods. Two featured subsets were formulated, one based on home team advantage and other based on Toss decision. Selected machine learning techniques were applied on both feature sets to determine a predictive model. Experimental tests show that tree-based models particularly Random Forest performed better in terms of accuracy, precision and recall metrics when compared to probabilistic and statistical models. However, on the Toss featured subset, none of the considered machine learning algorithms performed well in producing accurate predictive models.

Details

Applied Computing and Informatics, vol. 18 no. 3/4

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 9 May 2022

Email classification analysis using machine learning techniques

Khalid Iqbal and Muhammad Shehrayar Khan

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

HTML

PDF (773 KB)

Downloads

11986

Abstract

Purpose

In this digital era, email is the most pervasive form of communication between people. Many users become a victim of spam emails and their data have been exposed.

Design/methodology/approach

Researchers contribute to solving this problem by a focus on advanced machine learning algorithms and improved models for detecting spam emails but there is still a gap in features. To achieve good results, features also play an important role. To evaluate the performance of applied classifiers, 10-fold cross-validation is used.

Findings

The results approve that the spam emails are correctly classified with the accuracy of 98.00% for the Support Vector Machine and 98.06% for the Artificial Neural Network as compared to other applied machine learning classifiers.

Originality/value

In this paper, Point-Biserial correlation is applied to each feature concerning the class label of the University of California Irvine (UCI) spambase email dataset to select the best features. Extensive experiments are conducted on selected features by training the different classifiers.

Details

Applied Computing and Informatics, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2634-1964

Keywords

Open Access

Article

Publication date: 10 June 2024

A comparative analysis of consumer credit risk models in Peer-to-Peer Lending

Lua Thi Trinh

The purpose of this paper is to compare nine different models to evaluate consumer credit risk, which are the following: Logistic Regression (LR), Naive Bayes (NB), Linear…

HTML

PDF (989 KB)

Downloads

1323

Abstract

Purpose

The purpose of this paper is to compare nine different models to evaluate consumer credit risk, which are the following: Logistic Regression (LR), Naive Bayes (NB), Linear Discriminant Analysis (LDA), k-Nearest Neighbor (k-NN), Support Vector Machine (SVM), Classification and Regression Tree (CART), Artificial Neural Network (ANN), Random Forest (RF) and Gradient Boosting Decision Tree (GBDT) in Peer-to-Peer (P2P) Lending.

Design/methodology/approach

The author uses data from P2P Lending Club (LC) to assess the efficiency of a variety of classification models across different economic scenarios and to compare the ranking results of credit risk models in P2P lending through three families of evaluation metrics.

Findings

The results from this research indicate that the risk classification models in the 2013–2019 economic period show greater measurement efficiency than for the difficult 2007–2012 period. Besides, the results of ranking models for predicting default risk show that GBDT is the best model for most of the metrics or metric families included in the study. The findings of this study also support the results of Tsai et al. (2014) and Teplý and Polena (2019) that LR, ANN and LDA models classify loan applications quite stably and accurately, while CART, k-NN and NB show the worst performance when predicting borrower default risk on P2P loan data.

Originality/value

The main contributions of the research to the empirical literature review include: comparing nine prediction models of consumer loan application risk through statistical and machine learning algorithms evaluated by the performance measures according to three separate families of metrics (threshold, ranking and probabilistic metrics) that are consistent with the existing data characteristics of the LC lending platform through two periods of reviewing the current economic situation and platform development.

Details

Journal of Economics, Finance and Administrative Science, vol. 29 no. 58

Type: Research Article

DOI:

ISSN: 2077-1886

Keywords

Open Access

Article

Publication date: 13 November 2019

Strong consistency of a kernel-based rule for spatially dependent data

Ahmad Younso, Ziad Kanaya and Nour Azhari

We consider the kernel-based classifier proposed by Younso (2017). This nonparametric classifier allows for the classification of missing spatially dependent data. The weak…

HTML

PDF (1.8 MB)

Downloads

303

Abstract

We consider the kernel-based classifier proposed by Younso (2017). This nonparametric classifier allows for the classification of missing spatially dependent data. The weak consistency of the classifier has been studied by Younso (2017). The purpose of this paper is to establish strong consistency of this classifier under mild conditions. The classifier is discussed in a multi-class case. The results are illustrated with simulation studies and real applications.

Details

Arab Journal of Mathematical Sciences, vol. 26 no. 1/2

Type: Research Article

DOI:

ISSN: 1319-5166

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Social implications

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Propósito

Diseño/metodología/enfoque

Hallazgos

Originalidad/valor

Palabras clave

Tipo de artículo

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Research limitations/implications

Practical implications

Originality/value

Details

Keywords

Abstract

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Abstract

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions