Impact on recommendation performance of online review helpfulness and consistency

Jaeseung Park, Xinzhe Li, Qinglong Li, Jaekyeong Kim

Data Technologies and Applications

ISSN: 2514-9288

Article publication date: 8 September 2022

Issue publication date: 25 April 2023

Downloads

915

pdf (609 KB)

Abstract

Purpose

The existing collaborative filtering algorithm may select an insufficiently representative customer as the neighbor of a target customer, which means that the performance in providing recommendations is not sufficiently accurate. This study aims to investigate the impact on recommendation performance of selecting influential and representative customers.

Design/methodology/approach

Some studies have shown that review helpfulness and consistency significantly affect purchase decision-making. Thus, this study focuses on customers who have written helpful and consistent reviews to select influential and representative neighbors. To achieve the purpose of this study, the authors apply a text-mining approach to analyze review helpfulness and consistency. In addition, they evaluate the performance of the proposed methodology using several real-world Amazon review data sets for experimental utility and reliability.

Findings

This study is the first to propose a methodology to investigate the effect of review consistency and helpfulness on recommendation performance. The experimental results confirmed that the recommendation performance was excellent when a neighbor was selected who wrote consistent or helpful reviews more than when neighbors were selected for all customers.

Originality/value

This study investigates the effect of review consistency and helpfulness on recommendation performance. Online review can enhance recommendation performance because it reflects the purchasing behavior of customers who consider reviews when purchasing items. The experimental results indicate that review helpfulness and consistency can enhance the performance of personalized recommendation services, increase customer satisfaction and increase confidence in a company.

Keywords

Citation

Park, J., Li, X., Li, Q. and Kim, J. (2023), "Impact on recommendation performance of online review helpfulness and consistency", Data Technologies and Applications, Vol. 57 No. 2, pp. 199-221. https://doi.org/10.1108/DTA-04-2022-0172

Publisher

:

Emerald Publishing Limited

1. Introduction

The online e-commerce market is growing explosively with recent developments in information and communication technology and the popularization of smartphones. Accordingly, the size of online shopping transactions is growing steadily. Thus, new items and services are released regularly and accessibility and convenience for customers have improved. However, there is an information overload problem in that the cost of information search increases for customers making purchase decisions. In other words, selecting an item suited to customer preference from among many items takes a long time and is challenging (Park et al., 2012; Su and Khoshgoftaar, 2009; Khodabandehlou et al., 2020). Recently, the demand for online shopping has soared; however, customers face limitations in checking and experiencing their preferred items or services, which highlights the problem of information overload. Furthermore, many companies have difficulty generating profits due to reduced opportunities to promote and display their items or services to customers who prefer them and are likely to purchase them. Accordingly, a personalized recommendation service is essential for providing personalized items or services to customers. For example, global e-commerce companies such as Amazon, Netflix and Google provide personalized recommendation services to strengthen their sustainable corporate competitiveness (Bennett and Lanning, 2007; Das et al., 2007; Linden et al., 2003). Amazon generates 35 per cent of its corporate sales through items or services that are provided by personalized recommendation services. Netflix delivers 75 per cent of all videos that are viewed by customers through personalized recommendation services. As such, personalized recommendation services can reduce the cost of searching for information and positively impact corporate revenue generation (Lee and Hosanagar, 2019).

The collaborative filtering (CF) algorithm is the most widely used of many recommender systems (Kim et al., 2012, 2010b; Khodabandehlou et al., 2020). CF algorithms are implemented based on the following assumption: customers with similar preferences for certain items exhibit similar preferences for other items. Based on this assumption, the CF algorithm predicts preferences based on the similarity between customers. The CF algorithm measures the similarity between the target customer and other customers to select a customer with high similarity as a neighbor to the target customer, and it predicts the preference of the target customer according to the neighbor's preference. The core idea of the CF algorithm is to select a customer group that indicates preferences similar to those of the target customer. Here, similar customers are usually referred to as nearest neighbors (Ricci et al., 2011). Nevertheless, existing CF algorithms may select less representative customers as neighbors of their target customers. This means that the recommendation performance is not accurate enough when providing recommendations. With the development of the Internet and smart devices, unstructured data related to customers and transactions are continuously increasing. Such a rapid increase in data helps to improve the performance of the recommender system, but on the other hand, it also decreases the performance of the recommender system due to increased noise (Kim et al., 2012). Therefore, in order to reduce computing cost and provide effective recommendation service, a strategy to improve the performance of the recommendation algorithm by filtering only influential and meaningful data is required along with research to develop a new recommendation algorithm to increase the recommendation performance (Dong-Hui and Guang 2013). However, there have been few studies on how changes in input data affect recommendation system performance in the recommendation system research so far.

Therefore, it is essential to investigate the impact of selecting influential and representative customers on recommendation performance. Recently, some studies have utilized review-related information as an additional feature to provide personalized recommendation services; online reviews contain specific and reliable information that effectively provides recommendations (Liu et al., 2013; Li et al., 2021). Many previous studies have argued that such online reviews influence customers' purchase decision-making processes (Ham et al., 2019; Hlee et al., 2019; Mitra and Jenamani, 2021). Aghakhani et al. (2021) argue that when consumers process online review information, they simultaneously process review texts and their attendant star ratings. In other words, consistency between a review text and its attendant star rating affects information decision-making. Another study argues that helpful reviews have an essential influence on purchase decision-making (Cheung et al., 2009; Hlee et al., 2019). Based on previous studies, the authors selected neighbor customers based on review consistency and helpfulness to address the problem of insufficient representative neighbor customers. Review consistency indicates the consistency between a review text and its corresponding numerical rating. Review helpfulness indicates the proportion of helpful votes to total votes on questions asking whether the review is helpful or not. The authors also investigated the effect of review sentiment balance on recommendation performance in helpful and consistent reviews. The authors evaluated the performance of the proposed methodology using several real-world Amazon review data sets for experimental effectiveness and reliability. From the experimental results, they confirmed that the recommendation performance was better when a neighbor was selected who wrote consistent or helpful reviews than when neighbors were selected for all customers. The contributions of this study are summarized as follows:

This study investigated the effects of review consistency and helpfulness on recommendation performance. Online review can enhance recommendation performance because it reflects the purchasing behavior of customers who consider reviews when purchasing items.
This study applied a text-mining tool to perform sentiment analysis of the review text. Review consistency was calculated as the consistency between the review text and the corresponding numerical rating. Review helpfulness was calculated by dividing helpful votes by total votes.
This study conducted experiments using several real-world Amazon review data sets. The results indicate that reflecting review helpfulness and consistency on recommender systems can enhance the performance of personalized recommendation services, increase customer satisfaction and increase confidence in a company.

The remainder of this paper is structured as follows. Section 2 describes CF- and review-based recommender systems. Section 3 describes the proposed methodology. Section 4 describes the data sets, evaluation criteria and experimental results. Finally, Section 5 summarizes the research and describes future studies.

2. Related work

2.1 Collaborative filtering

Goldberg et al. (1992) first proposed the CF algorithm and demonstrated the best performance among the recommendation algorithms to date. The CF algorithm predicts preferences based on similarities between customers or items on the basic assumption that customers with similar preferences for a particular item will show similar preferences for other items (Herlocker et al., 2004; Kim et al., 2012; Li et al., 2021). Therefore, it is necessary to calculate the similarity between the target customer and neighboring customers when using the CF algorithm. However, as the number of neighboring customers increases, the computational cost increases. Previous studies have utilized K-nearest neighbor (KNN) algorithms or clustering techniques to classify groups having similar preferences to address such issues (Park et al., 2012; Acilar and Arslan, 2009; Dehdarirad et al., 2020). Furthermore, the CF algorithm can be divided into user-based CF and item-based CF according to whether the group is classified based on the item purchased by the customer or based on the customer who purchased the item (Herlocker et al., 2004; Bobadilla et al., 2013; Kim et al., 2009, 2012). Figure 1 shows the basic concept of the user-based CF algorithm. When the target customer is selected, the similarity between the recommended target customer and the neighboring customers is measured based on past purchase history. In other words, similar purchase patterns indicate high similarity, so the algorithm selects the customer with the highest similarity as the neighbor when it measures the similarity with all neighboring customers. For example, in Figure 1, the customer with the most similar preference to the recommended customer is Alice, who purchased Item 1, Item 3 and Item 5. The final phase of the CF algorithm is the recommendation list phase. Here, Item 8 is the item that Alice has purchased but that the target customer has not yet purchased. Therefore, Item 8 was recommended to the target customer. Item-based CF selects similar items based on the target item and recommends them to customers who have not purchased the target items. This study predicts preferences based on user-based CF algorithms.

Table I summarizes studies using CF algorithms. These studies proposed a methodology for enhancing recommendation performance by using purchase history data. However, they provided recommendation services to target customers by selecting neighbors using quantitative data such as ratings, click status and purchase status. Selecting an influential and representative customer as a neighbor that is similar to the target customer is challenging. In addition, the computational cost increases, and the computational speed decreases when the customer's transaction data increases. This study investigates the impact on recommendation performance of selecting influential and representative customers when building CF algorithms. To investigate the impact on recommendation performance, the authors selected neighbor customers who wrote consistent and helpful reviews and compared them with all customers.

2.2 Online review-based recommender system

Many previous studies have argued that online reviews significantly impact customers' purchase decision-making process (Ham et al., 2019; Hlee et al., 2019; Mitra and Jenamani, 2021). Thus, some studies have utilized review-related information as an additional feature to propose a personalized recommendation methodology (Liu et al., 2013; Li et al., 2021). Leung et al. (2006) used sentiment analysis to develop a model to estimate review sentiment. The calculated sentiment score was then reflected in the recommendation algorithm. This study was the first to apply customer reviews to a recommendation system, and it received considerable attention. This study showed that higher recommendation performance can be achieved when qualitative and quantitative data are simultaneously considered. García-Cumbreras et al. (2013) used sentiment analysis for customer reviews and classified customers as intuitionists and pessimists. The result showed better recommendation performance when customers were classified as intuitionists and pessimists than when they were classified using traditional methodology. However, online review contents were not reflected in the recommender systems, and there is a chance that the loss of information could be reduced. Zhang et al. (2014) proposed a user review–enhanced CF recommendation methodology that reflected the review. It used the reviews of approximately 32 movies from the movie review ontology of Zhou and Chaovalit (2008). The features of the reviews were derived using feature frequency-inverse review frequency, which is similar to the term frequency-inverse document frequency. The user's sentiment polarity was reflected in each review's features, and the similarity between users was then calculated. The CF algorithms were proposed based on them. The results showed that the proposed methodology applied to Yahoo Movies data improved the prediction accuracy from 6.18 to 8.24 per cent compared with traditional CF methods. The prediction performance results were excellent; however, the content of the reviews was disregarded. Jeon and Ahn (2015) considered online reviews and ratings to enhance recommendation performance. To confirm the effectiveness of the proposed methodology, review data and quantitative ratings based on text mining were used. The results show that the CF that reflected reviews was better than traditional CF in terms of performance. Hyun et al. (2019) proposed a novel recommendation methodology that combined online reviews and star ratings. They also created a sentiment dictionary using movie review data to calculate sentiment score and then to calculate new star ratings generated by combining the sentiment score. The researchers used the new calculated rating to create a recommendation algorithm with the CF technique. The results show that the proposed methodology exhibits excellent performance compared with the traditional CF methodology.

Previous studies have shown new possibilities for enhancing recommendation performance by combining review texts and recommendation algorithms. However, the existing methodology does not adequately consider the expressiveness of review information or the influence and representativeness of neighboring customers. This means that the recommendation performance is not sufficiently accurate. Therefore, it is essential to investigate the impact of selecting influential and representative customers on recommendation performance when building recommendation algorithms. Some studies have suggested that review helpfulness and consistency significantly affect purchase decision-making. Thus, the authors focused on customers who had written helpful reviews and consistent reviews as influential and representative neighbors.

3. Methodology

This study investigates the impact of selecting influential and representative customers on recommendation performance when building recommendation algorithms. Figure 2 shows the framework of the proposed methodology for achieving the purpose of this study. First, the authors preprocessed Amazon review text data and produced customer profiles from the preprocessed review data. Then, they extracted influential and representative customers based on review consistency and helpfulness. The authors classified customer profiles into the overall group (from all reviews) and partial groups (from helpful and/or consistent reviews) in this phase. With the CF recommendation techniques, the authors used the similarity between customers to select the neighbors of the target customer. After selecting the neighbors, the authors made the recommendation list based on these similarity values. Finally, they compared the recommendation performance between the overall and partial groups when selecting the neighbors.

3.1 Data preprocessing module

The authors defined R = {r₁, r₂, …, r_k} as an original dataset for evaluating proposed methodology framework. Then, each review includes five attributions [I, U, R, T, H], where I indicates item attributions, U indicates reviewer attributions, R indicates metadata attributions (e.g. stat rating) and T indicates the review textual attributions. H indicates the helpfulness score, measured as the ratio of helpful votes to the total number of votes, where H ∈ [0, 1]. Let M be a vector of helpfulness classification result for each review, where M_i indicates whether a review is helpful or not. Many studies have proposed standard optimized threshold value θ for the reliable and effective classification of review helpfulness (Krishnamoorthy, 2015; Du et al., 2020). The authors labeled a review as helpful if threshold value θ was greater than 0.6; otherwise, it was labeled as unhelpful. Thus, M_i is measured as follows:

(1)

M_{i} = {\begin{matrix} 1, & if H_{i} > θ \\ 0, & otherwise \end{matrix} .

Subsequently, the authors preprocessed the review textual and converted it to a structured format to measure review consistency. Thus, theyfiltered the original review textual. The original review textual is divided into token units that the computer can process to analyze text. However, such tokens include noisy data that can distort the analysis results. Thus, noisy data must be removed through text preprocessing. First, punctuation, numbers and symbols were removed. Second, all review texts were converted to lowercase text to avoid duplicate words. Third, the stopwords were removed, which are often used in the review text but do not carry meaning. Finally, a lemmatization was applied that converted words into standard forms for analysis efficiency. Review consistency is calculated based on the review textual and the corresponding numerical star rating. Let N be a vector of review textual predicted value for each review, where N_i indicates a review textual sentiment score. A sentiment analysis technique is used to calculate the sentiment score in the preprocessed review text. Sentiment analysis automatically measures opinions in sentences or full text (Wang et al., 2018). Most existing studies have utilized several sentiment analysis techniques to measure sentiment scores reflected in reviews (Hu et al., 2014). For example, Yang et al. (2020) calculated a text sentiment score using TextBlob, a Python-based text processing library. The authors applied linguistic inquiry and word count (LIWC) to measure the review sentiment scores (Gonzales et al., 2010). LIWC is an excellent text analysis tool based on lexicometry that is widely used in various domain studies. Based on the common strategy of Zhou and Yang (2019), the authors measured the review text sentiment score using the following formula:

(2)

N_{i} = \frac{{PW}_{i} - {NW}_{i}}{{TW}_{i}},

where PW_i and NW_i represent the count of positive and negative words in review i, respectively, and TW_i represents the total count of words in review text i. When it is calculated according to the above formula, the review text sentiment score represents a value between −1 and 1. The authors then rescale the sentiment score of each review text to a value between 1 and 5 according to the following formula: (max′ − min′) × [(x − min)/(max − min)] + min′. Here, x represents the original emotional score of the review text; min and max represent the minimum and maximum values of the original sentiment score, respectively; max′ and min′ represent the minimum and maximum values of the rescaled sentiment scores as 1 and 5, respectively. An example of the preprocessed data set is shown in Table II. Each row indicates the customer's preference for a particular item.

3.2 Profile producing module

The preprocessed customer profile is produced as follows. The final customer profile contains five attributes [I, U, R, M, N], where I indicates Item ID, U indicates customer ID, R indicates star rating, M indicates the helpfulness information and N indicates the sentiment. This study aims to investigate the impact of selecting influential and representative customers on recommendation performance when building recommendation algorithms. The authors classify customer profiles into overall and partial groups based on review consistency and helpfulness to investigate the proposed methodology effectively.

The overall group means a group where the target customer can select all customers as neighbors. In other words, when a neighbor customer that is similar to the target customer is selected, all customers can be selected as neighbor customers without considering their influence or representativeness. The authors then produce a customer–item matrix based on the customers and items that are included in the overall review. The partial group is further divided into two groups. The first type of partial group means which includes only helpful reviews. Some studies argue that helpful reviews have an important influence on purchase decision-making (Hlee et al., 2019). Thus, it is vital to consider customers who have written helpful reviews when influential and representative neighbors are selected for target customers. The authors labeled it as helpful if the review helpfulness score H was higher than 0.6 and labeled it as unhelpful otherwise, according to previous studies (Du et al., 2020; Krishnamoorthy, 2015). The authors then produced a customer–item matrix based on the customers and items included in the helpful review. The second type of partial group includes only consistent review between a review text and a star rating. Review consistency between a review text and a star rating affects information decision-making (Aghakhani et al., 2021). Thus, it is essential to consider customers whose written reviews are consistent when influential and representative neighbors are selected for target customers. The authors labeled it as positive if the review star rating R and review sentiment score S were higher than 3 and as negative otherwise (Ganu et al., 2009). They then produced a customer–item matrix based on the customers and items included in the review consistency. Let M and N denote the number of customers and items, respectively. Then, the customer–item matrix is defined as follows:

(3)

y_{u i} = (\begin{array}{l} 1, & if the customer u purchased item i \\ 0, & otherwise. \end{array}

Here, a value of 1 for y indicates that customer u purchased item i. Similarly, a value of 0 indicates that it was not purchased.

3.3 CF recommendation module

The CF algorithm utilizes the preference information of neighboring customers to generate a recommendation list based on items that the target customer is likely to purchase. The algorithms are divided into two phases: neighbor customer selection and recommendation list generation.

Most CF algorithms are based on a similarity measure between target customers and other customers, where sim(a, b) denotes the similarity of customer a and customer b. The Pearson correlation coefficient measure and cosine measure are usually used as the similarity measures in the CF algorithm, and their performance is known to be almost the same (Sarwar et al., 2001; Bobadilla et al., 2013; Cho et al., 2002; Kim et al., 2002, 2010b). In this study, the authors measured similarity based on Pearson correlation coefficient similarity. The similarity between customer a and customer b was calculated as follows:

(4)

s i m (a, b) = \frac{\sum_{i = 1}^{N} (P_{a i} \cdot P_{b i})}{\sqrt{\sum_{i = 1}^{N} {(P_{a i})}^{2}} \sqrt{\sum_{i = 1}^{N} {(P_{b i})}^{2}}},

where a and b are customers, P_ai is the current preference of customer a for item i and P_bi is the current preference of customer b for item i. The preference similarity values of the two customers range from −1 to 1. The CF algorithm selects a similar customer group, i.e. a neighbor group with a high similarity value. Here, according to the strategy of the previous study, the CF algorithm is calculated with a neighborhood size of 40 (Duong et al., 2018; Hug, 2020).

3.4 Preference prediction module

The second phase in the CF algorithm generates a recommendation list based on the preferences of neighbor customers recorded in the customer profile. The probability of target customer c purchasing item j is measured using the purchase likelihood score (PLS) (Kim et al., 2010b; Chung et al., 2014; Moon et al., 2019). The PLS is calculated as follows:

(5)

PLS (c, j) = \frac{\sum_{j = 1}^{M} P_{u, j} \cdot s i m (c, i)}{\sum_{i = 1}^{M} s i m (c, j)}

where P_uj indicates customer u's preference for item j. P_uj is set to 1 if the customer previously purchased item j, otherwise, it is set to 0. Sim(c,i) indicates the similarity between the target customer c and another customer i. The CF algorithm generates a recommendation list consisting of k items with high PLS scores that the target customer has not previously purchased. In most recommender system studies, sensitivity analysis is used to determine the optimal k value. Usually, the k value starts at 1, and the larger values are substituted one after another to determine the k value at which the performance of the CF algorithm is assumed to be high. However, because the purpose of this study was to compare the performance difference between the overall group and the partial group, 10 was used as the k value, as this value has been widely used in previous studies (He et al., 2017; Cho et al., 2002; Kim et al., 2002, 2010b).

4. Experiments

4.1 Data set and evaluation protocols

To measure the performance of the proposed methodology, the authors used publicly accessible Amazon data sets collected between May 1996 and July 2014 (He and McAuley, 2016; McAuley et al., 2015). Amazon item reviews have been used and analyzed in many previous studies. Therefore, by adopting Amazon reviews, one can make fair comparisons with previous studies. In addition, the results of this study may provide practical insights into online e-commerce. The authors utilized the six most significant domains to evaluate the proposed methodology, including Android apps, video games, electronics, CDs and vinyl, movies and TV and books. Table III presents the descriptive statistics of the six domain data sets. As shown in Table IV, the customer profile contains (1) the ID of a reviewed item; (2) the helpfulness information, namely customer-provided helpful and unhelpful votes; (3) a star rating; (4) the published date, week and time; (5) the ID and name of the reviewer and (6) a text composed of a summary headline and detailed comments on the item. The review helpfulness voting distribution presented in Figure 3(a) shows a similar pattern, in which helpful reviews represent receive more votes than unhelpful reviews. To alleviate biases caused by the “words of few mouths” phenomenon, the authors filtered the reviews that received more than 10 helpfulness votes (Roy et al., 2019; Zhang et al., 2012). Figure 3(b) shows the distribution of the review ratings. Customers tended to provide positive feedback, which accounted for more than 70 per cent of all reviews. Many previous studies have defined this phenomenon as positive bias. To overcome the data sparsity issue, the authors filtered the data set to contain only customers with at least 20 interactions (Li et al., 2021; He et al., 2017). To effectively compare the recommendation performance, the authors divided the raw data set into training sets and test sets. The training set was used to train the CF algorithms, and the test set was used to evaluate the recommendation performance. The authors set 80 per cent of each data set as the training data set and measured the performance using the remaining data set.

To evaluate the recommendation performance of the proposed methodology, the authors used precision, recall and F1 score to measure the recommendation accuracy (Herlocker et al., 2004; Bobadilla et al., 2013; Su and Khoshgoftaar, 2009). The F1 score is a balanced weighted average between precision and recall. A high F1 score means a high prediction ability for the recommendation system. The precision, recall and F1 score for the Top-K recommendation list are defined in Equations (6)(8).

(6)

Precision = \frac{TP}{TP + FP},

(7)

Recall = \frac{TP}{TP + FN},

(8)

F 1 score = \frac{2 \times Precision \times Recall}{Precision + Recall},

where TP is the true positive (item relevant and recommended), FP is the false positive (item irrelevant and recommended) and FN is the false negative (item relevant and not recommended).

Most recent studies have suggested measuring the diversity of the recommended items to avoid a situation where many customers are referring to the same items (Zhou et al., 2010; Smyth and McClave, 2001; Hurley and Zhang, 2011). Several metrics can be used to measure the diversity of recommendations. In this study, the authors measured diversity using Shannon entropy (SE), which has been widely used in several studies (Panniello et al., 2014; Castells et al., 2015). SE is defined as follows:

(9)

SE = - \sum_{i = 1}^{n} (p_{i} \times \log (p_{i})),

where p_i is the percentage of recommendation items containing the ith item and n is the total number of items.

4.3 Experimental result

In this section, the authors described and discussed the experimental results of the study. Section 4.3.1 presents the impact of review helpfulness on recommendation performance. To achieve this purpose, the authors classified the customer profiles into the overall review and helpful review groups and compared their recommendation performance. Section 4.3.2 presents the impact of review consistency on recommendation performance. In this section, the authors also classified customer profiles into two groups, the overall review group and the consistent review group, and compared their recommendation performance. Additionally, they investigated the effects of review sentiment valance for helpful and consistent review groups. They used a real-world online review collected from Amazon.com to evaluate the proposed methodology. They also used the F1 score and SE metrics to evaluate the recommendation performance of the proposed methodology. The authors programmed their application in Python. All experiments were conducted in a computer environment with an Intel Core i9-9900KF CPU with 64 G of memory and GeForce RTX 2080 Ti GPU.

4.3.1 Impact of review helpfulness

In this section, the authors present the impact of review helpfulness on the performance of a recommender system using a real-world review data set. They classified customer profiles into overall review and helpful review groups to achieve this purpose. The overall review group includes all customers selected as neighbor customers when neighbor customers who are similar to the target customer are selected. The helpful review group includes customers who write helpful reviews and can be selected as neighbor customers. Figure 4 shows the results of the experiments for the impact of review helpfulness on recommendation performance. “Overall Reviews” represents a traditional methodology that produces profiles that include all customers, and “Helpful Reviews” represents customer profiles that include only helpful reviews. The results show that the CF algorithm achieves better recommendation performance when it uses customer profiles, including helpful reviews, regardless of the data set. The recommendation accuracy improved by 2.848 (D1), 2.733 (D2), 0.020 (D3), 0.100 (D4), 0.139 (D5) and 1.846 (D6) compared with a traditional methodology that includes overall customer reviews. Such results show that accuracy is better when customers who have written helpful reviews are selected as neighbor customers to a target customer. However, review helpfulness did not affect the diversity of the recommendations. The diversity decreased by 0.067 (D1), 0.169 (D2), 0.124 (D3), 0.105 (D4), 0.079 (D5) and 0.170 (D6) compared with a traditional methodology that includes all customer reviews. These results are consistent with the results of previous studies that showed that the accuracy of a recommendation system becomes slightly lower with an increase in the variety of recommended items (Adomavicius and Kwon, 2011; Smith and Linden, 2017).

Furthermore, to confirm the effectiveness of the experiment, the authors conducted paired t-tests to investigate whether there was a statistical difference in the recommendation performance between the two groups for each data set. As shown in Table V, the results show that the mean of the two groups was statistically significant at p < 0.001. Thus, helpfulness can be interpreted as an essential factor in enhancing recommendation performance.

The effect of sentiment valance on review helpfulness has been investigated by many researchers (Lee et al., 2017; Ren and Hong, 2019), but few studies have investigated the effect of sentiment valance on recommendation performance. Thus, the authors investigated the effect of recommendation sentiment valance in helpful reviews. Figure 5 shows the results of the experiments for the impact of sentiment valance on recommendation performance. The authors classified customer profiles into three types of groups to achieve this objective. The “Helpful Reviews” mark indicates customer profiles that include only helpful reviews. “Helpful & Positive Reviews” and “Helpful & Negative Reviews” marks indicate customer profiles written with positive and negative reviews in helpful reviews, respectively. The experimental results showed that the review sentiment valance of review helpfulness does not significantly affect the recommendation performance. Customers perceive that review helpfulness vote information prepared by third parties is necessary for purchasing decisions, and customers do not make the additional effort to explore for information. In other words, customers are affected by whether a review is helpful or not in making a purchase decision, but it does not matter whether a helpful review is positive or negative. Additionally, to confirm the effectiveness of the experiment, the authors used one-way analysis of variance (ANOVA) to investigate whether there was a statistical difference in the recommendation performance among the three groups of each data set. The results show that the means of the three groups of each data set are statistically significant for p < 0.1 (see Table A1).

4.3.2 Impact of review consistency

Subsequently, the authors present the impact of review consistency on the performance of the recommender system. To achieve this objective, customer profiles were classified into two groups: the overall review group and the consistent review group. The consistent review group includes customers who wrote consistent reviews and who can be selected as neighboring customers, and review consistency indicates the consistency between a review text and the corresponding numerical rating. Figure 6 shows the results of the experiments for the impact of review consistency on recommendation performance. “Overall Reviews” includes all customers, and “Consistent Reviews” includes customers who have written only consistent reviews. The results show that the CF algorithm achieves excellent recommendation performance when it uses customer profiles with consistent reviews, regardless of the data set category. The recommendation accuracy improved by 0.329 (D1), 0.696 (D2), 0.171 (D3), 0.048 (D4), 0.068 (D5) and 0.368 (D6) compared with a traditional methodology that includes overall customer reviews. However, review consistency does not affect the diversity of recommendations as does review helpfulness. The diversity of the consistent review group decreased by 0.019 (D1), 0.019 (D2), 0.017 (D3), 0.017 (D4), 0.018 (D5) and 0.022 (D6) compared to that of the overall review group. These results show that accuracy is better when customers who have written consistent reviews are selected as neighbor customers to target customers.

Furthermore, to confirm the effectiveness of the experiment, the authors conducted paired t-tests to investigate whether there was a statistical difference in the recommendation performance between the two groups of each data set. As shown in Table VI, the results show that the difference in the mean of the two groups was statistically significant at p < 0.01. Thus, the authors concluded that consistent reviews are essential for enhancing the recommendation performance.

As in the helpfulness review experiments, the authors investigated the effect of recommendation sentiment valence on performance with consistent reviews. Figure 7 shows the results of the experiments with the impact of sentiment valence on recommendation performance. To achieve this purpose, customer profiles were classified into three groups. The “Consistent Reviews” includes customer profiles that include only consistent reviews and the “Consistent & Positive Reviews” and “Consistent & Negative Reviews” include customer profiles having positive and negative reviews in the helpful reviews, respectively. According to a previous study, customers process the consistency of review texts and star ratings when they read online review information (Aghakhani et al., 2021). Unlike the review helpfulness vote, a third party cannot vote for review consistency. Therefore, customers must explore the review content, such as review sentiment valence, in consistent reviews. Most studies argue that positive reviews significantly impact purchasing decisions, and this study also confirms that positive reviews impact recommendation performance. Additionally, to confirm the effectiveness of the experiment, the authors used one-way ANOVA to investigate whether there was a statistical difference in the recommendation performance among the three groups of each data set. The results show that the mean of the three groups was statistically significant (p < 0.1 (see Table A2).

5. Conclusions

A recommender system is a critical tool that e-commerce companies can use to pursue sustainable growth. Therefore, global companies, such as Amazon (Linden et al., 2003), Netflix (Bennett and Lanning, 2007) and Google (Das et al., 2007) offer recommendation services to their customers to gain a competitive advantage. However, traditional recommendation algorithms select insufficiently influential and representative customers as neighbors for the target customer. This means that the recommendation performance is not sufficiently accurate. Therefore, the authors investigated the impact of selecting influential and representative customers on recommendation performance. To investigate the impact on recommendation performance, they selected neighbor customers who wrote consistent and helpful reviews and compared them with all neighbor customers. Then, they compared the recommendation performance of the proposed methodology using several real-world Amazon review data sets for effective and reliable experiments. The results showed that the recommendation performance was better when neighbors who wrote consistent and helpful reviews were selected than when neighbors were selected from among all customers.

The summary and theoretical implications of this study are as follows. First, to improve the performance of the recommendation algorithm, filtering only influential and meaningful data is required along with research to develop a new recommendation algorithm. However, there have been few studies on how changes in input data affect recommendation performance in the recommender system research so far. Previous studies focus on enhancing recommendation performance by developing new algorithms, but this study focused on applications based on customer behavior data. This study proposes a novel recommendation methodology by filtering the review data, which is essential for customers' purchasing decisions. These studies can expand the scope of recommender system research. Second, the authors measured review helpfulness and consistency to produce influential and representative customer profiles. Some studies have argued that review helpfulness and consistency significantly affect purchase decision-making. Thus, the authors have enhanced recommendation performance by filtering customers who wrote helpful reviews and consistent reviews and were selected as neighbors. The experimental results demonstrate that review helpfulness and consistency are essential for improving recommendation performance. The authors also found that positive sentiment in consistent reviews affects recommendation performance. Third, the authors further investigated the impact of review helpfulness and consistency on recommended accuracy and diversity. The experimental results showed excellent recommendation accuracy when customers who had helpful or consistent reviews were filtered out and selected as neighbors, regardless of the data set. However, it was found that recommendation diversity was slightly better when all customers were selected as neighbors rather than when only the filtered customers were selected. These results are consistent with the results of previous studies that found the accuracy of a recommender system is slightly lower to increase the diversity of recommended items (Kim et al., 2021).

These results provide e-commerce companies with the following practical implications. First, existing e-commerce companies use all customer transaction data to develop recommender systems. This is because they believe that many customer transactions can increase recommendation performance. However, the experiments show that too much customer transaction data may decrease recommendation performance. This study investigated whether both review helpfulness and consistency affect recommendation performance. Thus, e-commerce companies should consider more options when developing recommender systems. E-commerce companies should know that more customer data does not necessarily improve recommendation accuracy. Therefore, customer data should be regularly managed after it accumulates to some extent. Furthermore, knowing which factors of input data affect the performance of the recommender system can give guidelines when designing customer interfaces in the future. Second, global e-commerce websites apply deep learning and artificial intelligence technologies for personalized recommendation services. However, most small e-commerce websites are challenging to apply such technologies due to development costs and the lack of technical human resources. This study focused on making optimal recommendation methodology using online review sources and traditional CF recommendation algorithms to address these concerns. CF algorithms are still applied in many e-commerce websites due to their excellent performance. The authors have demonstrated that their experiments outperform CF recommendation algorithms when providing recommendations using customer data filtered based on review consistency and helpfulness. Therefore, e-commerce practitioners must develop effective recommender systems appropriate to the size of e-commerce websites. In other words, a small online e-commerce website can build an excellent recommender system based on the effective integration of the company resources.

This study has several limitations. First, this study was conducted using only Amazon review data sets, and generalization of the research results requires further study using data sets from various domains. Second, the algorithm used in the experiment is a traditional CF algorithm commonly used in the study of recommender systems. The authors are uncertain whether other algorithms, such as the recurrent neural network (RNN) or convolutional neural network (CNN), would result in the same findings. Therefore, further research is needed to determine whether the results of this study will hold when various algorithms, such as the CNN and RNN, are used. Finally, this study concludes only that review helpfulness and consistency affect recommendation performance. Future studies could identify various factors that affect recommendation performance using a series of real-world data sets.

Figures

Figure 1.

Examples of user-based CF algorithms

Figure 2.

Proposed methodology framework

Figure 3.

Distributions of (a) review helpfulness and (b) review rating

Figure 4.

Comparison of (a) accuracy and (b) diversity according to review helpfulness

Figure 5.

Comparison of (a) accuracy and (b) diversity according to the impact of sentiment valance in helpful reviews

Figure 6.

Comparison of (a) accuracy and (b) diversity according to the impact of review consistency

Figure 7.

Comparison of (a) accuracy and (b) diversity according to the impact of sentiment valance in consistent review

Table I.

Summary of previous studies using CF techniques

Domain	Methodology	Reference
Book	Clustering	Linden et al., (2003), Murty and Jain (1995)
	KNN	McSherry (2003)
	Association rule, KNN	Kim et al. (2010a)
Movie	KNN	Subramaniyaswamy and Logesh (2017), Ortega et al. (2016)
	Clustering, KNN	Weng and Liu (2004), Acilar and Arslan (2009)
	Clustering	Ungar and Foster (1998)
Music	Clustering	Peretz (1989), Tsai et al. (2004)
	Clustering, regression	Zhu et al. (2006)
	Clustering, neural network	Liu et al. (2010)
E-commerce	KNN	Zeng et al. (2004)
	Association rule, clustering	Ha (2002)
	Clustering	Sarwar et al. (2002), Kim and Ahn (2008)

Table II.

An example of customer–item feature vector matrices

I	U	R	M	N
343	48	5	1	3.5
5656	658	3	0	1.8
⋮	⋮	⋮	⋮	⋮
657	796	4	1	3.7
68	437	1	1	2.1

Notes: I = Item ID; U = customer ID; R = star rating; M = helpfulness score; N = sentiment score

Table III.

Descriptive statistics of six domain data sets

Data sets	Customers	Items	Review and rating	Sparsity ratio (%)
D1: Video games	24,303	10,672	231,780	99.91
D2: Baby	19,445	7,050	170,792	99.87
D3: Beauty	22,363	12,101	198,502	99.92
D4: Health and personal care	38,609	18,534	346,355	99.95
D5: Cell phones and accessories	27,879	10,429	194,439	99.93
D6: Sports and outdoors	35,598	18,357	296,337	99.95

Table IV.

Amazon review composition example

Attribute	Attribute type	Description	Example
Item ID	String	Unique identifier ID for the item	B002I096AA
Total number of votes	Integer	Number of helpful votes in total	17
Number of helpful votes	Integer	Number of helpful votes	10
Star rating	Integer	A star rating value on a review	3
Review time	String	An upload time on a review message	Thursday, January 19, 2012, 12 a.m.
Reviewer ID	String	Unique identifier ID for the customer	A1F9Z42CFF9IAY
Reviewer name	String	A reviewer name who wrote a review	T. Tom
Summary headline	String	Title of a review message	There is no sense of urgency in buying a Nintendo 3DS quite yet
Detailed comments	String	Contents of a review message	With no real interesting launch titles in the USA and a short battery life, as well as a few other negatives, there is no sense of urgency in buying a Nintendo 3DS yet for any but the most dedicated gamers.

Table V.

t-Test for accuracy and diversity of overall and helpful reviews

Data set	Metric type	M		SD		t
Data set	Metric type	Overall	Helpful	Overall	Helpful	t
D1 (N = 1,457)	Accuracy	0.003	0.010	0.020	0.038	−7.170*
D1 (N = 1,457)	Diversity	0.989	0.923	0.012	0.091	28.028*
D2 (N = 647)	Accuracy	0.002	0.006	0.015	0.027	−3.706*
D2 (N = 647)	Diversity	0.986	0.819	0.018	0.174	24.506*
D3 (N = 1,136)	Accuracy	0.017	0.017	0.054	0.052	−4.181*
D3 (N = 1,136)	Diversity	0.982	0.860	0.032	0.132	31.859*
D4 (N = 1,969)	Accuracy	0.009	0.010	0.036	0.039	−3.734*
D4 (N = 1,969)	Diversity	0.980	0.877	0.029	0.110	42.765*
D5 (N = 322)	Accuracy	0.017	0.020	0.048	0.056	−3.825*
D5 (N = 322)	Diversity	0.941	0.866	0.080	0.180	7.318*
D6 (N = 1,277)	Accuracy	0.002	0.007	0.019	0.032	−5.121*
D6 (N = 1,277)	Diversity	0.977	0.811	0.031	0.162	36.933*

Notes: *p < 0.001; SD = standard deviation, M = mean

Table VI.

t-Test of accuracy and diversity for overall and consistent reviews

Data set	Metric type	M		SD		t
Data set	Metric type	Overall	Consistent	Overall	Consistent	t
D1 (N = 1,457)	Accuracy	0.006	0.008	0.031	0.036	−3.196**
D1 (N = 1,457)	Diversity	0.983	0.964	0.020	0.040	23.160**
D2 (N = 647)	Accuracy	0.003	0.005	0.020	0.027	−3.020*
D2 (N = 647)	Diversity	0.980	0.961	0.025	0.043	16.029**
D3 (N = 1,136)	Accuracy	0.019	0.022	0.055	0.061	−4.771**
D3 (N = 1,136)	Diversity	0.973	0.957	0.038	0.054	18.980**
D4 (N = 1,969)	Accuracy	0.008	0.008	0.036	0.036	−4.565**
D4 (N = 1,969)	Diversity	0.975	0.958	0.035	0.049	25.342**
D5 (N = 322)	Accuracy	0.026	0.028	0.070	0.072	−3.683*
D5 (N = 322)	Diversity	0.944	0.926	0.063	0.070	9.116**
D6 (N = 1,277)	Accuracy	0.007	0.010	0.037	0.040	−4.603**
D6 (N = 1,277)	Diversity	0.962	0.941	0.039	0.051	26.056**

Notes: *p < 0.05, **p < 0.001; SD = standard deviation, M = mean

Table A1.

ANOVA and the Scheffé multiple comparison test for helpful reviews

Data set	Metric type	Group type	M	SD	F	Scheffé
D1 (N = 1,457)	Accuracy	Helpful (a)	0.010	0.038	12.040***	a > c; c < b
		Positive (b)	0.011	0.038
		Negative (c)	0.005	0.025
	Diversity	Helpful (a)	0.923	0.091	374.821***	a > b > c
		Positive (b)	0.902	0.112
		Negative (c)	0.744	0.301
D2 (N = 647)	Accuracy	Helpful (a)	0.006	0.027	1.421
		Positive (b)	0.005	0.025
		Negative (c)	0.004	0.021
	Diversity	Helpful (a)	0.819	0.174	1.069
		Positive (b)	0.811	0.182
		Negative (c)	0.831	0.337
D3 (N = 1,136)	Accuracy	Helpful (a)	0.017	0.052	24.059***	a > c; b > c
		Positive (b)	0.016	0.050
		Negative (c)	0.006	0.026
	Diversity	Helpful (a)	0.860	0.132	25.000***	a > c; b > c
		Positive (b)	0.852	0.138
		Negative (c)	0.796	0.359
D4 (N = 1,969)	Accuracy	Helpful (a)	0.010	0.039	32.879***	a > c; b > c
		Positive (b)	0.010	0.039
		Negative (c)	0.003	0.017
	Diversity	Helpful (a)	0.877	0.110	182.362***	a > c; b > c
		Positive (b)	0.867	0.121
		Negative (c)	0.751	0.366
D5 (N = 322)	Accuracy	Helpful (a)	0.020	0.056	10.110***	a > c; b > c
		Positive (b)	0.019	0.054
		Negative (c)	0.005	0.024
	Diversity	Helpful (a)	0.866	0.180	21.160***	a > c; b > c
		Positive (b)	0.866	0.181
		Negative (c)	0.947	0.185
D6 (N = 1,277)	Accuracy	Helpful (a)	0.007	0.032	18.764***	a > c; b > c
		Positive (b)	0.006	0.032
		Negative (c)	0.001	0.012
	Diversity	Helpful (a)	0.811	0.162	2.315
		Positive (b)	0.805	0.173
		Negative (c)	0.825	0.351

Notes: *p < 0.1, **p < 0.05, ***p < 0.001; ANOVA = analysis of variance, SD = standard deviation, M = mean

Table A2.

ANOVA and the Scheffé multiple comparison test for consistent reviews

Data set	Metric type	Group type	M	SD	F	Scheffé
D1 (N = 1,457)	Accuracy	Consistent (a)	0.008	0.036	12.889***	a > c; b > c
		Positive (b)	0.011	0.041
		Negative (c)	0.005	0.024
	Diversity	Consistent (a)	0.964	0.040	308.391***	a > c; b > c
		Positive (b)	0.956	0.045
		Negative (c)	0.833	0.264
D2 (N = 647)	Accuracy	Consistent (a)	0.005	0.027	0.116
		Positive (b)	0.005	0.026
		Negative (c)	0.006	0.029
	Diversity	Consistent (a)	0.961	0.043	56.596***	a > c; b > c
		Positive (b)	0.960	0.043
		Negative (c)	0.872	0.290
D3 (N = 1,136)	Accuracy	Consistent (a)	0.022	0.061	40.171***	a > c; b > c
		Positive (b)	0.022	0.061
		Negative (c)	0.005	0.023
	Diversity	Consistent (a)	0.957	0.054	139.807***	a > c; b > c
		Positive (b)	0.955	0.056
		Negative (c)	0.842	0.309
D4 (N = 1,969)	Accuracy	Consistent (a)	0.008	0.036	4.243**
		Positive (b)	0.009	0.037
		Negative (c)	0.006	0.027
	Diversity	Consistent (a)	0.958	0.049	296.135***	a > c; b > c
		Positive (b)	0.957	0.051
		Negative (c)	0.830	0.318
D5 (N = 322)	Accuracy	Consistent (a)	0.028	0.072	11.363***	a > c; b > c
		Positive (b)	0.028	0.071
		Negative (c)	0.007	0.031
	Diversity	Consistent (a)	0.926	0.070	1.584
		Positive (b)	0.925	0.073
		Negative (c)	0.941	0.189
D6 (N = 1,277)	Accuracy	Consistent (a)	0.010	0.040	23.600***	a > c; b > c
		Positive (b)	0.010	0.041
		Negative (c)	0.002	0.016
	Diversity	Consistent (a)	0.941	0.051	2.779*
		Positive (b)	0.939	0.052
		Negative (c)	0.929	0.230

Notes: *p < 0.1, **p < 0.05, ***p < 0.001; ANOVA = analysis of variance, SD = standard deviation, M = mean

Appendix

References

Acilar, A.M. and Arslan, A. (2009), “A collaborative filtering method based on artificial immune network”, Expert Systems with Applications, Vol. 36 No. 4, pp. 8324-8332.

Adomavicius, G. and Kwon, Y. (2011), “Improving aggregate recommendation diversity using ranking-based techniques”, IEEE Transactions on Knowledge and Data Engineering, Vol. 24 No. 5, pp. 896-911.

Aghakhani, N., Oh, O., Gregg, D.G. and Karimi, J. (2021), “Online review consistency matters: an elaboration likelihood model perspective”, Information Systems Frontiers, Vol. 23 No. 5, pp. 1287-1301.

Bennett, J. and Lanning, S. (2007), “The Netflix Prize”, Proceedings of KDD Cup and Workshop, CiteSeer, Association for Computing Machinery, New York, NY, pp. 35-38.

Bobadilla, J., Ortega, F., Hernando, A. and Gutiérrez, A. (2013), “Recommender systems survey”, Knowledge-Based Systems, Vol. 46, pp. 109-132.

Castells, P., Hurley, N.J. and Vargas, S. (2015), “Novelty and diversity in recommender systems”, in Ricci, F., Rokach, L. and Shapira, B. (Eds), Recommender Systems Handbook, Springer US, Boston, MA, pp. 881-918.

Cheung, M.Y., Luo, C., Sia, C.L. and Chen, H. (2009), “Credibility of electronic word-of-mouth: informational and normative determinants of on-line consumer recommendations”, International Journal of Electronic Commerce, Vol. 13 No. 4, pp. 9-38.

Cho, Y.H., Kim, J.K. and Kim, S.H. (2002), “A personalized recommender system based on web usage mining and decision tree induction”, Expert Systems with Applications, Vol. 23 No. 3, pp. 329-342.

Chung, N., Koo, C. and Kim, J.K. (2014), “Extrinsic and intrinsic motivation for using a booth recommender system service on exhibition attendees' unplanned visit behavior”, Computers in Human Behavior, Vol. 30, pp. 59-68.

Das, A.S., Datar, M., Garg, A. and Rajaram, S. (2007), “Google news personalization: scalable online collaborative filtering”, Proceedings of the 16th International Conference on World Wide Web, Association for Computing Machinery, New York, NY, pp. 271-280.

Dehdarirad, H., Ghazimirsaeid, J. and Jalalimanesh, A. (2020), “Scholarly publication venue recommender systems: a systematic literature review”, Data Technologies and Applications, Vol. 54 No. 2, pp. 169-191.

Dong-Hui, Y. and Guang, Y. (2013), “Bigger data set, better personalized recommendation performance?”, Proceedings of 2013 International Conference on Management Science and Engineering, IEEE, Harbin, China, pp. 28-35.

Du, J., Zheng, L., He, J., Rong, J., Wang, H. and Zhang, Y. (2020), “An interactive network for end-to-end review helpfulness modeling”, Data Science and Engineering, Vol. 5 No. 3, pp. 261-279.

Duong, T.N., Than, V.D., Tran, T.H., Dang, Q.H., Nguyen, D.M. and Pham, H.M. (2018), “An effective similarity measure for neighborhood-based collaborative filtering”, Proceedings of 2018 5th NAFOSTED Conference Information and Computer Science, IEEE, Ho Chi Minh City, Vietnam, pp. 250-254.

Ganu, G., Elhadad, N. and Marian, A. (2009), “Beyond the stars: improving rating predictions using review text content”, WebDB, CiteSeer, pp. 1-6.

García-Cumbreras, M.Á., Montejo-Ráez, A. and Díaz-Galiano, M.C. (2013), “Pessimists and optimists: improving collaborative filtering through sentiment analysis”, Expert Systems with Applications, Vol. 40 No. 17, pp. 6758-6765.

Goldberg, D., Nichols, D., Oki, B.M. and Terry, D. (1992), “Using collaborative filtering to weave an information tapestry”, Communications of the ACM, Vol. 35 No. 12, pp. 61-70.

Gonzales, A.L., Hancock, J.T. and Pennebaker, J.W. (2010), “Language style matching as a predictor of social dynamics in small groups”, Communication Research, Vol. 37 No. 1, pp. 3-19.

Ha, S.H. (2002), “Helping online customers decide through web personalization”, IEEE Intelligent Systems, Vol. 17 No. 6, pp. 34-43.

Ham, J., Lee, K., Kim, T. and Koo, C. (2019), “Subjective perception patterns of online reviews: a comparison of utilitarian and hedonic values”, Information Processing & Management, Vol. 56 No. 4, pp. 1439-1456.

He, R. and McAuley, J. (2016), “Ups and downs: modeling the visual evolution of fashion trends with one-class collaborative filtering”, Proceedings of the 25th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 507-517.

He, X., Liao, L., Zhang, H., Nie, L., Hu, X. and Chua, T.-S. (2017), “Neural collaborative filtering”, Proceedings of the 26th International Conference on World Wide Web, International World Wide Web Conferences Steering Committee, Republic and Canton of Geneva, Switzerland, pp. 173-182.

Herlocker, J.L., Konstan, J.A., Terveen, L.G. and Riedl, J.T. (2004), “Evaluating collaborative filtering recommender systems”, ACM Transactions on Information Systems (TOIS), Vol. 22 No. 1, pp. 5-53.

Hlee, S., Lee, J., Yang, S.-B. and Koo, C. (2019), “The moderating effect of restaurant type on hedonic versus utilitarian review evaluations”, International Journal of Hospitality Management, Vol. 77, pp. 195-206.

Hu, N., Koh, N.S. and Reddy, S.K. (2014), “Ratings lead you to the product, reviews help you clinch it? The mediating role of online review sentiments on product sales”, Decision Support Systems, Vol. 57, pp. 42-53.

Hug, N. (2020), “Surprise: a Python library for recommender systems”, Journal of Open Source Software, Vol. 5 No. 52, p. 2174.

Hurley, N. and Zhang, M. (2011), “Novelty and diversity in top-n recommendation-analysis and evaluation”, ACM Transactions on Internet Technology (TOIT), Vol. 10 No. 4, pp. 1-30.

Hyun, J., Ryu, S. and Lee, S.-Y.T. (2019), “How to improve the accuracy of recommendation systems: combining ratings and review texts sentiment scores”, Journal of Intelligence and Information Systems, Vol. 25 No. 1, pp. 219-239.

Jeon, B. and Ahn, H. (2015), “A collaborative filtering system combined with users' review mining: application to the recommendation of smartphone apps”, Journal of Intelligence and Information Systems, Vol. 21 No. 2, pp. 1-18.

Khodabandehlou, S., Golpayegani, S.A.H. and Rahman, M.Z. (2020), “An effective recommender system based on personality traits, demographics and behavior of customers in time context”, Data Technologies and Applications, Vol. 55 No. 1, pp. 149-174.

Kim, H.K., Kim, J.K. and Ryu, Y.U. (2009), “Personalized recommendation over a customer network for ubiquitous shopping”, IEEE Transactions on Services Computing, Vol. 2 No. 2, pp. 140-151.

Kim, H.K., Ryu, Y.U., Cho, Y. and Kim, J.K. (2012), “Customer-driven content recommendation over a network of customers”, IEEE Transactions on Systems, Man, and Cybernetics-Part A: Systems and Humans, Vol. 42 No. 1, pp. 48-56.

Kim, H.-N., Ji, A.-T., Ha, I. and Jo, G.-S. (2010a), “Collaborative filtering based on collaborative tagging for enhancing the quality of recommendation”, Electronic Commerce Research and Applications, Vol. 9 No. 1, pp. 73-83.

Kim, J., Choi, I. and Li, Q. (2021), “Customer satisfaction of recommender system: examining accuracy and diversity in several types of recommendation approaches”, Sustainability, Vol. 13 No. 11, p. 6165.

Kim, J.K., Cho, Y.H., Kim, W.J., Kim, J.R. and Suh, J.H. (2002), “A personalized recommendation procedure for internet shopping support”, Electronic Commerce Research and Applications, Vol. 1 No. 3-4, pp. 301-313.

Kim, J.K., Kim, H.K., Oh, H.Y. and Ryu, Y.U. (2010b), “A group recommendation system for online communities”, International Journal of Information Management, Vol. 30 No. 3, pp. 212-219.

Kim, K.-J. and Ahn, H. (2008), “A recommender system using GA K-means clustering in an online shopping market”, Expert Systems with Applications, Vol. 34 No. 2, pp. 1200-1209.

Krishnamoorthy, S. (2015), “Linguistic features for review helpfulness prediction”, Expert Systems with Applications, Vol. 42 No. 7, pp. 3751-3759.

Lee, D. and Hosanagar, K. (2019), “How do recommender systems affect sales diversity? A cross-category investigation via randomized field experiment”, Information Systems Research, Vol. 30 No. 1, pp. 239-259.

Lee, M., Jeong, M. and Lee, J. (2017), “Roles of negative emotions in customers' perceived helpfulness of hotel reviews on a user-generated review website: a text mining approach”, International Journal of Contemporary Hospitality Management, Vol. 29 No. 2, pp. 762-783.

Leung, C.W., Chan, S.C. and Chung, F.-L. (2006), “Integrating collaborative filtering and sentiment analysis: a rating inference approach”, Proceedings of the ECAI 2006 Workshop on Recommender Systems, Citeseer, Riva del Garda, Italy, pp. 62-66.

Li, Q., Li, X., Lee, B. and Kim, J. (2021), “A hybrid CNN-based review helpfulness filtering model for improving e-commerce recommendation service”, Applied Sciences, Vol. 11 No. 18, p. 8613.

Linden, G., Smith, B. and York, J. (2003), “Amazon.com recommendations: item-to-item collaborative filtering”, IEEE Internet Computing, Vol. 7 No. 1, pp. 76-80.

Liu, H., He, J., Wang, T., Song, W. and Du, X. (2013), “Combining user preferences and user opinions for accurate recommendation”, Electronic Commerce Research and Applications, Vol. 12 No. 1, pp. 14-23.

Liu, X.F., Chi, K.T. and Small, M. (2010), “Complex network structure of musical compositions: algorithmic generation of appealing music”, Physica A: Statistical Mechanics and Its Applications, Vol. 389 No. 1, pp. 126-132.

McAuley, J., Targett, C., Shi, Q. and Van Den Hengel, A. (2015), “Image-based recommendations on styles and substitutes”, Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Computing Machinery, New York, NY, pp. 43-52.

McSherry, D. (2003), “Balancing user satisfaction and cognitive load in coverage-optimised retrieval”, Proceedings of 23rd International Conference on Innovative Techniques and Applications of Artificial Intelligence, Springer, London, pp. 381-394.

Mitra, S. and Jenamani, M. (2021), “Helpfulness of online consumer reviews: a multi-perspective approach”, Information Processing & Management, Vol. 58 No. 3, p. 102538.

Moon, H.S., Ryu, Y.U. and Kim, J.K. (2019), “Enhanced collaborative filtering: a product life cycle approach”, Journal of Electronic Commerce Research, Vol. 20 No. 3, pp. 155-168.

Murty, M.N. and Jain, A.K. (1995), “Knowledge-based clustering scheme for collection management and retrieval of library books”, Pattern Recognition, Vol. 28 No. 7, pp. 949-963.

Ortega, F., Hernando, A., Bobadilla, J. and Kang, J.H. (2016), “Recommending items to group of users using matrix factorization based collaborative filtering”, Information Sciences, Vol. 345, pp. 313-324.

Panniello, U., Tuzhilin, A. and Gorgoglione, M. (2014), “Comparing context-aware recommender systems in terms of accuracy and diversity”, User Modeling and User-Adapted Interaction, Vol. 24 No. 1-2, pp. 35-65.

Park, D.H., Kim, H.K., Choi, I.Y. and Kim, J.K. (2012), “A literature review and classification of recommender systems research”, Expert Systems with Applications, Vol. 39 No. 11, pp. 10059-10072.

Peretz, I. (1989), “Clustering in music: an appraisal of task factors”, International Journal of Psychology, Vol. 24 No. 1-5, pp. 157-178.

Ren, G. and Hong, T. (2019), “Examining the relationship between specific negative emotions and the perceived helpfulness of online reviews”, Information Processing & Management, Vol. 56 No. 4, pp. 1425-1438.

Ricci, F., Rokach, L. and Shapira, B. (2011), “Introduction to recommender systems handbook”, in Kantor, P. (Ed.), Recommender Systems Handbook, Springer, Boston, MA, pp. 1-35.

Roy, G., Datta, B. and Mukherjee, S. (2019), “Role of electronic word-of-mouth content and valence in influencing online purchase behavior”, Journal of Marketing Communications, Vol. 25 No. 6, pp. 661-684.

Sarwar, B., Karypis, G., Konstan, J. and Riedl, J. (2001), “Item-based collaborative filtering recommendation algorithms”, Proceedings of the 10th International Conference on World Wide Web, Association for Computing Machinery, New York, NY, pp. 285-295.

Sarwar, B.M., Karypis, G., Konstan, J. and Riedl, J. (2002), “Recommender systems for large-scale e-commerce: scalable neighborhood formation using clustering”, Proceedings of the Fifth International Conference on Computer and Information Technology, CiteSeer, pp. 291-324.

Smith, B. and Linden, G. (2017), “Two decades of recommender systems at Amazon.com”, IEEE Internet Computing, Vol. 21 No. 3, pp. 12-18.

Smyth, B. and McClave, P. (2001), “Similarity vs. diversity”, Proceedings of 4th International Conference on Case-Based Reasoning, Springer, Berlin, Heidelberg, pp. 347-361.

Su, X. and Khoshgoftaar, T.M. (2009), “A survey of collaborative filtering techniques”, Advances in Artificial Intelligence, Vol. 2009, p. 421425.

Subramaniyaswamy, V. and Logesh, R. (2017), “Adaptive KNN based recommender system through mining of user preferences”, Wireless Personal Communications, Vol. 97 No. 2, pp. 2229-2247.

Tsai, W.-H., Rodgers, D. and Wang, H.-M. (2004), “Blind clustering of popular music recordings based on singer voice characteristics”, Computer Music Journal, Vol. 28 No. 3, pp. 68-78.

Ungar, L.H. and Foster, D.P. (1998), “Clustering methods for collaborative filtering”, AAAI Workshop on Recommendation Systems, Menlo Park, CA, pp. 114-129.

Wang, C., Chen, G. and Wei, Q. (2018), “A temporal consistency method for online review ranking”, Knowledge-Based Systems, Vol. 143, pp. 259-270.

Weng, -S.-S. and Liu, M.-J. (2004), “Feature-based recommendations for one-to-one marketing”, Expert Systems with Applications, Vol. 26 No. 4, pp. 493-508.

Yang, S., Yao, J. and Qazi, A. (2020), “Does the review deserve more helpfulness when its title resembles the content? Locating helpful reviews by text mining”, Information Processing & Management, Vol. 57 No. 2, p. 102179.

Zeng, C., Xing, C.-X., Zhou, L.-Z. and Zheng, X.-H. (2004), “Similarity measure and instance selection for collaborative filtering”, International Journal of Electronic Commerce, Vol. 8 No. 4, pp. 115-129.

Zhang, R., Tran, T. and Mao, Y. (2012), “Opinion helpfulness prediction in the presence of “words of few mouths””, World Wide Web, Vol. 15 No. 2, pp. 117-138.

Zhang, Z., Zhang, D. and Lai, J. (2014), “urCF: user review enhanced collaborative filtering”.

Zhou, L. and Chaovalit, P. (2008), “Ontology‐supported polarity mining”, Journal of the American Society for Information Science and Technology, Vol. 59 No. 1, pp. 98-110.

Zhou, T., Kuscsik, Z., Liu, J.G., Medo, M., Wakeling, J.R. and Zhang, Y.C. (2010), “Solving the apparent diversity-accuracy dilemma of recommender systems”, Proceedings of the National Academy of Sciences of the United States of America, Vol. 107 No. 10, pp. 4511-4515.

Zhou, Y. and Yang, S. (2019), “Roles of review numerical and textual characteristics on review helpfulness across three different types of reviews”, IEEE Access, Vol. 7, pp. 27769-27780.

Zhu, X., Shi, Y.-Y., Kim, H.-G. and Eom, K.-W. (2006), “An integrated music recommendation system”, IEEE Transactions on Consumer Electronics, Vol. 52 No. 3, pp. 917-925.

Acknowledgements

Funding: This research was supported by the Industrial Technology Innovation Program (20009050) and the Ministry of Trade, Industry & Energy (MOTIE, Korea).

Corresponding author

Qinglong Li can be contacted at: leecy@khu.ac.kr