Hei Chia Wang, Yu Hung Chiang and Yi Feng Sun
This paper aims to improve a sentiment analysis (SA) system to help users (i.e. customers or hotel managers) understand hotel evaluations. There are three main purposes in this…
Abstract
Purpose
This paper aims to improve a sentiment analysis (SA) system to help users (i.e. customers or hotel managers) understand hotel evaluations. There are three main purposes in this paper: designing an unsupervised method for extracting online Chinese features and opinion pairs, distinguishing different intensities of polarity in opinion words and examining the changes in polarity in the time series.
Design/methodology/approach
In this paper, a review analysis system is proposed to automatically capture feature opinions experienced by other tourists presented in the review documents. In the system, a feature-level SA is designed to determine the polarity of these features. Moreover, an unsupervised method using a part-of-speech pattern clarification query and multi-lexicons SA to summarize all Chinese reviews is adopted.
Findings
The authors expect this method to help travellers search for what they want and make decisions more efficiently. The experimental results show the F-measure of the proposed method to be 0.628. It thus outperforms the methods used in previous studies.
Originality/value
The study is useful for travellers who want to quickly retrieve and summarize helpful information from the pool of messy hotel reviews. Meanwhile, the system will assist hotel managers to comprehensively understand service qualities with which guests are satisfied or dissatisfied.
Details
Keywords
Hei Chia Wang, Yu Hung Chiang and Yen Tzu Huang
In academic work, it is important to identify a specific domain of research. Many researchers may look to conference issues to determine interesting or new topics. Furthermore…
Abstract
Purpose
In academic work, it is important to identify a specific domain of research. Many researchers may look to conference issues to determine interesting or new topics. Furthermore, conference issues can help researchers identify current research trends in their field and learn about cutting-edge developments in their area of specialization. However, so much conference information is published online that it can be difficult to navigate and analyze in a meaningful or productive way. Hence, the use of knowledge management (KM) could be a way to resolve these issues. In KM, ontology is widely adopted, but most ontology construction methods do not consider social information between target users. Therefore, this study aims to propose a novel method of constructing research topic maps using an open directory project (ODP) and social information.
Design/methodology/approach
The approach is to incorporate conference information (i.e. title, keywords and abstract) as sources and to consider the ways in which social information automatically produces research topic maps. The methodology can be divided into four modules: data collection, element extraction, social information analysis and visualization. The data collection module collects the required conference data from the internet and performs pre-processing. Then, the element extraction module extracts topics, associations and other basic elements of topic maps while considering social information. Finally, the results will be shown in the visualization module for researchers to browse and search.
Findings
The results of this study propose three main findings. First, creating topic maps with the ODP category information can help capture a richer set of classification associations. Second, social information should be considered when constructing topic maps. This study includes the relationship among different authors and topics to support information in social networks. By considering social information, such as co-authorship/collaborator, this method helps researchers find research topics that are unfamiliar but interesting or potential cooperative opportunities in the future. Third, this study presents topic maps that show a clear and simple pathway in interested domain knowledge.
Research limitations implications
First, this study analyzes and collects conference information, including the titles, keywords and abstracts of conference papers, so the data set must include all of the abovementioned information. Second, social information only analyzes co-authorship associations (collabship associations); other social information could be extracted in the future study. Third, this study only analyzes the associations between topics. The intensity of associations is not discussed in the study.
Originality/value
The study will have a great impact on learned societies because it bridges the gap between theory and practice. The study is useful for researchers who want to know which conferences are related to their research. Moreover, social networks can help researchers expand and diversify their research.
Details
Keywords
Hei-Chia Wang, Army Justitia and Ching-Wen Wang
The explosion of data due to the sophistication of information and communication technology makes it simple for prospective tourists to learn about previous hotel guests'…
Abstract
Purpose
The explosion of data due to the sophistication of information and communication technology makes it simple for prospective tourists to learn about previous hotel guests' experiences. They prioritize the rating score when selecting a hotel. However, rating scores are less reliable for suggesting a personalized preference for each aspect, especially when they are in a limited number. This study aims to recommend ratings and personalized preference hotels using cross-domain and aspect-based features.
Design/methodology/approach
We propose an aspect-based cross-domain personalized recommendation (AsCDPR), a novel framework for rating prediction and personalized customer preference recommendations. We incorporate a cross-domain personalized approach and aspect-based features of items from the review text. We extracted aspect-based feature vectors from two domains using bidirectional long short-term memory and then mapped them by a multilayer perceptron (MLP). The cross-domain recommendation module trains MLP to analyze sentiment and predict item ratings and the polarities of the aspect based on user preferences.
Findings
Expanded by its synonyms, aspect-based features significantly improve the performance of sentiment analysis on accuracy and the F1-score matrix. With relatively low mean absolute error and root mean square error values, AsCDPR outperforms matrix factorization, collaborative matrix factorization, EMCDPR and Personalized transfer of user preferences for cross-domain recommendation. These values are 1.3657 and 1.6682, respectively.
Research limitation/implications
This study assists users in recommending hotels based on their priority preferences. Users do not need to read other people's reviews to capture the key aspects of items. This model could enhance system reliability in the hospitality industry by providing personalized recommendations.
Originality/value
This study introduces a new approach that embeds aspect-based features of items in a cross-domain personalized recommendation. AsCDPR predicts ratings and provides recommendations based on priority aspects of each user's preferences.
Details
Keywords
Tai-Chia Huang, Chia-Hsuan Hsieh and Hei-Chia Wang
Producing meeting documents requires an instantaneous recorder during meetings, which costs extra human resources and takes time to amend the file. However, a high-quality meeting…
Abstract
Purpose
Producing meeting documents requires an instantaneous recorder during meetings, which costs extra human resources and takes time to amend the file. However, a high-quality meeting document can enable users to recall the meeting content efficiently. The paper aims to discuss these issues.
Design/methodology/approach
An application based on this framework is developed to help the users find topics and obtain summarizations of meeting contents without extra effort. This app uses the Bluemix speech recognizer to obtain speech transcripts. It then combines latent Dirichlet allocation and a TextTiling algorithm with the speech script of meetings to detect boundaries between different topics and evaluate the topics in each segment. TextTeaser, an open API based on a feature-based approach, is then used to summarize the speech transcripts.
Findings
The results indicate that the summaries generated by the machine are 85 percent similar to the records written by humankind.
Originality/value
To reduce the human effort in generating meeting reports, this paper presents a framework to record and analyze meeting contents automatically by voice recognition, topic detection, and extractive summarization.
Details
Keywords
Hei-Chia Wang, Che-Tsung Yang and Yi-Hao Yen
Community question answering (CQA) websites provide an open and free way to share knowledge about general topics on the internet. However, inquirers may not obtain useful answers…
Abstract
Purpose
Community question answering (CQA) websites provide an open and free way to share knowledge about general topics on the internet. However, inquirers may not obtain useful answers and those who are qualified to provide answers may also miss opportunities to share their expertise without any notice. To address this problem, the purpose of this paper is to provide the means for inquirers to access archived answers and to identify effective subject matter experts for target questions.
Design/methodology/approach
This paper presents a question answering promoter, called QAP, for the CQA services. The proposed QAP facilitates the use of filtered archived answers regarded as explicit knowledge and recommended experts regarded as sources of implicit knowledge for the given target questions.
Findings
The experimental results indicate that QAP can leverage knowledge sharing by refining archived answers upon creditability and distributing raised questions to qualified potential experts.
Research limitations/implications
This proposed method is designed for the traditional Chinese corpus.
Originality/value
This paper proposed an integrated framework of answer selection and expert finding uses the bottom-up multipath evaluation algorithm, an underlying voting model, the agglomerative hierarchical clustering technique and feature approaches of answer trustworthiness measuring, identification of satisfied learners and credibility of repliers. The experiments using the corpus crawled from Yahoo! Knowledge Plus under designed scenarios are conducted and results are shown in fine details.
Details
Keywords
Hei-Chia Wang, Martinus Maslim and Hung-Yu Liu
A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as…
Abstract
Purpose
A clickbait is a deceptive headline designed to boost ad revenue without presenting closely relevant content. There are numerous negative repercussions of clickbait, such as causing viewers to feel tricked and unhappy, causing long-term confusion, and even attracting cyber criminals. Automatic detection algorithms for clickbait have been developed to address this issue. The fact that there is only one semantic representation for the same term and a limited dataset in Chinese is a need for the existing technologies for detecting clickbait. This study aims to solve the limitations of automated clickbait detection in the Chinese dataset.
Design/methodology/approach
This study combines both to train the model to capture the probable relationship between clickbait news headlines and news content. In addition, part-of-speech elements are used to generate the most appropriate semantic representation for clickbait detection, improving clickbait detection performance.
Findings
This research successfully compiled a dataset containing up to 20,896 Chinese clickbait news articles. This collection contains news headlines, articles, categories and supplementary metadata. The suggested context-aware clickbait detection (CA-CD) model outperforms existing clickbait detection approaches on many criteria, demonstrating the proposed strategy's efficacy.
Originality/value
The originality of this study resides in the newly compiled Chinese clickbait dataset and contextual semantic representation-based clickbait detection approach employing transfer learning. This method can modify the semantic representation of each word based on context and assist the model in more precisely interpreting the original meaning of news articles.
Details
Keywords
Hei Chia Wang, Yu Hung Chiang and Si Ting Lin
In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly…
Abstract
Purpose
In community question and answer (CQA) services, because of user subjectivity and the limits of knowledge, the distribution of answer quality can vary drastically – from highly related to irrelevant or even spam answers. Previous studies of CQA portals have faced two important issues: answer quality analysis and spam answer filtering. Therefore, the purposes of this study are to filter spam answers in advance using two-phase identification methods and then automatically classify the different types of question and answer (QA) pairs by deep learning. Finally, this study proposes a comprehensive study of answer quality prediction for different types of QA pairs.
Design/methodology/approach
This study proposes an integrated model with a two-phase identification method that filters spam answers in advance and uses a deep learning method [recurrent convolutional neural network (R-CNN)] to automatically classify various types of questions. Logistic regression (LR) is further applied to examine which answer quality features significantly indicate high-quality answers to different types of questions.
Findings
There are four prominent findings. (1) This study confirms that conducting spam filtering before an answer quality analysis can reduce the proportion of high-quality answers that are misjudged as spam answers. (2) The experimental results show that answer quality is better when question types are included. (3) The analysis results for different classifiers show that the R-CNN achieves the best macro-F1 scores (74.8%) in the question type classification module. (4) Finally, the experimental results by LR show that author ranking, answer length and common words could significantly impact answer quality for different types of questions.
Originality/value
The proposed system is simultaneously able to detect spam answers and provide users with quick and efficient retrieval mechanisms for high-quality answers to different types of questions in CQA. Moreover, this study further validates that crucial features exist among the different types of questions that can impact answer quality. Overall, an identification system automatically summarises high-quality answers for each different type of questions from the pool of messy answers in CQA, which can be very useful in helping users make decisions.
Details
Keywords
Hei‐Chia Wang, Ya‐lin Chou and Jiunn‐Liang Guo
The paper's aim is to propose a core journal decision method, called the local impact factor (LIF), which can evaluate the requirements of the local user community by combining…
Abstract
Purpose
The paper's aim is to propose a core journal decision method, called the local impact factor (LIF), which can evaluate the requirements of the local user community by combining both the access rate and the weighted impact factor, and by tracking citation information on the local users' articles.
Design/methodology/approach
Many institutions with a limited budget can subscribe only to the most valuable journals for their users. The importance of a journal to a local community can be calculated in many ways. This paper takes both global and local access frequency and journal citations into consideration. The method of weighted web page link analysis is adopted.
Findings
This paper finds that the weighted page rank may be used efficiently in the core journal decisions. Experimental results demonstrate that the proposed LIF can effectively suggest journals to local users better than existing methods (i.e. impact factor or the local journal rank).
Research limitations/implications
This research requires the determination of the thesis scores, which needs authorisation from the authors. If the scores are not available, the scores may be subjectively assigned or retrieved from the other resources.
Practical implications
A case study in National Cheng Kung University was conducted to show that the LIF can be used to help library managers evaluate the real demands of local community users.
Originality/value
Rather than existing research, this paper focuses on the utilisation and requirements of local community users and also finds the contributions of citation information to be significant and critical.
Details
Keywords
Jiunn-Liang Guo, Hei-Chia Wang and Ming-Way Lai
The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The…
Abstract
Purpose
The purpose of this paper is to develop a novel feature selection approach for automatic text classification of large digital documents – e-books of online library system. The main idea mainly aims on automatically identifying the discourse features in order to improving the feature selection process rather than focussing on the size of the corpus.
Design/methodology/approach
The proposed framework intends to automatically identify the discourse segments within e-books and capture proper discourse subtopics that are cohesively expressed in discourse segments and treating these subtopics as informative and prominent features. The selected set of features is then used to train and perform the e-book classification task based on the support vector machine technique.
Findings
The evaluation of the proposed framework shows that identifying discourse segments and capturing subtopic features leads to better performance, in comparison with two conventional feature selection techniques: TFIDF and mutual information. It also demonstrates that discourse features play important roles among textual features, especially for large documents such as e-books.
Research limitations/implications
Automatically extracted subtopic features cannot be directly entered into FS process but requires control of the threshold.
Practical implications
The proposed technique has demonstrated the promised application of using discourse analysis to enhance the classification of large digital documents – e-books as against to conventional techniques.
Originality/value
A new FS technique is proposed which can inspect the narrative structure of large documents and it is new to the text classification domain. The other contribution is that it inspires the consideration of discourse information in future text analysis, by providing more evidences through evaluation of the results. The proposed system can be integrated into other library management systems.
Details
Keywords
Joana Baleeiro Passos, Daisy Valle Enrique, Camila Costa Dutra and Carla Schwengber ten Caten
The innovation process demands an interaction between environment agents, knowledge generators and policies of incentive for innovation and not only development by companies…
Abstract
Purpose
The innovation process demands an interaction between environment agents, knowledge generators and policies of incentive for innovation and not only development by companies. Universities have gradually become the core of the knowledge production system and, therefore, their role regarding innovation has become more important and diversified. This study is aimed at identifying the mechanisms of university–industry (U–I) collaboration, as well as the operationalization steps of the U–I collaboration process.
Design/methodology/approach
This study is aimed at identifying, based on a systematic literature review, the mechanisms of university–industry (U–I) collaboration, as well as the operationalization steps of the U–I collaboration process.
Findings
The analysis of the 72 selected articles enabled identifying 15 mechanisms of U–I collaboration, proposing a new classification for such mechanisms and developing a framework presenting the operationalization steps of the interaction process.
Originality/value
In this paper, the authors screened nearly 1,500 papers and analyzed in detail 86 papers addressing U–I collaboration, mechanisms of U–I collaboration and operationalization steps of the U–I collaboration process. This paper provides a new classification for such mechanisms and developing a framework presenting the operationalization steps of the interaction process. This research contributes to both theory and practice by highlighting managerial aspects and stimulating academic research on such timely topic.