Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among…
Abstract
Purpose
Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among others. Most information retrieval systems contain several components which use text categorization methods. One of the first text categorization methods was designed using a naïve Bayes representation of the text. Currently, a number of variations of naïve Bayes have been discussed. The purpose of this paper is to evaluate naïve Bayes approaches on text categorization introducing new competitive extensions to previous approaches.
Design/methodology/approach
The paper focuses on introducing a new Bayesian text categorization method based on an extension of the naïve Bayes approach. Some modifications to document representations are introduced based on the well‐known BM25 text information retrieval method. The performance of the method is compared to several extensions of naïve Bayes using benchmark datasets designed for this purpose. The method is compared also to training‐based methods such as support vector machines and logistic regression.
Findings
The proposed text categorizer outperforms state‐of‐the‐art methods without introducing new computational costs. It also achieves performance results very similar to more complex methods based on criterion function optimization as support vector machines or logistic regression.
Practical implications
The proposed method scales well regarding the size of the collection involved. The presented results demonstrate the efficiency and effectiveness of the approach.
Originality/value
The paper introduces a novel naïve Bayes text categorization approach based on the well‐known BM25 information retrieval model, which offers a set of good properties for this problem.
Details
Keywords
V. Srilakshmi, K. Anuradha and C. Shoba Bindu
This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and…
Abstract
Purpose
This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and thus the documents available online become countless. The text documents comprise of research article, journal papers, newspaper, technical reports and blogs. These large documents are useful and valuable for processing real-time applications. Also, these massive documents are used in several retrieval methods. Text classification plays a vital role in information retrieval technologies and is considered as an active field for processing massive applications. The aim of text classification is to categorize the large-sized documents into different categories on the basis of its contents. There exist numerous methods for performing text-related tasks such as profiling users, sentiment analysis and identification of spams, which is considered as a supervised learning issue and is addressed with text classifier.
Design/methodology/approach
At first, the input documents are pre-processed using the stop word removal and stemming technique such that the input is made effective and capable for feature extraction. In the feature extraction process, the features are extracted using the vector space model (VSM) and then, the feature selection is done for selecting the highly relevant features to perform text categorization. Once the features are selected, the text categorization is progressed using the deep belief network (DBN). The training of the DBN is performed using the proposed grasshopper crow optimization algorithm (GCOA) that is the integration of the grasshopper optimization algorithm (GOA) and Crow search algorithm (CSA). Moreover, the hybrid weight bounding model is devised using the proposed GCOA and range degree. Thus, the proposed GCOA + DBN is used for classifying the text documents.
Findings
The performance of the proposed technique is evaluated using accuracy, precision and recall is compared with existing techniques such as naive bayes, k-nearest neighbors, support vector machine and deep convolutional neural network (DCNN) and Stochastic Gradient-CAViaR + DCNN. Here, the proposed GCOA + DBN has improved performance with the values of 0.959, 0.959 and 0.96 for precision, recall and accuracy, respectively.
Originality/value
This paper proposes a technique that categorizes the texts from massive sized documents. From the findings, it can be shown that the proposed GCOA-based DBN effectively classifies the text documents.
Details
Keywords
N. Venkata Sailaja, L. Padmasree and N. Mangathayaru
Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…
Abstract
Purpose
Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.
Design/methodology/approach
The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.
Findings
For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.
Originality/value
In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.
Details
Keywords
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…
Abstract
Purpose
To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.
Design/methodology/approach
A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.
Findings
Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.
Research limitations/implications
The paper does not attempt to provide an exhaustive bibliography of related resources.
Practical implications
As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.
Originality/value
To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.
Details
Keywords
Waleed Zaghloul, Sang M. Lee and Silvana Trimi
The purpose of this paper is to compare the performance of neural networks (NNs) and support vector machines (SVMs) as text classifiers. SVMs are considered one of the best…
Abstract
Purpose
The purpose of this paper is to compare the performance of neural networks (NNs) and support vector machines (SVMs) as text classifiers. SVMs are considered one of the best classifiers. NNs could be adopted as text classifiers if their performance is comparable to that of SVMs.
Design/methodology/approach
Several NNs are trained to classify the same set of text documents with SVMs and their effectiveness is measured. The performance of the two tools is then statistically compared.
Findings
For text classification (TC), the performance of NNs is statistically comparable to that of the SVMs even when a significantly reduced document size is used.
Practical implications
This research finds not only that NNs are very viable TC tools with comparable performance to SVMs, but also that it does so using a much reduced size of document. The successful use of NNs in classifying reduced text documents would be its great advantage as a classification tool, compared to others, as it can bring great savings in terms of computation time and costs.
Originality/value
This paper is of value by showing statistically that NNs could be adopted as text classifiers with effectiveness comparable to SVMs, one of the best text classifiers currently used. This research is the first step towards utilizing NNs in text mining and its sub‐areas.
Details
Keywords
Carlos G. Figuerola, Angel Zazo Rodríguez and José Luis Alonso Berrocal
Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e…
Abstract
Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.
Details
Keywords
The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word…
Abstract
Purpose
The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task.
Design/methodology/approach
Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation‐based approach was compared with the non‐segmentation‐based approach.
Findings
There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy.
Practical implications
Apply the findings to real web text classification is ongoing work.
Originality/value
The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.
Details
Keywords
Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin and Robertas Damaševičius
The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A…
Abstract
Purpose
The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.
Design/methodology/approach
This study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.
Findings
The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.
Originality/value
This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.
Details
Keywords
Efthimia Mavridou, Konstantinos M. Giannoutakis, Dionysios Kehagias, Dimitrios Tzovaras and George Hassapis
Semantic categorization of Web services comprises a fundamental requirement for enabling more efficient and accurate search and discovery of services in the semantic Web era…
Abstract
Purpose
Semantic categorization of Web services comprises a fundamental requirement for enabling more efficient and accurate search and discovery of services in the semantic Web era. However, to efficiently deal with the growing presence of Web services, more automated mechanisms are required. This paper aims to introduce an automatic Web service categorization mechanism, by exploiting various techniques that aim to increase the overall prediction accuracy.
Design/methodology/approach
The paper proposes the use of Error Correcting Output Codes on top of a Logistic Model Trees-based classifier, in conjunction with a data pre-processing technique that reduces the original feature-space dimension without affecting data integrity. The proposed technique is generalized so as to adhere to all Web services with a description file. A semantic matchmaking scheme is also proposed for enabling the semantic annotation of the input and output parameters of each operation.
Findings
The proposed Web service categorization framework was tested with the OWLS-TC v4.0, as well as a synthetic data set with a systematic evaluation procedure that enables comparison with well-known approaches. After conducting exhaustive evaluation experiments, categorization efficiency in terms of accuracy, precision, recall and F-measure was measured. The presented Web service categorization framework outperformed the other benchmark techniques, which comprise different variations of it and also third-party implementations.
Originality/value
The proposed three-level categorization approach is a significant contribution to the Web service community, as it allows the automatic semantic categorization of all functional elements of Web services that are equipped with a service description file.
Details
Keywords
José L. Navarro‐Galindo and José Samos
Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue…
Abstract
Purpose
Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue, as this enables the benefits of the semantic web to be extended. The purpose of this paper is to present a FLERSA (Flexible Range Semantic Annotation) for flexible range semantic annotation.
Design/methodology/approach
A FLERSA is presented as a user‐centred annotation tool for Web content expressed in natural language. The tool has been built in order to illustrate how a WCMS called Joomla! can be converted into its semantic equivalent.
Findings
The development of the tool shows that it is possible to build a semantic WCMS through a combination of semantic components and other resources such as ontologies and emergence technologies, including XML, RDF, RDFa and OWL.
Practical implications
The paper provides a starting‐point for further research in which the principles and techniques of the FLERSA tool can be applied to any WCMS.
Originality/value
The tool allows both manual and automatic semantic annotations, as well as providing enhanced search capabilities. For manual annotation, a new flexible range markup technique is used, based on the RDFa standard, to support the evolution of annotated Web documents more effectively than XPointer. For automatic annotation, a hybrid approach based on machine learning techniques (Vector‐Space Model + n‐grams) is used to determine the concepts that the content of a Web document deals with (from an ontology which provides a taxonomy), based on previous annotations that are used as a training corpus.
Details
Keywords
Thomas Mandl and Christa Womser‐Hacker
A framework for the long‐term learning of user preferences in information retrieval is presented. The multiple indexing and method‐object relations (MIMOR) model tightly…
Abstract
A framework for the long‐term learning of user preferences in information retrieval is presented. The multiple indexing and method‐object relations (MIMOR) model tightly integrates a fusion method and a relevance feedback processor into a learning model. Several black box matching functions can be combined into a linear combination committee machine which reflects the user's vague individual cognitive concepts expressed in relevance feedback decisions. An extension based on the soft computing paradigm couples the relevance feedback processor and the matching function into a unified retrieval system.
Details
Keywords
Marko Grobelnik and Dunja Mladenić
Purpose – To resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different…
Abstract
Purpose – To resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different scale, and for different purpose. Design/methodology/approach – Contribute to knowledge management by applying knowledge discovery approaches to enable computer search for the relevant knowledge whereas the humans give just broad directions. Findings – Knowledge discovery techniques proved to be very appropriate for many problems related to knowledge management. Surprisingly, it is often the case that already relatively simple approaches provide valuable results. Research limitations/implications – Still there are many open problems and scalability issues that arise when dealing with real‐world data and especially in the areas involving text and network analysis. Practical implications – Each problem should be handled with care, taking into account different aspects and selecting/extending the most appropriate available methods or developing some new approaches. Originality/value – This paper provides an interesting collection of selected knowledge discovery methods applied in different context but all contributing in some way to knowledge management. Several of the reported approaches were developed in collaboration with the authors of the paper with especial emphases on their usability for practical problems involving knowledge management.
Details
Keywords
Christopher Soo‐Guan Khoo, Armineh Nourbakhsh and Jin‐Cheon Na
Sentiment analysis and emotion processing are attracting increasing interest in many fields. Computer and information scientists are developing automated methods for sentiment…
Abstract
Purpose
Sentiment analysis and emotion processing are attracting increasing interest in many fields. Computer and information scientists are developing automated methods for sentiment analysis of online text. Most of the studies have focused on identifying sentiment polarity or orientation – whether a document, usually a product or movie review, carries a positive or negative sentiment. It is time for researchers to address more sophisticated kinds of sentiment analysis. This paper aims to evaluate a particular linguistic framework called appraisal theory for adoption in manual as well as automatic sentiment analysis of news text.
Design/methodology/approach
The appraisal theory is applied to the analysis of a sample of political news articles reporting on Iraq and the economic policies of George W. Bush and Mahmoud Ahmadinejad to assess its utility and to identify challenges in adopting this framework.
Findings
The framework was useful in uncovering various aspects of sentiment that should be useful for researchers, such as the appraisers and object of appraisal, bias of the appraisers and the author, type of attitude and manner of expressing the sentiment. Problems encountered include difficulty in identifying appraisal phrases and attitude categories because of the subtlety of expression in political news articles, lack of treatment of tense and timeframe, lack of a typology of emotions, and need to identify different types of behaviours (political, verbal and material actions) that reflect sentiment.
Originality/value
The study has identified future directions for research in automated sentiment analysis as well as sentiment analysis of online news text. It has also demonstrated how sentiment analysis of news text can be carried out.
Details
Keywords
Swati Garg, Shuchi Sinha, Arpan Kumar Kar and Mauricio Mani
This paper reviews 105 Scopus-indexed articles to identify the degree, scope and purposes of machine learning (ML) adoption in the core functions of human resource management…
Abstract
Purpose
This paper reviews 105 Scopus-indexed articles to identify the degree, scope and purposes of machine learning (ML) adoption in the core functions of human resource management (HRM).
Design/methodology/approach
A semi-systematic approach has been used in this review. It allows for a more detailed analysis of the literature which emerges from multiple disciplines and uses different methods and theoretical frameworks. Since ML research comes from multiple disciplines and consists of several methods, a semi-systematic approach to literature review was considered appropriate.
Findings
The review suggests that HRM has embraced ML, albeit it is at a nascent stage and is receiving attention largely from technology-oriented researchers. ML applications are strongest in the areas of recruitment and performance management and the use of decision trees and text-mining algorithms for classification dominate all functions of HRM. For complex processes, ML applications are still at an early stage; requiring HR experts and ML specialists to work together.
Originality/value
Given the current focus of organizations on digitalization, this review contributes significantly to the understanding of the current state of ML integration in HRM. Along with increasing efficiency and effectiveness of HRM functions, ML applications improve employees' experience and facilitate performance in the organizations.
Details
Keywords
Lu Zhang, Pu Dong, Long Zhang, Bojiao Mu and Ahui Yang
This study aims to explore the dissemination and evolutionary path of online public opinion from a crisis management perspective. By clarifying the influencing factors and dynamic…
Abstract
Purpose
This study aims to explore the dissemination and evolutionary path of online public opinion from a crisis management perspective. By clarifying the influencing factors and dynamic mechanisms of online public opinion dissemination, this study provides insights into attenuating the negative impact of online public opinion and creating a favorable ecological space for online public opinion.
Design/methodology/approach
This research employs bibliometric analysis and CiteSpace software to analyze 302 Chinese articles published from 2006 to 2023 in the China National Knowledge Infrastructure (CNKI) database and 276 English articles published from 1994 to 2023 in the Web of Science core set database. Through literature keyword clustering, co-citation analysis and burst terms analysis, this paper summarizes the core scientific research institutions, scholars, hot topics and evolutionary paths of online public opinion crisis management research from both Chinese and international academic communities.
Findings
The results show that the study of online public opinion crisis management in China and internationally is centered on the life cycle theory, which integrates knowledge from information, computer and system sciences. Although there are differences in political interaction and stage evolution, the overall evolutionary path is similar, and it develops dynamically in the “benign conflict” between the expansion of the research perspective and the gradual refinement of research granularity.
Originality/value
This study summarizes the research results of online public opinion crisis management from China and the international academic community and identifies current research hotspots and theoretical evolution paths. Future research can focus on deepening the basic theories of public opinion crisis management under the influence of frontier technologies, exploring the subjectivity and emotionality of web users using fine algorithms and promoting the international development of network public opinion crisis management theory through transnational comparison and international cooperation.
Details
Keywords
Shubhada Prashant Nagarkar and Rajendra Kumbhar
The purpose of this paper was to analyse text mining (TM) literature indexed in the Web of Science (WoS) under the “Information Science Library Science” subcategory. More…
Abstract
Purpose
The purpose of this paper was to analyse text mining (TM) literature indexed in the Web of Science (WoS) under the “Information Science Library Science” subcategory. More specifically, it analyses the chronological growth of TM literature, and the major countries, institutions, departments and individuals contributing to TM literature. Collaboration in TM research is also analysed.
Design/methodology/approach
Bibliographic and citation data required for this research were retrieved from the WoS database. TM being a multidisciplinary field, the search was restricted to “Information Science Library Science” subcategory in the WoS. A comprehensive query statement covering all synonyms of “text mining” was prepared using the Boolean operator “OR”. Microsoft Excel and HistCite software were used for data analysis. Pajek and VoSviewer were used for data visualization.
Findings
It was found that USA is the major producer of TM research literature, and the highest number of papers were published in the Journal of The American Medical Informatics. Columbia University ranked first both in number of articles and citations received in the top ten institutes publishing TM literature. It was also observed that six of the top ten subdivisions of institutions are either from medicine or medical informatics or biomedical information. H.C. Chen and C. Friedman were seen to be the most prolific authors.
Research limitations/implications
The paper analyses articles on TM published during 1999-2013 in WoS under the subcategory Information Science Library Science’.
Originality/value
The paper is based on empirical data exclusively gathered for this research.
Details
Keywords
Yu‐Liang Chi and Hsiao‐Chi Chen
The purpose of this paper is to demonstrate how the semantic rules in conjunction with ontology can be applied for inferring new facts to dispatch news into corresponding…
Abstract
Purpose
The purpose of this paper is to demonstrate how the semantic rules in conjunction with ontology can be applied for inferring new facts to dispatch news into corresponding departments.
Design/methodology/approach
Under a specific task domain, the proposed design comprises finding a glossary from electronic resources, gathering organization functions as controlled vocabularies, and linking relationships between the glossary and controlled vocabularies. Web ontology language is employed to represent this knowledge as ontology, and semantic web rule language is utilized to infer implicit facts among instances.
Findings
Document dispatching is highly domain dependent. Human perspectives being adopted as predefined knowledge in understanding document meanings are important. Knowledge‐intensive approaches such as ontology can model and represent expertise as reusable components. Ontology and rules together extend inference capabilities in semantic relationships between instances.
Practical implications
Empirical lessons reveal that ontology with semantic rules can be utilized to model human subjective judgement as knowledge bases. An example, including ontology and rules, based on news dispatching is provided.
Originality/value
An organization can classify and deliver documents to corresponding departments based on known facts by following the described procedure.
Details
Keywords
Tariq Soussan and Marcello Trovati
Social media has become a vital part of any institute’s marketing plan. Social networks benefit businesses by allowing them to interact with their clients, grow brand exposure…
Abstract
Purpose
Social media has become a vital part of any institute’s marketing plan. Social networks benefit businesses by allowing them to interact with their clients, grow brand exposure through offers and promotions and find new leads. It also offers vital information concerning the general emotions and sentiments directly connected to the welfare and security of the online community involved with the brand. Big organizations can make use of their social media data to generate planned and operational decisions. This paper aims to look into the conversion of sentiments and emotions over time.
Design/methodology/approach
In this work, a model called sentiment urgency emotion detection (SUED) from previous work will be applied on tweets from two different periods of time, one before the start of the COVID-19 pandemic and the other after it started to monitor the conversion of sentiments and emotions over time. The model has been trained to improve its accuracy and F1 score so that the precision and percentage of correctly predicted texts is high. This model will be tuned to improve results (Soussan and Trovati, 2020a; Soussan and Trovati, 2020b) and will be applied on a general business Twitter account of one of the largest chains of supermarkets in the UK to be able to see what sentiments and emotions can be detected and how urgent they are.
Findings
This will show the effect of COVID-19 pandemic on the conversions of the sentiments, emotions and urgencies of the tweets.
Originality/value
Sentiments will be compared between the two periods to evaluate how sentiments and emotions vary over time taking into consideration the COVID-19 as an affective factor. In addition, SUED will be tuned to enhance results and the knowledge that is mined when turning data into decisions is crucial because it will aid stakeholders handling the institute to evaluate the topics and issues that were mostly emphasized.
Details
Keywords
Sandra Maria Correia Loureiro, Ricardo Godinho Bilro and Arnold Japutra
This paper aims to explore the relationships between website quality – through consumer-generated media stimuli-, emotions and consumer-brand engagement in online environments.
Abstract
Purpose
This paper aims to explore the relationships between website quality – through consumer-generated media stimuli-, emotions and consumer-brand engagement in online environments.
Design/methodology/approach
Two independent studies are conducted to examine these relationships. Study 1, based on a sample of 366 respondents, uses a structural equation modelling approach to test the research hypotheses. Study 2, based on 1,454 online consumer reviews, uses text-mining technique to examine further the relationship between emotions and consumer-brand engagement.
Findings
The findings show that all the consumer-generated media stimuli are positively related to the dimensions of emotions. However, only pleasure and arousal are positively related to the three variables of consumer-brand engagement. The findings also show cognitive processing as the strongest dimension of consumer-brand engagement providing positive sentiments towards brands.
Practical implications
The findings provide marketers with an understanding of how valid, useful and relevant content (i.e. information/content) creates a greater emotional connection and drive consumer-brand engagement. Marketers should be aware that consumer-generated media stimuli influence consumers’ emotions and their reaction.
Originality/value
This study is one of the firsts to adapt and apply the S-O-R framework in explaining online consumer-brand engagement. This study also adds to the brand engagement literature as the first study that combines PLS-SEM approach with text-mining analysis to provide a better understanding of these relationships.
Details
Keywords
Jin Zhang, Yanyan Wang and Yuehua Zhao
The statistical method plays an extremely important role in quantitative research studies in library and information science (LIS). The purpose of this paper is to investigate the…
Abstract
Purpose
The statistical method plays an extremely important role in quantitative research studies in library and information science (LIS). The purpose of this paper is to investigate the status of statistical methods used in the field, their application areas and the temporal change patterns during a recent 15-year period.
Design/methodology/approach
The research papers in six major scholarly journals from 1999 to 2013 in LIS were examined. Factors including statistical methods, application areas and time period were analyzed using quantitative research methods including content analysis and temporal analysis methods.
Findings
The research studies using statistical methods in LIS have increased steadily. Statistical methods were more frequently used to solve problems in the information retrieval area than in other areas, and inferential statistical methods were used more often than predictive statistical methods and other statistical methods. Anomaly analysis on statistical method uses was conducted and four types of anomaly were specified.
Originality/value
The findings of this study can help educators, graduates and researchers in the field of LIS better understand the patterns and trends of the applications of statistical methods in this field, depict an overall picture of quantitative research studies in LIS from the perspective of statistical methods and discover the change patterns of statistical method applications in LIS between 1999 and 2013.
Details
Keywords
The aim of this study is to offer valuable insights to businesses and facilitate better understanding on transformer-based models (TBMs), which are among the widely employed…
Abstract
Purpose
The aim of this study is to offer valuable insights to businesses and facilitate better understanding on transformer-based models (TBMs), which are among the widely employed generative artificial intelligence (GAI) models, garnering substantial attention due to their ability to process and generate complex data.
Design/methodology/approach
Existing studies on TBMs tend to be limited in scope, either focusing on specific fields or being highly technical. To bridge this gap, this study conducts robust bibliometric analysis to explore the trends across journals, authors, affiliations, countries and research trajectories using science mapping techniques – co-citation, co-words and strategic diagram analysis.
Findings
Identified research gaps encompass the evolution of new closed and open-source TBMs; limited exploration across industries like education and disciplines like marketing; a lack of in-depth exploration on TBMs' adoption in the health sector; scarcity of research on TBMs' ethical considerations and potential TBMs' performance research in diverse applications, like image processing.
Originality/value
The study offers an updated TBMs landscape and proposes a theoretical framework for TBMs' adoption in organizations. Implications for managers and researchers along with suggested research questions to guide future investigations are provided.
Details
Keywords
Chih-Hung Chung and Lu-Jia Chen
The purpose of this study is to explore the capabilities required by entry-level human resources (HR) professionals based on job advertisements by using text mining (TM) technique.
Abstract
Purpose
The purpose of this study is to explore the capabilities required by entry-level human resources (HR) professionals based on job advertisements by using text mining (TM) technique.
Design/methodology/approach
This study used TM techniques to explore the capabilities required by entry-level HR professionals based on job advertisements on HR agency 104’s website in Taiwan. Python was used to crawl the advertisements on the website, and 841 posts were collected. Next, the author used TM to explore and understand hidden trends and patterns in numerous data sets.
Findings
The results of this study revealed four critical success factors (specific skills, educational level, experience and specific capabilities), five clusters and ten classifications.
Practical implications
The results can aid HR curriculum developers and educators in customizing and improving HR education curricula, such that HR students can develop capabilities required to secure employment in the current HR job market.
Originality/value
Our results may facilitate the understanding of the current trends in the HR job market and provide useful suggestions to HR curriculum developers for improving training and professional course design, such that students’ competitiveness is enhanced and professional capabilities improved.
Details
Keywords
Andry Alamsyah and Raras Fitriyani Astuti
This study aims to analyze public discourse on decentralized finance (DeFi) and central bank digital currencies (CBDC) using advanced natural language processing (NLP) techniques…
Abstract
Purpose
This study aims to analyze public discourse on decentralized finance (DeFi) and central bank digital currencies (CBDC) using advanced natural language processing (NLP) techniques to uncover key insights that can guide financial policy and innovation. This research seeks to fill the gap in the existing literature by applying state-of-the-art NLP models like BERT and RoBERTa to understand the evolving online discourse around DeFi and CBDC.
Design/methodology/approach
This study uses a multilabel classification using BERT and RoBERTa models alongside BERTopic for topic modeling. Data is collected from social media platforms, including Twitter and LinkedIn, as well as relevant documents, to analyze public sentiment and discourse. Model performance is evaluated based on accuracy, precision, recall and F1-scores.
Findings
RoBERTa outperforms BERT in classification accuracy and precision across all metrics, making it more effective in categorizing public discourse on DeFi and CBDC. BERTopic identifies five key topics frequently discussed, such as financial inclusion, competition and growth in DeFi, with important implications for policymakers.
Practical implications
The insights derived from this study provide valuable information for financial regulators and policymakers to develop more informed, data-driven strategies for implementing and regulating DeFi and CBDC. Public discourse analysis enables policymakers to understand emerging concerns and trends critical for crafting effective financial policies.
Originality/value
This study is among the first to use advanced NLP models, including RoBERTa and BERTopic, to analyze public discourse on DeFi and CBDC. It offers novel insights into the potential challenges and opportunities these innovations present. It contributes to the growing body of research on the intersection of digital financial technologies and public sentiment.
Details
Keywords
Jing Chen, Lu Zhang and Wenhai Qian
Attentive to task-related information is the prerequisite for task completion. Comparing the cognition between attentive readers (AR) and inattentive readers (IAR) is of great…
Abstract
Purpose
Attentive to task-related information is the prerequisite for task completion. Comparing the cognition between attentive readers (AR) and inattentive readers (IAR) is of great value for improving reading services which has seldom been studied. To explore their cognitive differences, this study investigates the effectiveness, efficiency and cognitive resource allocation strategy by eye-tracking technology.
Design/methodology/approach
A controlled user study of two types of task, fact-finding (FF) and content understanding (CU) tasks was conducted to collect data including answer for task, fixation duration (FD), fixation count (FC), fixation duration proportion (FDP), and fixation count proportion (FCP). 24 participants were placed into AR or IAR group according to their fixation duration on paragraphs related to task.
Findings
Two types of cognitive resource allocation strategies, question-oriented (QO) and navigation-assistant (NA) were identified according to the differences in FDP and FCP. In FF task, although QO strategy was applied by the two groups, AR group was significantly more effective and efficient. In CU task, although the two groups were similar in effectiveness and efficiency, AR group promoted their strategies to NA while IAR group sticked to applying QO strategy. Furthermore, an interesting phenomenon “win by uncertainty”, which implies IAR group may get correct answer through uncertain means, such as clue, domain knowledge or guess, rather than task-related information, was observed.
Originality/value
This study takes a deep insight into cognition from the prospect of attentive and inattentive to task-related information. Identifying indicators about cognition helps to distinguish attentive and inattentive readers in various tasks automatically. The cognitive resource allocation strategy applied by readers sheds new light on reading skill training. A typical reading phenomenon “win by uncertainty” was found and defined. Understanding the phenomenon is of great value for satisfying reader information need and enhancing their deep learning.
Details
Keywords
Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…
Abstract
Purpose
In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.
Design/methodology/approach
On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.
Findings
The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.
Originality/value
The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.
Details
Keywords
Gustavo Silva, Leandro F. Pereira, José Crespo Carvalho, Rui Vinhas da Silva and Ana Simoes
This study aims to conduct a pertinent assessment of the concept of business competitiveness and how Portugal can progress in that field, for the sake of becoming a more…
Abstract
Purpose
This study aims to conduct a pertinent assessment of the concept of business competitiveness and how Portugal can progress in that field, for the sake of becoming a more sustainable and wealth-creator economy.
Design/methodology/approach
The research was elaborated with 65 in-depth interviews with expert persons from the Portuguese business ecosystem, who were asked to reflect on the state of the economy and competitiveness of the country.
Findings
There is much room for improvement in almost all areas of activity, in particular by promoting an innovative, value-adding and exporting private sector and a lighter and more efficient public sector. The conclusions point to modernisation of the Portuguese economy as a way of making it more competitive in a highly competitive and demanding global scenario.
Originality/value
To the best of the authors’ knowledge, it is the first time that a reflection with experts of the local Portuguese economy has been carried out, especially after a difficult period of COVID.
Details
Keywords
Abstract
Details
Keywords
Mamta Kayest and Sanjay Kumar Jain
Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The…
Abstract
Purpose
Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The purpose of this paper is to develop an effective document retrieval method, which focuses on reducing the time needed for the navigator to evoke the whole document based on contents, themes and concepts of documents.
Design/methodology/approach
This paper introduces an incremental learning approach for text categorization using Monarch Butterfly optimization–FireFly optimization based Neural Network (MB–FF based NN). Initially, the feature extraction is carried out on the pre-processed data using Term Frequency–Inverse Document Frequency (TF–IDF) and holoentropy to find the keywords of the document. Then, cluster-based indexing is performed using MB–FF algorithm, and finally, by matching process with the modified Bhattacharya distance measure, the document retrieval is done. In MB–FF based NN, the weights in the NN are chosen using MB–FF algorithm.
Findings
The effectiveness of the proposed MB–FF based NN is proven with an improved precision value of 0.8769, recall value of 0.7957, F-measure of 0.8143 and accuracy of 0.7815, respectively.
Originality/value
The experimental results show that the proposed MB–FF based NN is useful to companies, which have a large workforce across the country.
Details
Keywords
Stanley Loh, José Palazzo M. de Oliveira and Fábio Leite Gastal
This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high‐level textual characteristics. Instead of…
Abstract
This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high‐level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi‐automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.
Details
Keywords
Haichao Dong, Siu Cheung Hui and Yulan He
The purpose of this research is to study the characteristics of chat messages from analysing a collection of 33,121 sample messages gathered from 1,700 sessions of conversations…
Abstract
Purpose
The purpose of this research is to study the characteristics of chat messages from analysing a collection of 33,121 sample messages gathered from 1,700 sessions of conversations of 72 pairs of MSN Messenger users over a four month duration from June to September of 2005. The primary objective of chat message characterization is to understand the properties of chat messages for effective message analysis, such as message topic detection.
Design/methodology/approach
From the study on chat message characteristics, an indicative term‐based categorization approach for chat topic detection is proposed. In the proposed approach, different techniques such as sessionalisation of chat messages and extraction of features from icon texts and URLs are incorporated for message pre‐processing. Naïve Bayes, Associative Classification, and Support Vector Machine are employed as classifiers for categorizing topics from chat sessions.
Findings
Indicative term‐based approach is superior to the traditional document frequency based approach, for feature selection in chat topic categorization.
Originality/value
This paper studies the characteristics of chat messages and proposes an indicative term‐based categorization approach for chat topic detection.
Details
Keywords
Esther David, Maayan Zhitomirsky-Geffet, Moshe Koppel and Hodaya Uzan
Social network sites have been widely adopted by politicians in the last election campaigns. To increase the effectiveness of these campaigns the potential electorate is to be…
Abstract
Purpose
Social network sites have been widely adopted by politicians in the last election campaigns. To increase the effectiveness of these campaigns the potential electorate is to be identified, as targeted ads are much more effective than non-targeted ads. Therefore, the purpose of this paper is to propose and implement a new methodology for automatic prediction of political orientation of users on social network sites by comparison to texts from the overtly political parties’ pages.
Design/methodology/approach
To this end, textual information on personal users’ pages is used as a source of statistical features. The authors apply automatic text categorization algorithms to distinguish between texts of users from different political wings. However, these algorithms require a set of manually labeled texts for training, which is typically unavailable in real life situations. To overcome this limitation the authors propose to use texts available on various political parties’ pages on a social network site to train the classifier. The political leaning of these texts is determined by the political affiliation of the corresponding parties. The classifier learned on such overtly political texts is then applied on the personal user pages to predict their political orientation. To assess the validity and effectiveness of the proposed methodology two corpora were constructed: personal Facebook pages of 450 Israeli citizens, and political parties Facebook pages of the nine prominent Israeli parties.
Findings
The authors found that when a political tendency classifier is trained and tested on data in the same corpus, accuracy is very high. More significantly, training on manifestly political texts (political party Facebook pages) yields classifiers which can be used to classify non-political personal Facebook pages with fair accuracy.
Social implications
Previous studies have shown that targeted ads are more effective than non-targeted ads leading to substantial saving in the advertising budget. Therefore, the approach for automatic determining the political orientation of users on social network sites might be adopted for targeting political messages, especially during election campaigns.
Originality/value
This paper proposes and implements a new approach for automatic cross-corpora identification of political bias of user profiles on social network. This suggests that individuals’ political tendencies can be identified without recourse to any tagged personal data. In addition, the authors use learned classifiers to determine which self-identified centrists lean left or right and which voters are likely to switch allegiance in subsequent elections.
Details
Keywords
With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes…
Abstract
Purpose
With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes the use of binary k‐nearest neighbour (BKNN) for text categorization.
Design/methodology/approach
The paper describes the traditional k‐nearest neighbor (KNN) classifier, introduces BKNN and outlines experiemental results.
Findings
The experimental results indicate that BKNN requires much less CPU time than KNN, without loss of classification performance.
Originality/value
The paper demonstrates how BKNN can be an efficient and effective algorithm for text categorization. Proposes the use of binary k‐nearest neighbor (BKNN ) for text categorization.
Details
Keywords
Maayan Zhitomirsky-Geffet, Esther David, Moshe Koppel and Hodaya Uzan
Reliability and political bias of mass media has been a controversial topic in the literature. The purpose of this paper is to propose and implement a methodology for fully…
Abstract
Purpose
Reliability and political bias of mass media has been a controversial topic in the literature. The purpose of this paper is to propose and implement a methodology for fully automatic evaluation of the political tendency of the written media on the web, which does not rely on subjective human judgments.
Design/methodology/approach
The underlying idea is to base the evaluation on fully automatic comparison of the texts of articles on different news websites to the overtly political texts with known political orientation. The authors also apply an alternative approach for evaluation of political tendency based on wisdom of the crowds.
Findings
The authors found that the learnt classifier can accurately distinguish between self-declared left and right news sites. Furthermore, news sites’ political tendencies can be identified by automatic classifier learnt from manifestly political texts without recourse to any manually tagged data. The authors also show a high correlation between readers’ perception (as a “wisdom of crowds” evaluation) of the bias and the classifier results for different news sites.
Social implications
The results are quite promising and can put an end to the never ending dispute on the reliability and bias of the press.
Originality/value
This paper proposes and implements a new approach for fully automatic (independent of human opinion/assessment) identification of political bias of news sites by their texts.
Details
Keywords
Issa Alsmadi and Keng Hoon Gan
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…
Abstract
Purpose
Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.
Design/methodology/approach
The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.
Findings
This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.
Originality/value
Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.
Details
Keywords
Claudia Vásquez Rojas, Eduardo Roldán Reyes, Fernando Aguirre y Hernández and Guillermo Cortés Robles
Strategic planning (SP) enables enterprises to plan management and operations activities efficiently in the medium and large term. During its implementation, many processes and…
Abstract
Purpose
Strategic planning (SP) enables enterprises to plan management and operations activities efficiently in the medium and large term. During its implementation, many processes and methods are manually applied and may be time consuming. The purpose of this paper is to introduce an automatic method to define strategic plans by using text mining (TM) algorithms within a generic SP model especially suited for small- and medium-sized enterprises (SMEs).
Design/methodology/approach
Textual feedbacks were collected through a SWOT matrix during the implementation of a SP model in a company dedicated to the local distribution of food. A four-step TM process (performing acquisition, pre-processing, processing, and validation tasks) is applied via a framework developed under the cloud computer paradigm in order to determine the strategic plans.
Findings
The use of categorization and clustering algorithms show that unstructured textual information produced during the SP can be efficiently processed and capitalized. Collected evidence reveals the potential to enhance the strategic plans creation with less effort and time, improving the relevance, and producing new technological resources accessible to SMEs.
Originality/value
An innovative framework especially suited for the SMEs based on the synergy assumption of the coupling between TM and a generic SP model.
Details
Keywords
Kwok-Kuen To and Ming Fai Pang
The purpose of this paper is to investigate how different arrangements, such as lesson structures and patterns of variation, enhance students’ genre awareness, their understanding…
Abstract
Purpose
The purpose of this paper is to investigate how different arrangements, such as lesson structures and patterns of variation, enhance students’ genre awareness, their understanding of genre features of informative text and can generate new learning.
Design/methodology/approach
This is an example of learning study consisting of a design experiment, and embedded in the design was the selected test criteria. The variation theory of learning served as the major guiding principle for the pedagogical design, lesson analysis and evaluation.
Findings
The findings of this study give support to variation theory being a powerful pedagogical tool for improving students’ understanding of informative texts and enabling them to generate new learning. Students in the target group who had more opportunities to encounter the “first contrast, next contrast and last generalisation” pattern of variation performed better than those in the comparison group, who were exposed to the “first generalisation, next contrast and last generalisation” pattern. The pure hierarchical lesson structure used for the target group was found to be more conducive to learning than the mixed structure (sequential–hierarchical structure) used in the comparison group.
Originality/value
Both the lesson structure and patterns of variation and invariance used are extremely important in developing a powerful method of enhancing students’ genre awareness, their understanding of genre features of informative text and to generate new learning.
Details
Keywords
Umama Rahman and Miraj Uddin Mahbub
The data created from regular maintenance activities of equipment are stored as text in industrial plants. The size of these data is increasing rapidly nowadays. Text mining…
Abstract
Purpose
The data created from regular maintenance activities of equipment are stored as text in industrial plants. The size of these data is increasing rapidly nowadays. Text mining provides a chance to handle this huge amount of text data and extract meaningful information to improve various processes of an industrial environment. This paper represents the application of classification models on maintenance text records to classify failure for improving maintenance programs in the industry.
Design/methodology/approach
This paper is presented as an implementation study, where text mining approaches are used for binary classification of text data. Naive Bayes and Support Vector Machine (SVM), two classification algorithms are applied for training and testing of the models as per the labeled data. The reason behind this is, these algorithms perform better on text data for classifying failure and they are easy to handle. A methodology is proposed for the development of maintenance programs, including classification of potential failure in advance by analyzing the regular maintenance data as well as comparing the performance of both models on the data.
Findings
The accuracy of both models falls within the acceptable limit, and performance evaluation of the models concludes the validation of the results. Other performance measures exhibit excellent values for both of the models.
Practical implications
The proposed approach provides the maintenance team an opportunity to know about the upcoming breakdown in advance so that necessary measures can be taken to prevent failure in an industrial environment. As predictive maintenance incurs a high expense, it could be a better replacement for small and medium industrial plants.
Originality/value
Nowadays, maintenance is preventive-based rather than a corrective approach. The proposed technique is facilitating the concept of a proactive approach by minimizing the cost of additional maintenance steps. As predictive maintenance is efficient but incurs high expenses, this proposed method can minimize unnecessary maintenance operations and keep control over the budget. This is a significant way of developing maintenance programs and will make maintenance personnel ready for the machine breakdown.
Details
Keywords
Ko-Chiu Wu and Tsai-Ying Hsieh
The purpose of this paper is to investigate user experiences with a touch-wall interface featuring both clustering and categorization representations of available e-books in a…
Abstract
Purpose
The purpose of this paper is to investigate user experiences with a touch-wall interface featuring both clustering and categorization representations of available e-books in a public library to understand human information interactions under work-focused and recreational contexts.
Design/methodology/approach
Researchers collected questionnaires from 251 New Taipei City Library visitors who used the touch-wall interface to search for new titles. The authors applied structural equation modelling to examine relationships among hedonic/utilitarian needs, clustering and categorization representations, perceived ease of use (EU) and the extent to which users experienced anxiety and uncertainty (AU) while interacting with the interface.
Findings
Utilitarian users who have an explicit idea of what they intend to find tend to prefer the categorization interface. A hedonic-oriented user tends to prefer clustering interfaces. Users reported EU regardless of which interface they engaged with. Results revealed that use of the clustering interface had a negative correlation with AU. Users that seek to satisfy utilitarian needs tended to emphasize the importance of perceived EU, whilst pleasure-seeking users were a little more tolerant of anxiety or uncertainty.
Originality/value
The Online Public Access Catalogue (OPAC) encourages library visitors to borrow digital books through the implementation of an information visualization system. This situation poses an opportunity to validate uses and gratification theory. People with hedonic/utilitarian needs displayed different risk-control attitudes and affected uncertainty using the interface. Knowledge about user interaction with such interfaces is vital when launching the development of a new OPAC.
Details
Keywords
Amirreza Ghadiridehkordi, Jia Shao, Roshan Boojihawon, Qianxi Wang and Hui Li
This study examines the role of online customer reviews through text mining and sentiment analysis to improve customer satisfaction across various services within the UK banking…
Abstract
Purpose
This study examines the role of online customer reviews through text mining and sentiment analysis to improve customer satisfaction across various services within the UK banking sector. Additionally, the study analyses sentiment trends over a five-year period.
Design/methodology/approach
Using DistilBERT and Support Vector Machine algorithms, customer sentiments were assessed through an analysis of 20,137 Trustpilot reviews of HSBC, Santander, and Tesco Bank from 2018 to 2023. Data pre-processing steps were implemented to ensure data integrity and minimize noise.
Findings
Both positive and negative sentiments provide valuable insights. The results indicate a high prevalence of negative sentiments related to customer service and communication, with HSBC and Santander receiving 90.8% and 89.7% negative feedback, respectively, compared to Tesco Bank’s 66.8%. Key areas for improvement include HSBC’s credit card services and call center efficiency, which experienced increased negative feedback during the COVID-19 pandemic. The findings also demonstrate that DistilBERT excelled in categorizing reviews, while the SVM model, when combined with customer ratings, achieved 96% accuracy in sentiment analysis.
Research limitations/implications
This study focuses on UK bank consumers of HSBC, Santander, and Tesco Bank. A multi-country or cross-cultural study may further enhance our understanding of the approaches and findings.
Practical implications
Online customer reviews become more informative when categorised by service sector. To enhance customer satisfaction, bank managers should pay attention to both positive and negative reviews, and track trends over time.
Originality/value
The uniqueness of this study lies in its exploration of the importance of categorisation in text-mining-based sentiment analysis, its focus on the influence of both positive and negative sentiments, and its emphasis on tracking sentiment trends over time.
Details
Keywords
The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…
Abstract
Purpose
The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.
Design/methodology/approach
An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.
Findings
The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.
Originality/value
The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification
Details
Keywords
Ankie Visschedijk and Forbes Gibb
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…
Abstract
This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.
Qingyu Zhang and Richard S. Segall
The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.
Abstract
Purpose
The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.
Design/methodology/approach
Selected softwares are compared with their common and unique features. The software for data mining are SAS® Enterprise Miner™, Megaputer PolyAnalyst® 5.0, NeuralWare Predict®, and BioDiscovery GeneSight®. The software for TM are CompareSuite, SAS® Text Miner, TextAnalyst, VisualText, Megaputer PolyAnalyst® 5.0, and WordStat. The software for web mining are Megaputer PolyAnalyst®, SPSS Clementine®, ClickTracks, and QL2.
Findings
This paper discusses and compares the existing features, characteristics, and algorithms of selected software for data mining, TM, and web mining, respectively. These softwares are also applied to available data sets.
Research limitations/implications
The limitations are the inclusion of selected software and datasets rather than considering the entire realm of these. This review could be used as a framework for comparing other data, text, and web mining software.
Practical implications
This paper can be helpful for an organization or individual when choosing proper software to meet their mining needs.
Originality/value
Each of the software selected for this research has its own unique characteristics, properties, and algorithms. No other paper compares these selected softwares both visually and descriptively for all the three types of data, text, and web mining.
Details
Keywords
The purpose of this paper is to examine the role of big data text analytics as an enabler of knowledge management (KM). The paper argues that big data text analytics represents an…
Abstract
Purpose
The purpose of this paper is to examine the role of big data text analytics as an enabler of knowledge management (KM). The paper argues that big data text analytics represents an important means to visualise and analyse data, especially unstructured data, which have the potential to improve KM within organisations.
Design/methodology/approach
The study uses text analytics to review 196 articles published in two of the leading KM journals – Journal of Knowledge Management and Journal of Knowledge Management Research & Practice – in 2013 and 2014. The text analytics approach is used to process, extract and analyse the 196 papers to identify trends in terms of keywords, topics and keyword/topic clusters to show the utility of big data text analytics.
Findings
The findings show how big data text analytics can have a key enabler role in KM. Drawing on the 196 articles analysed, the paper shows the power of big data-oriented text analytics tools in supporting KM through the visualisation of data. In this way, the authors highlight the nature and quality of the knowledge generated through this method for efficient KM in developing a competitive advantage.
Research limitations/implications
The research has important implications concerning the role of big data text analytics in KM, and specifically the nature and quality of knowledge produced using text analytics. The authors use text analytics to exemplify the value of big data in the context of KM and highlight how future studies could develop and extend these findings in different contexts.
Practical implications
Results contribute to understanding the role of big data text analytics as a means to enhance the effectiveness of KM. The paper provides important insights that can be applied to different business functions, from supply chain management to marketing management to support KM, through the use of big data text analytics.
Originality/value
The study demonstrates the practical application of the big data tools for data visualisation, and, with it, improving KM.
Details
Keywords
Debasis Majhi and Bhaskar Mukherjee
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…
Abstract
Purpose
The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.
Design/methodology/approach
By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.
Findings
Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.
Practical implications
Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.
Originality/value
To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.
Details
Keywords
Dion Hoe‐Lian Goh, Alton Chua, Chei Sian Lee and Khasfariyati Razikin
Social tagging systems allow users to assign keywords (tags) to useful resources, facilitating their future access by the tag creator and possibly by other users. Social tagging…
Abstract
Purpose
Social tagging systems allow users to assign keywords (tags) to useful resources, facilitating their future access by the tag creator and possibly by other users. Social tagging has both proponents and critics, and this paper aims to investigate if tags are an effective means of resource discovery.
Design/methodology/approach
The paper adopts techniques from text categorisation in which webpages and their associated tags from del.icio.us and trained Support Vector Machine (SVM) classifiers are downloaded to determine if the documents could be assigned to their associated tags. Two text categorisation experiments were conducted. The first used only the terms from the documents as features while the second experiment included tags in addition to terms as part of its feature set. Performance metrics used were precision, recall, accuracy and F1 score. A content analysis was also conducted to uncover characteristics of effective and ineffective tags for resource discovery.
Findings
Results from the classifiers were mixed, and the inclusion of tags as part of the feature set did not result in a statistically significant improvement (or degradation) of the performance of the SVM classifiers. This suggests that not all tags can be used for resource discovery by public users, confirming earlier work that there are many dynamic reasons for tagging documents that may not be apparent to others.
Originality/value
The authors extend their understanding of social classification and its utility in sharing and accessing resources. Results of this work may be used to guide development in social tagging systems as well as social tagging practices.
Details
Keywords
Ramzi A. Haraty and Rouba Nasrallah
The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous…
Abstract
Purpose
The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.
Design/methodology/approach
The proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.
Findings
The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.
Research limitations/implications
The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.
Originality/value
In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.
Details
Keywords
Alfredo Milani, Niyogi Rajdeep, Nimita Mangal, Rajat Kumar Mudgal and Valentina Franzoni
This paper aims to propose an approach for the analysis of user interest based on tweets, which can be used in the design of user recommendation systems. The extract topics are…
Abstract
Purpose
This paper aims to propose an approach for the analysis of user interest based on tweets, which can be used in the design of user recommendation systems. The extract topics are seen positively by the user.
Design/methodology/approach
The proposed approach is based on the combination of sentiment extraction and classification analysis of tweet to extract the topic of interest. The proposed hybrid method is original. The topic extraction phase uses a method based on semantic distance in the WordNet taxonomy. Sentiment extraction uses NLPcore.
Findings
The algorithm has been extensively tested using real tweets generated by 1,000 users. The results are quite encouraging and outperform state-of-the-art results and confirm the suitability of the approach combining sentiment and categorization for the topic of interest extraction.
Research limitations/implications
The hybrid method combining sentiment extraction and classification for user positive topics represents a novel contribution with many potential applications.
Practical implications
The functionality of positive topic extraction is very useful as a component in the design of a recommender system based on user profiling from Twitter user behaviors.
Social implications
The application of the proposed method in short-text social network can be massive and beyond the applications in tweets.
Originality/value
There are few works that have considered both sentiment analysis and classification to find out users’ interest. The algorithm has been extensively tested using real tweets generated by 1,000 users. The results are quite encouraging and outperform state-of-the-art results.
Details
Keywords
Sheng-Qun Chen, Ting You and Jing-Lin Zhang
This study aims to enhance the classification and processing of online appeals by employing a deep-learning-based method. This method is designed to meet the requirements for…
Abstract
Purpose
This study aims to enhance the classification and processing of online appeals by employing a deep-learning-based method. This method is designed to meet the requirements for precise information categorization and decision support across various management departments.
Design/methodology/approach
This study leverages the ALBERT–TextCNN algorithm to determine the appropriate department for managing online appeals. ALBERT is selected for its advanced dynamic word representation capabilities, rooted in a multi-layer bidirectional transformer architecture and enriched text vector representation. TextCNN is integrated to facilitate the development of multi-label classification models.
Findings
Comparative experiments demonstrate the effectiveness of the proposed approach and its significant superiority over traditional classification methods in terms of accuracy.
Originality/value
The original contribution of this study lies in its utilization of the ALBERT–TextCNN algorithm for the classification of online appeals, resulting in a substantial improvement in accuracy. This research offers valuable insights for management departments, enabling enhanced understanding of public appeals and fostering more scientifically grounded and effective decision-making processes.