Search results | Emerald Insight

Article

Publication date: 30 March 2012

A new term‐weighting scheme for naïve Bayes text categorization

Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among…

HTML

PDF (193 KB)

Downloads

453

Abstract

Purpose

Automatic text categorization has applications in several domains, for example e‐mail spam detection, sexual content filtering, directory maintenance, and focused crawling, among others. Most information retrieval systems contain several components which use text categorization methods. One of the first text categorization methods was designed using a naïve Bayes representation of the text. Currently, a number of variations of naïve Bayes have been discussed. The purpose of this paper is to evaluate naïve Bayes approaches on text categorization introducing new competitive extensions to previous approaches.

Design/methodology/approach

The paper focuses on introducing a new Bayesian text categorization method based on an extension of the naïve Bayes approach. Some modifications to document representations are introduced based on the well‐known BM25 text information retrieval method. The performance of the method is compared to several extensions of naïve Bayes using benchmark datasets designed for this purpose. The method is compared also to training‐based methods such as support vector machines and logistic regression.

Findings

The proposed text categorizer outperforms state‐of‐the‐art methods without introducing new computational costs. It also achieves performance results very similar to more complex methods based on criterion function optimization as support vector machines or logistic regression.

Practical implications

The proposed method scales well regarding the size of the collection involved. The presented results demonstrate the efficiency and effectiveness of the approach.

Originality/value

The paper introduces a novel naïve Bayes text categorization approach based on the well‐known BM25 information retrieval model, which offers a set of good properties for this problem.

Details

International Journal of Web Information Systems, vol. 8 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 30 July 2020

Optimized deep belief network and entropy-based hybrid bounding model for incremental text categorization

V. Srilakshmi, K. Anuradha and C. Shoba Bindu

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and…

HTML

PDF (1.7 MB)

Downloads

88

Abstract

Purpose

This paper aims to model a technique that categorizes the texts from huge documents. The progression in internet technologies has raised the count of document accessibility, and thus the documents available online become countless. The text documents comprise of research article, journal papers, newspaper, technical reports and blogs. These large documents are useful and valuable for processing real-time applications. Also, these massive documents are used in several retrieval methods. Text classification plays a vital role in information retrieval technologies and is considered as an active field for processing massive applications. The aim of text classification is to categorize the large-sized documents into different categories on the basis of its contents. There exist numerous methods for performing text-related tasks such as profiling users, sentiment analysis and identification of spams, which is considered as a supervised learning issue and is addressed with text classifier.

Design/methodology/approach

At first, the input documents are pre-processed using the stop word removal and stemming technique such that the input is made effective and capable for feature extraction. In the feature extraction process, the features are extracted using the vector space model (VSM) and then, the feature selection is done for selecting the highly relevant features to perform text categorization. Once the features are selected, the text categorization is progressed using the deep belief network (DBN). The training of the DBN is performed using the proposed grasshopper crow optimization algorithm (GCOA) that is the integration of the grasshopper optimization algorithm (GOA) and Crow search algorithm (CSA). Moreover, the hybrid weight bounding model is devised using the proposed GCOA and range degree. Thus, the proposed GCOA + DBN is used for classifying the text documents.

Findings

The performance of the proposed technique is evaluated using accuracy, precision and recall is compared with existing techniques such as naive bayes, k-nearest neighbors, support vector machine and deep convolutional neural network (DCNN) and Stochastic Gradient-CAViaR + DCNN. Here, the proposed GCOA + DBN has improved performance with the values of 0.959, 0.959 and 0.96 for precision, recall and accuracy, respectively.

Originality/value

This paper proposes a technique that categorizes the texts from massive sized documents. From the findings, it can be shown that the proposed GCOA-based DBN effectively classifies the text documents.

Details

International Journal of Web Information Systems, vol. 16 no. 3

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 2 July 2020

Incremental learning for text categorization using rough set boundary based optimized Support Vector Neural Network

N. Venkata Sailaja, L. Padmasree and N. Mangathayaru

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text…

HTML

PDF (2 MB)

Downloads

181

Abstract

Purpose

Text mining has been used for various knowledge discovery based applications, and thus, a lot of research has been contributed towards it. Latest trending research in the text mining is adopting the incremental learning data, as it is economical while dealing with large volume of information.

Design/methodology/approach

The primary intention of this research is to design and develop a technique for incremental text categorization using optimized Support Vector Neural Network (SVNN). The proposed technique involves four major steps, such as pre-processing, feature selection, classification and feature extraction. Initially, the data is pre-processed based on stop word removal and stemming. Then, the feature extraction is done by extracting semantic word-based features and Term Frequency and Inverse Document Frequency (TF-IDF). From the extracted features, the important features are selected using Bhattacharya distance measure and the features are subjected as the input to the proposed classifier. The proposed classifier performs incremental learning using SVNN, wherein the weights are bounded in a limit using rough set theory. Moreover, for the optimal selection of weights in SVNN, Moth Search (MS) algorithm is used. Thus, the proposed classifier, named Rough set MS-SVNN, performs the text categorization for the incremental data, given as the input.

Findings

For the experimentation, the 20 News group dataset, and the Reuters dataset are used. Simulation results indicate that the proposed Rough set based MS-SVNN has achieved 0.7743, 0.7774 and 0.7745 for the precision, recall and F-measure, respectively.

Originality/value

In this paper, an online incremental learner is developed for the text categorization. The text categorization is done by developing the Rough set MS-SVNN classifier, which classifies the incoming texts based on the boundary condition evaluated by the Rough set theory, and the optimal weights from the MS. The proposed online text categorization scheme has the basic steps, like pre-processing, feature extraction, feature selection and classification. The pre-processing is carried out to identify the unique words from the dataset, and the features like semantic word-based features and TF-IDF are obtained from the keyword set. Feature selection is done by setting a minimum Bhattacharya distance measure, and the selected features are provided to the proposed Rough set MS-SVNN for the classification.

Details

Data Technologies and Applications, vol. 54 no. 5

Type: Research Article

DOI:

ISSN: 2514-9288

Keywords

View access options

Article

Publication date: 1 May 2006

Automated subject classification of textual web documents

Koraljka Golub

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning…

HTML

PDF (127 KB)

Downloads

2251

Abstract

Purpose

To provide an integrated perspective to similarities and differences between approaches to automated classification in different research communities (machine learning, information retrieval and library science), and point to problems with the approaches and automated classification as such.

Design/methodology/approach

A range of works dealing with automated classification of full‐text web documents are discussed. Explorations of individual approaches are given in the following sections: special features (description, differences, evaluation), application and characteristics of web pages.

Findings

Provides major similarities and differences between the three approaches: document pre‐processing and utilization of web‐specific document characteristics is common to all the approaches; major differences are in applied algorithms, employment or not of the vector space model and of controlled vocabularies. Problems of automated classification are recognized.

Research limitations/implications

The paper does not attempt to provide an exhaustive bibliography of related resources.

Practical implications

As an integrated overview of approaches from different research communities with application examples, it is very useful for students in library and information science and computer science, as well as for practitioners. Researchers from one community have the information on how similar tasks are conducted in different communities.

Originality/value

To the author's knowledge, no review paper on automated text classification attempted to discuss more than one community's approach from an integrated perspective.

Details

Journal of Documentation, vol. 62 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 22 May 2009

Text classification: neural networks vs support vector machines

Waleed Zaghloul, Sang M. Lee and Silvana Trimi

The purpose of this paper is to compare the performance of neural networks (NNs) and support vector machines (SVMs) as text classifiers. SVMs are considered one of the best…

HTML

PDF (67 KB)

Downloads

1504

Abstract

Purpose

The purpose of this paper is to compare the performance of neural networks (NNs) and support vector machines (SVMs) as text classifiers. SVMs are considered one of the best classifiers. NNs could be adopted as text classifiers if their performance is comparable to that of SVMs.

Design/methodology/approach

Several NNs are trained to classify the same set of text documents with SVMs and their effectiveness is measured. The performance of the two tools is then statistically compared.

Findings

For text classification (TC), the performance of NNs is statistically comparable to that of the SVMs even when a significantly reduced document size is used.

Practical implications

This research finds not only that NNs are very viable TC tools with comparable performance to SVMs, but also that it does so using a much reduced size of document. The successful use of NNs in classifying reduced text documents would be its great advantage as a classification tool, compared to others, as it can bring great savings in terms of computation time and costs.

Originality/value

This paper is of value by showing statistically that NNs could be adopted as text classifiers with effectiveness comparable to SVMs, one of the best text classifiers currently used. This research is the first step towards utilizing NNs in text mining and its sub‐areas.

Details

Industrial Management & Data Systems, vol. 109 no. 5

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 1 December 2001

Automatic vs manual categorisation of documents in Spanish

Carlos G. Figuerola, Angel Zazo Rodríguez and José Luis Alonso Berrocal

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e…

HTML

PDF (313 KB)

Downloads

307

Abstract

Automatic categorisation can be understood as a learning process during which a program recognises the characteristics that distinguish each category or class from others, i.e. those characteristics which the documents should have in order to belong to that category. As yet few experiments have been carried out with documents in Spanish. Here we show the possibilities of elaborating pattern vectors that include the characteristics of different classes or categories of documents, using techniques based on those applied to the expansion of queries by relevance; likewise, the results of applying these techniques to a collection of documents in Spanish are given. The same collection of documents was categorised manually and the results of both procedures were compared.

Details

Journal of Documentation, vol. 57 no. 6

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 1 May 2007

Machine learning for Asian language text classification

Fuchun Peng and Xiangji Huang

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word…

HTML

PDF (247 KB)

Downloads

974

Abstract

Purpose

The purpose of this research is to compare several machine learning techniques on the task of Asian language text classification, such as Chinese and Japanese where no word boundary information is available in written text. The paper advocates a simple language modeling based approach for this task.

Design/methodology/approach

Naïve Bayes, maximum entropy model, support vector machines, and language modeling approaches were implemented and were applied to Chinese and Japanese text classification. To investigate the influence of word segmentation, different word segmentation approaches were investigated and applied to Chinese text. A segmentation‐based approach was compared with the non‐segmentation‐based approach.

Findings

There were two findings: the experiments show that statistical language modeling can significantly outperform standard techniques, given the same set of features; and it was found that classification with word level features normally yields improved classification performance, but that classification performance is not monotonically related to segmentation accuracy. In particular, classification performance may initially improve with increased segmentation accuracy, but eventually classification performance stops improving, and can in fact even decrease, after a certain level of segmentation accuracy.

Practical implications

Apply the findings to real web text classification is ongoing work.

Originality/value

The paper is very relevant to Chinese and Japanese information processing, e.g. webpage classification, web search.

Details

Journal of Documentation, vol. 63 no. 3

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 13 August 2024

X-News dataset for online news categorization

Samia Nawaz Yousafzai, Hooria Shahbaz, Armughan Ali, Amreen Qamar, Inzamam Mashood Nasir, Sara Tehsin and Robertas Damaševičius

The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A…

HTML

PDF (728 KB)

Downloads

35

Abstract

Purpose

The objective is to develop a more effective model that simplifies and accelerates the news classification process using advanced text mining and deep learning (DL) techniques. A distributed framework utilizing Bidirectional Encoder Representations from Transformers (BERT) was developed to classify news headlines. This approach leverages various text mining and DL techniques on a distributed infrastructure, aiming to offer an alternative to traditional news classification methods.

Design/methodology/approach

This study focuses on the classification of distinct types of news by analyzing tweets from various news channels. It addresses the limitations of using benchmark datasets for news classification, which often result in models that are impractical for real-world applications.

Findings

The framework’s effectiveness was evaluated on a newly proposed dataset and two additional benchmark datasets from the Kaggle repository, assessing the performance of each text mining and classification method across these datasets. The results of this study demonstrate that the proposed strategy significantly outperforms other approaches in terms of accuracy and execution time. This indicates that the distributed framework, coupled with the use of BERT for text analysis, provides a robust solution for analyzing large volumes of data efficiently. The findings also highlight the value of the newly released corpus for further research in news classification and emotion classification, suggesting its potential to facilitate advancements in these areas.

Originality/value

This research introduces an innovative distributed framework for news classification that addresses the shortcomings of models trained on benchmark datasets. By utilizing cutting-edge techniques and a novel dataset, the study offers significant improvements in accuracy and processing speed. The release of the corpus represents a valuable contribution to the field, enabling further exploration into news and emotion classification. This work sets a new standard for the analysis of news data, offering practical implications for the development of more effective and efficient news classification systems.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 17 no. 4

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 18 June 2018

Automatic categorization of Web service elements

Efthimia Mavridou, Konstantinos M. Giannoutakis, Dionysios Kehagias, Dimitrios Tzovaras and George Hassapis

Semantic categorization of Web services comprises a fundamental requirement for enabling more efficient and accurate search and discovery of services in the semantic Web era…

HTML

PDF (2.1 MB)

Downloads

249

Abstract

Purpose

Semantic categorization of Web services comprises a fundamental requirement for enabling more efficient and accurate search and discovery of services in the semantic Web era. However, to efficiently deal with the growing presence of Web services, more automated mechanisms are required. This paper aims to introduce an automatic Web service categorization mechanism, by exploiting various techniques that aim to increase the overall prediction accuracy.

Design/methodology/approach

The paper proposes the use of Error Correcting Output Codes on top of a Logistic Model Trees-based classifier, in conjunction with a data pre-processing technique that reduces the original feature-space dimension without affecting data integrity. The proposed technique is generalized so as to adhere to all Web services with a description file. A semantic matchmaking scheme is also proposed for enabling the semantic annotation of the input and output parameters of each operation.

Findings

The proposed Web service categorization framework was tested with the OWLS-TC v4.0, as well as a synthetic data set with a systematic evaluation procedure that enables comparison with well-known approaches. After conducting exhaustive evaluation experiments, categorization efficiency in terms of accuracy, precision, recall and F-measure was measured. The presented Web service categorization framework outperformed the other benchmark techniques, which comprise different variations of it and also third-party implementations.

Originality/value

The proposed three-level categorization approach is a significant contribution to the Web service community, as it allows the automatic semantic categorization of all functional elements of Web services that are equipped with a service description file.

Details

International Journal of Web Information Systems, vol. 14 no. 2

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 30 March 2012

The FLERSA tool: adding semantics to a web content management system

José L. Navarro‐Galindo and José Samos

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue…

HTML

PDF (1.5 MB)

Downloads

686

Abstract

Purpose

Nowadays, the use of WCMS (web content management systems) is widespread. The conversion of this infrastructure into its semantic equivalent (semantic WCMS) is a critical issue, as this enables the benefits of the semantic web to be extended. The purpose of this paper is to present a FLERSA (Flexible Range Semantic Annotation) for flexible range semantic annotation.

Design/methodology/approach

A FLERSA is presented as a user‐centred annotation tool for Web content expressed in natural language. The tool has been built in order to illustrate how a WCMS called Joomla! can be converted into its semantic equivalent.

Findings

The development of the tool shows that it is possible to build a semantic WCMS through a combination of semantic components and other resources such as ontologies and emergence technologies, including XML, RDF, RDFa and OWL.

Practical implications

The paper provides a starting‐point for further research in which the principles and techniques of the FLERSA tool can be applied to any WCMS.

Originality/value

The tool allows both manual and automatic semantic annotations, as well as providing enhanced search capabilities. For manual annotation, a new flexible range markup technique is used, based on the RDFa standard, to support the evolution of annotated Web documents more effectively than XPointer. For automatic annotation, a hybrid approach based on machine learning techniques (Vector‐Space Model + n‐grams) is used to determine the concepts that the content of a Web document deals with (from an ontology which provides a taxonomy), based on previous annotations that are used as a training corpus.

Details

International Journal of Web Information Systems, vol. 8 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 1 May 2004

A framework for long‐term learning of topical user preferences in information retrieval

Thomas Mandl and Christa Womser‐Hacker

A framework for the long‐term learning of user preferences in information retrieval is presented. The multiple indexing and method‐object relations (MIMOR) model tightly…

HTML

PDF (260 KB)

Downloads

806

Abstract

A framework for the long‐term learning of user preferences in information retrieval is presented. The multiple indexing and method‐object relations (MIMOR) model tightly integrates a fusion method and a relevance feedback processor into a learning model. Several black box matching functions can be combined into a linear combination committee machine which reflects the user's vague individual cognitive concepts expressed in relevance feedback decisions. An extension based on the soft computing paradigm couples the relevance feedback processor and the matching function into a unified retrieval system.

Details

New Library World, vol. 105 no. 5/6

Type: Research Article

DOI:

ISSN: 0307-4803

Keywords

Information retrieval, Data handling, Learning methods, Lifelong learning

View access options

Article

Publication date: 1 October 2005

Automated knowledge discovery in advanced knowledge management

Marko Grobelnik and Dunja Mladenić

Purpose – To resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different…

HTML

PDF (943 KB)

Downloads

4003

Abstract

Purpose – To resent approaches and some research results of various research areas contributing to knowledge discovery from different sources, different data forms, on different scale, and for different purpose. Design/methodology/approach – Contribute to knowledge management by applying knowledge discovery approaches to enable computer search for the relevant knowledge whereas the humans give just broad directions. Findings – Knowledge discovery techniques proved to be very appropriate for many problems related to knowledge management. Surprisingly, it is often the case that already relatively simple approaches provide valuable results. Research limitations/implications – Still there are many open problems and scalability issues that arise when dealing with real‐world data and especially in the areas involving text and network analysis. Practical implications – Each problem should be handled with care, taking into account different aspects and selecting/extending the most appropriate available methods or developing some new approaches. Originality/value – This paper provides an interesting collection of selected knowledge discovery methods applied in different context but all contributing in some way to knowledge management. Several of the reported approaches were developed in collaboration with the authors of the paper with especial emphases on their usability for practical problems involving knowledge management.

Details

Journal of Knowledge Management, vol. 9 no. 5

Type: Research Article

DOI:

ISSN: 1367-3270

Keywords

View access options

Article

Publication date: 23 November 2012

Sentiment analysis of online news text: a case study of appraisal theory

Christopher Soo‐Guan Khoo, Armineh Nourbakhsh and Jin‐Cheon Na

Sentiment analysis and emotion processing are attracting increasing interest in many fields. Computer and information scientists are developing automated methods for sentiment…

HTML

PDF (161 KB)

Downloads

5255

Abstract

Purpose

Sentiment analysis and emotion processing are attracting increasing interest in many fields. Computer and information scientists are developing automated methods for sentiment analysis of online text. Most of the studies have focused on identifying sentiment polarity or orientation – whether a document, usually a product or movie review, carries a positive or negative sentiment. It is time for researchers to address more sophisticated kinds of sentiment analysis. This paper aims to evaluate a particular linguistic framework called appraisal theory for adoption in manual as well as automatic sentiment analysis of news text.

Design/methodology/approach

The appraisal theory is applied to the analysis of a sample of political news articles reporting on Iraq and the economic policies of George W. Bush and Mahmoud Ahmadinejad to assess its utility and to identify challenges in adopting this framework.

Findings

The framework was useful in uncovering various aspects of sentiment that should be useful for researchers, such as the appraisers and object of appraisal, bias of the appraisers and the author, type of attitude and manner of expressing the sentiment. Problems encountered include difficulty in identifying appraisal phrases and attitude categories because of the subtlety of expression in political news articles, lack of treatment of tense and timeframe, lack of a typology of emotions, and need to identify different types of behaviours (political, verbal and material actions) that reflect sentiment.

Originality/value

The study has identified future directions for research in automated sentiment analysis as well as sentiment analysis of online news text. It has also demonstrated how sentiment analysis of news text can be carried out.

Details

Online Information Review, vol. 36 no. 6

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 2 February 2021

A review of machine learning applications in human resource management

Swati Garg, Shuchi Sinha, Arpan Kumar Kar and Mauricio Mani

This paper reviews 105 Scopus-indexed articles to identify the degree, scope and purposes of machine learning (ML) adoption in the core functions of human resource management…

HTML

PDF (2.1 MB)

Downloads

12309

Abstract

Purpose

This paper reviews 105 Scopus-indexed articles to identify the degree, scope and purposes of machine learning (ML) adoption in the core functions of human resource management (HRM).

Design/methodology/approach

A semi-systematic approach has been used in this review. It allows for a more detailed analysis of the literature which emerges from multiple disciplines and uses different methods and theoretical frameworks. Since ML research comes from multiple disciplines and consists of several methods, a semi-systematic approach to literature review was considered appropriate.

Findings

The review suggests that HRM has embraced ML, albeit it is at a nascent stage and is receiving attention largely from technology-oriented researchers. ML applications are strongest in the areas of recruitment and performance management and the use of decision trees and text-mining algorithms for classification dominate all functions of HRM. For complex processes, ML applications are still at an early stage; requiring HR experts and ML specialists to work together.

Originality/value

Given the current focus of organizations on digitalization, this review contributes significantly to the understanding of the current state of ML integration in HRM. Along with increasing efficiency and effectiveness of HRM functions, ML applications improve employees' experience and facilitate performance in the organizations.

Details

International Journal of Productivity and Performance Management, vol. 71 no. 5

Type: Research Article

DOI:

ISSN: 1741-0401

Keywords

View access options

Article

Publication date: 23 April 2024

A systematic literature review of crisis management in online public opinion: evolutionary path and implications for China

Lu Zhang, Pu Dong, Long Zhang, Bojiao Mu and Ahui Yang

This study aims to explore the dissemination and evolutionary path of online public opinion from a crisis management perspective. By clarifying the influencing factors and dynamic…

HTML

PDF (5.7 MB)

Downloads

205

Abstract

Purpose

This study aims to explore the dissemination and evolutionary path of online public opinion from a crisis management perspective. By clarifying the influencing factors and dynamic mechanisms of online public opinion dissemination, this study provides insights into attenuating the negative impact of online public opinion and creating a favorable ecological space for online public opinion.

Design/methodology/approach

This research employs bibliometric analysis and CiteSpace software to analyze 302 Chinese articles published from 2006 to 2023 in the China National Knowledge Infrastructure (CNKI) database and 276 English articles published from 1994 to 2023 in the Web of Science core set database. Through literature keyword clustering, co-citation analysis and burst terms analysis, this paper summarizes the core scientific research institutions, scholars, hot topics and evolutionary paths of online public opinion crisis management research from both Chinese and international academic communities.

Findings

The results show that the study of online public opinion crisis management in China and internationally is centered on the life cycle theory, which integrates knowledge from information, computer and system sciences. Although there are differences in political interaction and stage evolution, the overall evolutionary path is similar, and it develops dynamically in the “benign conflict” between the expansion of the research perspective and the gradual refinement of research granularity.

Originality/value

This study summarizes the research results of online public opinion crisis management from China and the international academic community and identifies current research hotspots and theoretical evolution paths. Future research can focus on deepening the basic theories of public opinion crisis management under the influence of frontier technologies, exploring the subjectivity and emotionality of web users using fine algorithms and promoting the international development of network public opinion crisis management theory through transnational comparison and international cooperation.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 13 April 2015

Text mining: An analysis of research published under the subject category ‘Information Science Library Science’ in Web of Science Database during 1999-2013

Shubhada Prashant Nagarkar and Rajendra Kumbhar

The purpose of this paper was to analyse text mining (TM) literature indexed in the Web of Science (WoS) under the “Information Science Library Science” subcategory. More…

HTML

PDF (453 KB)

Downloads

3755

Abstract

Purpose

The purpose of this paper was to analyse text mining (TM) literature indexed in the Web of Science (WoS) under the “Information Science Library Science” subcategory. More specifically, it analyses the chronological growth of TM literature, and the major countries, institutions, departments and individuals contributing to TM literature. Collaboration in TM research is also analysed.

Design/methodology/approach

Bibliographic and citation data required for this research were retrieved from the WoS database. TM being a multidisciplinary field, the search was restricted to “Information Science Library Science” subcategory in the WoS. A comprehensive query statement covering all synonyms of “text mining” was prepared using the Boolean operator “OR”. Microsoft Excel and HistCite software were used for data analysis. Pajek and VoSviewer were used for data visualization.

Findings

It was found that USA is the major producer of TM research literature, and the highest number of papers were published in the Journal of The American Medical Informatics. Columbia University ranked first both in number of articles and citations received in the top ten institutes publishing TM literature. It was also observed that six of the top ten subdivisions of institutions are either from medicine or medical informatics or biomedical information. H.C. Chen and C. Friedman were seen to be the most prolific authors.

Research limitations/implications

The paper analyses articles on TM published during 1999-2013 in WoS under the subcategory Information Science Library Science’.

Originality/value

The paper is based on empirical data exclusively gathered for this research.

Details

Library Review, vol. 64 no. 3

Type: Research Article

DOI:

ISSN: 0024-2535

Keywords

View access options

Article

Publication date: 7 August 2009

Ontology and semantic rules in document dispatching

Yu‐Liang Chi and Hsiao‐Chi Chen

The purpose of this paper is to demonstrate how the semantic rules in conjunction with ontology can be applied for inferring new facts to dispatch news into corresponding…

HTML

PDF (266 KB)

Downloads

932

Abstract

Purpose

The purpose of this paper is to demonstrate how the semantic rules in conjunction with ontology can be applied for inferring new facts to dispatch news into corresponding departments.

Design/methodology/approach

Under a specific task domain, the proposed design comprises finding a glossary from electronic resources, gathering organization functions as controlled vocabularies, and linking relationships between the glossary and controlled vocabularies. Web ontology language is employed to represent this knowledge as ontology, and semantic web rule language is utilized to infer implicit facts among instances.

Findings

Document dispatching is highly domain dependent. Human perspectives being adopted as predefined knowledge in understanding document meanings are important. Knowledge‐intensive approaches such as ontology can model and represent expertise as reusable components. Ontology and rules together extend inference capabilities in semantic relationships between instances.

Practical implications

Empirical lessons reveal that ontology with semantic rules can be utilized to model human subjective judgement as knowledge bases. An example, including ontology and rules, based on news dispatching is provided.

Originality/value

An organization can classify and deliver documents to corresponding departments based on known facts by following the described procedure.

Details

The Electronic Library, vol. 27 no. 4

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 9 October 2020

Sentiment urgency emotion conversion over time for business intelligence

Tariq Soussan and Marcello Trovati

Social media has become a vital part of any institute’s marketing plan. Social networks benefit businesses by allowing them to interact with their clients, grow brand exposure…

HTML

PDF (907 KB)

Downloads

124

Abstract

Purpose

Social media has become a vital part of any institute’s marketing plan. Social networks benefit businesses by allowing them to interact with their clients, grow brand exposure through offers and promotions and find new leads. It also offers vital information concerning the general emotions and sentiments directly connected to the welfare and security of the online community involved with the brand. Big organizations can make use of their social media data to generate planned and operational decisions. This paper aims to look into the conversion of sentiments and emotions over time.

Design/methodology/approach

In this work, a model called sentiment urgency emotion detection (SUED) from previous work will be applied on tweets from two different periods of time, one before the start of the COVID-19 pandemic and the other after it started to monitor the conversion of sentiments and emotions over time. The model has been trained to improve its accuracy and F1 score so that the precision and percentage of correctly predicted texts is high. This model will be tuned to improve results (Soussan and Trovati, 2020a; Soussan and Trovati, 2020b) and will be applied on a general business Twitter account of one of the largest chains of supermarkets in the UK to be able to see what sentiments and emotions can be detected and how urgent they are.

Findings

This will show the effect of COVID-19 pandemic on the conversions of the sentiments, emotions and urgencies of the tweets.

Originality/value

Sentiments will be compared between the two periods to evaluate how sentiments and emotions vary over time taking into consideration the COVID-19 as an affective factor. In addition, SUED will be tuned to enhance results and the knowledge that is mined when turning data into decisions is crucial because it will aid stakeholders handling the institute to evaluate the topics and issues that were mostly emphasized.

Details

International Journal of Web Information Systems, vol. 16 no. 5

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 22 October 2019

The effect of consumer-generated media stimuli on emotions and consumer brand engagement

Sandra Maria Correia Loureiro, Ricardo Godinho Bilro and Arnold Japutra

This paper aims to explore the relationships between website quality – through consumer-generated media stimuli-, emotions and consumer-brand engagement in online environments.

HTML

PDF (577 KB)

Downloads

3108

Abstract

Purpose

This paper aims to explore the relationships between website quality – through consumer-generated media stimuli-, emotions and consumer-brand engagement in online environments.

Design/methodology/approach

Two independent studies are conducted to examine these relationships. Study 1, based on a sample of 366 respondents, uses a structural equation modelling approach to test the research hypotheses. Study 2, based on 1,454 online consumer reviews, uses text-mining technique to examine further the relationship between emotions and consumer-brand engagement.

Findings

The findings show that all the consumer-generated media stimuli are positively related to the dimensions of emotions. However, only pleasure and arousal are positively related to the three variables of consumer-brand engagement. The findings also show cognitive processing as the strongest dimension of consumer-brand engagement providing positive sentiments towards brands.

Practical implications

The findings provide marketers with an understanding of how valid, useful and relevant content (i.e. information/content) creates a greater emotional connection and drive consumer-brand engagement. Marketers should be aware that consumer-generated media stimuli influence consumers’ emotions and their reaction.

Originality/value

This study is one of the firsts to adapt and apply the S-O-R framework in explaining online consumer-brand engagement. This study also adds to the brand engagement literature as the first study that combines PLS-SEM approach with text-mining analysis to provide a better understanding of these relationships.

Details

Journal of Product & Brand Management, vol. 29 no. 3

Type: Research Article

DOI:

ISSN: 1061-0421

Keywords

View access options

Article

Publication date: 6 November 2017

Investigation on the statistical methods in research studies of library and information science

Jin Zhang, Yanyan Wang and Yuehua Zhao

The statistical method plays an extremely important role in quantitative research studies in library and information science (LIS). The purpose of this paper is to investigate the…

HTML

PDF (1 MB)

Downloads

1446

Abstract

Purpose

The statistical method plays an extremely important role in quantitative research studies in library and information science (LIS). The purpose of this paper is to investigate the status of statistical methods used in the field, their application areas and the temporal change patterns during a recent 15-year period.

Design/methodology/approach

The research papers in six major scholarly journals from 1999 to 2013 in LIS were examined. Factors including statistical methods, application areas and time period were analyzed using quantitative research methods including content analysis and temporal analysis methods.

Findings

The research studies using statistical methods in LIS have increased steadily. Statistical methods were more frequently used to solve problems in the information retrieval area than in other areas, and inferential statistical methods were used more often than predictive statistical methods and other statistical methods. Anomaly analysis on statistical method uses was conducted and four types of anomaly were specified.

Originality/value

The findings of this study can help educators, graduates and researchers in the field of LIS better understand the patterns and trends of the applications of statistical methods in this field, depict an overall picture of quantitative research studies in LIS from the perspective of statistical methods and discover the change patterns of statistical method applications in LIS between 1999 and 2013.

Details

The Electronic Library, vol. 35 no. 6

Type: Research Article

DOI:

ISSN: 0264-0473

Keywords

View access options

Article

Publication date: 19 July 2024

The research landscape on generative artificial intelligence: a bibliometric analysis of transformer-based models

Giulio Marchena Sekli

The aim of this study is to offer valuable insights to businesses and facilitate better understanding on transformer-based models (TBMs), which are among the widely employed…

HTML

PDF (3.2 MB)

Downloads

166

Abstract

Purpose

The aim of this study is to offer valuable insights to businesses and facilitate better understanding on transformer-based models (TBMs), which are among the widely employed generative artificial intelligence (GAI) models, garnering substantial attention due to their ability to process and generate complex data.

Design/methodology/approach

Existing studies on TBMs tend to be limited in scope, either focusing on specific fields or being highly technical. To bridge this gap, this study conducts robust bibliometric analysis to explore the trends across journals, authors, affiliations, countries and research trajectories using science mapping techniques – co-citation, co-words and strategic diagram analysis.

Findings

Identified research gaps encompass the evolution of new closed and open-source TBMs; limited exploration across industries like education and disciplines like marketing; a lack of in-depth exploration on TBMs' adoption in the health sector; scarcity of research on TBMs' ethical considerations and potential TBMs' performance research in diverse applications, like image processing.

Originality/value

The study offers an updated TBMs landscape and proposes a theoretical framework for TBMs' adoption in organizations. Implications for managers and researchers along with suggested research questions to guide future investigations are provided.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

Content available

Book part

Publication date: 20 January 2005

BIBLIOGRAPHY

HTML

PDF (1.1 MB)

View access options

Article

Publication date: 6 December 2019

Text mining for human resources competencies: Taiwan example

Chih-Hung Chung and Lu-Jia Chen

The purpose of this study is to explore the capabilities required by entry-level human resources (HR) professionals based on job advertisements by using text mining (TM) technique.

HTML

PDF (1010 KB)

Downloads

535

Abstract

Purpose

The purpose of this study is to explore the capabilities required by entry-level human resources (HR) professionals based on job advertisements by using text mining (TM) technique.

Design/methodology/approach

This study used TM techniques to explore the capabilities required by entry-level HR professionals based on job advertisements on HR agency 104’s website in Taiwan. Python was used to crawl the advertisements on the website, and 841 posts were collected. Next, the author used TM to explore and understand hidden trends and patterns in numerous data sets.

Findings

The results of this study revealed four critical success factors (specific skills, educational level, experience and specific capabilities), five clusters and ten classifications.

Practical implications

The results can aid HR curriculum developers and educators in customizing and improving HR education curricula, such that HR students can develop capabilities required to secure employment in the current HR job market.

Originality/value

Our results may facilitate the understanding of the current trends in the HR job market and provide useful suggestions to HR curriculum developers for improving training and professional course design, such that students’ competitiveness is enhanced and professional capabilities improved.

Details

European Journal of Training and Development, vol. 45 no. 6/7

Type: Research Article

DOI:

ISSN: 2046-9012

Keywords

View access options

Article

Publication date: 3 March 2025

Analyzing public discourse on DeFi and CBDC using advanced NLP techniques: insights for financial policy and innovation

Andry Alamsyah and Raras Fitriyani Astuti

This study aims to analyze public discourse on decentralized finance (DeFi) and central bank digital currencies (CBDC) using advanced natural language processing (NLP) techniques…

HTML

PDF (508 KB)

Downloads

9

Abstract

Purpose

This study aims to analyze public discourse on decentralized finance (DeFi) and central bank digital currencies (CBDC) using advanced natural language processing (NLP) techniques to uncover key insights that can guide financial policy and innovation. This research seeks to fill the gap in the existing literature by applying state-of-the-art NLP models like BERT and RoBERTa to understand the evolving online discourse around DeFi and CBDC.

Design/methodology/approach

This study uses a multilabel classification using BERT and RoBERTa models alongside BERTopic for topic modeling. Data is collected from social media platforms, including Twitter and LinkedIn, as well as relevant documents, to analyze public sentiment and discourse. Model performance is evaluated based on accuracy, precision, recall and F1-scores.

Findings

RoBERTa outperforms BERT in classification accuracy and precision across all metrics, making it more effective in categorizing public discourse on DeFi and CBDC. BERTopic identifies five key topics frequently discussed, such as financial inclusion, competition and growth in DeFi, with important implications for policymakers.

Practical implications

The insights derived from this study provide valuable information for financial regulators and policymakers to develop more informed, data-driven strategies for implementing and regulating DeFi and CBDC. Public discourse analysis enables policymakers to understand emerging concerns and trends critical for crafting effective financial policies.

Originality/value

This study is among the first to use advanced NLP models, including RoBERTa and BERTopic, to analyze public discourse on DeFi and CBDC. It offers novel insights into the potential challenges and opportunities these innovations present. It contributes to the growing body of research on the intersection of digital financial technologies and public sentiment.

Details

Digital Policy, Regulation and Governance, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 2398-5038

Keywords

View access options

Article

Publication date: 4 October 2022

Cognitive differences between readers attentive and inattentive to task-related information: an eye-tracking study

Jing Chen, Lu Zhang and Wenhai Qian

Attentive to task-related information is the prerequisite for task completion. Comparing the cognition between attentive readers (AR) and inattentive readers (IAR) is of great…

HTML

PDF (1.5 MB)

Downloads

343

Abstract

Purpose

Attentive to task-related information is the prerequisite for task completion. Comparing the cognition between attentive readers (AR) and inattentive readers (IAR) is of great value for improving reading services which has seldom been studied. To explore their cognitive differences, this study investigates the effectiveness, efficiency and cognitive resource allocation strategy by eye-tracking technology.

Design/methodology/approach

A controlled user study of two types of task, fact-finding (FF) and content understanding (CU) tasks was conducted to collect data including answer for task, fixation duration (FD), fixation count (FC), fixation duration proportion (FDP), and fixation count proportion (FCP). 24 participants were placed into AR or IAR group according to their fixation duration on paragraphs related to task.

Findings

Two types of cognitive resource allocation strategies, question-oriented (QO) and navigation-assistant (NA) were identified according to the differences in FDP and FCP. In FF task, although QO strategy was applied by the two groups, AR group was significantly more effective and efficient. In CU task, although the two groups were similar in effectiveness and efficiency, AR group promoted their strategies to NA while IAR group sticked to applying QO strategy. Furthermore, an interesting phenomenon “win by uncertainty”, which implies IAR group may get correct answer through uncertain means, such as clue, domain knowledge or guess, rather than task-related information, was observed.

Originality/value

This study takes a deep insight into cognition from the prospect of attentive and inattentive to task-related information. Identifying indicators about cognition helps to distinguish attentive and inattentive readers in various tasks automatically. The cognitive resource allocation strategy applied by readers sheds new light on reading skill training. A typical reading phenomenon “win by uncertainty” was found and defined. Understanding the phenomenon is of great value for satisfying reader information need and enhancing their deep learning.

Details

Aslib Journal of Information Management, vol. 75 no. 5

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

Open Access

Article

Publication date: 2 April 2024

Automated Dewey Decimal Classification of Swedish library metadata using Annif software

Koraljka Golub, Osma Suominen, Ahmed Taiye Mohammed, Harriet Aagaard and Olof Osterman

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an…

HTML

PDF (187 KB)

Downloads

1623

Abstract

Purpose

In order to estimate the value of semi-automated subject indexing in operative library catalogues, the study aimed to investigate five different automated implementations of an open source software package on a large set of Swedish union catalogue metadata records, with Dewey Decimal Classification (DDC) as the target classification system. It also aimed to contribute to the body of research on aboutness and related challenges in automated subject indexing and evaluation.

Design/methodology/approach

On a sample of over 230,000 records with close to 12,000 distinct DDC classes, an open source tool Annif, developed by the National Library of Finland, was applied in the following implementations: lexical algorithm, support vector classifier, fastText, Omikuji Bonsai and an ensemble approach combing the former four. A qualitative study involving two senior catalogue librarians and three students of library and information studies was also conducted to investigate the value and inter-rater agreement of automatically assigned classes, on a sample of 60 records.

Findings

The best results were achieved using the ensemble approach that achieved 66.82% accuracy on the three-digit DDC classification task. The qualitative study confirmed earlier studies reporting low inter-rater agreement but also pointed to the potential value of automatically assigned classes as additional access points in information retrieval.

Originality/value

The paper presents an extensive study of automated classification in an operative library catalogue, accompanied by a qualitative study of automated classes. It demonstrates the value of applying semi-automated indexing in operative information retrieval systems.

Details

Journal of Documentation, vol. 80 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 12 September 2023

The competitiveness of Portugal: views from the market

Gustavo Silva, Leandro F. Pereira, José Crespo Carvalho, Rui Vinhas da Silva and Ana Simoes

This study aims to conduct a pertinent assessment of the concept of business competitiveness and how Portugal can progress in that field, for the sake of becoming a more…

HTML

PDF (423 KB)

Downloads

217

Abstract

Purpose

This study aims to conduct a pertinent assessment of the concept of business competitiveness and how Portugal can progress in that field, for the sake of becoming a more sustainable and wealth-creator economy.

Design/methodology/approach

The research was elaborated with 65 in-depth interviews with expert persons from the Portuguese business ecosystem, who were asked to reflect on the state of the economy and competitiveness of the country.

Findings

There is much room for improvement in almost all areas of activity, in particular by promoting an innovative, value-adding and exporting private sector and a lighter and more efficient public sector. The conclusions point to modernisation of the Portuguese economy as a way of making it more competitive in a highly competitive and demanding global scenario.

Originality/value

To the best of the authors’ knowledge, it is the first time that a reflection with experts of the local Portuguese economy has been carried out, especially after a difficult period of COVID.

Details

Competitiveness Review: An International Business Journal , vol. 34 no. 3

Type: Research Article

DOI:

ISSN: 1059-5422

Keywords

Content available

Article

Publication date: 1 March 2003

Intelligent Technologies in Library and Information Service Applications (ASIS&T Monograph series)

Michael Malinconico

HTML

Downloads

144

View access options

Article

Publication date: 26 June 2019

An incremental learning approach for the text categorization using hybrid optimization

Mamta Kayest and Sanjay Kumar Jain

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The…

HTML

PDF (468 KB)

Downloads

148

Abstract

Purpose

Document retrieval has become a hot research topic over the past few years, and has been paid more attention in browsing and synthesizing information from different documents. The purpose of this paper is to develop an effective document retrieval method, which focuses on reducing the time needed for the navigator to evoke the whole document based on contents, themes and concepts of documents.

Design/methodology/approach

This paper introduces an incremental learning approach for text categorization using Monarch Butterfly optimization–FireFly optimization based Neural Network (MB–FF based NN). Initially, the feature extraction is carried out on the pre-processed data using Term Frequency–Inverse Document Frequency (TF–IDF) and holoentropy to find the keywords of the document. Then, cluster-based indexing is performed using MB–FF algorithm, and finally, by matching process with the modified Bhattacharya distance measure, the document retrieval is done. In MB–FF based NN, the weights in the NN are chosen using MB–FF algorithm.

Findings

The effectiveness of the proposed MB–FF based NN is proven with an improved precision value of 0.8769, recall value of 0.7957, F-measure of 0.8143 and accuracy of 0.7815, respectively.

Originality/value

The experimental results show that the proposed MB–FF based NN is useful to companies, which have a large workforce across the country.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 12 no. 3

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

View access options

Article

Publication date: 1 October 2001

Knowledge discovery in textual documentation: qualitative and quantitative analyses

Stanley Loh, José Palazzo M. de Oliveira and Fábio Leite Gastal

This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high‐level textual characteristics. Instead of…

HTML

PDF (50 KB)

Downloads

1533

Abstract

This paper presents an approach for performing knowledge discovery in texts through qualitative and quantitative analyses of high‐level textual characteristics. Instead of applying mining techniques on attribute values, terms or keywords extracted from texts, the discovery process works over conceptss identified in texts. Concepts represent real world events and objects, and they help the user to understand ideas, trends, thoughts, opinions and intentions present in texts. The approach combines a quasi‐automatic categorisation task (for qualitative analysis) with a mining process (for quantitative analysis). The goal is to find new and useful knowledge inside a textual collection through the use of mining techniques applied over concepts (representing text content). In this paper, an application of the approach to medical records of a psychiatric hospital is presented. The approach helps physicians to extract knowledge about patients and diseases. This knowledge may be used for epidemiological studies, for training professionals and it may be also used to support physicians to diagnose and evaluate diseases.

Details

Journal of Documentation, vol. 57 no. 5

Type: Research Article

DOI:

ISSN: 0022-0418

Keywords

View access options

Article

Publication date: 1 September 2006

Structural analysis of chat messages for topic detection

Haichao Dong, Siu Cheung Hui and Yulan He

The purpose of this research is to study the characteristics of chat messages from analysing a collection of 33,121 sample messages gathered from 1,700 sessions of conversations…

HTML

PDF (427 KB)

Downloads

1403

Abstract

Purpose

The purpose of this research is to study the characteristics of chat messages from analysing a collection of 33,121 sample messages gathered from 1,700 sessions of conversations of 72 pairs of MSN Messenger users over a four month duration from June to September of 2005. The primary objective of chat message characterization is to understand the properties of chat messages for effective message analysis, such as message topic detection.

Design/methodology/approach

From the study on chat message characteristics, an indicative term‐based categorization approach for chat topic detection is proposed. In the proposed approach, different techniques such as sessionalisation of chat messages and extraction of features from icon texts and URLs are incorporated for message pre‐processing. Naïve Bayes, Associative Classification, and Support Vector Machine are employed as classifiers for categorizing topics from chat sessions.

Findings

Indicative term‐based approach is superior to the traditional document frequency based approach, for feature selection in chat topic categorization.

Originality/value

This paper studies the characteristics of chat messages and proposes an indicative term‐based categorization approach for chat topic detection.

Details

Online Information Review, vol. 30 no. 5

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 12 September 2016

Utilizing Facebook pages of the political parties to automatically predict the political orientation of Facebook users

Esther David, Maayan Zhitomirsky-Geffet, Moshe Koppel and Hodaya Uzan

Social network sites have been widely adopted by politicians in the last election campaigns. To increase the effectiveness of these campaigns the potential electorate is to be…

HTML

PDF (197 KB)

Downloads

1244

Abstract

Purpose

Social network sites have been widely adopted by politicians in the last election campaigns. To increase the effectiveness of these campaigns the potential electorate is to be identified, as targeted ads are much more effective than non-targeted ads. Therefore, the purpose of this paper is to propose and implement a new methodology for automatic prediction of political orientation of users on social network sites by comparison to texts from the overtly political parties’ pages.

Design/methodology/approach

To this end, textual information on personal users’ pages is used as a source of statistical features. The authors apply automatic text categorization algorithms to distinguish between texts of users from different political wings. However, these algorithms require a set of manually labeled texts for training, which is typically unavailable in real life situations. To overcome this limitation the authors propose to use texts available on various political parties’ pages on a social network site to train the classifier. The political leaning of these texts is determined by the political affiliation of the corresponding parties. The classifier learned on such overtly political texts is then applied on the personal user pages to predict their political orientation. To assess the validity and effectiveness of the proposed methodology two corpora were constructed: personal Facebook pages of 450 Israeli citizens, and political parties Facebook pages of the nine prominent Israeli parties.

Findings

The authors found that when a political tendency classifier is trained and tested on data in the same corpus, accuracy is very high. More significantly, training on manifestly political texts (political party Facebook pages) yields classifiers which can be used to classify non-political personal Facebook pages with fair accuracy.

Social implications

Previous studies have shown that targeted ads are more effective than non-targeted ads leading to substantial saving in the advertising budget. Therefore, the approach for automatic determining the political orientation of users on social network sites might be adopted for targeting political messages, especially during election campaigns.

Originality/value

This paper proposes and implements a new approach for automatic cross-corpora identification of political bias of user profiles on social network. This suggests that individuals’ political tendencies can be identified without recourse to any tagged personal data. In addition, the authors use learned classifiers to determine which self-identified centrists lean left or right and which voters are likely to switch allegiance in subsequent elections.

Details

Online Information Review, vol. 40 no. 5

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 1 August 2005

Binary k‐nearest neighbor for text categorization

Songbo Tan

With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes…

HTML

PDF (149 KB)

Downloads

801

Abstract

Purpose

With the ever‐increasing volume of text data via the internet, it is important that documents are classified as manageable and easy to understand categories. This paper proposes the use of binary k‐nearest neighbour (BKNN) for text categorization.

Design/methodology/approach

The paper describes the traditional k‐nearest neighbor (KNN) classifier, introduces BKNN and outlines experiemental results.

Findings

The experimental results indicate that BKNN requires much less CPU time than KNN, without loss of classification performance.

Originality/value

The paper demonstrates how BKNN can be an efficient and effective algorithm for text categorization. Proposes the use of binary k‐nearest neighbor (BKNN ) for text categorization.

Details

Online Information Review, vol. 29 no. 4

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 13 June 2016

Utilizing overtly political texts for fully automatic evaluation of political leaning of online news websites

Maayan Zhitomirsky-Geffet, Esther David, Moshe Koppel and Hodaya Uzan

Reliability and political bias of mass media has been a controversial topic in the literature. The purpose of this paper is to propose and implement a methodology for fully…

HTML

PDF (370 KB)

Downloads

1635

Abstract

Purpose

Reliability and political bias of mass media has been a controversial topic in the literature. The purpose of this paper is to propose and implement a methodology for fully automatic evaluation of the political tendency of the written media on the web, which does not rely on subjective human judgments.

Design/methodology/approach

The underlying idea is to base the evaluation on fully automatic comparison of the texts of articles on different news websites to the overtly political texts with known political orientation. The authors also apply an alternative approach for evaluation of political tendency based on wisdom of the crowds.

Findings

The authors found that the learnt classifier can accurately distinguish between self-declared left and right news sites. Furthermore, news sites’ political tendencies can be identified by automatic classifier learnt from manifestly political texts without recourse to any manually tagged data. The authors also show a high correlation between readers’ perception (as a “wisdom of crowds” evaluation) of the bias and the classifier results for different news sites.

Social implications

The results are quite promising and can put an end to the never ending dispute on the reliability and bias of the press.

Originality/value

This paper proposes and implements a new approach for fully automatic (independent of human opinion/assessment) identification of political bias of news sites by their texts.

Details

Online Information Review, vol. 40 no. 3

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

View access options

Article

Publication date: 21 January 2019

Review of short-text classification

Issa Alsmadi and Keng Hoon Gan

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type…

HTML

PDF (941 KB)

Downloads

1186

Abstract

Purpose

Rapid developments in social networks and their usage in everyday life have caused an explosion in the amount of short electronic documents. Thus, the need to classify this type of document based on their content has a significant implication in many applications. The need to classify these documents in relevant classes according to their text contents should be interested in many practical reasons. Short-text classification is an essential step in many applications, such as spam filtering, sentiment analysis, Twitter personalization, customer review and many other applications related to social networks. Reviews on short text and its application are limited. Thus, this paper aims to discuss the characteristics of short text, its challenges and difficulties in classification. The paper attempt to introduce all stages in principle classification, the technique used in each stage and the possible development trend in each stage.

Design/methodology/approach

The paper as a review of the main aspect of short-text classification. The paper is structured based on the classification task stage.

Findings

This paper discusses related issues and approaches to these problems. Further research could be conducted to address the challenges in short texts and avoid poor accuracy in classification. Problems in low performance can be solved by using optimized solutions, such as genetic algorithms that are powerful in enhancing the quality of selected features. Soft computing solution has a fuzzy logic that makes short-text problems a promising area of research.

Originality/value

Using a powerful short-text classification method significantly affects many applications in terms of efficiency enhancement. Current solutions still have low performance, implying the need for improvement. This paper discusses related issues and approaches to these problems.

Details

International Journal of Web Information Systems, vol. 15 no. 2

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 14 May 2018

Integration of a text mining approach in the strategic planning process of small and medium-sized enterprises

Claudia Vásquez Rojas, Eduardo Roldán Reyes, Fernando Aguirre y Hernández and Guillermo Cortés Robles

Strategic planning (SP) enables enterprises to plan management and operations activities efficiently in the medium and large term. During its implementation, many processes and…

HTML

PDF (394 KB)

Downloads

779

Abstract

Purpose

Strategic planning (SP) enables enterprises to plan management and operations activities efficiently in the medium and large term. During its implementation, many processes and methods are manually applied and may be time consuming. The purpose of this paper is to introduce an automatic method to define strategic plans by using text mining (TM) algorithms within a generic SP model especially suited for small- and medium-sized enterprises (SMEs).

Design/methodology/approach

Textual feedbacks were collected through a SWOT matrix during the implementation of a SP model in a company dedicated to the local distribution of food. A four-step TM process (performing acquisition, pre-processing, processing, and validation tasks) is applied via a framework developed under the cloud computer paradigm in order to determine the strategic plans.

Findings

The use of categorization and clustering algorithms show that unstructured textual information produced during the SP can be efficiently processed and capitalized. Collected evidence reveals the potential to enhance the strategic plans creation with less effort and time, improving the relevance, and producing new technological resources accessible to SMEs.

Originality/value

An innovative framework especially suited for the SMEs based on the synergy assumption of the coupling between TM and a generic SP model.

Details

Industrial Management & Data Systems, vol. 118 no. 4

Type: Research Article

DOI:

ISSN: 0263-5577

Keywords

View access options

Article

Publication date: 10 May 2019

A study of variation theory to enhance students’ genre awareness and learning of genre features

Kwok-Kuen To and Ming Fai Pang

The purpose of this paper is to investigate how different arrangements, such as lesson structures and patterns of variation, enhance students’ genre awareness, their understanding…

HTML

PDF (283 KB)

Downloads

362

Abstract

Purpose

The purpose of this paper is to investigate how different arrangements, such as lesson structures and patterns of variation, enhance students’ genre awareness, their understanding of genre features of informative text and can generate new learning.

Design/methodology/approach

This is an example of learning study consisting of a design experiment, and embedded in the design was the selected test criteria. The variation theory of learning served as the major guiding principle for the pedagogical design, lesson analysis and evaluation.

Findings

The findings of this study give support to variation theory being a powerful pedagogical tool for improving students’ understanding of informative texts and enabling them to generate new learning. Students in the target group who had more opportunities to encounter the “first contrast, next contrast and last generalisation” pattern of variation performed better than those in the comparison group, who were exposed to the “first generalisation, next contrast and last generalisation” pattern. The pure hierarchical lesson structure used for the target group was found to be more conducive to learning than the mixed structure (sequential–hierarchical structure) used in the comparison group.

Originality/value

Both the lesson structure and patterns of variation and invariance used are extremely important in developing a powerful method of enhancing students’ genre awareness, their understanding of genre features of informative text and to generate new learning.

Details

International Journal for Lesson and Learning Studies, vol. 8 no. 3

Type: Research Article

DOI:

ISSN: 2046-8253

Keywords

View access options

Article

Publication date: 17 February 2022

Application of classification models on maintenance records through text mining approach in industrial environment

Umama Rahman and Miraj Uddin Mahbub

The data created from regular maintenance activities of equipment are stored as text in industrial plants. The size of these data is increasing rapidly nowadays. Text mining…

HTML

PDF (1.3 MB)

Downloads

403

Abstract

Purpose

The data created from regular maintenance activities of equipment are stored as text in industrial plants. The size of these data is increasing rapidly nowadays. Text mining provides a chance to handle this huge amount of text data and extract meaningful information to improve various processes of an industrial environment. This paper represents the application of classification models on maintenance text records to classify failure for improving maintenance programs in the industry.

Design/methodology/approach

This paper is presented as an implementation study, where text mining approaches are used for binary classification of text data. Naive Bayes and Support Vector Machine (SVM), two classification algorithms are applied for training and testing of the models as per the labeled data. The reason behind this is, these algorithms perform better on text data for classifying failure and they are easy to handle. A methodology is proposed for the development of maintenance programs, including classification of potential failure in advance by analyzing the regular maintenance data as well as comparing the performance of both models on the data.

Findings

The accuracy of both models falls within the acceptable limit, and performance evaluation of the models concludes the validation of the results. Other performance measures exhibit excellent values for both of the models.

Practical implications

The proposed approach provides the maintenance team an opportunity to know about the upcoming breakdown in advance so that necessary measures can be taken to prevent failure in an industrial environment. As predictive maintenance incurs a high expense, it could be a better replacement for small and medium industrial plants.

Originality/value

Nowadays, maintenance is preventive-based rather than a corrective approach. The proposed technique is facilitating the concept of a proactive approach by minimizing the cost of additional maintenance steps. As predictive maintenance is efficient but incurs high expenses, this proposed method can minimize unnecessary maintenance operations and keep control over the budget. This is a significant way of developing maintenance programs and will make maintenance personnel ready for the machine breakdown.

Details

Journal of Quality in Maintenance Engineering, vol. 29 no. 1

Type: Research Article

DOI:

ISSN: 1355-2511

Keywords

View access options

Article

Publication date: 16 May 2016

Affective choosing of clustering and categorization representations in e-book interfaces

Ko-Chiu Wu and Tsai-Ying Hsieh

The purpose of this paper is to investigate user experiences with a touch-wall interface featuring both clustering and categorization representations of available e-books in a…

HTML

PDF (2.5 MB)

Downloads

757

Abstract

Purpose

The purpose of this paper is to investigate user experiences with a touch-wall interface featuring both clustering and categorization representations of available e-books in a public library to understand human information interactions under work-focused and recreational contexts.

Design/methodology/approach

Researchers collected questionnaires from 251 New Taipei City Library visitors who used the touch-wall interface to search for new titles. The authors applied structural equation modelling to examine relationships among hedonic/utilitarian needs, clustering and categorization representations, perceived ease of use (EU) and the extent to which users experienced anxiety and uncertainty (AU) while interacting with the interface.

Findings

Utilitarian users who have an explicit idea of what they intend to find tend to prefer the categorization interface. A hedonic-oriented user tends to prefer clustering interfaces. Users reported EU regardless of which interface they engaged with. Results revealed that use of the clustering interface had a negative correlation with AU. Users that seek to satisfy utilitarian needs tended to emphasize the importance of perceived EU, whilst pleasure-seeking users were a little more tolerant of anxiety or uncertainty.

Originality/value

The Online Public Access Catalogue (OPAC) encourages library visitors to borrow digital books through the implementation of an information visualization system. This situation poses an opportunity to validate uses and gratification theory. People with hedonic/utilitarian needs displayed different risk-control attitudes and affected uncertainty using the interface. Knowledge about user interaction with such interfaces is vital when launching the development of a new OPAC.

Details

Aslib Journal of Information Management, vol. 68 no. 3

Type: Research Article

DOI:

ISSN: 2050-3806

Keywords

View access options

Article

Publication date: 29 November 2024

Leveraging sentiment analysis via text mining to improve customer satisfaction in UK banks

Amirreza Ghadiridehkordi, Jia Shao, Roshan Boojihawon, Qianxi Wang and Hui Li

This study examines the role of online customer reviews through text mining and sentiment analysis to improve customer satisfaction across various services within the UK banking…

HTML

PDF (7.1 MB)

Downloads

81

Abstract

Purpose

This study examines the role of online customer reviews through text mining and sentiment analysis to improve customer satisfaction across various services within the UK banking sector. Additionally, the study analyses sentiment trends over a five-year period.

Design/methodology/approach

Using DistilBERT and Support Vector Machine algorithms, customer sentiments were assessed through an analysis of 20,137 Trustpilot reviews of HSBC, Santander, and Tesco Bank from 2018 to 2023. Data pre-processing steps were implemented to ensure data integrity and minimize noise.

Findings

Both positive and negative sentiments provide valuable insights. The results indicate a high prevalence of negative sentiments related to customer service and communication, with HSBC and Santander receiving 90.8% and 89.7% negative feedback, respectively, compared to Tesco Bank’s 66.8%. Key areas for improvement include HSBC’s credit card services and call center efficiency, which experienced increased negative feedback during the COVID-19 pandemic. The findings also demonstrate that DistilBERT excelled in categorizing reviews, while the SVM model, when combined with customer ratings, achieved 96% accuracy in sentiment analysis.

Research limitations/implications

This study focuses on UK bank consumers of HSBC, Santander, and Tesco Bank. A multi-country or cross-cultural study may further enhance our understanding of the approaches and findings.

Practical implications

Online customer reviews become more informative when categorised by service sector. To enhance customer satisfaction, bank managers should pay attention to both positive and negative reviews, and track trends over time.

Originality/value

The uniqueness of this study lies in its exploration of the importance of categorisation in text-mining-based sentiment analysis, its focus on the influence of both positive and negative sentiments, and its emphasis on tracking sentiment trends over time.

Details

International Journal of Bank Marketing, vol. 43 no. 4

Type: Research Article

DOI:

ISSN: 0265-2323

Keywords

View access options

Article

Publication date: 6 February 2017

Hybrid supervised clustering based ensemble scheme for text classification

Aytug Onan

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in…

HTML

PDF (234 KB)

Downloads

551

Abstract

Purpose

The immense quantity of available unstructured text documents serve as one of the largest source of information. Text classification can be an essential task for many purposes in information retrieval, such as document organization, text filtering and sentiment analysis. Ensemble learning has been extensively studied to construct efficient text classification schemes with higher predictive performance and generalization ability. The purpose of this paper is to provide diversity among the classification algorithms of ensemble, which is a key issue in the ensemble design.

Design/methodology/approach

An ensemble scheme based on hybrid supervised clustering is presented for text classification. In the presented scheme, supervised hybrid clustering, which is based on cuckoo search algorithm and k-means, is introduced to partition the data samples of each class into clusters so that training subsets with higher diversities can be provided. Each classifier is trained on the diversified training subsets and the predictions of individual classifiers are combined by the majority voting rule. The predictive performance of the proposed classifier ensemble is compared to conventional classification algorithms (such as Naïve Bayes, logistic regression, support vector machines and C4.5 algorithm) and ensemble learning methods (such as AdaBoost, bagging and random subspace) using 11 text benchmarks.

Findings

The experimental results indicate that the presented classifier ensemble outperforms the conventional classification algorithms and ensemble learning methods for text classification.

Originality/value

The presented ensemble scheme is the first to use supervised clustering to obtain diverse ensemble for text classification

Details

Kybernetes, vol. 46 no. 2

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 1 January 1993

UNCONVENTIONAL TEXT RETRIEVAL SYSTEMS

Ankie Visschedijk and Forbes Gibb

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional…

HTML

PDF (1.2 MB)

Downloads

541

Abstract

This article reviews some of the more unconventional text retrieval systems, emphasising those which have been commercialised. These sophisticated systems improve on conventional retrieval by using either innovative software or hardware to increase retrieval speed or functionality, precision or recall. The software systems reviewed are: AIDA, CLARIT, Metamorph, SIMPR, STATUS/IQ, TCS, TINA and TOPIC. The hardware systems reviewed are: CAFS‐ISP, the Connection Machine, GESCAN,HSTS,MPP, TEXTRACT, TRW‐FDF and URSA.

Details

Online and CD-Rom Review, vol. 17 no. 1

Type: Research Article

DOI:

ISSN: 1353-2642

Keywords

View access options

Article

Publication date: 4 May 2010

Review of data, text and web mining software

Qingyu Zhang and Richard S. Segall

The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.

HTML

PDF (2.1 MB)

Downloads

2933

Abstract

Purpose

The purpose of this paper is to review and compare selected software for data mining, text mining (TM), and web mining that are not available as free open‐source software.

Design/methodology/approach

Selected softwares are compared with their common and unique features. The software for data mining are SAS^® Enterprise Miner™, Megaputer PolyAnalyst^® 5.0, NeuralWare Predict^®, and BioDiscovery GeneSight^®. The software for TM are CompareSuite, SAS^® Text Miner, TextAnalyst, VisualText, Megaputer PolyAnalyst^® 5.0, and WordStat. The software for web mining are Megaputer PolyAnalyst^®, SPSS Clementine^®, ClickTracks, and QL2.

Findings

This paper discusses and compares the existing features, characteristics, and algorithms of selected software for data mining, TM, and web mining, respectively. These softwares are also applied to available data sets.

Research limitations/implications

The limitations are the inclusion of selected software and datasets rather than considering the entire realm of these. This review could be used as a framework for comparing other data, text, and web mining software.

Practical implications

This paper can be helpful for an organization or individual when choosing proper software to meet their mining needs.

Originality/value

Each of the software selected for this research has its own unique characteristics, properties, and algorithms. No other paper compares these selected softwares both visually and descriptively for all the three types of data, text, and web mining.

Details

Kybernetes, vol. 39 no. 4

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 13 February 2017

Big data text analytics: an enabler of knowledge management

Zaheer Khan and Tim Vorley

The purpose of this paper is to examine the role of big data text analytics as an enabler of knowledge management (KM). The paper argues that big data text analytics represents an…

HTML

PDF (798 KB)

Downloads

6051

Abstract

Purpose

The purpose of this paper is to examine the role of big data text analytics as an enabler of knowledge management (KM). The paper argues that big data text analytics represents an important means to visualise and analyse data, especially unstructured data, which have the potential to improve KM within organisations.

Design/methodology/approach

The study uses text analytics to review 196 articles published in two of the leading KM journals – Journal of Knowledge Management and Journal of Knowledge Management Research & Practice – in 2013 and 2014. The text analytics approach is used to process, extract and analyse the 196 papers to identify trends in terms of keywords, topics and keyword/topic clusters to show the utility of big data text analytics.

Findings

The findings show how big data text analytics can have a key enabler role in KM. Drawing on the 196 articles analysed, the paper shows the power of big data-oriented text analytics tools in supporting KM through the visualisation of data. In this way, the authors highlight the nature and quality of the knowledge generated through this method for efficient KM in developing a competitive advantage.

Research limitations/implications

The research has important implications concerning the role of big data text analytics in KM, and specifically the nature and quality of knowledge produced using text analytics. The authors use text analytics to exemplify the value of big data in the context of KM and highlight how future studies could develop and extend these findings in different contexts.

Practical implications

Results contribute to understanding the role of big data text analytics as a means to enhance the effectiveness of KM. The paper provides important insights that can be applied to different business functions, from supply chain management to marketing management to support KM, through the use of big data text analytics.

Originality/value

The study demonstrates the practical application of the big data tools for data visualisation, and, with it, improving KM.

Details

Journal of Knowledge Management, vol. 21 no. 1

Type: Research Article

DOI:

ISSN: 1367-3270

Keywords

View access options

Article

Publication date: 21 June 2023

Identifying research fronts in NLP applications in library and information science using meta-analysis approaches

Debasis Majhi and Bhaskar Mukherjee

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where…

HTML

PDF (596 KB)

Downloads

369

Abstract

Purpose

The purpose of this study is to identify the research fronts by analysing highly cited core papers adjusted with the age of a paper in library and information science (LIS) where natural language processing (NLP) is being applied significantly.

Design/methodology/approach

By excavating international databases, 3,087 core papers that received at least 5% of the total citations have been identified. By calculating the average mean years of these core papers, and total citations received, a CPT (citation/publication/time) value was calculated in all 20 fronts to understand how a front is relatively receiving greater attention among peers within a course of time. One theme article has been finally identified from each of these 20 fronts.

Findings

Bidirectional encoder representations from transformers with CPT value 1.608 followed by sentiment analysis with CPT 1.292 received highest attention in NLP research. Columbia University New York, in terms of University, Journal of the American Medical Informatics Association, in terms of journals, USA followed by People Republic of China, in terms of country and Xu, H., University of Texas, in terms of author are the top in these fronts. It is identified that the NLP applications boost the performance of digital libraries and automated library systems in the digital environment.

Practical implications

Any research fronts that are identified in the findings of this paper may be used as a base for researchers who intended to perform extensive research on NLP.

Originality/value

To the best of the authors’ knowledge, the methodology adopted in this paper is the first of its kind where meta-analysis approach has been used for understanding the research fronts in sub field like NLP for a broad domain like LIS.

Details

Digital Library Perspectives, vol. 39 no. 3

Type: Research Article

DOI:

ISSN: 2059-5816

Keywords

View access options

Article

Publication date: 19 June 2009

Resource discovery through social tagging: a classification and content analytic approach

Dion Hoe‐Lian Goh, Alton Chua, Chei Sian Lee and Khasfariyati Razikin

Social tagging systems allow users to assign keywords (tags) to useful resources, facilitating their future access by the tag creator and possibly by other users. Social tagging…

HTML

PDF (99 KB)

Downloads

1236

Abstract

Purpose

Social tagging systems allow users to assign keywords (tags) to useful resources, facilitating their future access by the tag creator and possibly by other users. Social tagging has both proponents and critics, and this paper aims to investigate if tags are an effective means of resource discovery.

Design/methodology/approach

The paper adopts techniques from text categorisation in which webpages and their associated tags from del.icio.us and trained Support Vector Machine (SVM) classifiers are downloaded to determine if the documents could be assigned to their associated tags. Two text categorisation experiments were conducted. The first used only the terms from the documents as features while the second experiment included tags in addition to terms as part of its feature set. Performance metrics used were precision, recall, accuracy and F1 score. A content analysis was also conducted to uncover characteristics of effective and ineffective tags for resource discovery.

Findings

Results from the classifiers were mixed, and the inclusion of tags as part of the feature set did not result in a statistically significant improvement (or degradation) of the performance of the SVM classifiers. This suggests that not all tags can be used for resource discovery by public users, confirming earlier work that there are many dynamic reasons for tagging documents that may not be apparent to others.

Originality/value

The authors extend their understanding of social classification and its utility in sharing and accessing resources. Results of this work may be used to guide development in social tagging systems as well as social tagging practices.

Details

Online Information Review, vol. 33 no. 3

Type: Research Article

DOI:

ISSN: 1468-4527

Keywords

Content available

Article

Publication date: 6 November 2017

Deep Text: Using Text Analytics to Conquer Information Overload, Get Real Value from Social Media, and Add Big(ger) Text to Big Data

Behrooz Bayat

HTML

PDF (39 KB)

Downloads

476

View access options

Article

Publication date: 20 June 2018

Indexing Arabic texts using association rule data mining

Ramzi A. Haraty and Rouba Nasrallah

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous…

HTML

PDF (363 KB)

Downloads

2291

Abstract

Purpose

The purpose of this paper is to propose a new model to enhance auto-indexing Arabic texts. The model denotes extracting new relevant words by relating those chosen by previous classical methods to new words using data mining rules.

Design/methodology/approach

The proposed model uses an association rule algorithm for extracting frequent sets containing related items – to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The associations of words extracted are illustrated as sets of words that appear frequently together.

Findings

The proposed methodology shows significant enhancement in terms of accuracy, efficiency and reliability when compared to previous works.

Research limitations/implications

The stemming algorithm can be further enhanced. In the Arabic language, we have many grammatical rules. The more we integrate rules to the stemming algorithm, the better the stemming will be. Other enhancements can be done to the stop-list. This is by adding more words to it that should not be taken into consideration in the indexing mechanism. Also, numbers should be added to the list as well as using the thesaurus system because it links different phrases or words with the same meaning to each other, which improves the indexing mechanism. The authors also invite researchers to add more pre-requisite texts to have better results.

Originality/value

In this paper, the authors present a full text-based auto-indexing method for Arabic text documents. The auto-indexing method extracts new relevant words by using data mining rules, which has not been investigated before. The method uses an association rule mining algorithm for extracting frequent sets containing related items to extract relationships between words in the texts to be indexed with words from texts that belong to the same category. The benefits of the method are demonstrated using empirical work involving several Arabic texts.

Details

Library Hi Tech, vol. 37 no. 1

Type: Research Article

DOI:

ISSN: 0737-8831

Keywords

View access options

Article

Publication date: 16 April 2018

Sentiment extraction and classification for the analysis of users’ interest in tweets

Alfredo Milani, Niyogi Rajdeep, Nimita Mangal, Rajat Kumar Mudgal and Valentina Franzoni

This paper aims to propose an approach for the analysis of user interest based on tweets, which can be used in the design of user recommendation systems. The extract topics are…

HTML

PDF (1 MB)

Downloads

349

Abstract

Purpose

This paper aims to propose an approach for the analysis of user interest based on tweets, which can be used in the design of user recommendation systems. The extract topics are seen positively by the user.

Design/methodology/approach

The proposed approach is based on the combination of sentiment extraction and classification analysis of tweet to extract the topic of interest. The proposed hybrid method is original. The topic extraction phase uses a method based on semantic distance in the WordNet taxonomy. Sentiment extraction uses NLPcore.

Findings

The algorithm has been extensively tested using real tweets generated by 1,000 users. The results are quite encouraging and outperform state-of-the-art results and confirm the suitability of the approach combining sentiment and categorization for the topic of interest extraction.

Research limitations/implications

The hybrid method combining sentiment extraction and classification for user positive topics represents a novel contribution with many potential applications.

Practical implications

The functionality of positive topic extraction is very useful as a component in the design of a recommender system based on user profiling from Twitter user behaviors.

Social implications

The application of the proposed method in short-text social network can be massive and beyond the applications in tweets.

Originality/value

There are few works that have considered both sentiment analysis and classification to find out users’ interest. The algorithm has been extensively tested using real tweets generated by 1,000 users. The results are quite encouraging and outperform state-of-the-art results.

Details

International Journal of Web Information Systems, vol. 14 no. 1

Type: Research Article

DOI:

ISSN: 1744-0084

Keywords

View access options

Article

Publication date: 30 July 2024

Data-driven decision-making method for determining the handling department for online appeals

Sheng-Qun Chen, Ting You and Jing-Lin Zhang

This study aims to enhance the classification and processing of online appeals by employing a deep-learning-based method. This method is designed to meet the requirements for…

HTML

PDF (1.7 MB)

Downloads

18

Abstract

Purpose

This study aims to enhance the classification and processing of online appeals by employing a deep-learning-based method. This method is designed to meet the requirements for precise information categorization and decision support across various management departments.

Design/methodology/approach

This study leverages the ALBERT–TextCNN algorithm to determine the appropriate department for managing online appeals. ALBERT is selected for its advanced dynamic word representation capabilities, rooted in a multi-layer bidirectional transformer architecture and enriched text vector representation. TextCNN is integrated to facilitate the development of multi-label classification models.

Findings

Comparative experiments demonstrate the effectiveness of the proposed approach and its significant superiority over traditional classification methods in terms of accuracy.

Originality/value

The original contribution of this study lies in its utilization of the ALBERT–TextCNN algorithm for the classification of online appeals, resulting in a substantial improvement in accuracy. This research offers valuable insights for management departments, enabling enhanced understanding of public appeals and fostering more scientifically grounded and effective decision-making processes.

Details

Kybernetes, vol. ahead-of-print no. ahead-of-print

Type: Research Article

DOI:

ISSN: 0368-492X