Search results
1 – 3 of 3Mohammad Amin Shayegan and Saeed Aghabozorgi
Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory…
Abstract
Purpose
Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory requirement for saving and processing data, and the time complexity for training algorithms. The purpose of the paper is to reduce the volume of training part of a data set – in order to increase the system speed, without any significant decrease in system accuracy.
Design/methodology/approach
A new technique for data set size reduction – using a version of modified frequency diagram approach – is presented. In order to reduce processing time, the proposed method compares the samples of a class to other samples in the same class, instead of comparing samples from different classes. It only removes patterns that are similar to the generated class template in each class. To achieve this aim, no feature extraction operation was carried out, in order to produce more precise assessment on the proposed data size reduction technique.
Findings
The results from the experiments, and according to one of the biggest handwritten numeral standard optical character recognition (OCR) data sets, Hoda, show a 14.88 percent decrease in data set volume without significant decrease in performance.
Practical implications
The proposed technique is effective for size reduction for all pictorial databases such as OCR data sets.
Originality/value
State-of-the-art algorithms currently used for data set size reduction usually remove samples near to class's centers, or support vector (SV) samples between different classes. However, the samples near to a class center have valuable information about class characteristics, and they are necessary to build a system model. Also, SV s are important samples to evaluate the system efficiency. The proposed technique, unlike the other available methods, keeps both outlier samples, as well as the samples close to the class centers.
Details
Keywords
Antonio Usai, Marco Pironti, Monika Mital and Chiraz Aouina Mejri
The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge…
Abstract
Purpose
The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics.
Design/methodology/approach
This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1).
Findings
The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment.
Originality/value
This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.
Details
Keywords
Olalekan Oshodi, David J. Edwards, Ka Chi lam, Ayokunle Olubunmi Olanipekun and Clinton Ohis Aigbavboa
Construction economics scholars have emphasised the importance of construction output forecasting and have called for increased investment in infrastructure projects due to the…
Abstract
Purpose
Construction economics scholars have emphasised the importance of construction output forecasting and have called for increased investment in infrastructure projects due to the positive relationship between construction output and economic growth. However, construction output tends to fluctuate over time. Excessive changes in the volume of construction output have a negative impact upon the construction sector, such as liquidation of construction companies and job losses. Information gleaned from extant literature suggests that fluctuation in construction output is a global problem. Evidence indicates that modelling of construction output provides information for understanding the factors responsible for these changes.
Methodology
An interpretivist epistemological lens is adopted to conduct a systematic review of published studies on modelling of construction output. A thematic analysis is then presented, and the trends and gaps in current knowledge are highlighted.
Findings
It is observed that interest rate is the most common determinant of construction output. Also revealed is that very little is known about the underlying factors stimulating growth in the volume of investment in maintenance construction works. Further work is required to investigate the efficacy of using non-linear techniques for construction output modelling.
Originality
This study provides a contemporary mapping of existing knowledge relating to construction output and provides insights into gaps in current understanding that can be explored by future researchers.
Details