Search results

1 – 3 of 3

View access options

Article

Publication date: 29 April 2014

A new method for Arabic/Farsi numeral data set size reduction via modified frequency diagram matching

Mohammad Amin Shayegan and Saeed Aghabozorgi

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory…

HTML

PDF (1.6 MB)

Downloads

152

Abstract

Purpose

Pattern recognition systems often have to handle problem of large volume of training data sets including duplicate and similar training samples. This problem leads to large memory requirement for saving and processing data, and the time complexity for training algorithms. The purpose of the paper is to reduce the volume of training part of a data set – in order to increase the system speed, without any significant decrease in system accuracy.

Design/methodology/approach

A new technique for data set size reduction – using a version of modified frequency diagram approach – is presented. In order to reduce processing time, the proposed method compares the samples of a class to other samples in the same class, instead of comparing samples from different classes. It only removes patterns that are similar to the generated class template in each class. To achieve this aim, no feature extraction operation was carried out, in order to produce more precise assessment on the proposed data size reduction technique.

Findings

The results from the experiments, and according to one of the biggest handwritten numeral standard optical character recognition (OCR) data sets, Hoda, show a 14.88 percent decrease in data set volume without significant decrease in performance.

Practical implications

The proposed technique is effective for size reduction for all pictorial databases such as OCR data sets.

Originality/value

State-of-the-art algorithms currently used for data set size reduction usually remove samples near to class's centers, or support vector (SV) samples between different classes. However, the samples near to a class center have valuable information about class characteristics, and they are necessary to build a system model. Also, SV s are important samples to evaluate the system efficiency. The proposed technique, unlike the other available methods, keeps both outlier samples, as well as the samples close to the class centers.

Details

Kybernetes, vol. 43 no. 5

Type: Research Article

DOI:

ISSN: 0368-492X

Keywords

View access options

Article

Publication date: 31 May 2018

Knowledge discovery out of text data: a systematic review via text mining

Antonio Usai, Marco Pironti, Monika Mital and Chiraz Aouina Mejri

The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge…

HTML

PDF (1.4 MB)

Downloads

4313

Abstract

Purpose

The aim of this work is to increase awareness of the potential of the technique of text mining to discover knowledge and further promote research collaboration between knowledge management and the information technology communities. Since its emergence, text mining has involved multidisciplinary studies, focused primarily on database technology, Web-based collaborative writing, text analysis, machine learning and knowledge discovery. However, owing to the large amount of research in this field, it is becoming increasingly difficult to identify existing studies and therefore suggest new topics.

Design/methodology/approach

This article offers a systematic review of 85 academic outputs (articles and books) focused on knowledge discovery derived from the text mining technique. The systematic review is conducted by applying “text mining at the term level, in which knowledge discovery takes place on a more focused collection of words and phrases that are extracted from and label each document” (Feldman et al., 1998, p. 1).

Findings

The results revealed that the keywords extracted to be associated with the main labels, id est, knowledge discovery and text mining, can be categorized in two periods: from 1998 to 2009, the term knowledge and text were always used. From 2010 to 2017 in addition to these terms, sentiment analysis, review manipulation, microblogging data and knowledgeable users were the other terms frequently used. Besides this, it is possible to notice the technical, engineering nature of each term present in the first decade. Whereas, a diverse range of fields such as business, marketing and finance emerged from 2010 to 2017 owing to a greater interest in the online environment.

Originality/value

This is a first comprehensive systematic review on knowledge discovery and text mining through the use of a text mining technique at term level, which offers to reduce redundant research and to avoid the possibility of missing relevant publications.

Details

Journal of Knowledge Management, vol. 22 no. 7

Type: Research Article

DOI:

ISSN: 1367-3270

Keywords

View access options

Article

Publication date: 15 May 2020

Construction output modelling: a systematic review

Olalekan Oshodi, David J. Edwards, Ka Chi lam, Ayokunle Olubunmi Olanipekun and Clinton Ohis Aigbavboa

Construction economics scholars have emphasised the importance of construction output forecasting and have called for increased investment in infrastructure projects due to the…

HTML

PDF (676 KB)

Downloads

583

Abstract

Purpose

Construction economics scholars have emphasised the importance of construction output forecasting and have called for increased investment in infrastructure projects due to the positive relationship between construction output and economic growth. However, construction output tends to fluctuate over time. Excessive changes in the volume of construction output have a negative impact upon the construction sector, such as liquidation of construction companies and job losses. Information gleaned from extant literature suggests that fluctuation in construction output is a global problem. Evidence indicates that modelling of construction output provides information for understanding the factors responsible for these changes.

Methodology

An interpretivist epistemological lens is adopted to conduct a systematic review of published studies on modelling of construction output. A thematic analysis is then presented, and the trends and gaps in current knowledge are highlighted.

Findings

It is observed that interest rate is the most common determinant of construction output. Also revealed is that very little is known about the underlying factors stimulating growth in the volume of investment in maintenance construction works. Further work is required to investigate the efficacy of using non-linear techniques for construction output modelling.

Originality

This study provides a contemporary mapping of existing knowledge relating to construction output and provides insights into gaps in current understanding that can be explored by future researchers.

Details

Engineering, Construction and Architectural Management, vol. 27 no. 10

Type: Research Article

DOI:

ISSN: 0969-9988

Keywords

Access

Year

All dates (3)

Content type

Article (3)

1 – 3 of 3

A new method for Arabic/Farsi numeral data set size reduction via modified frequency diagram matching

Abstract

Purpose

Design/methodology/approach

Findings

Practical implications

Originality/value

Details

Keywords

Knowledge discovery out of text data: a systematic review via text mining

Abstract

Purpose

Design/methodology/approach

Findings

Originality/value

Details

Keywords

Construction output modelling: a systematic review

Abstract

Purpose

Methodology

Findings

Originality

Details

Keywords

Access

Year

Content type

All feedback is valuable

Report an issue or find answers to frequently asked questions