Search results
1 – 10 of 10Wei-Chao Lin, Shih-Wen Ke and Chih-Fong Tsai
Data mining is widely considered necessary in many business applications for effective decision-making. The importance of business data mining is reflected by the existence of…
Abstract
Purpose
Data mining is widely considered necessary in many business applications for effective decision-making. The importance of business data mining is reflected by the existence of numerous surveys in the literature focusing on the investigation of related works using data mining techniques for solving specific business problems. The purpose of this paper is to answer the following question: What are the widely used data mining techniques in business applications?
Design/methodology/approach
The aim of this paper is to examine related surveys in the literature and thus to identify the frequently applied data mining techniques. To ensure the recent relevance and quality of the conclusions, the criterion for selecting related studies are that the works be published in reputed journals within the past 10 years.
Findings
There are 33 different data mining techniques employed in eight different application areas. Most of them are supervised learning techniques and the application area where such techniques are most often seen is bankruptcy prediction, followed by the areas of customer relationship management, fraud detection, intrusion detection and recommender systems. Furthermore, the widely used ten data mining techniques for business applications are the decision tree (including C4.5 decision tree and classification and regression tree), genetic algorithm, k-nearest neighbor, multilayer perceptron neural network, naïve Bayes and support vector machine as the supervised learning techniques and association rule, expectation maximization and k-means as the unsupervised learning techniques.
Originality/value
The originality of this paper is to survey the recent 10 years of related survey and review articles about data mining in business applications to identify the most popular techniques.
Details
Keywords
Wei-Chao Lin, Shih-Wen Ke and Chih-Fong Tsai
This paper aims to introduce a prototype system called SAFQuery (Simple And Flexible Query interface). In many existing Web search interfaces, simple and advanced query processes…
Abstract
Purpose
This paper aims to introduce a prototype system called SAFQuery (Simple And Flexible Query interface). In many existing Web search interfaces, simple and advanced query processes are treated separately that cannot be issued interchangeably. In addition, after several rounds of queries for specific information need(s), it is possible that users might wish to re-examine the retrieval results corresponding to some previous queries or to slightly modify some of the specific queries issued before. However, it is often hard to remember what queries have been issued. These factors make the current Web search process not very simple or flexible.
Design/methodology/approach
In SAFQuery, the simple and advanced query strategies are integrated into a single interface, which can easily formulate query specifications when needed in the same interface. Moreover, query history information is provided that displays the past query specifications, which can help with the memory load.
Findings
The authors' experiments by user evaluation show that most users had a positive experience when using SAFQuery. Specifically, it is easy to use and can simplify the Web search task.
Originality/value
The proposed prototype system provides simple and flexible Web search strategies. Particularly, it allows users to easily issue simple and advanced queries based on one single query interface, interchangeably. In addition, users can easily input previously issued queries without spending time to recall what the queries are and/or to re-type previous queries.
Details
Keywords
Cheng-Che Shen, Ya-Han Hu, Wei-Chao Lin, Chih-Fong Tsai and Shih-Wen Ke
The purpose of this paper is to focus on examining the research impact of papers written with and without funding. Specifically, the citation analysis method is used to compare…
Abstract
Purpose
The purpose of this paper is to focus on examining the research impact of papers written with and without funding. Specifically, the citation analysis method is used to compare the general and funded papers published in two leading international conferences, which are ACM SIGIR and ACM SIGKDD.
Design/methodology/approach
The authors investigate the number of general and funded papers to see whether the number of funded papers is larger than the number of general papers. In addition, the total citations and the number of highly cited papers with and without funding are also compared.
Findings
The analysis results of ACM SIGIR papers show that in most cases the number of funded papers is larger than the number of general papers. Moreover, the total captions, the average number of citations per paper, and the number of highly cited papers all reveal the superiority of funded papers over general papers. However, the findings are somewhat different for the ACM SIGKDD papers. This may be because ACM SIGIR began much earlier than ACM SIGKDD, which relates to the maturity of the research problems addressed in these two conferences.
Originality/value
The value of this paper is the first attempt at examining the research impact of general and funded research papers by the citation analysis method. The research impact of other research areas can be further investigated by other analysis methods.
Details
Keywords
Shih-Wen Ke, Wei-Chao Lin, Chih-Fong Tsai and Ya-Han Hu
Conference publications are an important aspect of research activities. There are generally both oral presentations and poster sessions at large international conferences. One can…
Abstract
Purpose
Conference publications are an important aspect of research activities. There are generally both oral presentations and poster sessions at large international conferences. One can hypothesise that, for the same conferences, the papers presented in oral sessions should have a higher research impact than the papers presented in poster sessions. However, there has been no related study examining the validity of this hypothesis. In other words, the difference of research impact between papers presented orally or during poster sessions has not been discussed in literature. Therefore, the purpose of this paper is to conduct a citation analysis to compare the research impact of papers presented in oral and poster sessions.
Design/methodology/approach
In this paper, data from three leading conferences in the field of computer vision are examined, namely CVPR (2011 and 2012), ICCV (2011) and ECCV (2012). Several types of citation-related statistics are collected, including the number of highly cited papers (i.e. high number of citations) presented in oral and poster sessions, the total citations of both types of papers, the average citations of oral and poster papers, and the average citations of each frequently cited paper of both types.
Findings
There are three main findings. First, a larger proportion of highly cited papers are from oral sessions than poster sessions. Second, the average number of citations per paper is larger for those presented in oral sessions than poster sessions. Third, the average number of citations for highly cited papers presented in oral sessions is not necessarily greater than for the ones presented in poster sessions.
Originality/value
The originality of this paper is that it is the first attempt to examine the differences of citation impacts of conference papers presented in oral and poster sessions. The findings of this study will allow future bibliometrics research to further explore this related issue for longer periods and different fields.
Details
Keywords
Wei-Chao Lin, Chih-Fong Tsai and Shih-Wen Ke
In many research areas, there are a variety of different types of academic publications, including journals, magazines and conferences, which provide outlets for researchers to…
Abstract
Purpose
In many research areas, there are a variety of different types of academic publications, including journals, magazines and conferences, which provide outlets for researchers to present their findings. Generally speaking, although there are differences in the reviewing criteria and publication processes of different publication types, in the same research area, there is certainly overlap in terms of the problems addressed and the audience for different publication types. Therefore, the research impacts of different publication types in the same research area should be moderately or highly correlated. The paper aims to discuss these issues.
Design/methodology/approach
To prove this hypothesis, the authors examine the correlation coefficient of citation impacts for different types of publications, in seven research areas of computer science, from 2000 to 2013. In particular, four related citation statistics are examined for each publication type, which are average citations per paper, average citations per year, average annual increase in individual h-index, and h-index.
Findings
The analysis results show only a partial correlation in terms of several specific citation measures for different publication types in the same research area. Moreover, the level of correlation of the citation impact between different publication types is different, depending on the research area.
Originality/value
The contribution of this paper is to investigate whether the research impact of different types of publications in the same area is correlated. The findings can help guide researchers and academics choose the most appropriate publication outlets.
Details
Keywords
Chih-Fong Tsai, Ya-Han Hu and Shih-Wen George Ke
Ranking relevant journals is very critical for researchers to choose their publication outlets, which can affect their research performance. In the management information systems…
Abstract
Purpose
Ranking relevant journals is very critical for researchers to choose their publication outlets, which can affect their research performance. In the management information systems (MIS) subject, many related studies conducted surveys as the subjective method for identifying MIS journal rankings. However, very few consider other objective methods, such as journals’ impact factors and h-indexes. The paper aims to discuss these issues.
Design/methodology/approach
In this paper, top 50 ranked journals identified by researchers’ perceptions are examined in terms of the correlation to the rankings by their impact factors and h-indexes. Moreover, a hybrid method to combine these different rankings based on Borda count is used to produce new MIS journal rankings.
Findings
The results show that there are low correlations between the subjective and objective based MIS journal rankings. In addition, the new MIS journal rankings by the Borda count approach can also be considered for future researches.
Originality/value
The contribution of this paper is to apply the Borda count approach to combine different MIS journal rankings produced by subjective and objective methods. The new MIS journal rankings and previous studies can be complementary to allow researchers to determine the top-ranked journals for their publication outlets.
Details
Keywords
Wei-Chao Lin, Chih-Fong Tsai and Shih-Wen Ke
Churn prediction is a very important task for successful customer relationship management. In general, churn prediction can be achieved by many data mining techniques. However…
Abstract
Purpose
Churn prediction is a very important task for successful customer relationship management. In general, churn prediction can be achieved by many data mining techniques. However, during data mining, dimensionality reduction (or feature selection) and data reduction are the two important data preprocessing steps. In particular, the aims of feature selection and data reduction are to filter out irrelevant features and noisy data samples, respectively. The purpose of this paper, performing these data preprocessing tasks, is to make the mining algorithm produce good quality mining results.
Design/methodology/approach
Based on a real telecom customer churn data set, seven different preprocessed data sets based on performing feature selection and data reduction by different priorities are used to train the artificial neural network as the churn prediction model.
Findings
The results show that performing data reduction first by self-organizing maps and feature selection second by principal component analysis can allow the prediction model to provide the highest prediction accuracy. In addition, this priority allows the prediction model for more efficient learning since 66 and 62 percent of the original features and data samples are reduced, respectively.
Originality/value
The contribution of this paper is to understand the better procedure of performing the two important data preprocessing steps for telecom churn prediction.
Details
Keywords
Chih-Fong Tsai, Shih-Wen Ke, Kenneth McGarry and Ming-Yi Lin
The purpose of this paper is to introduce a novel personal scientific document retrieval system. The most common approach taken for the storage of personal documents is to…
Abstract
Purpose
The purpose of this paper is to introduce a novel personal scientific document retrieval system. The most common approach taken for the storage of personal documents is to construct a hierarchical folder structure. Most users prefer searching for documents by manually traversing their organizational hierarchy until reaching the location where the target item is stored, then locating the specific documents within its directory or folder. However, this is very time-consuming, especially when the number of personal scientific documents is very large. Unfortunately, related personal information management (PIM) systems, which provide solutions for managing various types of personal information, have thus far made little progress at managing personal scientific documents.
Design/methodology/approach
In this paper, we introduce the design of a personal scientific document retrieval system, namely, LocalContent. It is composed of database indexing and retrieval stages. During indexing, term feature extraction from scientific documents is performed by the natural language processing technique. The extracted terms are stored in the inverted index for later retrieval. For retrieval, a graphical user interface is provided by LocalContent, which allows users to search their personal scientific documents.
Findings
The evaluation results based on 20 different personal archives taken from 20 graduate students show that LocalContent is simple to use and can facilitate the search for relevant scientific documents. Moreover, these users were willing to have a system which provides specialized search functions like LocalContent to explore their personal scientific documents in the future.
Originality/value
LocalContent is a novel scientific document retrieval system and provides several particular functions of LocalContent including displaying the content summary of the query term frequency in each specific section of the retrieved documents, querying by local section specification and providing a number of recommended keywords related to the query terms.
Details
Keywords
The problems of transient natural convection from a corrugated plateembedded in an enclosed porous medium is studied numerically. The non‐Darcianeffects as well as the…
Abstract
The problems of transient natural convection from a corrugated plate embedded in an enclosed porous medium is studied numerically. The non‐Darcian effects as well as the acceleration terms are taken into consideration in the momentum equation. The governing equations in terms of vorticity, stream function and temperature are expressed in a body‐fitted coordinates system, which were solved numerically by the finite difference method. Results are presented in terms of streamlines and isotherms, local and average Nusselt numbers, with Darcy‐Rayleigh number ranging from 0 to 1000, and Darcy number from 10–4 to 10–1, for several aspect ratios of the cavity and plate positions. The flow and heat transfer characteristics for a corrugated plate and a flat plate and the numerical results solved with four different mathematical models are also compared.
Details
Keywords