Artur Wiliński and Stanisław Osowski
The purpose of this paper is to discover the most important genes generated by the gene expression arrays, responsible for the recognition of particular types of cancer.
Abstract
Purpose
The purpose of this paper is to discover the most important genes generated by the gene expression arrays, responsible for the recognition of particular types of cancer.
Design/methodology/approach
The paper presents the analysis of different techniques of gene selection, including correlation, statistical hypothesis, clusterization and linear support vector machine (SVM).
Findings
The correctness of the gene selection is proved by mapping the distribution of selected genes on the two‐coordinate system formed by two most important principal components of the PCA transformation. Final confirmation of this approach are the classification results of recognition of several types of cancer, performed using Gaussian kernel SVM.
Originality/value
The results of selection of the most significant genes used for the SVM recognition of seven types of cancer have confirmed good accuracy of results. The presented methodology is of potential use in practical application in bioinformatics.