Minghu Ha, Jiqiang Chen, Witold Pedrycz and Lu Sun
Bounds on the rate of convergence of learning processes based on random samples and probability are one of the essential components of statistical learning theory (SLT). The…
Abstract
Purpose
Bounds on the rate of convergence of learning processes based on random samples and probability are one of the essential components of statistical learning theory (SLT). The constructive distribution‐independent bounds on generalization are the cornerstone of constructing support vector machines. Random sets and set‐valued probability are important extensions of random variables and probability, respectively. The paper aims to address these issues.
Design/methodology/approach
In this study, the bounds on the rate of convergence of learning processes based on random sets and set‐valued probability are discussed. First, the Hoeffding inequality is enhanced based on random sets, and then making use of the key theorem the non‐constructive distribution‐dependent bounds of learning machines based on random sets in set‐valued probability space are revisited. Second, some properties of random sets and set‐valued probability are discussed.
Findings
In the sequel, the concepts of the annealed entropy, the growth function, and VC dimension of a set of random sets are presented. Finally, the paper establishes the VC dimension theory of SLT based on random sets and set‐valued probability, and then develops the constructive distribution‐independent bounds on the rate of uniform convergence of learning processes. It shows that such bounds are important to the analysis of the generalization abilities of learning machines.
Originality/value
SLT is considered at present as one of the fundamental theories about small statistical learning.
Details
Keywords
Minghu Ha, Witold Pedrycz, Jiqiang Chen and Lifang Zheng
The purpose of this paper is to introduce some basic knowledge of statistical learning theory (SLT) based on random set samples in set‐valued probability space for the first time…
Abstract
Purpose
The purpose of this paper is to introduce some basic knowledge of statistical learning theory (SLT) based on random set samples in set‐valued probability space for the first time and generalize the key theorem and bounds on the rate of uniform convergence of learning theory in Vapnik, to the key theorem and bounds on the rate of uniform convergence for random sets in set‐valued probability space. SLT based on random samples formed in probability space is considered, at present, as one of the fundamental theories about small samples statistical learning. It has become a novel and important field of machine learning, along with other concepts and architectures such as neural networks. However, the theory hardly handles statistical learning problems for samples that involve random set samples.
Design/methodology/approach
Being motivated by some applications, in this paper a SLT is developed based on random set samples. First, a certain law of large numbers for random sets is proved. Second, the definitions of the distribution function and the expectation of random sets are introduced, and the concepts of the expected risk functional and the empirical risk functional are discussed. A notion of the strict consistency of the principle of empirical risk minimization is presented.
Findings
The paper formulates and proves the key theorem and presents the bounds on the rate of uniform convergence of learning theory based on random sets in set‐valued probability space, which become cornerstones of the theoretical fundamentals of the SLT for random set samples.
Originality/value
The paper provides a studied analysis of some theoretical results of learning theory.