Search results

1 – 1 of 1

Per page

10 20 50

(0)

Citations:

View access options

Article

Publication date: 8 June 2010

Solving two‐armed Bernoulli bandit problems using a Bayesian learning automaton

Ole‐Christoffer Granmo

The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull…

HTML

PDF (273 KB)

Downloads

703

Abstract

Purpose

The two‐armed Bernoulli bandit (TABB) problem is a classical optimization problem where an agent sequentially pulls one of two arms attached to a gambling machine, with each pull resulting either in a reward or a penalty. The reward probabilities of each arm are unknown, and thus one must balance between exploiting existing knowledge about the arms, and obtaining new information. The purpose of this paper is to report research into a completely new family of solution schemes for the TABB problem: the Bayesian learning automaton (BLA) family.

Design/methodology/approach

Although computationally intractable in many cases, Bayesian methods provide a standard for optimal decision making. BLA avoids the problem of computational intractability by not explicitly performing the Bayesian computations. Rather, it is based upon merely counting rewards/penalties, combined with random sampling from a pair of twin Beta distributions. This is intuitively appealing since the Bayesian conjugate prior for a binomial parameter is the Beta distribution.

Findings

BLA is to be proven instantaneously self‐correcting, and it converges to only pulling the optimal arm with probability as close to unity as desired. Extensive experiments demonstrate that the BLA does not rely on external learning speed/accuracy control. It also outperforms established non‐Bayesian top performers for the TABB problem. Finally, the BLA provides superior performance in a distributed application, namely, the Goore game (GG).

Originality/value

The value of this paper is threefold. First of all, the reported BLA takes advantage of the Bayesian perspective for tackling TABBs, yet avoids the computational complexity inherent in Bayesian approaches. Second, the improved performance offered by the BLA opens up for increased accuracy in a number of TABB‐related applications, such as the GG. Third, the reported results form the basis for a new avenue of research – even for cases when the reward/penalty distribution is not Bernoulli distributed. Indeed, the paper advocates the use of a Bayesian methodology, used in conjunction with the corresponding appropriate conjugate prior.

Details

International Journal of Intelligent Computing and Cybernetics, vol. 3 no. 2

Type: Research Article

DOI:

ISSN: 1756-378X

Keywords

Access

Year

All dates (1)
From To Go

Content type

Article (1)