Multi-Armed Bandit Library v0.1

IntervalEstimationGambler Class

Interval estimation gambler.

For a list of all members of this type, see IntervalEstimationGambler Members.


public class IntervalEstimationGambler : GamblerBase

Thread Safety

Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.


Intuively, the Interval Estimation algorithm makes an optimistic reward estimation with 100 (1 - alpha) % confidence interval for each lever. Then the lever of highest estimated upper bound is pulled. Notice that a smaller alpha leads to more exploration. In order to compute the upper bound of the mean estimate, the current implementation relies on the assumption that the lever mean estimate is normally distributed.

The Interval Estimation is due to KaelBling (1993) in Learning in Embedded Systems (MIT Press).


Namespace: Bandit.Stochastic

Assembly: Bandit (in Bandit.dll)

See Also

IntervalEstimationGambler Members | Bandit.Stochastic Namespace