Interval estimation gambler.
For a list of all members of this type, see IntervalEstimationGambler Members.
System.Object
Gambler
GamblerBase
IntervalEstimationGambler
Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.
Intuively, the Interval Estimation algorithm makes an optimistic reward estimation with 100 (1 - alpha) %
confidence interval for each lever. Then the lever of highest estimated upper bound is pulled. Notice that a smaller alpha
leads to more exploration. In order to compute the upper bound of the mean estimate, the current implementation relies on the assumption that the lever mean estimate is normally distributed.
The Interval Estimation is due to KaelBling (1993) in Learning in Embedded Systems (MIT Press).
Namespace: Bandit.Stochastic
Assembly: Bandit (in Bandit.dll)
IntervalEstimationGambler Members | Bandit.Stochastic Namespace