Interval estimation gambler.
For a list of all members of this type, see IntervalEstimationGambler Members.
Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.
Intuively, the Interval Estimation algorithm makes an optimistic reward estimation with
100 (1 - alpha) % confidence interval for each lever. Then the lever of highest estimated upper bound is pulled. Notice that a smaller
alpha leads to more exploration. In order to compute the upper bound of the mean estimate, the current implementation relies on the assumption that the lever mean estimate is normally distributed.
The Interval Estimation is due to KaelBling (1993) in Learning in Embedded Systems (MIT Press).
Assembly: Bandit (in Bandit.dll)