Greedy epsilon strategy.
For a list of all members of this type, see EpsilonGreedyGambler Members.
System.Object
Gambler
GamblerBase
EpsilonGreedyGambler
Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.
The epsilon greedy strategy is certainly the most simple strategy for the bandit problem. Intuitively, it consists of always pulling the lever of highest estimated mean, except when a random lever is pulled with an epsilon
frequency.
The epsilon greedy strategy seems to appear first in Learning from Delayed Rewards by Watkins (1989), Phd Thesis, Cambridge University. This strategy is so simple that earlier use are likely.
Namespace: Bandit.Stochastic
Assembly: Bandit (in Bandit.dll)
EpsilonGreedyGambler Members | Bandit.Stochastic Namespace