Greedy epsilon strategy.
For a list of all members of this type, see EpsilonGreedyGambler Members.
Public static (Shared in Visual Basic) members of this type are safe for multithreaded operations. Instance members are not guaranteed to be thread-safe.
The epsilon greedy strategy is certainly the most simple strategy for the bandit problem. Intuitively, it consists of always pulling the lever of highest estimated mean, except when a random lever is pulled with an
The epsilon greedy strategy seems to appear first in Learning from Delayed Rewards by Watkins (1989), Phd Thesis, Cambridge University. This strategy is so simple that earlier use are likely.
Assembly: Bandit (in Bandit.dll)