PART ONE

Tabular Solution Methods

Chapter 2 Multi-arm Bandits

The n-armed bandits is a classical problem were we can experiment and explain some of classical reinforcement issues. You have n one-armed bandit (casino machines) and you try to know with which you'll get more gain.

In this chapter we study the evaluative aspect of reinforcement learning in simplified setting, one that does not involve learning to act in more than one situation. We call this non-associative setting.