3.2 Goals and Rewards
The reward is a simple number, passed from the environment to the agent. The agent's goal is maximizing cumulative reward in the long run. We can clearly state this informal idea as the reward hypothesis:
That all of what we mean by goals and purposes can be well thought of as the maximization of the expected value of thecumulative sum of a received scalar signal (called reward).
You must express your goal in terms of reward. Take care not to try to reward sub-goal. For example in chess, your reward has to be :
- victory = +1
- draw = 0
- loss = -1
If you implement sub-goal by the means of capturing some pawn or something else, your algorithm may try to achieve its sub-goal at the cost of its ultimate goal.
It is the environment which give the reward and not the agent itself. We define the agent as everything we do not have full control over. We can only interact with it with actions. This does not preclude the agent from defining for itself a kind of internal reward or a sequence of internal rewards. Indeed, this is exactly what many reinforcement learning methods do.