3.3 Returns
Let's defined our goal formally. We want to maximize the expected return.
- Let a specific function of the reward sequence
- Let is a final time step
There is two kind of problems:
- episodic problems
- there are episodes which ends in a special state called the terminal state
- followed by a reset to a standard starting state.
- continiuing problems
- the opposite of episodic problems
- in this case the final step
We will use an other mathematical definition of returns to simplify both problems
is called the discount rate.
The discount rate determines the present value of future rewards. The higher is, the more strongly the objective takes futures rewards into account