The Markov Property

3.5 The Markov Property

This chapter is about the state requirement to have certain properties and mainly the Markov one.

State: information available by the agent

The state is more than just a sensation the agent has to be aware of. It can be highly processed versions of original sensations, complex structures built up over time and more.

But the state signal should not be expected to inform the agent of everything about the environment or even everything that would be useful to it in making decision. For example if the agent is playing blackjack, it should only be aware of the visibles data.o

A state signal that succeeds in retaining all relevant information is said to be Markov or to have the Markov property.

To simplify our definition, we assume that there are a finite number of state and reward value so we can work in term of sum and probabilities rather than integrals and probability densities but it can be easily extends.

Consider how a general environment might respond at time $t+1$ to the action taken at time $t$ . In the moste general, causal case this reponse may depend on everything that has happened earlier. In this case the dynamics can be defined only by specifuing the complete probability distribution:

$\begin{align} P_r\{R_{t+1} = r, S_{t+1} = s'| S_0, A_0, R_1, ..., S_{t-1}, A_{t-1}, R_t, S_t, A_t\} \end{align}$

However, if the state has the Markov Property, we can simplify the equation:

$\begin{align} P_r\{R_{t+1} = r, S_{t+1} = s'| S_t, A_t\} \end{align}$

If an environment has the Markov property, then its one-step dynamics enable us to predict the next state and expected next reward. By iterating this equation, one can predict all future state and expected rewards from knowledge only of the current state. It also follows that Markov states provide the best possiblebasis for choosing actions.

Even when the state signal is non-Markov, it is still appropriate to think of the state in reinforcement learning as an approximation to a Markov state.

The Markov property is important in reinforcement learning because decision and values are assumed to be a function only of the current state. Even if your case is not strictly Markov, the theory developed for the Markov case are still helpful to understand the behavior of the algorithms.