The Agent-Environment Interface

3.1 The Agent-Environment Interface

Some definition:

Agent: the learner and decision-maker
Environment: things the agent interact with
Action: the way the agent interact with its environment
Reward: a special numerical values send by the environment to the agent that it tried to maximize

Figure 1: The agent-environment interaction in reinforcement learning

The agent interact with its environment at discrete time steps. At each time step $t$ the agent receives some representation of the environment's state.

Let

$S_t \in \mathcal{S}$
$A_t \in \mathcal{A(S_t)}$
$R_{t+1} \in \mathcal{R} \subset \mathbb{R}$

Where :

$\mathcal{S}$ is the set of all possible states
$\mathcal{A(S_t)}$ is the set of actions available in state S_t
$R_{t+1}$ the reward associated to the previous action

At each time step, the agent implements a mapping from states to action. This mapping is called a policy.

$\pi_t$ is a policy at time step t
$\pi_t(a | s)$ is the probability that $A_t = a$ if $S_t = s$

The framework is abstract and flexible so it can easily be extended to many reinforcement problems.