Reinforcement Learning Goal

Reinforcement Learning is just Machine learning via Reinforcement (god damn genius I know).

In order words, we are giving good or bad reinforcement every now and then to the agent, which will construct a function that will hopefully be at least directionally proportional to the rewards that us (the environment) is giving.

Untitled

This function is referred as the objective function, and it’s pure definition is the expected return for our trajectory $\tau$:

$$ J(\pi) = \mathbb E_{\tau \sim \pi} [R(\tau)] $$

Where $R(\tau)$ represents our cummulative return during trajectory $\tau$, sampled from policy $\pi$.

So now our goal is to simply, find an optimal policy (denoted by $\pi^*$) that maximizes this function:

$$ \pi^* = argmax_\pi J(\pi) $$

Pracically however, tools like Pytorch make approximations using minimizations, not maximization. You pass in a loss function and the engine will calculate the gradients (derivatives) and walk along the direction that minimizes it.

Tabular Algorithms

Brain (Neural Networks) Based Algorithms