SMU

Markov process + reward

reward

horizon

episode

Return

Value function

where