SMU Markov process + reward reward horizon episode Return Value function V(s)=E[Gt∣Xt=s], where Gt=∑i=0∞γi⋅R(Xt)