SMU Markov reward process + actions P(s′∣s,a)=P(Xt+1=s′∣Xt=s,At=a) Reward R(s,a)=E[Rt∣Xt=s,At=a] Policy