Training Rule to Learn Q Note Q and v closely related: V(s)=max Q(s, a) Which allows us to write Q recursively as Q(St, at)=r(St, at)+V(S(st, at)) r(St, at)+y max Q(st+1, a') Nice! Let Q denote learners current approximation to Q. Consider training rule Q(s,a)←r+maxQ(s,a’) where s is the state resulting from applying action a in state s