正在加载图片...
Training Rule to Learn Q Note Q and v closely related: V(s)=max Q(s, a) Which allows us to write Q recursively as Q(St, at)=r(St, at)+V(S(st, at)) r(St, at)+y max Q(st+1, a') Nice! Let Q denote learners current approximation to Q. Consider training rule Q(s,a)←r+maxQ(s,a’) where s is the state resulting from applying action a in state s
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有