正在加载图片...
Q Function Define new function very similar to V Q(s,a)=r(s,a)+yV(8(s, a If agent learns Q, it can choose optimal action even without knowing 8! T(s)=argmax[r(s,a)+yV(8(s, a)) 丌*(s)= argmax Q(s,a) Q is the evaluation function the agent will learn
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有