点击下载:《机器学习》演示文稿(15)
正在加载图片...
Q Function Define new function very similar to V Q(s,a)=r(s,a)+yV(8(s, a If agent learns Q, it can choose optimal action even without knowing 8! T(s)=argmax[r(s,a)+yV(8(s, a)) 丌*(s)= argmax Q(s,a) Q is the evaluation function the agent will learn
<<向上翻页
向下翻页>>
点击下载:《机器学习》演示文稿(15)
©2008-现在 cucdc.com 高等教育资讯网 版权所有