nal of Machine Learning Reseach , 200_中国高校课件下载中心

正在加载图片...

·24· 智能系统学报第2卷 nal of Machine Learning Reseach,2002(4):1471- [35]WANG X N,XU X,WU T,HE H G,ZHANGM.A 1530. hybrid reinforcement learning combined with SVM and [24]WEAVER L,TAO N.The optimal reward baseline for its applications [A ]Proceedings of the International gradient-based reinforcement learning[A ]Proceedings Conference on Sensing,Computing and Automation of 17#Conference in Uncertainty in Artificial Intelli- [C].Chongqing,China,2006. gence[C].Washington ,2001. [36]GHAVAMZADEH M,MA HADEVAN S.Hierarchi- [25]MUNOS R.Geometric variance reduction in Markov cal policy gradient algorithms[A].In Proceedings of chains:application to value function and gradient esti- the Twentieth International Conference on Machine mation [J ]Journal of Machine Learning Research, Learning[C].Washington,D.C.,2003. 2006(7):413,427. [37]DIETTERICH T.The MAXQ method for hierarchical [26]BERENJI H R,VEN GEROV D.A convergent actor- reinforcement learning[A].In:Proceedings of the fif- critic-based FRL algorithm with application to power teenth international conference on machine learning[C]. management of wireless tansmitters[J ]IEEE Transac- [s.1.1,1998. tions on Fuzzy Systems,2003,11(4):478-485. [38]PARR R.Hierarchical control and learning for Markov [27]GRUDIC GZ,UNGER L R.Localizing policy gradient decision processes[D].University of California,Berk- estimates to action transitions [A ]Seventeenth int. ley,1998 Conference on Machine Learning,Stanford University [39]SUTTON R,PRECUP D,SIN GH.Between MDPs [C].2000:343-350. and Semi-MDPs:a framework for temporal abstraction [28 ]NICOL N S,YU Jin,ABERDEEN D.Fast online poli- in reinforcement learning [J ]Artificial Intellignce, cy gradient learning with SMD gain vector adaptation 1999(112):181-211. [A ]In Advances in Neural Information Processing 作者简介： Systems (NIPS)[C].The MIT Press,Cambridge, 王学宁，男，1976年生，博士研究 MA,2006. 生，主要研究方向为增强学习、智能控 [29 ]BA GN ELL J A,SCHNEIDER J.Policy search in ker- 制等，参加国家自然科学基金重点项目 nel hilbert space EB/OL].http://citeseer.ist.psu. 一项、青年基金项目一项，863项目一 cduW650945.html.2005-05-27. 项，己发表论文10余篇，其中被SCI收 [30]NG A,JORDAN M.Pegasus:a policy search method 录3篇，EI收录5篇 for large MDPs and POMDPs approximation[A ]In E mail wxn9576 @yahoo.com.cn Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence [C].San Francisco,2000. [31 ]Diplomarbeit.The Reinforcement Learning Toolbox, 陈伟，男，1976年生，博士研究 Reinforcement Learning for Optimal Control Tasks[D], 生，主要研究方向为机器人定位与见 University of Technology,Graz,2005. 图、机器学习等，参加国家自然科学基 [32]NGA Y,KIM H J,MICHAEL I,SASTRY S.Au- 金重点项目一项。 tonomous helicopter flight via Reinforcement Learning [A ]In Neural Information Processing Systems 16[C]. [s.1.],2004. 张锰，男，1972年生，2001年毕 [33]STRENS MJ,MOORE A.Policy search using paired 业于国防科技大学计算机学院，获硕士 comparisons [J].Journal of Machine Learning Re- 学位.主要研究方向为指挥自动化.曾 search,2002(3):921-950. 获全军科技进步二等奖2项，全军科技 [34]MARTIN C.The essential dynamics algorithm:fast policy 进步三等奖3项，并在国内外科技期刊 Search in continuous worlds[R].MIT Media Laboratory, 上发表论文12篇，其中SCI检索1篇， Vsion and Modelling Technical Report.2004. EI检索3篇 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.netnal of Machine Learning Reseach , 2002 ( 4) : 1471 - 1530. [ 24 ]WEAV ER L , TAO N. The optimal reward baseline for gradient2based reinforcement learning[ A ]. Proceedings of 17 # Conference in Uncertainty in Artificial Intelli2 gence[C]. Washington ,2001. [25 ] MUNOS R. Geometric variance reduction in Markov chains: application to value function and gradient esti2 mation [J ]. Journal of Machine Learning Research , 2006 (7) :413 - 427. [26 ]BERENJ I H R , V EN GEROV D. A convergent actor2 critic2based FRL algorithm with application to power management of wireless tansmitters[J ]. IEEE Transac2 tions on Fuzzy Systems , 2003 , 11 (4) : 478 - 485. [27 ] GRUDIC G Z , UN GER L R. Localizing policy gradient estimates to action transitions [ A ]. Seventeenth int. Conference on Machine Learning , Stanford University [C]. 2000 :343 - 350. [28 ]NICOL N S , YU Jin ,ABERDEEN D. Fast online poli2 cy gradient learning with SMD gain vector adaptation [ A ]. In Advances in Neural Information Processing Systems ( NIPS) [ C ]. The MIT Press , Cambridge , MA , 2006. [29 ]BA GNELL J A , SCHN EIDER J. Policy search in ker2 nel hilbert space [ EB/ OL ]. http :/ / citeseer. ist. psu. edu/ 650945. html. 2005 - 05 - 27. [30 ]N G A , JORDAN M. Pegasus: a policy search method for large MDPs and POMDPs approximation [ A ]. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence [C]. San Francisco , 2000. [ 31 ] Diplomarbeit. The Reinforcement Learning Toolbox , Reinforcement Learning for Optimal Control Tasks[D] , University of Technology , Graz , 2005. [32 ]N G A Y, KIM H J , MICHA EL I , SASTR Y S. Au2 tonomous helicopter flight via Reinforcement Learning [ A ]. In Neural Information Processing Systems 16[C]. [s. l. ] , 2004. [33 ]STRENS M J , MOORE A. Policy search using paired comparisons [ J ]. Journal of Machine Learning Re2 search , 2002 (3) :921 - 950. [34 ]MARTIN C. The essential dynamics algorithm: fast policy Search in continuous worlds[ R]. MIT Media Laboratory , Vsion and Modelling Technical Report. 2004. [35 ]WAN G X N , XU X , WU T , HE H G, ZHAN G M. A hybrid reinforcement learning combined with SVM and its applications [ A ]. Proceedings of the International Conference on Sensing , Computing and Automation [C]. Chongqing , China , 2006. [36 ] GHAVAMZADEH M , MA HADEVAN S. Hierarchi2 cal policy gradient algorithms [ A ]. In Proceedings of the Twentieth International Conference on Machine Learning[C]. Washington , D. C. , 2003. [ 37 ]DIETTERICH T. The MAXQ method for hierarchical reinforcement learning[ A ]. In : Proceedings of the fif2 teenth international conference on machine learning[C]. [s. l. ] ,1998. [38 ]PARR R. Hierarchical control and learning for Markov decision processes[D ]. University of California , Berk2 ley ,1998. [39 ] SU TTON R , PRECU P D , SIN GH. Between MDPs and Semi2MDPs: a framework for temporal abstraction in reinforcement learning [J ]. Artificial Intellignce , 1999 (112) :181 - 211. 作者简介 : 王学宁 ,男 , 1976 年生 ,博士研究生 ,主要研究方向为增强学习、智能控制等 ,参加国家自然科学基金重点项目一项、青年基金项目一项 ,863 项目一项 ,已发表论文 10 余篇 ,其中被 SCI 收录 3 篇 ,EI 收录 5 篇. E2mail :wxn9576 @yahoo. com. cn 陈伟 ,男 , 1976 年生 ,博士研究生 ,主要研究方向为机器人定位与见图、机器学习等 ,参加国家自然科学基金重点项目一项. 张锰 ,男 ,1972 年生 ,2001 年毕业于国防科技大学计算机学院 ,获硕士学位. 主要研究方向为指挥自动化. 曾获全军科技进步二等奖 2 项 ,全军科技进步三等奖 3 项 ,并在国内外科技期刊上发表论文 12 篇 ,其中 SCI 检索 1 篇 , EI 检索 3 篇. ·24 · 智能系统学报第 2 卷

<<向上翻页

点击下载：【学术论文】增强学习中的直接策略搜索方法综述