正在加载图片...
·24· 智能系统学报 第2卷 nal of Machine Learning Reseach,2002(4):1471- [35]WANG X N,XU X,WU T,HE H G,ZHANGM.A 1530. hybrid reinforcement learning combined with SVM and [24]WEAVER L,TAO N.The optimal reward baseline for its applications [A ]Proceedings of the International gradient-based reinforcement learning[A ]Proceedings Conference on Sensing,Computing and Automation of 17#Conference in Uncertainty in Artificial Intelli- [C].Chongqing,China,2006. gence[C].Washington ,2001. [36]GHAVAMZADEH M,MA HADEVAN S.Hierarchi- [25]MUNOS R.Geometric variance reduction in Markov cal policy gradient algorithms[A].In Proceedings of chains:application to value function and gradient esti- the Twentieth International Conference on Machine mation [J ]Journal of Machine Learning Research, Learning[C].Washington,D.C.,2003. 2006(7):413,427. [37]DIETTERICH T.The MAXQ method for hierarchical [26]BERENJI H R,VEN GEROV D.A convergent actor- reinforcement learning[A].In:Proceedings of the fif- critic-based FRL algorithm with application to power teenth international conference on machine learning[C]. management of wireless tansmitters[J ]IEEE Transac- [s.1.1,1998. tions on Fuzzy Systems,2003,11(4):478-485. [38]PARR R.Hierarchical control and learning for Markov [27]GRUDIC GZ,UNGER L R.Localizing policy gradient decision processes[D].University of California,Berk- estimates to action transitions [A ]Seventeenth int. ley,1998 Conference on Machine Learning,Stanford University [39]SUTTON R,PRECUP D,SIN GH.Between MDPs [C].2000:343-350. and Semi-MDPs:a framework for temporal abstraction [28 ]NICOL N S,YU Jin,ABERDEEN D.Fast online poli- in reinforcement learning [J ]Artificial Intellignce, cy gradient learning with SMD gain vector adaptation 1999(112):181-211. [A ]In Advances in Neural Information Processing 作者简介: Systems (NIPS)[C].The MIT Press,Cambridge, 王学宁,男,1976年生,博士研究 MA,2006. 生,主要研究方向为增强学习、智能控 [29 ]BA GN ELL J A,SCHNEIDER J.Policy search in ker- 制等,参加国家自然科学基金重点项目 nel hilbert space EB/OL].http://citeseer.ist.psu. 一项、青年基金项目一项,863项目一 cduW650945.html.2005-05-27. 项,己发表论文10余篇,其中被SCI收 [30]NG A,JORDAN M.Pegasus:a policy search method 录3篇,EI收录5篇 for large MDPs and POMDPs approximation[A ]In E mail wxn9576 @yahoo.com.cn Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence [C].San Francisco,2000. [31 ]Diplomarbeit.The Reinforcement Learning Toolbox, 陈伟,男,1976年生,博士研究 Reinforcement Learning for Optimal Control Tasks[D], 生,主要研究方向为机器人定位与见 University of Technology,Graz,2005. 图、机器学习等,参加国家自然科学基 [32]NGA Y,KIM H J,MICHAEL I,SASTRY S.Au- 金重点项目一项。 tonomous helicopter flight via Reinforcement Learning [A ]In Neural Information Processing Systems 16[C]. [s.1.],2004. 张锰,男,1972年生,2001年毕 [33]STRENS MJ,MOORE A.Policy search using paired 业于国防科技大学计算机学院,获硕士 comparisons [J].Journal of Machine Learning Re- 学位.主要研究方向为指挥自动化.曾 search,2002(3):921-950. 获全军科技进步二等奖2项,全军科技 [34]MARTIN C.The essential dynamics algorithm:fast policy 进步三等奖3项,并在国内外科技期刊 Search in continuous worlds[R].MIT Media Laboratory, 上发表论文12篇,其中SCI检索1篇, Vsion and Modelling Technical Report.2004. EI检索3篇 1994-2009 China Academic Journal Electronic Publishing House.All rights reserved.http://www.cnki.netnal of Machine Learning Reseach , 2002 ( 4) : 1471 - 1530. [ 24 ]WEAV ER L , TAO N. The optimal reward baseline for gradient2based reinforcement learning[ A ]. Proceedings of 17 # Conference in Uncertainty in Artificial Intelli2 gence[C]. Washington ,2001. [25 ] MUNOS R. Geometric variance reduction in Markov chains: application to value function and gradient esti2 mation [J ]. Journal of Machine Learning Research , 2006 (7) :413 - 427. [26 ]BERENJ I H R , V EN GEROV D. A convergent actor2 critic2based FRL algorithm with application to power management of wireless tansmitters[J ]. IEEE Transac2 tions on Fuzzy Systems , 2003 , 11 (4) : 478 - 485. [27 ] GRUDIC G Z , UN GER L R. Localizing policy gradient estimates to action transitions [ A ]. Seventeenth int. Conference on Machine Learning , Stanford University [C]. 2000 :343 - 350. [28 ]NICOL N S , YU Jin ,ABERDEEN D. Fast online poli2 cy gradient learning with SMD gain vector adaptation [ A ]. In Advances in Neural Information Processing Systems ( NIPS) [ C ]. The MIT Press , Cambridge , MA , 2006. [29 ]BA GNELL J A , SCHN EIDER J. Policy search in ker2 nel hilbert space [ EB/ OL ]. http :/ / citeseer. ist. psu. edu/ 650945. html. 2005 - 05 - 27. [30 ]N G A , JORDAN M. Pegasus: a policy search method for large MDPs and POMDPs approximation [ A ]. In Proceedings of the 16th Conference on Uncertainty in Artificial Intelligence [C]. San Francisco , 2000. [ 31 ] Diplomarbeit. The Reinforcement Learning Toolbox , Reinforcement Learning for Optimal Control Tasks[D] , University of Technology , Graz , 2005. [32 ]N G A Y, KIM H J , MICHA EL I , SASTR Y S. Au2 tonomous helicopter flight via Reinforcement Learning [ A ]. In Neural Information Processing Systems 16[C]. [s. l. ] , 2004. [33 ]STRENS M J , MOORE A. Policy search using paired comparisons [ J ]. Journal of Machine Learning Re2 search , 2002 (3) :921 - 950. [34 ]MARTIN C. The essential dynamics algorithm: fast policy Search in continuous worlds[ R]. MIT Media Laboratory , Vsion and Modelling Technical Report. 2004. [35 ]WAN G X N , XU X , WU T , HE H G, ZHAN G M. A hybrid reinforcement learning combined with SVM and its applications [ A ]. Proceedings of the International Conference on Sensing , Computing and Automation [C]. Chongqing , China , 2006. [36 ] GHAVAMZADEH M , MA HADEVAN S. Hierarchi2 cal policy gradient algorithms [ A ]. In Proceedings of the Twentieth International Conference on Machine Learning[C]. Washington , D. C. , 2003. [ 37 ]DIETTERICH T. The MAXQ method for hierarchical reinforcement learning[ A ]. In : Proceedings of the fif2 teenth international conference on machine learning[C]. [s. l. ] ,1998. [38 ]PARR R. Hierarchical control and learning for Markov decision processes[D ]. University of California , Berk2 ley ,1998. [39 ] SU TTON R , PRECU P D , SIN GH. Between MDPs and Semi2MDPs: a framework for temporal abstraction in reinforcement learning [J ]. Artificial Intellignce , 1999 (112) :181 - 211. 作者简介 : 王学宁 ,男 , 1976 年生 ,博士研究 生 ,主要研究方向为增强学习、智能控 制等 ,参加国家自然科学基金重点项目 一项、青年基金项目一项 ,863 项目一 项 ,已发表论文 10 余篇 ,其中被 SCI 收 录 3 篇 ,EI 收录 5 篇. E2mail :wxn9576 @yahoo. com. cn 陈 伟 ,男 , 1976 年生 ,博士研究 生 ,主要研究方向为机器人定位与见 图、机器学习等 ,参加国家自然科学基 金重点项目一项. 张 锰 ,男 ,1972 年生 ,2001 年毕 业于国防科技大学计算机学院 ,获硕士 学位. 主要研究方向为指挥自动化. 曾 获全军科技进步二等奖 2 项 ,全军科技 进步三等奖 3 项 ,并在国内外科技期刊 上发表论文 12 篇 ,其中 SCI 检索 1 篇 , EI 检索 3 篇. ·24 · 智 能 系 统 学 报 第 2 卷
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有