·212· 智能系统学报 第4卷 [6]KOLTER JZ,ABBEEL P,NG A Y.Hicrarchical appren- proach [C]//Proceedings of the 22nd International Confer- ticeship learning with application to quadruped locomotion ence on Machine Learning.New York,USA:ACM, [C]//Advances in Neural Information Processing Systems. 2005.896-903. Cambridge,USA:MIT Press,2008. [18]TASKAR B,LACOSTE-JULIEN S,JORDAN M.Strue- [7]RATLIFF N,BAGNELL J A,ZINKEVICH M A.Subgradi- tured prediction via the extragradient method [C]//Pro- ent methods for maxrimum margin structured learning[C]/ ceedings of Neural Information Processing Systems.Van- Workshop on Learing in Structured Outputs Spaces at IC- couver,Canada,2005:1345-1352. ML.Pittsburgh,USA,2006. [19]SHOR N Z,KIWIEL K C.RUSZCAYNSKI A.Minimiza- [8]SYED U,BOWLING M,SCHAPIRE R E.Apprenticeship tion methods for non-differentiable functions M].New learning using linear programming[C]//Proccedings of the York,USA:Springer-Verlag,1985. 25 International Conference on Machine Learning (ICML [20]TSOCHANTARIDIS I,JOACHIMS T,HOFMANN T,et 2008).Helsinki,inland,2008:1032-1039. al.Large margin methods for structured and interdepend- [9]SYED U,SCHAPIRE R E.A game-theoretic approach to ent output variables[J].The Journal of Machine Learning apprenticeship leaming[C]//Advances in Neural Informa- Re8 earch,2005,6:1453-1484. tion Processing Systems.Cambridge,USA:MIT Press, [21 ]CHECHIK G,HEI'TZ G,ELIDAN G,et al.Max-margin 2008. classification of incomplete data [C]//Advances in Neural [10]GRIMES D B,RAJESH D R,RAO R P N.Leamning non- Information Processing Systems:Proceedings of the 2006 parametric models for probabilistic imitation [C]//Pro- Conference.Cambridge,USA:MIT Press,2007:233- ceedings of Neural Information Processing Systems.Cam- 240. bridge,USA:MIT Press,2007:521-528. [22]NEU G,SZEPESVARI C.Apprenticeship leaming using [11]ABBEEL P,COATES A,QUIGLEY M,et al.An appli- inverse reinforcement learing and gradient methods[C]/ cation of reinforcement leaming to acrobatic helicopter Proceedings of Uncertainty in Artificial Intelligence.Van- flight[C]//Proceedings of Neural Information Processing couver,Canada,2007:295-302 Systems.Cambridge,USA:MIT Press,2007:1-8. 作者简介: [12]KOLTER JZ,RODGERS M P,NG A Y.A complete con- 金卓军,男,1984年生,博士研究 trol architecture for quadruped locomotion over rough ter- 生,主要研究方向为机器学习. rain[C]//IEEE International Conference on Robotics and Automation.Pasadena,USA,2008:811-818. [13]REBULA J R,NEUHAUS P D,BONNLANDER B V,ct al.A controller for the littledog quadruped walking on rough terrain[C]//2007 IEEE International Conference on Robotics and Automation.Roma,Italy,2007:1467- 钱徽,男,1974年生,副教授,人 1473. 工智能学会智能机器人专业委员会委 [14]KAELBLING L P,LITTMAN M L,MOORE A W.Rein- 员,主要研究方向为人工智能、计算机 forcement learning:a survey [J].Journal of Artificial In- 视觉. telligence Research,1996,4:237-285. [15]SUTTON R S,BARTO A G.Reinforcement learning:an introduction[M].Cambridge,USA:MIT Press,1998. [16]COATES A,ABBEEL P,NG A Y.Reinforcement learn- 陈沈轶,男,1980生,博士研究生, ing with multiple demonstrations [C]//The Twenty-first 主要研究方向为机器学习 Annual Conference on Neural Information Processing Sys- tems (NIPS 2007).Vancouver,Canada,2007. [17]TASKAR B,CHATALBASHEV V,KOLLER D,et al. Learing structured prediction models:a large margin ap-