正在加载图片...
第1期 郭宪,等:仿生机器人运动步态控制:强化学习方法综述 ·159· [22]BOHEZ S,ABDOLMALEKI A,NEUNERT M,et al. and low-energy locomotion[J].ACM transactions on Value constrained model-free continuous control[EB/OL]. graphics,2018,374):144-150. (2019-02-12).https://arxiv.org/abs/1902.04623 [36]PENG Xuebin.BERSETH G.VAN DE PANNE M.Ter- [23]ALTMAN E.Constrained Markov decision processes[M] rain-adaptive locomotion skills using deep reinforcement London:Chapman and Hall,1999. learning[J].ACM transactions on graphics,2016,35(4): [24]DELCOMYN F.Neural basis of rhythmic behavior in an- 81-88. imals[J.Science,1980,210(4469:492-498 [37]BING Zhenshan,LEMKE C,JIANG Zhuangyi,et al.En- [25]MATSUOKA K.Sustained oscillations generated by mu- ergy-efficient slithering gait exploration for a snake-like tually inhibiting neurons with adaptation[].Biological robot based on reinforcement learning[EB/OL].(2019-04- cybernetics.1985,52(6):367-376. 16).https://arxiv.org/abs/1904.07788v1. [26]COHEN A H.HOLMES P J.RAND R H.The nature of [38]PENG Xuebin,VAN DE PANNE M.Learning loco- the coupling between segmental oscillators of the lamprey motion skills using DeepRL:does the choice of action spinal generator for locomotion:a mathematical model[J]. space matter?[C]//Proceeding of ACM SIGGRAPH/Euro- Journal of mathematical biology,1982,13(3):345-369. [27]BAY J S,HEMAMI H.Modeling of a neural pattern gen- graphics Symposium on Computer Animation.Los Angeles,USA,2017:12-20. erator with coupled nonlinear oscillators[J].IEEE transac- [39]VAN HASSELT H.Double q-learning[Cl//Proceedings tions on biomedical engineering,1987,BME-34(4): of the 23rd International Conference on Neural Informa- 297-306. [28]ENDO G,MORIMOTO J,MATSUBARA T,et al. tion Processing Systems.Red Hook,USA,2010: Learning CPG-based biped locomotion with a policy 2613-2621. gradient method:application to a humanoid robot[J].The [40]HA D,SCHMIDHUBER J.World Models[EB/OL]. international journal of robotics research,2008,27(2): (2018-05-09).https::/arxiv..org/abs/1803.10122 213-228. [41]EBERT F,FINN C,DASARI S,et al.Visual foresight: [29]MATSUBARA T,MORIMOTO J,NAKANISHI J,et al. model-based deep reinforcement learning for vision-based Learning CPG-based biped locomotion with a policy robotic control[EB/OL].(2018-12-03).https://arxiv.org/abs/ gradient method[C]//Proceedings of the 5th IEEE-RAS 1812.00568. International Conference on Humanoid Robots.Tsukuba, [42]FINN C,RAJESWARAN A.KAKADE S.et al.Online Japan,2005. meta-learning[EB/OL].(2019-07-03).https://arxiv.org/abs/ [30]DOYA K.Reinforcement learning in continuous time and 1902.08438 space[J].Neural computation,2000,12(1):219-245. [43]MAHJOURIAN R,MII KKULAINEN R,LAZIC N,et [31]SARTORETTI G,PAIVINE W,SHI Yunfei,et al.Dis- al.Hierarchical policy design for sample-efficient learn- tributed learning of decentralized control policies for ar- ing of robot table tennis through self-play[EB/OL].(2019- ticulated mobile robots[J].IEEE transactions on robotics, 02-17).https://arxiv.org/abs/1811.12927?context=cs. 2019,35(5)1109-1122. 作者简介: [32]方勇纯,朱威,郭宪.基于路径积分强化学习方法的蛇 郭宪,讲师,博土,主要研究方向 形机器人目标导向运动J].模式识别与人工智能, 为仿生机器人设计与智能运动控制。 2019,32(11-9 主持国家自然科学基金项目1项,省 FANG Yongchun,ZHU Wei,GUO Xian.Target-directed 部级项目2项。 locomotion of a snake-like robot based on path integral reinforcement learning[J].Pattern recognition and artifi- cial intelligence,2019,32(1):1-9. [33]IJSPEERT A J,SCHAAL S.Learning attractor land- 方勇纯,教授,博士生导师.南开 scapes for learning motor primitives[M]//THRUN S, 大学人工智能学院院长,主要研究方 SAUL L K,SCHOLKOPF B.Advances in Neural In- 向为机器人视觉控制、欠驱动吊运系 formation Processing Systems.Cambridge,MA:MIT 统控制、仿生机器人运动控制和微纳 Press,.2002:1547-1554. 米操作。主持国家重点研发计划项 [34]SCHAAL S,PETERS J,NAKANISHI J,et al.Learning 目、国家基金重点项目、“十二五”国家 movement primitives[M]//DARIORAJA P,CHATILA R. 技术支撑计划课题、国家基金仪器专 Robotics Research.The Eleventh International Symposi- 项等项目。获吴文俊人工智能自然科学奖一等奖、天津市专 um.Berlin,Germany:Springer,2005. 利奖金奖、天津市自然科学一等奖、高等教育教学成果一等 [35]YU Wenhao,TURK G,LIU C K.Learning symmetric 奖等多项奖励,发表学术论文100余篇。BOHEZ S, ABDOLMALEKI A, NEUNERT M, et al. Value constrained model-free continuous control[EB/OL]. (2019-02-12). https://arxiv.org/abs/1902.04623. [22] ALTMAN E. Constrained Markov decision processes[M]. London: Chapman and Hall, 1999. [23] DELCOMYN F. Neural basis of rhythmic behavior in an￾imals[J]. Science, 1980, 210(4469): 492–498. [24] MATSUOKA K. Sustained oscillations generated by mu￾tually inhibiting neurons with adaptation[J]. Biological cybernetics, 1985, 52(6): 367–376. [25] COHEN A H, HOLMES P J, RAND R H. The nature of the coupling between segmental oscillators of the lamprey spinal generator for locomotion: a mathematical model[J]. Journal of mathematical biology, 1982, 13(3): 345–369. [26] BAY J S, HEMAMI H. Modeling of a neural pattern gen￾erator with coupled nonlinear oscillators[J]. IEEE transac￾tions on biomedical engineering, 1987, BME−34(4): 297–306. [27] ENDO G, MORIMOTO J, MATSUBARA T, et al. Learning CPG-based biped locomotion with a policy gradient method: application to a humanoid robot[J]. The international journal of robotics research, 2008, 27(2): 213–228. [28] MATSUBARA T, MORIMOTO J, NAKANISHI J, et al. Learning CPG-based biped locomotion with a policy gradient method[C]//Proceedings of the 5th IEEE-RAS International Conference on Humanoid Robots. Tsukuba, Japan, 2005. [29] DOYA K. Reinforcement learning in continuous time and space[J]. Neural computation, 2000, 12(1): 219–245. [30] SARTORETTI G, PAIVINE W, SHI Yunfei, et al. Dis￾tributed learning of decentralized control policies for ar￾ticulated mobile robots[J]. IEEE transactions on robotics, 2019, 35(5): 1109–1122. [31] 方勇纯, 朱威, 郭宪. 基于路径积分强化学习方法的蛇 形机器人目标导向运动 [J]. 模式识别与人工智能, 2019, 32(1): 1–9. FANG Yongchun, ZHU Wei, GUO Xian. Target-directed locomotion of a snake-like robot based on path integral reinforcement learning[J]. Pattern recognition and artifi￾cial intelligence, 2019, 32(1): 1–9. [32] IJSPEERT A J, SCHAAL S. Learning attractor land￾scapes for learning motor primitives[M]//THRUN S, SAUL L K, SCHOLKOPF B. Advances in Neural In￾formation Processing Systems. Cambridge, MA: MIT Press, 2002: 1547−1554. [33] SCHAAL S, PETERS J, NAKANISHI J, et al. Learning movement primitives[M]//DARIORAJA P, CHATILA R. Robotics Research. The Eleventh International Symposi￾um. Berlin, Germany: Springer, 2005. [34] [35] YU Wenhao, TURK G, LIU C K. Learning symmetric and low-energy locomotion[J]. ACM transactions on graphics, 2018, 37(4): 144–150. PENG Xuebin, BERSETH G, VAN DE PANNE M. Ter￾rain-adaptive locomotion skills using deep reinforcement learning[J]. ACM transactions on graphics, 2016, 35(4): 81–88. [36] BING Zhenshan, LEMKE C, JIANG Zhuangyi, et al. En￾ergy-efficient slithering gait exploration for a snake-like robot based on reinforcement learning[EB/OL]. (2019-04- 16). https://arxiv.org/abs/1904.07788v1. [37] PENG Xuebin, VAN DE PANNE M. Learning loco￾motion skills using DeepRL: does the choice of action space matter?[C]//Proceeding of ACM SIGGRAPH/Euro￾graphics Symposium on Computer Animation. Los Angeles, USA, 2017: 12−20. [38] VAN HASSELT H. Double q-learning[C]//Proceedings of the 23rd International Conference on Neural Informa￾tion Processing Systems. Red Hook, USA, 2010: 2613−2621. [39] HA D, SCHMIDHUBER J. World Models[EB/OL]. (2018-05-09). https://arxiv.org/abs/1803.10122. [40] EBERT F, FINN C, DASARI S, et al. Visual foresight: model-based deep reinforcement learning for vision-based robotic control[EB/OL]. (2018-12-03). https://arxiv.org/abs/ 1812.00568. [41] FINN C, RAJESWARAN A, KAKADE S, et al. Online meta-learning[EB/OL]. (2019-07-03). https://arxiv.org/abs/ 1902.08438. [42] MAHJOURIAN R, MⅡKKULAINEN R, LAZIC N, et al. Hierarchical policy design for sample-efficient learn￾ing of robot table tennis through self-play[EB/OL]. (2019- 02-17). https://arxiv.org/abs/1811.12927?context=cs. [43] 作者简介: 郭宪,讲师,博士,主要研究方向 为仿生机器人设计与智能运动控制。 主持国家自然科学基金项目 1 项,省 部级项目 2 项。 方勇纯,教授,博士生导师,南开 大学人工智能学院院长,主要研究方 向为机器人视觉控制、欠驱动吊运系 统控制、仿生机器人运动控制和微纳 米操作。主持国家重点研发计划项 目、国家基金重点项目、“十二五”国家 技术支撑计划课题、国家基金仪器专 项等项目。获吴文俊人工智能自然科学奖一等奖、天津市专 利奖金奖、天津市自然科学一等奖、高等教育教学成果一等 奖等多项奖励,发表学术论文 100 余篇。 第 1 期 郭宪,等:仿生机器人运动步态控制:强化学习方法综述 ·159·
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有