正在加载图片...
第4期 殷昌盛,等:多智能体分层强化学习综述 ·655· 究展望).军事运筹与系统工程,2016,30(3):22-27, [64]陈希亮,张永亮.基于深度强化学习的陆军分队战术决 ZHU Feng,HU Xiaofeng.Overview and research pro- 策问题研究[.军事运筹与系统工程,2017,31(3): spect of battlefield situation assessment based on deep 20-27. learning[J].Military operations research and systems en- CHEN Xiliang,ZHANG Yongliang.Research on tactical gineering,2016,30(3):22-27. decision of army units based on deep reinforcement learn- [53]TIAN Yuandong,GONG Quchengg,SHANG Wenling,et ing[J].Military operations research and systems engineer- al.ELF:an extensive,lightweight and flexible research ing,2017,31(3):20-27. platform for real-time strategy games [C]//31st Confer- [65]乔永杰,王欣九,孙亮.陆军指挥所模型自主生成作战 ence and Workshop on Neural Information Processing 计划时间参数的方法[.中国电子科学研究院学报, Systems,California.USA.2017:2656-2666. 2017,12(3:278-284. [54]MEHTA M.ONTANOS S.AMUNDESEN T.et al.Au- QIAO Yongjie,WANG Xinjiu,SUN Liang.A Method thoring behaviors for games using learning from demon- for Army command post to auto-Generate combat time stration[C]//Proc of the 8th Intenational Conference on scheduling[J].Journal of china academy of electronics Case-based Reasoning,Berlin.Heidelberg,2009:12-20. and information technology,2017,12(3):278-284. [55]JUSTESEN N.RISI S.Learning macromanagement in [66]DING Shifei,DU Wei,ZHAO Xingyu,et al.A new asyn- StarCraft from replays using deep learning[C]//IEEE's chronous reinforcement learning algorithm based on im- 2017 Conference on Computational Intelligence in proved parallel PSO[J].Applied intelligence,2019, Games,New York,USA.2017. 49(12):4211-4222 [56]WU Huikai,ZHANG Junge,HUANG Kaiqi.MSC:A [67]ZHENG Yanbin,LI Bo,AN Deyu,et al.Multi-agent path dataset for macro-management in StarCraft II [DB/OL]. planning algorithm based on hierarchical reinforcement [2018-05-31].http:/cn.axiv.org/pdf1710.03131v1 learning and artificial potential field[J].Journal of com- [57]BATO A G,MAHADEVAN S.Recent advances in hier- puter applications,2015,35(12):3491-3496. [68]王冲,景宁,李军,等.一种基于多Agent强化学习的多 archical reinforcement learning[J].Discrete event dynam- ic systems,.2013,13(4):341-379. 星协同任务规划算法.国防科技大学学报,2011, [58]TIMOTHY P L,JONATHAN J H,PRITZEL A,et al. 33(1)53-58. WANG Chong,JING Ning,LI Jun,et al.An algorithm of Continous control with deep reinforcement learning [EB/OL]. [2015-11-18].https://arxiv.org/abs/1509.02971. cooperative multiple satellites mission planning based on [59]DIBIA V,DEMIRALP C.Data2Vis automatic genera- multi-agent reinforcement learning[J.Journal of national university of defense technology,2011,33(1):53-58. tion of data visualizations using sequence to sequence re- current neural networks [EB/OL].[2018-11-2]. 作者简介: https://arxiv.org/abs/1804.03126. 殷昌盛,讲师,博士,主要研究方 [60]SUSHIL JL,LIU Siming.multi-objective evolution for 向为机器学习与智能决策。发表学术 3D RTS micro [EB/OL].[2018-3-8].https://arxiv.org/ 论文20余篇,出版专著3部。 abs/1803.02943 [61]PENG Peng,WEN Ying,YANG Yaodong,et al.Multia- gent bidirectionally-coordinated nets:emergence of hu- man-level coordination in learning to play StarCraft com- bat games[EB/OL].[2018-05-31].http://cn.arxiv.org/ 杨若鹏,教授,博士生导师,主要 pdf1703.10069v4. 研究方向为智能化指挥。近年来获得 [62]SHAO Kun,ZHU Yuanheng,ZHAO Dongbin.StarCraft 军队科技进步一等奖1项、三等奖 micromanagement with reinforcement learning and cur- 2项,发表学术论文40余篇.出版专 riculum transfer learning[J].IEEE transactions on emer- 著10余部。 ging topics in computational intelligence,2018(99):1-12. [63]李耀宇,朱一凡,杨峰.基于逆向强化学习的舰载机甲 板调度优化方案生成方法[】.国防科技大学学报 朱巍,副教授,主要研究方向为机 2013,35(4:171-175. 器学习与智能决策。 LI Yaoyu,ZHU Yifan,YANG Fan.Inverse reinforce- ment learning based optimal schedule generation ap- proach for carrier aircraft on flight deck[J].Journal of na- tional university of defense technology,2013,35(4): 171-175.究展望 [J]. 军事运筹与系统工程, 2016, 30(3): 22–27. ZHU Feng, HU Xiaofeng. Overview and research pro￾spect of battlefield situation assessment based on deep learning[J]. Military operations research and systems en￾gineering, 2016, 30(3): 22–27. TIAN Yuandong, GONG Quchengg, SHANG Wenling, et al. ELF: an extensive, lightweight and flexible research platform for real-time strategy games [C]//31st Confer￾ence and Workshop on Neural Information Processing Systems, California, USA, 2017: 2656−2666. [53] MEHTA M, ONTANOS S, AMUNDESEN T, et al. Au￾thoring behaviors for games using learning from demon￾stration[C]//Proc of the 8th Intenational Conference on Case-based Reasoning, Berlin, Heidelberg, 2009: 12−20. [54] JUSTESEN N, RISI S. Learning macromanagement in StarCraft from replays using deep learning[C]// IEEE’s 2017 Conference on Computational Intelligence in Games, New York, USA. 2017. [55] WU Huikai, ZHANG Junge, HUANG Kaiqi. MSC: A dataset for macro-management in StarCraft II [DB/OL]. [2018-05-31]. http://cn.arxiv.org/pdf/1710.03131v1. [56] BATO A G, MAHADEVAN S. Recent advances in hier￾archical reinforcement learning[J]. Discrete event dynam￾ic systems, 2013, 13(4): 341–379. [57] TIMOTHY P L, JONATHAN J H, PRITZEL A, et al. Continous control with deep reinforcement learning[EB/OL]. [2015-11-18]. https://arxiv.org/abs/1509.02971. [58] DIBIA V, DEMIRALP C. Data2Vis automatic genera￾tion of data visualizations using sequence to sequence re￾current neural networks [EB/OL]. [2018-11-2]. https://arxiv.org/abs/1804.03126. [59] SUSHIL J L, LIU Siming. multi-objective evolution for 3D RTS micro [EB/OL]. [2018-3-8]. https://arxiv.org/ abs/1803.02943. [60] PENG Peng, WEN Ying, YANG Yaodong, et al. Multia￾gent bidirectionally-coordinated nets: emergence of hu￾man-level coordination in learning to play StarCraft com￾bat games[EB/OL]. [2018-05-31]. http://cn.arxiv.org/ pdf/1703.10069v4. [61] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin. StarCraft micromanagement with reinforcement learning and cur￾riculum transfer learning[J]. IEEE transactions on emer￾ging topics in computational intelligence, 2018(99): 1–12. [62] 李耀宇, 朱一凡, 杨峰. 基于逆向强化学习的舰载机甲 板调度优化方案生成方法 [J]. 国防科技大学学报, 2013, 35(4): 171–175. LI Yaoyu, ZHU Yifan, YANG Fan. Inverse reinforce￾ment learning based optimal schedule generation ap￾proach for carrier aircraft on flight deck[J]. Journal of na￾tional university of defense technology, 2013, 35(4): 171–175. [63] 陈希亮, 张永亮. 基于深度强化学习的陆军分队战术决 策问题研究 [J]. 军事运筹与系统工程, 2017, 31(3): 20–27. CHEN Xiliang, ZHANG Yongliang. Research on tactical decision of army units based on deep reinforcement learn￾ing[J]. Military operations research and systems engineer￾ing, 2017, 31(3): 20–27. [64] 乔永杰, 王欣九, 孙亮. 陆军指挥所模型自主生成作战 计划时间参数的方法 [J]. 中国电子科学研究院学报, 2017, 12(3): 278–284. QIAO Yongjie, WANG Xinjiu, SUN Liang. A Method for Army command post to auto-Generate combat time scheduling[J]. Journal of china academy of electronics and information technology, 2017, 12(3): 278–284. [65] DING Shifei, DU Wei, ZHAO Xingyu, et al. A new asyn￾chronous reinforcement learning algorithm based on im￾proved parallel PSO[J]. Applied intelligence, 2019, 49(12): 4211–4222. [66] ZHENG Yanbin, LI Bo, AN Deyu, et al. Multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J]. Journal of com￾puter applications, 2015, 35(12): 3491–3496. [67] 王冲, 景宁, 李军, 等. 一种基于多 Agent 强化学习的多 星协同任务规划算法 [J]. 国防科技大学学报, 2011, 33(1): 53–58. WANG Chong, JING Ning, LI Jun, et al. An algorithm of cooperative multiple satellites mission planning based on multi-agent reinforcement learning[J]. Journal of national university of defense technology, 2011, 33(1): 53–58. [68] 作者简介: 殷昌盛,讲师,博士,主要研究方 向为机器学习与智能决策。发表学术 论文 20 余篇,出版专著 3 部。 杨若鹏,教授,博士生导师,主要 研究方向为智能化指挥。近年来获得 军队科技进步一等奖 1 项、三等奖 2 项,发表学术论文 40 余篇,出版专 著 10 余部。 朱巍,副教授,主要研究方向为机 器学习与智能决策。 第 4 期 殷昌盛,等:多智能体分层强化学习综述 ·655·
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有