究展望 [J]. 军事运筹与系统工程, 2016, 30(3): 22–2

点击下载：多智能体分层强化学习综述（国防科技大学：殷昌盛、杨若鹏、朱巍、邹小飞、李峰）

正在加载图片...

第4期殷昌盛，等：多智能体分层强化学习综述 ·655· 究展望).军事运筹与系统工程，2016,30(3)：22-27， [64]陈希亮，张永亮.基于深度强化学习的陆军分队战术决 ZHU Feng,HU Xiaofeng.Overview and research pro- 策问题研究[.军事运筹与系统工程，2017,31(3)： spect of battlefield situation assessment based on deep 20-27. learning[J].Military operations research and systems en- CHEN Xiliang,ZHANG Yongliang.Research on tactical gineering,2016,30(3):22-27. decision of army units based on deep reinforcement learn- [53]TIAN Yuandong,GONG Quchengg,SHANG Wenling,et ing[J].Military operations research and systems engineer- al.ELF:an extensive,lightweight and flexible research ing,2017,31(3):20-27. platform for real-time strategy games [C]//31st Confer- [65]乔永杰，王欣九，孙亮.陆军指挥所模型自主生成作战 ence and Workshop on Neural Information Processing 计划时间参数的方法[.中国电子科学研究院学报， Systems,California.USA.2017:2656-2666. 2017,12(3:278-284. [54]MEHTA M.ONTANOS S.AMUNDESEN T.et al.Au- QIAO Yongjie,WANG Xinjiu,SUN Liang.A Method thoring behaviors for games using learning from demon- for Army command post to auto-Generate combat time stration[C]//Proc of the 8th Intenational Conference on scheduling[J].Journal of china academy of electronics Case-based Reasoning,Berlin.Heidelberg,2009:12-20. and information technology,2017,12(3):278-284. [55]JUSTESEN N.RISI S.Learning macromanagement in [66]DING Shifei,DU Wei,ZHAO Xingyu,et al.A new asyn- StarCraft from replays using deep learning[C]//IEEE's chronous reinforcement learning algorithm based on im- 2017 Conference on Computational Intelligence in proved parallel PSO[J].Applied intelligence,2019, Games,New York,USA.2017. 49(12):4211-4222 [56]WU Huikai,ZHANG Junge,HUANG Kaiqi.MSC:A [67]ZHENG Yanbin,LI Bo,AN Deyu,et al.Multi-agent path dataset for macro-management in StarCraft II [DB/OL]. planning algorithm based on hierarchical reinforcement [2018-05-31].http:/cn.axiv.org/pdf1710.03131v1 learning and artificial potential field[J].Journal of com- [57]BATO A G,MAHADEVAN S.Recent advances in hier- puter applications,2015,35(12):3491-3496. [68]王冲，景宁，李军，等.一种基于多Agent强化学习的多 archical reinforcement learning[J].Discrete event dynam- ic systems,.2013,13(4):341-379. 星协同任务规划算法.国防科技大学学报，2011， [58]TIMOTHY P L,JONATHAN J H,PRITZEL A,et al. 33(1)53-58. WANG Chong,JING Ning,LI Jun,et al.An algorithm of Continous control with deep reinforcement learning [EB/OL]. [2015-11-18].https://arxiv.org/abs/1509.02971. cooperative multiple satellites mission planning based on [59]DIBIA V,DEMIRALP C.Data2Vis automatic genera- multi-agent reinforcement learning[J.Journal of national university of defense technology,2011,33(1):53-58. tion of data visualizations using sequence to sequence re- current neural networks [EB/OL].[2018-11-2]. 作者简介： https://arxiv.org/abs/1804.03126. 殷昌盛，讲师，博士，主要研究方 [60]SUSHIL JL,LIU Siming.multi-objective evolution for 向为机器学习与智能决策。发表学术 3D RTS micro [EB/OL].[2018-3-8].https://arxiv.org/ 论文20余篇，出版专著3部。 abs/1803.02943 [61]PENG Peng,WEN Ying,YANG Yaodong,et al.Multia- gent bidirectionally-coordinated nets:emergence of hu- man-level coordination in learning to play StarCraft com- bat games[EB/OL].[2018-05-31].http://cn.arxiv.org/ 杨若鹏，教授，博士生导师，主要 pdf1703.10069v4. 研究方向为智能化指挥。近年来获得 [62]SHAO Kun,ZHU Yuanheng,ZHAO Dongbin.StarCraft 军队科技进步一等奖1项、三等奖 micromanagement with reinforcement learning and cur- 2项，发表学术论文40余篇.出版专 riculum transfer learning[J].IEEE transactions on emer- 著10余部。 ging topics in computational intelligence,2018(99):1-12. [63]李耀宇，朱一凡，杨峰.基于逆向强化学习的舰载机甲板调度优化方案生成方法[】.国防科技大学学报朱巍，副教授，主要研究方向为机 2013,35(4:171-175. 器学习与智能决策。 LI Yaoyu,ZHU Yifan,YANG Fan.Inverse reinforce- ment learning based optimal schedule generation ap- proach for carrier aircraft on flight deck[J].Journal of na- tional university of defense technology,2013,35(4): 171-175.究展望 [J]. 军事运筹与系统工程, 2016, 30(3): 22–27. ZHU Feng, HU Xiaofeng. Overview and research prospect of battlefield situation assessment based on deep learning[J]. Military operations research and systems engineering, 2016, 30(3): 22–27. TIAN Yuandong, GONG Quchengg, SHANG Wenling, et al. ELF: an extensive, lightweight and flexible research platform for real-time strategy games [C]//31st Conference and Workshop on Neural Information Processing Systems, California, USA, 2017: 2656−2666. [53] MEHTA M, ONTANOS S, AMUNDESEN T, et al. Authoring behaviors for games using learning from demonstration[C]//Proc of the 8th Intenational Conference on Case-based Reasoning, Berlin, Heidelberg, 2009: 12−20. [54] JUSTESEN N, RISI S. Learning macromanagement in StarCraft from replays using deep learning[C]// IEEE’s 2017 Conference on Computational Intelligence in Games, New York, USA. 2017. [55] WU Huikai, ZHANG Junge, HUANG Kaiqi. MSC: A dataset for macro-management in StarCraft II [DB/OL]. [2018-05-31]. http://cn.arxiv.org/pdf/1710.03131v1. [56] BATO A G, MAHADEVAN S. Recent advances in hierarchical reinforcement learning[J]. Discrete event dynamic systems, 2013, 13(4): 341–379. [57] TIMOTHY P L, JONATHAN J H, PRITZEL A, et al. Continous control with deep reinforcement learning[EB/OL]. [2015-11-18]. https://arxiv.org/abs/1509.02971. [58] DIBIA V, DEMIRALP C. Data2Vis automatic generation of data visualizations using sequence to sequence recurrent neural networks [EB/OL]. [2018-11-2]. https://arxiv.org/abs/1804.03126. [59] SUSHIL J L, LIU Siming. multi-objective evolution for 3D RTS micro [EB/OL]. [2018-3-8]. https://arxiv.org/ abs/1803.02943. [60] PENG Peng, WEN Ying, YANG Yaodong, et al. Multiagent bidirectionally-coordinated nets: emergence of human-level coordination in learning to play StarCraft combat games[EB/OL]. [2018-05-31]. http://cn.arxiv.org/ pdf/1703.10069v4. [61] SHAO Kun, ZHU Yuanheng, ZHAO Dongbin. StarCraft micromanagement with reinforcement learning and curriculum transfer learning[J]. IEEE transactions on emerging topics in computational intelligence, 2018(99): 1–12. [62] 李耀宇, 朱一凡, 杨峰. 基于逆向强化学习的舰载机甲板调度优化方案生成方法 [J]. 国防科技大学学报, 2013, 35(4): 171–175. LI Yaoyu, ZHU Yifan, YANG Fan. Inverse reinforcement learning based optimal schedule generation approach for carrier aircraft on flight deck[J]. Journal of national university of defense technology, 2013, 35(4): 171–175. [63] 陈希亮, 张永亮. 基于深度强化学习的陆军分队战术决策问题研究 [J]. 军事运筹与系统工程, 2017, 31(3): 20–27. CHEN Xiliang, ZHANG Yongliang. Research on tactical decision of army units based on deep reinforcement learning[J]. Military operations research and systems engineering, 2017, 31(3): 20–27. [64] 乔永杰, 王欣九, 孙亮. 陆军指挥所模型自主生成作战计划时间参数的方法 [J]. 中国电子科学研究院学报, 2017, 12(3): 278–284. QIAO Yongjie, WANG Xinjiu, SUN Liang. A Method for Army command post to auto-Generate combat time scheduling[J]. Journal of china academy of electronics and information technology, 2017, 12(3): 278–284. [65] DING Shifei, DU Wei, ZHAO Xingyu, et al. A new asynchronous reinforcement learning algorithm based on improved parallel PSO[J]. Applied intelligence, 2019, 49(12): 4211–4222. [66] ZHENG Yanbin, LI Bo, AN Deyu, et al. Multi-agent path planning algorithm based on hierarchical reinforcement learning and artificial potential field[J]. Journal of computer applications, 2015, 35(12): 3491–3496. [67] 王冲, 景宁, 李军, 等. 一种基于多 Agent 强化学习的多星协同任务规划算法 [J]. 国防科技大学学报, 2011, 33(1): 53–58. WANG Chong, JING Ning, LI Jun, et al. An algorithm of cooperative multiple satellites mission planning based on multi-agent reinforcement learning[J]. Journal of national university of defense technology, 2011, 33(1): 53–58. [68] 作者简介：殷昌盛，讲师，博士，主要研究方向为机器学习与智能决策。发表学术论文 20 余篇，出版专著 3 部。杨若鹏，教授，博士生导师，主要研究方向为智能化指挥。近年来获得军队科技进步一等奖 1 项、三等奖 2 项，发表学术论文 40 余篇，出版专著 10 余部。朱巍，副教授，主要研究方向为机器学习与智能决策。第 4 期殷昌盛，等：多智能体分层强化学习综述 ·655·

<<向上翻页

点击下载：多智能体分层强化学习综述（国防科技大学：殷昌盛、杨若鹏、朱巍、邹小飞、李峰）