正在加载图片...
第14卷第1期 智能系统学报 Vol.14 No.I 2019年1月 CAAI Transactions on Intelligent Systems Jan.2019 D0:10.11992/tis.201807010 网络出版地址:http:/kns.cnki.net/kcms/detail/23.1538.TP.20181230.0904.002.html 事件驱动的强化学习多智能体编队控制 徐鹏',谢广明23,文家燕2,高远 (1.广西科技大学电气与信息工程学院,广西柳州545006;2.北京大学工学院,北京100871,3.北京大学海洋 研究院,北京100871) 摘要:针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制, 智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不 仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现 编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有 效降低多智能体的动作决策频率和资源消耗。 关键词:强化学习:多智能体;事件驱动:编队控制:马尔可夫过程:集群智能;动作决策:粒子群算法 中图分类号:TP391.8文献标志码:A文章编号:1673-4785(2019)01-0093-06 中文引用格式:徐鹏,谢广明,文家燕,等.事件驱动的强化学习多智能体编队控制J智能系统学报,2019,14(1):93-98. 英文引用格式:XU Peng,.XIE Guangming,WEN Jiayan,etal.Event-triggered reinforcement learning formation control for multi agent[J].CAAI transactions on intelligent systems,2019,14(1):93-98. Event-triggered reinforcement learning formation control for multi-agent XU Peng',XIE Guangming'2,WEN Jiayan2,GAO Yuan' (1.School of Electric and Information Engineering,Guangxi University of Science and Technology,Liuzhou 545006,China;2.Col- lege of Engineering,Peking University,Beijing 100871,China:3.Institute of Ocean Research,Peking University,Beijing 100871, China) Abstract:A large consumption of communication and computing capabilities has been reported in classical reinforce- ment learning of multi-agent formation.This paper introduces an event-triggered mechanism so that the multi-agent's decisions do not need to be carried out periodically;instead,the multi-agent's actions are replaced depending on the event-triggered condition.Both the sum of total reward and variance in current rewards are considered when designing an event-triggered condition,so a joint optimization strategy is obtained by exchanging information among multiple agents.Numerical simulation results demonstrate that the multi-agent formation control algorithm can effectively re- duce the frequency of a multi-agent's action decisions and consumption of resources while ensuring system perform- ance. Keywords:reinforcement learning:multi-agent:event-triggered:formation control:Markov decision processes;swarm intelligence,action-decisions;particle swarm optimization 强化学习是受动物能有效适应环境的启发发 励累积最大化,来寻求最优的策略。目前强化 展而来的一种算法。基本思想是以试错的机制与 学习的行业应用颇广泛,比如无人驾驶、人形机 环境进行交互,在没有导师信号的情况下,使奖 器人、智能交通和多智能体协同等。其中多智能 收稿日期:2018-07-11.网络出版日期:2019-0103. 体编队的强化学习研究是一个重要的方向1。 基金项目:国家重点研发计划项目(2017YFB1400800):国家自然 科学基金项目(91648120.61633002.51575005.61563006. 文献[4]设计多动作回放的马尔可夫模型,在此框 61563005):广西高校工业过程智能控制技术重点实 验室项目(PICT-2016-04). 架下,多智能体Q学习可收敛到最优的联合行动 通信作者:文家燕.E-mail:wenjiayan.20I2@126.com. 策略。文献[⑤]提出一种评估Q值法,多智能体通DOI: 10.11992/tis.201807010 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.TP.20181230.0904.002.html 事件驱动的强化学习多智能体编队控制 徐鹏1 ,谢广明1,2,3,文家燕1,2,高远1 (1. 广西科技大学 电气与信息工程学院,广西 柳州 545006; 2. 北京大学 工学院,北京 100871; 3. 北京大学 海洋 研究院,北京 100871) 摘 要:针对经典强化学习的多智能体编队存在通信和计算资源消耗大的问题,本文引入事件驱动控制机制, 智能体的动作决策无须按固定周期进行,而依赖于事件驱动条件更新智能体动作。在设计事件驱动条件时,不 仅考虑智能体的累积奖赏值,还引入智能体与邻居奖赏值的偏差,智能体间通过交互来寻求最优联合策略实现 编队。数值仿真结果表明,基于事件驱动的强化学习多智能体编队控制算法,在保证系统性能的情况下,能有 效降低多智能体的动作决策频率和资源消耗。 关键词:强化学习;多智能体;事件驱动;编队控制;马尔可夫过程;集群智能;动作决策;粒子群算法 中图分类号:TP391.8 文献标志码:A 文章编号:1673−4785(2019)01−0093−06 中文引用格式:徐鹏, 谢广明, 文家燕, 等. 事件驱动的强化学习多智能体编队控制[J]. 智能系统学报, 2019, 14(1): 93–98. 英文引用格式:XU Peng, XIE Guangming, WEN Jiayan, et al. Event-triggered reinforcement learning formation control for multi￾agent[J]. CAAI transactions on intelligent systems, 2019, 14(1): 93–98. Event-triggered reinforcement learning formation control for multi-agent XU Peng1 ,XIE Guangming1,2,3 ,WEN Jiayan1,2 ,GAO Yuan1 (1. School of Electric and Information Engineering, Guangxi University of Science and Technology, Liuzhou 545006, China; 2. Col￾lege of Engineering, Peking University, Beijing 100871, China; 3. Institute of Ocean Research, Peking University, Beijing 100871, China) Abstract: A large consumption of communication and computing capabilities has been reported in classical reinforce￾ment learning of multi-agent formation. This paper introduces an event-triggered mechanism so that the multi-agent’s decisions do not need to be carried out periodically; instead, the multi-agent’s actions are replaced depending on the event-triggered condition. Both the sum of total reward and variance in current rewards are considered when designing an event-triggered condition, so a joint optimization strategy is obtained by exchanging information among multiple agents. Numerical simulation results demonstrate that the multi-agent formation control algorithm can effectively re￾duce the frequency of a multi-agent’s action decisions and consumption of resources while ensuring system perform￾ance. Keywords: reinforcement learning; multi-agent; event-triggered; formation control; Markov decision processes; swarm intelligence; action-decisions; particle swarm optimization 强化学习是受动物能有效适应环境的启发发 展而来的一种算法。基本思想是以试错的机制与 环境进行交互,在没有导师信号的情况下,使奖 Q Q 励累积最大化,来寻求最优的策略[1-3]。目前强化 学习的行业应用颇广泛,比如无人驾驶、人形机 器人、智能交通和多智能体协同等。其中多智能 体编队的强化学习研究是一个重要的方向[4-5]。 文献[4]设计多动作回放的马尔可夫模型,在此框 架下,多智能体 学习可收敛到最优的联合行动 策略。文献[5]提出一种评估 值法,多智能体通 收稿日期:2018−07−11. 网络出版日期:2019−01−03. 基金项目:国家重点研发计划项目 (2017YFB1400800);国家自然 科学基金项目 (91648120,61633002,51575005,61563006, 61563005);广西高校工业过程智能控制技术重点实 验室项目 (IPICT-2016-04). 通信作者:文家燕. E-mail:wenjiayan2012@126.com. 第 14 卷第 1 期 智 能 系 统 学 报 Vol.14 No.1 2019 年 1 月 CAAI Transactions on Intelligent Systems Jan. 2019
向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有