正在加载图片...
第5期 杨瑞,等:强化学习稀疏奖励算法研究一理论与实验 ·897· [10]HOSU I A,REBEDEA T.Playing atari games with deep Nature,2015,518(7540y:529-533 reinforcement learning and human checkpoint [22]WILLIAMS R J.Simple statistical gradient-following al- replay [EB/OL].Bucharest,Romania:arXiv,2016.[2019- gorithms for connectionist reinforcement learning[J].Ma- 10-21]https://arxiv.org/pdf/1607.05077.pdf. chine learning,.1992,8(3/4):229-256. [11]ANDRYCHOWICZ M,WOLSKI F,RAY A,et al.Hind- [23]KONDA V R,TSITSIKLIS J N.Actor-critic sight experience replay[Cl/Proceedings of the 31st Inter- algorithms[C]//Advances in Neural Information Pro- national Conference on Neural Information Processing cessing Systems.Colorado,USA,2000:1008-1014 Systems.Long Beach,USA,2017:5048-5058. [24]MNIH V,BADIA A P,MIRZA M,et al.Asynchronous [12]杨惟轶,白辰甲,蔡超,等.深度强化学习中稀疏奖励问 methods for deep reinforcement learning[C]//Proceedings 题研究综述[).计算机科学,2020,47(3):182-191 of the 33rd International Conference on International YANG Weiyi,BAI Chenjia,CAl Chao,et al.Survey on Conference on Machine Learning.New York,USA. sparse reward in deep reinforcement learning[J].Com- 2016:1928-1937 puter science,2020,47(3):182-191. [25]SCHULMAN J.WOLSKI F.DHARIWAL P.et al.Prox- [13]GULLAPALLI V,BARTO A G.Shaping as a method for imal policy optimization algorithms[EB/OL].California. accelerating reinforcement learning[C]//Proceedings of USA:arXiv,2017.[2019-11-3]https://arxiv.org/ the 1992 IEEE International Symposium on Intelligent pdf1707.06347.pdf. Control.Glasgow,UK,1992:554-559. [26]LILLICRAP T P.HUNT JJ,PRITZEL A,et al.Continu- [14]HUSSEIN A.GABER MM,ELYAN E,et al.Imitation ous control with deep reinforcement learning[EB/OL]. learning:A survey of learning methods[J].ACM comput- London,UK:arXiv,2015.[2019-12-25]https://arxiv ing surveys,.2017,50(2):1-35. org/pdf/1509.02971.pdf. [15]BENGIO Y,LOURADOUR J,COLLOBERT R,et al. [27]NG A Y.HARADA D,RUSSELL S.Policy invariance Curriculum learning[C]//Proceedings of the 26th Annual under reward transformations:Theory and application to International Conference on Machine Learning.Montreal, reward shaping[C//Proceedings of the Sixteenth Interna- Quebec,Canada,2009:41-48. tional Conference on Machine Learning.Bled,Slovenia [16]BURDA Y,EDWARDS H,PATHAK D,et al.Large- 1999.99:278-287 scale study of curiosity-driven learning[EB/OL].Califor- [28]RANDLOV J,ALSTROM P.Learning to drive a bicycle nia,USA:arXiv,2018.[2019-10-30]https://arxiv.org/pdf/ using reinforcement learning and shaping[C]//Proceed- 1808.04355. ings of the Fifteenth International Conference on Ma- [17刀周文吉,俞扬.分层强化学习综述).智能系统学报, 2017,12(5):590-594. chine Learning.Madison,USA,1998,98:463-471. [29]JAGODNIK K M.THOMAS P S.VAN DEN BOGERT ZHOU Wenji,YU Yang.Summarize of hierarchical rein- forcement learning[J].CAAI transactions on intelligent A J,et al.Training an actor-critic reinforcement learning systems,.2017,12(5):590-594. controller for arm movement using human-generated re- [18]Plappert M,Andrychowicz M,Ray A,et al.Multi-goal re- wards[J].IEEE transactions on neural systems and rehab- inforcement learning:Challenging robotics environments ilitation engineering,2017,25(10):1892-1905. and request for research[EB/OL].California,USA:arXiv, [30]FERREIRA E,LEFEVRE F.Expert-based reward shap- 2018.[2019-1l-1]htps://arxiv.org/pdf1802.09464.pdf. ing and exploration scheme for boosting policy learning [19]万里鹏,兰旭光,张翰博,等.深度强化学习理论及其应 of dialogue management[C]//2013 IEEE Workshop on 用综述U.模式识别与人工智能,2019,32(1):67-81. Automatic Speech Recognition and Understanding. WAN Lipeng,LAN Xuguang,ZHANG Hanbo,et al.A Olomouc,Czech Republic,2013:108-113. review of deep reinforcement learning theory and applica- [31]NG A Y,RUSSELL S J.Algorithms for inverse rein- tion[J].Pattern recognition and artificial intelligence, forcement learning[C]//Proceedings of the Seventeenth 2019,32(167-81 International Conference on Machine Learning.Stanford, [20]MNIH V,KAVUKCUOGLU K,SILVER D,et al.Play- USA.2000,1:663-670 ing atari with deep reinforcement learning[EB/OL].Lon- [32]MARTHI B.Automatic shaping and decomposition of re- don,UK:arXiv,2013.[2019-11-1]https://arxiv.org/pdf/ ward functions[C]//Proceedings of the 24th International 1312.5602.pdf. Conference on Machine Learning.Corvallis,USA,2007: [21]MNIH V.KAVUKCUOGLU K.SILVER D,et al.Hu- 601-608. man-level control through deep reinforcement learning[J]. [33]ROSS S.BAGNELL D.Efficient reductions for imitationHOSU I A, REBEDEA T. Playing atari games with deep reinforcement learning and human checkpoint replay[EB/OL]. Bucharest, Romania: arXiv, 2016. [2019- 10-21] https://arxiv.org/pdf/1607.05077.pdf. [10] ANDRYCHOWICZ M, WOLSKI F, RAY A, et al. Hind￾sight experience replay[C]//Proceedings of the 31st Inter￾national Conference on Neural Information Processing Systems. Long Beach, USA, 2017: 5048−5058. [11] 杨惟轶, 白辰甲, 蔡超, 等. 深度强化学习中稀疏奖励问 题研究综述 [J]. 计算机科学, 2020, 47(3): 182–191. YANG Weiyi, BAI Chenjia, CAI Chao, et al. Survey on sparse reward in deep reinforcement learning[J]. Com￾puter science, 2020, 47(3): 182–191. [12] GULLAPALLI V, BARTO A G. Shaping as a method for accelerating reinforcement learning[C]//Proceedings of the 1992 IEEE International Symposium on Intelligent Control. Glasgow, UK, 1992: 554−559. [13] HUSSEIN A, GABER M M, ELYAN E, et al. Imitation learning: A survey of learning methods[J]. ACM comput￾ing surveys, 2017, 50(2): 1–35. [14] BENGIO Y, LOURADOUR J, COLLOBERT R, et al. Curriculum learning[C]//Proceedings of the 26th Annual International Conference on Machine Learning. Montreal, Quebec, Canada, 2009: 41−48. [15] BURDA Y, EDWARDS H, PATHAK D, et al. Large￾scale study of curiosity-driven learning[EB/OL]. Califor￾nia, USA: arXiv, 2018. [2019-10-30] https://arxiv.org/pdf/ 1808.04355. [16] 周文吉, 俞扬. 分层强化学习综述 [J]. 智能系统学报, 2017, 12(5): 590–594. ZHOU Wenji, YU Yang. Summarize of hierarchical rein￾forcement learning[J]. CAAI transactions on intelligent systems, 2017, 12(5): 590–594. [17] Plappert M, Andrychowicz M, Ray A, et al. Multi-goal re￾inforcement learning: Challenging robotics environments and request for research[EB/OL]. California, USA: arXiv, 2018. [2019-11-1] https://arxiv.org/pdf/1802.09464.pdf. [18] 万里鹏, 兰旭光, 张翰博, 等. 深度强化学习理论及其应 用综述 [J]. 模式识别与人工智能, 2019, 32(1): 67–81. WAN Lipeng, LAN Xuguang, ZHANG Hanbo, et al. A review of deep reinforcement learning theory and applica￾tion[J]. Pattern recognition and artificial intelligence, 2019, 32(1): 67–81. [19] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Play￾ing atari with deep reinforcement learning[EB/OL]. Lon￾don, UK: arXiv, 2013. [2019-11-1] https://arxiv.org/pdf/ 1312.5602.pdf. [20] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Hu￾man-level control through deep reinforcement learning[J]. [21] Nature, 2015, 518(7540): 529–533. WILLIAMS R J. Simple statistical gradient-following al￾gorithms for connectionist reinforcement learning[J]. Ma￾chine learning, 1992, 8(3/4): 229–256. [22] KONDA V R, TSITSIKLIS J N. Actor-critic algorithms[C]//Advances in Neural Information Pro￾cessing Systems. Colorado, USA, 2000: 1008−1014. [23] MNIH V, BADIA A P, MIRZA M, et al. Asynchronous methods for deep reinforcement learning[C]//Proceedings of the 33rd International Conference on International Conference on Machine Learning. New York, USA, 2016: 1928−1937. [24] SCHULMAN J, WOLSKI F, DHARIWAL P, et al. Prox￾imal policy optimization algorithms[EB/OL]. California, USA: arXiv, 2017. [2019-11-3] https://arxiv.org/ pdf/1707.06347.pdf. [25] LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continu￾ous control with deep reinforcement learning[EB/OL]. London, UK: arXiv, 2015. [2019-12-25] https://arxiv. org/pdf/1509.02971.pdf. [26] NG A Y, HARADA D, RUSSELL S. Policy invariance under reward transformations: Theory and application to reward shaping[C]//Proceedings of the Sixteenth Interna￾tional Conference on Machine Learning. Bled, Slovenia, 1999, 99: 278−287. [27] RANDLØV J, ALSTRØM P. Learning to drive a bicycle using reinforcement learning and shaping[C]//Proceed￾ings of the Fifteenth International Conference on Ma￾chine Learning. Madison, USA, 1998, 98: 463−471. [28] JAGODNIK K M, THOMAS P S, VAN DEN BOGERT A J, et al. Training an actor-critic reinforcement learning controller for arm movement using human-generated re￾wards[J]. IEEE transactions on neural systems and rehab￾ilitation engineering, 2017, 25(10): 1892–1905. [29] FERREIRA E, LEFÈVRE F. Expert-based reward shap￾ing and exploration scheme for boosting policy learning of dialogue management[C]//2013 IEEE Workshop on Automatic Speech Recognition and Understanding. Olomouc, Czech Republic, 2013: 108−113. [30] NG A Y, RUSSELL S J. Algorithms for inverse rein￾forcement learning[C]//Proceedings of the Seventeenth International Conference on Machine Learning. Stanford, USA, 2000, 1: 663−670. [31] MARTHI B. Automatic shaping and decomposition of re￾ward functions[C]//Proceedings of the 24th International Conference on Machine Learning. Corvallis, USA, 2007: 601−608. [32] [33] ROSS S, BAGNELL D. Efficient reductions for imitation 第 5 期 杨瑞,等:强化学习稀疏奖励算法研究——理论与实验 ·897·
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有