正在加载图片...
第16卷第5期 智能系统学报 Vol.16 No.5 2021年9月 CAAI Transactions on Intelligent Systems Sep.2021 D0L:10.11992tis.202108010 面向机器学习的分布式并行计算关键技术及应用 曹嵘晖2,唐卓2,左知微2,张学东2 (1.湖南大学信息科学与工程学院,湖南长沙410082,2.国家超级计算长沙中心,湖南长沙410082) 摘要:当前机器学习等算法的计算、迭代过程日趋复杂,充足的算力是保障人工智能应用落地效果的关键。 本文首先提出一种适应倾斜数据的分布式异构环境下的任务时空调度算法,有效提升机器学习模型训练等任 务的平均效率:其次,提出分布式异构环境下高效的资源管理系统与节能调度算法,实现分布式异构环境下基 于动态预测的跨域计算资源迁移及电压/频率的动态调节,节省了系统的整体能耗:然后构建了适应于机器学 习深度学习算法迭代的分布式异构优化环境,提出了面向机器学习图迭代算法的分布式并行优化基本方法。 最后,本文研发了面向领域应用的智能分析系统,并在制造、交通、教育、医疗等领域推广应用,解决了在高效 数据采集、存储、清洗、融合与智能分析等过程中普遍存在的性能瓶颈问题。 关键词:机器学习;分布式计算;倾斜数据;任务时空调度:资源管理;节能调度:跨域资源迁移:并行优化:图迭 代算法:智能分析系统 中图分类号:TP18文献标志码:A文章编号:1673-4785(2021)05-0919-12 中文引用格式:曹蝾晖,唐卓,左知微,等.面向机器学习的分布式并行计算关键技术及应用.智能系统学报,2021,16(5): 919-930. 英文引用格式:CAO Ronghui,.TANG Zhuo,ZUO Zhiwei,,etal.Key technologies and applications of distributed parallel comput- ing for machine learning J.CAAI transactions on intelligent systems,2021,16(5):919-930 Key technologies and applications of distributed parallel computing for machine learning CAO Ronghui,TANG Zhuo,ZUO Zhiwei,ZHANG Xuedong2 (1.College of Computer Science and Electronic Engineering,Hunan University,Changsha 410082,China;2.National Supercom- puter Center in Changsha,Changsha 410082,China) Abstract:At present,the calculation and iteration process of algorithms such as machine learning is becoming more and more complex.Sufficient computational power is the key to ensure the landing effect of artificial intelligence applica- tion.In view of this,this paper first puts forward a task space-time scheduling algorithm adapted to the distributed het- erogeneous environment of skew data,which effectively improves the average efficiency of tasks such as machine learn- ing model training.Then,the high-efficiency resource management system and energy-saving scheduling algorithm in distributed heterogeneous environment are proposed to realize the dynamic prediction based cross-domain computing re- source migration and voltage/frequency dynamic regulation in distributed heterogeneous environment,which saves the overall energy consumption of the system,and then,the distributed heterogeneous optimization environment adapted to the iteration of machine learning/deep learning algorithm is constructed,and the basic method of distributed parallel op- timization for machine learning/graph iteration algorithm is proposed.Finally,the intelligent analysis system for field- oriented applications is researched and developed,and popularized in manufacturing,transportation,education,medical and other fields,which solves the performance bottleneck problems that are common in the process of high-efficiency data collection,storage,cleaning,fusion and intelligent analysis. Keywords:machine learning;distributed computing;skew data;task space-time scheduling;resource management;en- ergy-saving scheduling;cross-domain resource migration;parallel optimization;graph iteration algorithm;intelligent analysis system 收稿日期:2021-08-11. 以超级计算、云计算为计算基础设施,以大 基金项目:国家重点研发计划项目(2018YFB1701400):国家自 然科学基金项目(92055213,61873090,L1924056, 数据分析、从海量经验数据中产生智能的人工智 62002114):金融及产业数据驱动下的智慧园区云平 台研发及产业化项目(XMHT20190205007):广东省 能2.0时代的浪潮正在袭来2。互联网、人工智 重点领域研发计划项目(XMHT20190205007)深圳 市科技计划项目(JSGG20180507183023239). 能应用的蓬勃发展,在海量数据的处理分析上面 通信作者:唐卓.E-mail:ztang@hnu.edu.cn. 临巨大的挑战:传统数据平台的并行计算能力、DOI: 10.11992/tis.202108010 面向机器学习的分布式并行计算关键技术及应用 曹嵘晖1,2,唐卓1,2,左知微1,2,张学东1,2 (1. 湖南大学 信息科学与工程学院, 湖南 长沙 410082; 2. 国家超级计算长沙中心, 湖南 长沙 410082) 摘 要:当前机器学习等算法的计算、迭代过程日趋复杂, 充足的算力是保障人工智能应用落地效果的关键。 本文首先提出一种适应倾斜数据的分布式异构环境下的任务时空调度算法,有效提升机器学习模型训练等任 务的平均效率;其次,提出分布式异构环境下高效的资源管理系统与节能调度算法,实现分布式异构环境下基 于动态预测的跨域计算资源迁移及电压/频率的动态调节,节省了系统的整体能耗;然后构建了适应于机器学 习/深度学习算法迭代的分布式异构优化环境,提出了面向机器学习/图迭代算法的分布式并行优化基本方法。 最后,本文研发了面向领域应用的智能分析系统,并在制造、交通、教育、医疗等领域推广应用,解决了在高效 数据采集、存储、清洗、融合与智能分析等过程中普遍存在的性能瓶颈问题。 关键词:机器学习;分布式计算;倾斜数据;任务时空调度;资源管理;节能调度;跨域资源迁移;并行优化;图迭 代算法;智能分析系统 中图分类号:TP18 文献标志码:A 文章编号:1673−4785(2021)05−0919−12 中文引用格式:曹嵘晖, 唐卓, 左知微, 等. 面向机器学习的分布式并行计算关键技术及应用 [J]. 智能系统学报, 2021, 16(5): 919–930. 英文引用格式:CAO Ronghui, TANG Zhuo, ZUO Zhiwei, et al. Key technologies and applications of distributed parallel comput￾ing for machine learning[J]. CAAI transactions on intelligent systems, 2021, 16(5): 919–930. Key technologies and applications of distributed parallel computing for machine learning CAO Ronghui1,2 ,TANG Zhuo1,2 ,ZUO Zhiwei1,2 ,ZHANG Xuedong1,2 (1. College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082, China; 2. National Supercom￾puter Center in Changsha, Changsha 410082, China) Abstract: At present, the calculation and iteration process of algorithms such as machine learning is becoming more and more complex. Sufficient computational power is the key to ensure the landing effect of artificial intelligence applica￾tion. In view of this, this paper first puts forward a task space-time scheduling algorithm adapted to the distributed het￾erogeneous environment of skew data, which effectively improves the average efficiency of tasks such as machine learn￾ing model training. Then, the high-efficiency resource management system and energy-saving scheduling algorithm in distributed heterogeneous environment are proposed to realize the dynamic prediction based cross-domain computing re￾source migration and voltage/frequency dynamic regulation in distributed heterogeneous environment, which saves the overall energy consumption of the system, and then, the distributed heterogeneous optimization environment adapted to the iteration of machine learning/deep learning algorithm is constructed, and the basic method of distributed parallel op￾timization for machine learning/graph iteration algorithm is proposed. Finally, the intelligent analysis system for field￾oriented applications is researched and developed, and popularized in manufacturing, transportation, education, medical and other fields, which solves the performance bottleneck problems that are common in the process of high-efficiency data collection, storage, cleaning, fusion and intelligent analysis. Keywords: machine learning; distributed computing; skew data; task space-time scheduling; resource management; en￾ergy-saving scheduling; cross-domain resource migration; parallel optimization; graph iteration algorithm; intelligent analysis system 以超级计算、云计算为计算基础设施,以大 数据分析、从海量经验数据中产生智能的人工智 能 2.0 时代的浪潮正在袭来[1-2]。互联网、人工智 能应用的蓬勃发展,在海量数据的处理分析上面 临巨大的挑战:传统数据平台的并行计算能力、 收稿日期:2021−08−11. 基金项目:国家重点研发计划项目(2018YFB1701400);国家自 然科学基金项目(92055213,61873090,L1924056, 62002114);金融及产业数据驱动下的智慧园区云平 台研发及产业化项目(XMHT20190205007);广东省 重点领域研发计划项目(XMHT20190205007)深圳 市科技计划项目(JSGG20180507183023239). 通信作者:唐卓. E-mail: ztang@hnu.edu.cn. 第 16 卷第 5 期 智 能 系 统 学 报 Vol.16 No.5 2021 年 9 月 CAAI Transactions on Intelligent Systems Sep. 2021
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有