DOI: 10.11992/tis.201908011 网络出版地址: h

正在加载图片...

第14卷第6期智能系统学报 Vol.14 No.6 2019年11月 CAAI Transactions on Intelligent Systems Nov.2019 D0:10.11992/tis.201908011 网络出版地址：http:/kns.cnki.net/kcms/detail/23.1538.tp.20191126.1539.002.html 易变数据流的系统资源配置方法王春凯2，庄福振2，史忠植（1.中国再保险（集团）股份有限公司博士后科研工作站，北京100033,2.中国科学院计算技术研究所，北京100190) 摘要：大规模数据流管理系统往往由上层的关系查询系统和下层的流处理系统组成。当用户提交查询请求时，往往需要根据数据流的流速和分布情况动态配置系统参数。然而，由于数据流的易变性，频繁改变参数配置会降低系统性能。针对该问题.提出了OrientStream+框架。设定以用户自定义查询延迟阈值为间隔片段的微批量数据流传输机制；并利用多级别管道缓存，对相同配置的数据流进行批量处理；然后按照数据流的时间戳计算出精准查询结果；引入基于异常检测的增量学习模型，用于提高OrientStream+的预测精度。最后，在 Storm上实现了该资源配置框架，并进行了大量的实验。实验结果表明，OrientStream+框架可进一步降低系统的处理延迟并提高系统的吞吐率。关键词：大规模数据流管理系统：易变数据流：增量学习：模型预测；参数配置：微批处理；系统性能；异常检测中图分类号：TP311文献标志码：A文章编号：1673-4785(2019)06-1278-08 中文引用格式：王春凯，庄福振，史忠植.易变数据流的系统资源配置方法.智能系统学报，2019,14(6)：1278-1285，英文引用格式：WANG Chunkai,.ZHUANG Fuzhen,.SHI Zhongzhi.System resource allocation for variable data streamsJ.CAAI transactions on intelligent systems,2019,14(6):1278-1285. System resource allocation for variable data streams WANG Chunkai2,ZHUANG Fuzhen',SHI Zhongzhi? (1.Post-doctoral Research Center,China Reinsurance(Group)Corporation,Beijing 100033,China;2.Institute of Computing Tech- nology,Chinese Academy of Sciences,Beijing 100190,China) Abstract:A large-scale data stream management system(LSDSMS)usually contains a relational query system (RQS) and a stream processing system(SPS).When users submit queries to the RQS,it is often necessary to set system para- meters according to the rate and distribution of the data streams.However,because of the variability of data streams, changing the resource allocation often reduces the performance of the LSDSMS.In view this problem,we propose a framework for automating the characterization deployment in the LSDSMS OrientStream+.First,based on a user- defined query latency threshold,we designed a data stream transmission mechanism for a mini-batch scheme.Then,we introduced a multi-level pipeline cache for processing batch data streams in the same configuration and obtained accur- ate query results using the timestamp of the data streams.We also propose an incremental leaning technique with outlier detection to improve the prediction accuracy of OrientStream+.Finally,we validated the proposed approach on the open-source SPS-Storm.Our experimental results show that OrientStream+can reduce processing latency and improve the LSDSMS throughput. Keywords:large-scale data stream management system;variable data stream;incremental learning;model prediction; parameter configuration;mini-batch processing;system performance;outlier detection 日前，许多应用需要大规模的连续查询和分析，如：社会网络中的微博分析、金融领域中的高收稿日期：2019-08-15.网络出版日期：2019-11-27. 频交易监控，以及电子商务中的实时推荐等。基金项目：国家自然科学基金项目(U1836206,61773361)片中国这些应用往往需要快速响应用户提交的查询请博士后科学基金项目(2019M650044). 通信作者：王春凯.E-mail:chunkai_wang@163.com 求，要求大规模数据流管理系统对数据流的查询DOI: 10.11992/tis.201908011 网络出版地址: http://kns.cnki.net/kcms/detail/23.1538.tp.20191126.1539.002.html 易变数据流的系统资源配置方法王春凯1,2，庄福振2 ，史忠植2 （1. 中国再保险 (集团) 股份有限公司博士后科研工作站，北京 100033; 2. 中国科学院计算技术研究所，北京 100190）摘要：大规模数据流管理系统往往由上层的关系查询系统和下层的流处理系统组成。当用户提交查询请求时，往往需要根据数据流的流速和分布情况动态配置系统参数。然而，由于数据流的易变性，频繁改变参数配置会降低系统性能。针对该问题，提出了 OrientStream+框架。设定以用户自定义查询延迟阈值为间隔片段的微批量数据流传输机制；并利用多级别管道缓存，对相同配置的数据流进行批量处理；然后按照数据流的时间戳计算出精准查询结果；引入基于异常检测的增量学习模型，用于提高 OrientStream+的预测精度。最后，在 Storm 上实现了该资源配置框架，并进行了大量的实验。实验结果表明，OrientStream+框架可进一步降低系统的处理延迟并提高系统的吞吐率。关键词：大规模数据流管理系统；易变数据流；增量学习；模型预测；参数配置；微批处理；系统性能；异常检测中图分类号：TP311 文献标志码：A 文章编号：1673−4785(2019)06−1278−08 中文引用格式：王春凯, 庄福振, 史忠植. 易变数据流的系统资源配置方法 [J]. 智能系统学报, 2019, 14(6): 1278–1285. 英文引用格式：WANG Chunkai, ZHUANG Fuzhen, SHI Zhongzhi. System resource allocation for variable data streams[J]. CAAI transactions on intelligent systems, 2019, 14(6): 1278–1285. System resource allocation for variable data streams WANG Chunkai1,2 ，ZHUANG Fuzhen2 ，SHI Zhongzhi2 (1. Post-doctoral Research Center, China Reinsurance (Group) Corporation, Beijing 100033, China; 2. Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190, China) Abstract: A large-scale data stream management system (LSDSMS) usually contains a relational query system (RQS) and a stream processing system (SPS). When users submit queries to the RQS, it is often necessary to set system parameters according to the rate and distribution of the data streams. However, because of the variability of data streams, changing the resource allocation often reduces the performance of the LSDSMS. In view this problem, we propose a framework for automating the characterization deployment in the LSDSMS OrientStream+. First, based on a userdefined query latency threshold, we designed a data stream transmission mechanism for a mini-batch scheme. Then, we introduced a multi-level pipeline cache for processing batch data streams in the same configuration and obtained accurate query results using the timestamp of the data streams. We also propose an incremental leaning technique with outlier detection to improve the prediction accuracy of OrientStream+. Finally, we validated the proposed approach on the open-source SPS–Storm. Our experimental results show that OrientStream+ can reduce processing latency and improve the LSDSMS throughput. Keywords: large-scale data stream management system; variable data stream; incremental learning; model prediction; parameter configuration; mini-batch processing; system performance; outlier detection 日前，许多应用需要大规模的连续查询和分析，如：社会网络中的微博分析、金融领域中的高频交易监控，以及电子商务中的实时推荐等[1-3]。这些应用往往需要快速响应用户提交的查询请求，要求大规模数据流管理系统对数据流的查询收稿日期：2019−08−15. 网络出版日期：2019−11−27. 基金项目：国家自然科学基金项目 (U1836206，61773361)；中国博士后科学基金项目 (2019M650044). 通信作者：王春凯. E-mail：chunkai_wang@163.com. 第 14 卷第 6 期智能系统学报 Vol.14 No.6 2019 年 11 月 CAAI Transactions on Intelligent Systems Nov. 2019

向下翻页>>

点击下载：【知识工程】易变数据流的系统资源配置方法