[18]Apache.org.HDFS Architecture Guide [EB/OL].[2017- [4]孟小峰,慈祥.大数据管理:概念技术与挑战[J刀.计算 06-07].http://hadoop.apache.org/docs/r2.7.2/hadoop-2)明细查询:如图 18 所示,对网上交易日志的 明细查询,实现千亿级数据秒级查询响应。 主要用 于实时明细查询,根据时间、系统、站点等多维条件 查询从 TB 级别的日志数据中快速准确地找到所需 数据,查询时效均能达到秒级响应。 极大地方便了 运维管理人员的工作,在节约大量的时间的同时提 高了问题排查效率。 图 18 明细查询 Fig.18 Query details 6 结束语 本文研究了 SQL⁃on⁃Hadoop 技术在网络日志分 析中的应用。 我们选取了其中最有代表性的 3 种 SQL 查询引擎———Hive、Impala 和 Spark SQL,并使 用 TPC⁃H 的测试基准对它们的决策支持能力进行 测试及评估。 构建面向证券行业的网络日志分析 平台,实现万亿级日志存储和高效、灵活的查询系 统,为海量日志集中分析与管理系统应用提供支 持。 目前 SQL⁃on⁃Hadoop 系统还存在若干问题有待 解决,在有限的资源使用情况下和特定数据分布场 景下提高查询处理效率等问题都有待进一步的 研究。 参考文献: [1]OLINER A, GANAPATHI A, XU W. Advances and challenges in log analysis [J]. Communications of the ACM, 2012, 55 (2): 55-61. [2]李国杰,程学旗. 大数据研究:未来科技及经济社会发展 的重大战略领域———大数据的研究现状与科学思考 [J]. 中国科学院院刊,2012, 27(6): 647-657. LI Guojie, CHENG Xueqi. Research status and scientific thinking of big data [ J]. Bulletin of Chinese academy of sciences, 2012, 27(6): 647-657. [3]王元卓,靳小龙,程学旗. 网络大数据:现状与展望[ J]. 计算机学报, 2013, 36(6): 1125-1138. WANG Yuanzhuo, JIN Xiaolong, CHENG Xueqi. Network big data: present and future [ J ]. Chinese journal of computer, 2013, 36(6): 1125-1138. [4]孟小峰,慈祥. 大数据管理:概念、技术与挑战[ J]. 计算 机研究与发展, 2013, 50(1): 146-149. MENG Xiaofeng, CI Xiang. Big data management: Concepts, techniques and challenges [ J ]. Journal of computer research and development, 2013, 50 ( 1 ): 146-149. [5]JOSHI S B. Apache hadoop performance⁃tuning methodologies and best practices [C] / / Proceedings of the 3rd ACM/ SPEC International Conference on Performance Engineering. New York, USA, 2012: 241-242. [6]LAMB W. The storyteller, the scribe, and a missing man: hidden influences from printed sources in the gaelic tales of duncan and neil macdonald [ J]. Oral tradition, 2012, 27 (1): 109-160. [7]Apache.org. Apache Chukwa [EB/ OL]. [ 2017- 06- 07]. http: / / chukwa.apache.org / [8] GOODHOPE K, KOSHY J, KREPS J, et al. Building LinkedIn’ s real⁃time activity data pipeline [ J ]. Data engineering, 2012, 35(2): 33-45. [ 9]APACHE ORG. Apache Flume [EB/ OL]. [2017-06-07]. https: / / flume.apache.org. [10]GHEMAWAAT S, GOBIOFF H, LEUNG S T. The Google file system[C] / / Proc of the 19th ACM Symp on Operating Systems Principles. New York, USA, 2003: 29-43. [11 ] THUSOO A, SARMA J S, JAIN N, et al. Hive—a petabyte scale data warehouse using Hadoop [C] / / Proc of 2010 IEEE 26th International Conference. Piscataway, NJ, 2010: 996-1005. [12]APACHE ORG. Apache HBase [ EB/ OL]. [ 2017 - 06 - 07]. https: / / Hbase.apache.org. [13]APACHE ORG. Hadoop Streaming [EB/ OL]. [2017-06- 07]. http: / / hadoop. apache. org / docs/ r1. 2. 1 / streaming. html. [14] WEI J, ZHAO Y, JIANG K, et al. Analysis farm: A cloud⁃based scalable aggregation and query platform for network log analysis [ C ] / / International Conference on Cloud and Service Computing. Hong Kong, China, 2011: 354-359. [15] RABKIN A, KATZ R H. Chukwa: a system for reliable large-scale log collection[C] / / International Conference on Large Installation System Administration. New York , USA, 2010: 163-177. [16] LOGOTHETIS D, TREZZO C, WEBB K, et al. In⁃situ mapreduce for log processing [C] / / Usenix Conference on Hot Topics in Cloud Computing. Berkeley, USA, 2012: 26-26. [17]TREZZO C J. Continuous mapreduce: an architecture for large⁃scale in⁃situ data processing [ J]. Dissertations and theses⁃gradworks, 2010, 126(7): 14. [18]Apache.org. HDFS Architecture Guide [EB/ OL]. [2017- 06- 07]. http: / / hadoop. apache. org / docs/ r2.7.2 / hadoop⁃
