正在加载图片...
网络新媒体技术 2012年 的发展战略产生深远的影响。本文介绍了大数据的一些基本概念、特征和面 YES 临的科学问题,总结了中国科学院战略性先导科技专项课题“海云数据系统 关键技术研究与系统研制”的一些前期工作,对未来的研究方向进行了展望。 YES 参考文献 u] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung. The Google file system [C]//Pro- ceedings of the 19th ACM Symposium on Operating Systems Principles, ACM, Bolton Landing, NY, 2003, 20-43 2] Jeffrey Dean, Sanjay Ghemawat. MapReduce: simplified data processing on large clusters C]//OSDI04: Sixth Symposium on Operating System Design and Implementation,San Francisco, CA, 2004, 137-150 3] Clifford Lynch. Big data: How do your data grow? D]. Nature, 2008, 455(7209): 28-29 4]http://www.sciencemag,org/site/special/data/ 5] James Manyika, Michael Chui, Brad Brown, etc. Big data: The next frontier for innova- ion,competition, and productivity. 2011 6]http://www-01.ibmcom/software/data/bigdata/. J Joshua Zhexue Huang, Michael K. Ng, Hongqiang Rong, etc. Automated variable weigh- ting in k- means type clustering D. IEEE Transactions on pattern analysis and machine图7随机森林模型可视化 Intelligence,2005,27(5):657-668 8] Liping Jing, Michael K. Ng, Joshua Zhexue Huang. An entropy weighting k-means algorithm for subspace clustering of high dimensional sparse data []. IEEE Transactions Knowledge and Data Engineering, 2007, 19(8):1026-1041 9]http://lucene.apacheorg/mahout/ o]王珊,王会举,覃雄派,周烜.架构大数据:挑战、现状与展望.计算机学报,2011,34(10):1741-1752 ]覃雄派,王会举,杜小勇,王珊.大数据分析- RDBMS与 MapReduce的竞争与共生D].软件学报,2012,23(1):32 [2 Leo Breiman. Random forests [). Machine learning, 2001, 45(1): 5-32 03] Baoxun Xu, Joshua Zhexue Huang, Graham Willams, etc. Classifying very high-dimensional data with random forests built from small subspaces [], International Journal of Data Warehouse and Mining, 2012,8(2): 45-62 [14] Xiaojun Chen, Xiaofei Xu, Yunming Ye, etc. TW-k-means: automated two-level variable weighting clustering algorithm for multi-viewdataD].ieeetrAnsactionsonKnowledgeandDataEngineering,http://doi.ieeecomputersociety.org/10.1109 TKDE2011.262 [5] Xiaojun Chen, Yunming Ye, Xiaofei Xu etc. A feature group weighting method for subspace clustering of high-dimensional data D]. Pattern Recognition, 2012, 45(1):434-446 16] Bingguo Li, Xiaojun Chen, Mark Junjie Li, etc. Scalable random forests for massive data [C. PAKDD,2012. 作者简介 黄晢学,男,博土,中科院深圳先进技术硏究院研究员,主要从事数据挖掘与机器学习方面的硏究 曹付元,男,博士,中科院深圳先进技术研究院博士后,主要从事数据挖掘与机器学习方面的研究。 李俊杰,男,博土,中科院深圳先进技术硏究院助理硏究员,主要从事数据挖掘与机器学习方面的硏究。 陈小军,男,博土,中科院深圳先进技术研究院助理研究员,主要从事数据挖掘与机器学习方面的硏究 o1994-2013CHinaAcademicJournalElectronicpUblishingHouse.Allrightsreservedhttp://www.cnki.net网 络 新 媒 体 技 术 2012 年 图 7 随机森林模型可视化 的发展战略产生深远的影响。本文介绍了大数据的一些基本概念、特征和面 临的科学问题,总结了中国科学院战略性先导科技专项课题“海云数据系统 关键技术研究与系统研制”的一些前期工作,对未来的研究方向进行了展望。 参 考 文 献 [1]Sanjay Ghemawat,Howard Gobioff,Shun - Tak Leung. The Google file system[C]/ /Pro￾ceedings of the 19th ACM Symposium on Operating Systems Principles,ACM,Bolton Landing,NY,2003,20 - 43 [2]Jeffrey Dean,Sanjay Ghemawat. MapReduce: simplified data processing on large clusters [C]/ /OSDI'04: Sixth Symposium on Operating System Design and Implementation,San Francisco,CA,2004,137 - 150 [3]Clifford Lynch. Big data: How do your data grow? [J]. Nature,2008,455( 7209) : 28 -29 [4]http: / /www. sciencemag. org /site /special /data /. [5]James Manyika,Michael Chui,Brad Brown,etc. Big data: The next frontier for innova￾tion,competition,and productivity. 2011. [6]http: / /www - 01. ibm. com/software /data /bigdata /. [7]Joshua Zhexue Huang,Michael K. Ng,Hongqiang Rong,etc. Automated variable weigh￾ting in k - means type clustering[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence,2005,27( 5) : 657 - 668 [8]Liping Jing,Michael K. Ng,Joshua Zhexue Huang. An entropy weighting k - means algorithm for subspace clustering of high - dimensional sparse data[J]. IEEE Transactions Knowledge and Data Engineering,2007,19( 8) : 1026 - 1041 [9]http: / /lucene. apache. org /mahout /. [10]王珊,王会举,覃雄派,周烜 . 架构大数据: 挑战、现状与展望[J]. 计算机学报,2011,34( 10) : 1741 - 1752 [11]覃雄派,王会举,杜小勇,王珊 . 大数据分析 - RDBMS 与 MapReduce 的竞争与共生[J]. 软件学报,2012,23( 1) : 32 - 45 [12]Leo Breiman. Random forests[J]. Machine learning,2001,45( 1) : 5 – 32 [13]Baoxun Xu,Joshua Zhexue Huang,Graham Willams,etc. Classifying very high - dimensional data with random forests built from small subspaces[J],International Journal of Data Warehouse and Mining,2012,8( 2) : 45 - 62 [14]Xiaojun Chen,Xiaofei Xu,Yunming Ye,etc. TW - k - means: automated two - level variable weighting clustering algorithm for multi - view data[J]. IEEE Transactions on Knowledge and Data Engineering,http: / /doi. ieeecomputersociety. org /10. 1109 / TKDE. 2011. 262 [15]Xiaojun Chen,Yunming Ye,Xiaofei Xu etc. A feature group weighting method for subspace clustering of high - dimensional data [J]. Pattern Recognition,2012,45( 1) : 434 - 446 [16]Bingguo Li,Xiaojun Chen,Mark Junjie Li,etc. Scalable random forests for massive data[C]. PAKDD,2012. 作者简介 黄哲学,男,博士,中科院深圳先进技术研究院研究员,主要从事数据挖掘与机器学习方面的研究。 曹付元,男,博士,中科院深圳先进技术研究院博士后,主要从事数据挖掘与机器学习方面的研究。 李俊杰,男,博士,中科院深圳先进技术研究院助理研究员,主要从事数据挖掘与机器学习方面的研究。 陈小军,男,博士,中科院深圳先进技术研究院助理研究员,主要从事数据挖掘与机器学习方面的研究。 26
<<向上翻页
©2008-现在 cucdc.com 高等教育资讯网 版权所有