当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Data storage and preservation-pres

资源类别:文库,文档格式:PDF,文档页数:102,文件大小:559.67KB,团购合买
点击下载完整版文档(PDF)

Data storage and preservation Data storage and preservation Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019 1/62 S.Ponce-CERN

Data storage and preservation 1 / 62 S. Ponce - CERN devices // risks consistency safety c/c Data storage and preservation S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2019

Data storage and preservation Outline ①Storage devices Existing devices Parallelizing files'storage o Striping Introduction to Map/Reduce Risks of data loss and corruption ④Data consistency o Checksums Practical usage ⑤Data safety oRedundancy Parity o Erasure coding 6 Conclusion 2/62 S.Ponce-CERN

Data storage and preservation 2 / 62 S. Ponce - CERN devices // risks consistency safety c/c Outline 1 Storage devices Existing devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency Checksums Practical usage 5 Data safety Redundancy Parity Erasure coding 6 Conclusion

Data storage and preservation Storage devices ①Storage devices o Existing devices Parallelizing files'storage Risks of data loss and corruption Data consistency Data safety Conclusion 3/62 S.Ponce-CERN

Data storage and preservation 3 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Storage devices 1 Storage devices Existing devices 2 Parallelizing files’ storage 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion

Data storage and preservation devices y A variety of storage devices Main differences Capacities from 1GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability oVery different speeds 200 4/62 S.Ponce-CERN

Data storage and preservation 4 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2019 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 10 ns 7000 ✩ 10 GB s −1 volatile SSD 500 GB 10 ➭s 200 ✩ 1 GB s −1 poor HD 6 TB 3 ms 25 ✩ 150 MB s −1 average Tape 20 TB 100 s 20 ✩ 500 MB s −1 good

Data storage and preservation devices A variety of storage devices Main differences o Capacities from 1 GB to 10TB per unit o Prices from 1 to 300 for the same capacity o Very different reliability o Very different speeds Typical numbers in 2019 Capacity Latency $/TB Speed reliability per unit RAM 16GB 10ns 7000$ 10GBs-1 volatile SSD 500GB 10μs 200$ 1GBs-1 poor HD 6TB 3ms 25$ 150MBs-1 average Tape 20TB 100s 20$ 500MBs-1 good 4/62 S.Ponce-CERN

Data storage and preservation 4 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices Main differences Capacities from 1 GB to 10 TB per unit Prices from 1 to 300 for the same capacity Very different reliability Very different speeds Typical numbers in 2019 Capacity per unit Latency $/TB Speed reliability RAM 16 GB 10 ns 7000 ✩ 10 GB s−1 volatile SSD 500 GB 10 ➭s 200 ✩ 1 GB s−1 poor HD 6 TB 3 ms 25 ✩ 150 MB s−1 average Tape 20 TB 100 s 20 ✩ 500 MB s−1 good

Data storage and preservation 花5 A variety of storage devices You cannot have everything cheap HD Tape SSD RAM reliability speed 2o0 5/62 S.Ponce-CERN

Data storage and preservation 5 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo A variety of storage devices You cannot have everything cheap reliability speed RAM SSD HD Tape

Data storage and preservation 4 devices/7 Reliability in real world (CERN) For disks probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written 6/62 S.Ponce-CERN

Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10 −4 and you recover most of the data on it net result is 10 −7 file loss per year one unrecoverable bit error in 10 19 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Data storage and preservation Reliability in real world (CERN) For disks ● probability of losing a disk per year:few %up to 10% with 60K disks,it's around 10 per day and all files are lost o one unrecoverable bit error in 1014 bits read/written for 10GB files,that's one file corrupted per 1000 files written For tapes probability of losing a tape per year:10-4 and you recover most of the data on it o net result is 10-7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files,that's one file corrupted per 100M files written 6/62 S.Ponce-CERN

Data storage and preservation 6 / 62 S. Ponce - CERN devices // risks consistency safety c/c zoo Reliability in real world (CERN) For disks probability of losing a disk per year : few %, up to 10% with 60K disks, it’s around 10 per day and all files are lost one unrecoverable bit error in 1014 bits read/written for 10GB files, that’s one file corrupted per 1000 files written For tapes probability of losing a tape per year : 10−4 and you recover most of the data on it net result is 10−7 file loss per year one unrecoverable bit error in 1019 bits read/written for 10GB files, that’s one file corrupted per 100M files written

Data storage and preservation 花5 Parallelizing files'storage Storage devices 2Parallelizing files'storage Striping o Introduction to Map/Reduce 3 Risks of data loss and corruption Data consistency Data safety Conclusion 世nping mapred 7/62 S.Ponce-CERN

Data storage and preservation 7 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Parallelizing files’ storage 1 Storage devices 2 Parallelizing files’ storage Striping Introduction to Map/Reduce 3 Risks of data loss and corruption 4 Data consistency 5 Data safety 6 Conclusion

Data storage and preservation Why to parallelize storage to work around limitations o individual device speed(think disk) .a file is typically stored on a single device ·network cards'speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream o core network throughput o switches/routers are expensive o machines may have less throughput than their card(s)allow(s) ●hot data congestions o and the black hole it can generate as slower tranfers allow to accumulate more transfers strping mapreduce 8/62 S.Ponce-CERN

Data storage and preservation 8 / 62 S. Ponce - CERN devices // risks consistency safety c/c striping mapreduce Why to parallelize storage ? to work around limitations individual device speed (think disk) a file is typically stored on a single device network cards’ speed 1 Gbit network still present network congestion on a node reduces bandwidth per stream core network throughput switches / routers are expensive machines may have less throughput than their card(s) allow(s) hot data congestions and the black hole it can generate as slower tranfers allow to accumulate more transfers

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共102页,可试读30页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有