Structuring data for efficient I/O format compress addr state c/c Structuring data for efficient 1/O Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2015 1/42 S.Ponce-CERN
Structuring data for efficient I/O 1 / 42 S. Ponce - CERN format compress addr state c/c Structuring data for efficient I/O S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2015
Structuring data for efficient l/O format compreas addr state c/c Overall Course Structure Structuring Data for efficient I/O o Data formats,data compression oData addressing Many ways to Store Data o Storage devices and their specificities o Distributing and parallelizing storage Preserving data o Data consistency o Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching 2
Structuring data for efficient I/O 2 / 42 S. Ponce - CERN format compress addr state c/c Overall Course Structure Structuring Data for efficient I/O Data formats, data compression Data addressing Many ways to Store Data Storage devices and their specificities Distributing and parallelizing storage Preserving data Data consistency Data safety Key ingredients to achieve efficient I/O Synchronous vs asynchronous I/O I/O optimizations and caching
Structuring data for efficient I/O format compress addr state c/c Outline ① Data format Row vs Column Compressing data oCompression algorithms Efficiency and use cases Data addressing o Hierarchical namespaces ●Limitations ●Flat namespaces Stateful interfaces ●POSIX ●Limitations o Stateless interfaces Conclusion 3/42 S.Ponce CERN
Structuring data for efficient I/O 3 / 42 S. Ponce - CERN format compress addr state c/c Outline 1 Data format Row vs Column 2 Compressing data Compression algorithms Efficiency and use cases 3 Data addressing Hierarchical namespaces Limitations Flat namespaces 4 Stateful interfaces POSIX Limitations Stateless interfaces 5 Conclusion
Structuring data for efficient I/O format compress addr state c/c Data format 0 Data format o Row vs Column 2 Compressing data Data addressing 年 Stateful interfaces Conclusion row/col 4/42 S.Ponce-CERN
Structuring data for efficient I/O 4 / 42 S. Ponce - CERN format compress addr state c/c row/col Data format 1 Data format Row vs Column 2 Compressing data 3 Data addressing 4 Stateful interfaces 5 Conclusion
Structuring data for efficient I/O format compreas addr state c/c Data structure by example-scenario Scenario o You are measuring temperatures within a piece of detector o You have 10K captors and you take one measure every minute o After a month,you got 432M measures o That is 1.6GB if you take single precision floats(32bits) row/cal 5/42 S.Ponce-CERN
Structuring data for efficient I/O 5 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - scenario Scenario You are measuring temperatures within a piece of detector You have 10K captors and you take one measure every minute After a month, you got 432M measures That is 1.6GB if you take single precision floats (32bits)
Structuring data for efficient I/O format compress addr.state Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order row/cal 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn
Structuring data for efficient I/O format compre= addr stat Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c 0 a20 bo 20 row/cal 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn
Structuring data for efficient I/O 心format compre= addr.state cc Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c ao bo 20 1 a1 b1 21 row/cal 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn
Structuring data for efficient I/O format compre= addr.state Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 1 a1 b1 21 … n an bn Zn row/cal 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn
Structuring data for efficient I/O format compress addr stats c/e Data structure by example -row storage Naive structure o You arrange your captors in a sequential order according to the detector geometry Each minute,you create a new "row"of data,with 10K floats representing temperatures given by the captors,in that order Time (mn) Captor 1 Captor 2 Captor c 0 ao bo 20 1 a1 b1 Z1 n an bn Zn File content a0bo.2oa1b1…z1…anbn.zn o 6/42 S.Ponce-CERN
Structuring data for efficient I/O 6 / 42 S. Ponce - CERN format compress addr state c/c row/col Data structure by example - row storage Naive structure You arrange your captors in a sequential order according to the detector geometry Each minute, you create a new “row” of data, with 10K floats representing temperatures given by the captors, in that order Time (mn) Captor 1 Captor 2 ... Captor c 0 a0 b0 ... z0 1 a1 b1 ... z1 ... ... ... ... ... n an bn ... zn File content a0 b0 ... z0 a1 b1 ... z1 ... an bn ... zn