CSC 2018 Data Technologies Exercises Exercises Link Andreas-Joachim Peters CERN IT-ST CERN CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises CSC 2018 Data Technologies Exercises Andreas-Joachim Peters CERN IT-ST Exercises Link
CERN Exercises Overview 1.IO system What do you know already? IOPS,bandwidth,latency blocksize media and their characteristics cache 1O optimisation strategies 1st hour how to debug Io problems 2.Redundancy Technology 2nd hour Parity for RAID technology 3.Cloud Storage Technology 3rd 4th hour Scalability,Hashing,Indexing,,Deduplication CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Exercises Overview 1. IO system • What do you know already? • IOPS, bandwidth, latency & blocksize • media and their characteristics • cache & IO optimisation strategies • how to debug IO problems 2. Redundancy Technology • Parity for RAID technology 3. Cloud Storage Technology • Scalability, Hashing, Indexing,, Deduplication 1st hour 2nd hour 3rd + 4th hour
CERN lutorial Exercise 1 A common user experience:"My IO intensive application does not run fast enough -why?" Three important questions to answer what performance should we expect of our IO system? how can we measure limitations? how can we inspect the IO of our application? To answer this,we need a basic understanding of the IO system, some measurement and debugging tools CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Tutorial - Exercise 1 • A common user experience: “My IO intensive application does not run fast enough - why?” • Three important questions to answer • what performance should we expect of our IO system? • how can we measure limitations? • how can we inspect the IO of our application? • To answer this, we need a basic understanding of the IO system, some measurement and debugging tools
CERN Interlude before we start Let's see what you already know .. 光S16H片 Please open this anonymous online poll with your phone or laptop .. http://etc.ch/WAb5 We will repeat this poll in the end of the exercises and discuss the correct answers! CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Interlude before we start … Let’s see what you already know … http://etc.ch/WAb5 Please open this anonymous online poll with your phone or laptop … We will repeat this poll in the end of the exercises and discuss the correct answers!
CERN Linux 10 System in non-virtual machines Local 1O Since we measure here, Measurement Tools User read.write 244 GLIBC strace System-call Interface SCI ere房he meta data Cache implemented Virtual Filesystem Switch VFS (not important for the 已X日1Cs XFS EXT4 FS(x) Here is the data cache Skip imalementod important for the Block Layer CxICSES using vmstat iostat Device Drivers KERNEL CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Linux IO System Skip caching using direct IO Measurement Tools in non-virtual machines Local IO
Linux Performance Tools CERN Varlous.observablity: Various,static teptop teplife d七at如每4g uge ucalis Operating Systom wa Applications runlat cpudist offcputime intel_gpu_top ext4d山it intel_gpu_time System Libraries latencytop 2 System Call Interface schedtool GPU VFS Sockets powertop Scheduler /pree/stat oun t d- Locat Remote uzbesta电 top htop ps pidstat Virtual A668s Actes Memory CPU CPU Block Device Int Ethernet Clocksource tiptop Device Drivers pert pon Firmware /syn/. DRAM hardirgs 1/0 Bridge 1/O Controller Network Controller Disk Disk Port Port FAN ower Supply -1 s线tic performance tools perf-tools/bce tracing tools CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Local Acces Remote Acces
CERN ○Performance Bandwidth IOPS ·Latency .Blocksize Latency Start 10 End Time Bandwidth IO volume time IOPS IO operations time Latency time IO operation Blocksize payload operation CERN) CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises IO Performance Bandwidth = IO volume / time IOPS = IO operations / time Latency = time / IO operation Blocksize = payload / operation •Bandwidth •IOPS •Latency •Blocksize
CERN lO下ype Categories local storage device Local Remote remote storage device end-user analysis performance baseline often given by network sync async big data analysis forward reading Sequen tial Random seek read video streaming selective data analysis bulk data analysis CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises IO Type Categories Local Remote Sequen tial Random local storage device remote storage device performance baseline often given by network forward reading seek + read end-user analysis big data analysis video streaming selective data analysis bulk data analysis sync async
CERN Storage Media Characteristics Streaming Bandwidth Latency 6E+01 10000MB/s 20'000 1 sec 1E-021E-055E07 000 0.001sec 100 MB/s 250 200 0sec 1 MB/s 0sec Tape Disk SSD Memory Tape Disk SSD Memory Random IOPS Network Latency 300 RTT ms 1'000'000ops/s 2000'000 225 RTT ms 300 100'000 1'0000ps/s 150 RTT ms 100 75 RTT ms 0.5 70 100 1 ops/s 0 RTT ms Tape Disk SSD Memory CERN CERN CERN CERN LAN Our Hostel US Australia CERN Disclaimer: numbers are indicative for enterprise devices not always symmetric for RO,WO,RW CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Storage Media Characteristics Streaming Bandwidth 1 MB/s 100 MB/s 10000 MB/s Tape Disk SSD Memory 20'000 1'000 250 200 Random IOPS 1 ops/s 1'000 ops/s 1'000'000 ops/s Tape Disk SSD Memory 2'000'000 100'000 100 Latency 0 sec 0 sec 0.001 sec 1 sec Tape Disk SSD Memory 1E-02 1E-05 5E-07 6E+01 Network Latency 0 RTT ms 75 RTT ms 150 RTT ms 225 RTT ms 300 RTT ms CERN LAN CERN Our Hostel CERN US CERN Australia 300 100 0.5 70 Disclaimer: • numbers are indicative for enterprise devices • not always symmetric for RO,WO, RW
CERN Useful Linyx Command Realtime,CPU,System Time Measurement time Copy/Block IO tool dd INPUT OUTPUT 0f=<> /dev/zero /dev/null dd Block size bs= Block count count= CERN CSC 2018 Data Technology Exercises
CSC 2018 Data Technology Exercises Useful Linux Command Realtime, CPU, System Time Measurement time Copy/Block IO tool dd