当前位置:高等教育资讯网  >  中国高校课件下载中心  >  大学文库  >  浏览文档

中国科学院:CERN专题计算学校《T-CSC数据存储》课程教学资源(讲义)Practical vectorization-booklet

资源类别:文库,文档格式:PDF,文档页数:50,文件大小:1.18MB,团购合买
1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ Autovectorization Inline assembly Intrinsics Compiler extensions Libraries 5 What to expect ?
点击下载完整版文档(PDF)

Practical vectorization Practical vectorization Sebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2022 1/50 S.Ponce-CERN

Practical vectorization 1 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Practical vectorization S´ebastien Ponce sebastien.ponce@cern.ch CERN Thematic CERN School of Computing 2022

Practical vectorization Outline Introduction ② Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C++ ●Autovectorization oInline assembly o Intrinsics oCompiler extensions oLibraries What to expect 2/50 S.Ponce-CERN

Practical vectorization 2 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Outline 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ Autovectorization Inline assembly Intrinsics Compiler extensions Libraries 5 What to expect ?

Practical vectorization 4心,ntro Meature Peeeg Techniques Expectat66 Introduction Introduction Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C+ What to e色pect? 3/50 S.Ponce-CERN

Practical vectorization 3 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Introduction 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ 5 What to expect ?

Practical vectorization Intro Meature Feeeg Technigues Expe Goal of this course Make the theory explained by Andrzej concerning SIMD and vectorization more concrete o Detail the impact of vectorization on your code on your data model 。on actual C++code Give an idea of what to expect from vectorized code 4/50 S.Ponce-CERN

Practical vectorization 4 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Goal of this course Make the theory explained by Andrzej concerning SIMD and vectorization more concrete Detail the impact of vectorization on your code on your data model on actual C++code Give an idea of what to expect from vectorized code

Practical vectorization Intro SIMD Single Instruction Multiple Data Concept o Run the same operation in parallel on multiple data o Operation is as fast as in single data case oThe data leave in a "vector" Practically A B R +回=风 A2 B2 R2 → A B3 R3 A B R4 5/50 S.Ponce-CERN

Practical vectorization 5 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations SIMD - Single Instruction Multiple Data Concept Run the same operation in parallel on multiple data Operation is as fast as in single data case The data leave in a “vector” Practically A + B = R A 1 A 2 A 3 A 4 + B 1 B 2 B 3 B 4 = R 1 R 2 R 3 R 4

Practical vectorization Intro Promises of vectorization Theoretical gains Computation speed up corresponding to vector width o Note that it's dependant on the type of data ◆float vs double shorts versus ints Various units for various vector width Name Arch nb bits nb floats/int nb doubles/long SSEI 4 X86 128 4 2 AVX2 X86 256 8 4 AVX2 2(FMA) X86 256 8 4 AVX2 512 X86 512 16 8 SVE3 ARM 128-2048 464 2-32 1 Streaming SIMD Extensions2 Advanced Vector eXtension3 Scalable Vector Extension 6/50 S.Ponce-CERN

Practical vectorization 6 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Promises of vectorization Theoretical gains Computation speed up corresponding to vector width Note that it’s dependant on the type of data float vs double shorts versus ints Various units for various vector width Name Arch nb bits nb floats/int nb doubles/long SSE1 4 X86 128 4 2 AVX2 X86 256 8 4 AVX2 2 (FMA) X86 256 8 4 AVX2 512 X86 512 16 8 SVE3 ARM 128-2048 4-64 2-32 1 Streaming SIMD Extensions 2 Advanced Vector eXtension 3 Scalable Vector Extension

Practical vectorization ntro How to now what you can use Manually Look for sse,avx,etc in your processor flags 1scpu I egrep mmxlsselavx' Flags:fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts 7/50 S.Ponce·CERN

Practical vectorization 7 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations How to now what you can use Manually Look for sse, avx, etc in your processor flags lscpu | egrep ``mmx|sse|avx'' Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm cpuid_fault epb pti ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts

Practical vectorization Intro Situation for Intel processors Nehalem (2009). Sandy Bridge (2012):Haswell (2014): Knights Corner Knights Landing Skylake (2017): Westmere (2010): Itel Xeon Intel Xeon (2012 2016年 Intel Xeon Scalable Intel Xeon Processor Intel Xeon Phi Intel Xeon Phi Processor Family Processoes E3E$futily E3 v3/E5 V3/E7v3 Coprocessor x100 Precessoe x200 (legacy) AVX-512VL AVX-512DQ Ivy Bridge (2013): Broadwe2015 AVX-512BW Ietel Xeon Intel Xeon 512-bit Processor Procecor 512-bit E3 V2/E5 V2/E7 v2 E34E5v4E74 AVX-512ER Family AVX-512PF AVX-512CD AVX-512CD 512-bit AVX-512F AVX-512F 256-6it IMCI 256-bit AVX2 AVX2 AVX2 128-bit AVX AVX AVX AVX SSE* SSE* SSE* SSE SSE primary instraction set 8/50 S.Ponce-CERN

Practical vectorization 8 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Situation for Intel processors

Practical vectorization 花5 Measuring vectorization Introduction 2 Measuring vectorization Vectorization Prerequisite Vectorizing techniques in C+ What to e色pect? 9/50 S.Ponce-CERN

Practical vectorization 9 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Measuring vectorization 1 Introduction 2 Measuring vectorization 3 Vectorization Prerequisite 4 Vectorizing techniques in C++ 5 What to expect ?

Practical vectorization Intro Measure Techniques Am I using vector registers Yes you are As vector registers are used for scalar operations o Remember Andrzej's picture Wasted pasn Am I efficiently using vector registers o Here we have to look at the generated assembly code Looking for specific intructions oOr for the use of specific names of registers 10/50 S.Ponce-CERN

Practical vectorization 10 / 50 S. Ponce - CERN Intro Measure Prereq Techniques Expectations Am I using vector registers ? Yes you are As vector registers are used for scalar operations Remember Andrzej’s picture Wasted Used Am I efficiently using vector registers ? Here we have to look at the generated assembly code Looking for specific intructions Or for the use of specific names of registers

点击下载完整版文档(PDF)VIP每日下载上限内不扣除下载券和下载次数;
按次数下载不扣除下载券;
24小时内重复下载只扣除一次;
顺序:VIP每日次数-->可用次数-->下载券;
共50页,可试读17页,点击继续阅读 ↓↓
相关文档

关于我们|帮助中心|下载说明|相关软件|意见反馈|联系我们

Copyright © 2008-现在 cucdc.com 高等教育资讯网 版权所有