第6章Data- Level parallelism in vector,SsMD, and gPU architectures ·SMD结构 向量体系结构 多媒体SMD指令集扩展 图形处理单元 ·向量体系结构 GPU 2021/2/1 计算机体系结构
第6章 Data-Level Parallelism in Vector, SIMD, and GPU Architectures • SIMD结构 –向量体系结构 –多媒体SIMD指令集扩展 –图形处理单元 •向量体系结构 •GPU 2021/2/1 计算机体系结构 2
传统指令级并行技术的问题 挖掘IP的传统方法的主要缺陷: ·提高流水线的钟频率提高时钟频率,有时导 致CP随着增加( branches, other hazards ·指令预取和译有时在每个时钟周期很难预取 和译码多条指令 提高cαche命中室:在有些计算量较大的应用中 (科学计算)需要大量的数据,其局部性较差 有些程序处理的是连续的媒体流( multimedia)其 局部性也较差。 2021/2/1 计算机体系结构
传统指令级并行技术的问题 挖掘ILP的传统方法的主要缺陷: • 提高流水线的时钟频率: 提高时钟频率,有时导 致CPI随着增加 (branches, other hazards) • 指令预取和译码: 有时在每个时钟周期很难预取 和译码多条指令 • 提高Cache命中率 : 在有些计算量较大的应用中 (科学计算)需要大量的数据,其局部性较差, 有些程序处理的是连续的媒体流(multimedia),其 局部性也较差。 2021/2/1 计算机体系结构 3
Introduction SMD结构可有效地挖掘数据级并 基于矩阵运算的科学计算 图像和声音处理 S|MD比MMD更节能 针对每组数据操作仅需要取指一次 S|MD对PMD( persona| mobile devices)更具 吸引力 SMD允许程序员继续以串行模式思维 2021/2/1 计算机体系结构
Introduction • SIMD 结构可有效地挖掘数据级并行: • 基于矩阵运算的科学计算 • 图像和声音处理 • SIMD比MIMD更节能 • 针对每组数据操作仅需要取指一次 • SIMD对PMD( personal mobile devices)更具 吸引力 • SIMD 允许程序员继续以串行模式思维 2021/2/1 计算机体系结构 4
SIMD Parallelism ·向量体系结构 ·多媒体SMD指令集扩展 Graphics Processor Units(GPUs) For x86 processors 每年增加2 cores/chip ·SMD宽度每4年翻一番 ·SMD潜在加速比是MMD的2倍 2021/2/1 计算机体系结构 5
SIMD Parallelism •向量体系结构 •多媒体SIMD指令集 扩展 •Graphics Processor Units (GPUs) • For x86 processors: • 每年增加2cores/chip • SIMD 宽度每4年翻一番 • SIMD潜在加速比是MIMD的2倍 2021/2/1 计算机体系结构 5
1000 一 MIMD'SIMD(32b) x-MIMD"SIMD(64 b) A SIMD (32 b) →SMD(64b) HHMIMD 100 10 2003 2007 2011 2015 2019 2023 x86 computers. This figure assumes that two cores per chip for MIMD will be added every two years and the p Figure 4.1 Potential speedup via parallelism from MIMD, SIMD, and both MIMD and SIMD over time for number of operations for SIMd will double every four years 2021/2/1 机体系结构
2021/2/1 计算机体系结构 6
Supercomputers Supercomputer的定义: ·对于给定任务而言世界上最快的机器 ·任何造价超过3千万美元的机器 ·计算能力达到每秒万亿次的机器 由 Seymour cray设计的机器 CDC600(ca,1964)被认为是第 台超级计算机 2021/2/1 计算机体系结构
Supercomputers •Supercomputer的定义: •对于给定任务而言世界上最快的机器 •任何造价超过3千万美元的机器 •计算能力达到每秒万亿次的机器 •由Seymour Cray设计的机器 •CDC6600 (Cray, 1964) 被认为是第一 台超级计算机 2021/2/1 计算机体系结构 7
CDC 6600 Seymour Cray, 1963 a fast pipelined machine with 60-bit words 128 Kword main memory capacity 32 banks Ten functional units(parallel, unpipelined) Floating Point: adder, 2 multipliers, divider Integer: adder, 2 incrementer, Hardwired control (no microcoding Scoreboard for dynamic scheduling of instructions Ten Peripheral Processors for Input/ Output a fast multi-threaded 12-bit integer alu Very fast clock 10 MHz (fP add in 4 clocks) >400,000 transistors, 750 sq ft, 5 tons, 150 kW, novel freon-based technology for cooling Fastest machine in world for 5 years(until 7600) over 100 sold($7-10M each 计算机体系结构
CDC 6600 Seymour Cray, 1963 • A fast pipelined machine with 60-bit words • 128 Kword main memory capacity, 32 banks • Ten functional units (parallel, unpipelined) • Floating Point: adder, 2 multipliers, divider • Integer: adder, 2 incrementers, ... • Hardwired control (no microcoding) • Scoreboard for dynamic scheduling of instructions • Ten Peripheral Processors for Input/Output • a fast multi-threaded 12-bit integer ALU • Very fast clock, 10 MHz (FP add in 4 clocks) • >400,000 transistors, 750 sq. ft., 5 tons, 150 kW, novel freon-based technology for cooling • Fastest machine in world for 5 years (until 7600) • over 100 sold ($7-10M each) 2021/2/1 8 计算机体系结构
IBM Memo on cdc6600 Thomas Watson Jr, IBM CEO, August 1963 Last week Control data. announced the 6600 system /understand that in the laborator developing the system there are only 34 people including the janitor. of these, 14 are engineers and 4 are programmers. Contrasting this modest effort with our vast development activities, fail to understand why we have lost our industry leadership position by letting someone else offer the world's most powerful computer. To which Cray replied: "t seems like Mr Watson has answered his own question 2021/2/1 计算机体系结构
IBM Memo on CDC6600 Thomas Watson Jr., IBM CEO, August 1963: “Last week, Control Data ... announced the 6600 system. I understand that in the laboratory developing the system there are only 34 people including the janitor. Of these, 14 are engineers and 4 are programmers... Contrasting this modest effort with our vast development activities, I fail to understand why we have lost our industry leadership position by letting someone else offer the world's most powerful computer.” To which Cray replied: “It seems like Mr. Watson has answered his own question.” 2021/2/1 计算机体系结构 9
Supercomputer Applications 典型应用领域 军事研究领域(核武器硏制、密码学) 科学研究 ·天气预报 石油勘探 工业设计( car crash simulation) 生物信息学 密码学 ●均涉及大量的数据集处理 ●7080年代 Supercomputer= Vector machine 2021/2/1 计算机体系结构 10
Supercomputer Applications ⚫ 典型应用领域 • 军事研究领域(核武器研制、密码学) • 科学研究 • 天气预报 • 石油勘探 • 工业设计 (car crash simulation) • 生物信息学 • 密码学 ⚫均涉及大量的数据集处理 ⚫70-80年代Supercomputer = Vector Machine 2021/2/1 计算机体系结构 10
Alternative Model: Vector Processing 向量处理机具有更高层次的操作,一条向量指令 可以处理N个或N对操作数(处理对象是向量) SCALAR VECTOR (1 operation (N operations) vector length add r3, r1, r2 add. vV v3, v1, v2 2021/2/1 计算机体系结构 11
Alternative Model:Vector Processing • 向量处理机具有更高层次的操作,一条向量指令 可以处理N个或N对操作数(处理对象是向量) + r1 r2 r3 add r3, r1, r2 SCALAR (1 operation) v1 v2 v3 + vector length add.vv v3, v1, v2 VECTOR (N operations) 2021/2/1 计算机体系结构 11