Part I: Parallel Computer System Architectures PVP: Parallel vector processors K MIMD, UMA, large grain g A small number of powerful custom-designed Vector Processors(VP): 21G flops g A custom-designed high-bandwidth crossbar switch aA number of shared-memory modules. A large number of vector registers and instruction buffer without caches normally. Er Examples: Cray C-90/T-90, NEC SX-4, Galaxy-1 etc a Typical structure VP VP VP Crossbar Switch SM SM SM NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci I-
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn PVP : Parallel Vector Processors + MIMD ,UMA , large grain. + A small number of powerful custom-designed Vector Processors(VP) : ‡1G flops. + A custom-designed high-bandwidth crossbar switch. + A number of shared-memory modules. + A large number of vector registers and instruction buffer without caches normally. + Examples : Cray C-90/T-90, NEC SX-4 , Galaxy-1 etc. + Typical Structure : Part I : Parallel Computer System Architectures 1 -1 VP SM VP VP SM SM ... Crossbar Switch
Part I: Parallel Computer System Architectures SMP: Symmetric Multiprocessors K MIMD UMA, medium grain, higher DOP(Degree of arales K Commodity microprocessors with on/off-chip caches E A high-speed snoopy bus or crossbar switch. G Central shared memory K Symmetric each processor has equal access to SM(Shared Memory), I/O and OS services. g Unscalable due to sm and bus E Examples SGI Power Challenge, DEC Alpha server 8400, Dawning-1 etc E Typical Structure: P/CP/C P/C Bus or crossbar switch SM SM I/O NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci 1-2
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn SMP: Symmetric Multiprocessors + MIMD ,UMA , medium grain , higher DOP(Degree of Parallelism). + Commodity microprocessors with on/off-chip caches. + A high-speed snoopy bus or crossbar switch. + Central shared memory. + Symmetric : each processor has equal access to SM(Shared Memory) , I/O and OS services. + Unscalable due to SM and bus. + Examples : SGI Power Challenge , DEC Alpha server 8400, Dawning-1 etc. + Typical Structure : Part I : Parallel Computer System Architectures 1 -2 P/C Bus or Crossbar Switch SM SM I/O P/C ... P/C
Part I: Parallel Computer System Architectures Comparison of Five Commercial SMP Systems DEC System HP9000/IBM Sun Ultra SGI Power Alphaserver Enterprise Characteristics T600RS6000R40 Challenge x 840054410 6000 No. processors 12 12 30 36 437MHZ180 MHz 112 MHZ 167 MHz 195 MHz Processor type Alpha 21164 PA 8000/ PowerPC UltraSPARC MIPS 604 R10000 Off-chip cac 4 MB I MB 512 KB 4 MB per processor Max memory 28 GB 2GB30GB16GB Interconnect Bus Bus Bus+ Xbar Bus+ Xbar Bus bandwidth 2.1 GB/s 960 MB/s 1.8 GB/s 2.6 GB/s 1.2GB/s Intemal disk 192 GB 168 GB 38 63GB144GB 6 power 12 PCI 2 MCA, 30 Sbus, tO channels buses, each N/A each 160 each 200 Channel-2 HIO eack 133 MB/s MB/s MB/s 320 MB/s 144PCI 112 HP- VO slots 15 MCA/45 Sbus 12 HIO slots PB slots slots slots 320MB IO bandwidth 1.2 GB/s I GB/s 320 MB/s 2.6GB/s per HIO slot NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl -3
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Comparison of Five Commercial SMP Systems Part I : Parallel Computer System Architectures 1 -3
Part I: Parallel Computer System Architectures MPP: Massively Parallel Processors G MIMD, NUMA, medium /large grain g A large number of commodity microprocessors E A custom-designed high bandwidth, low latency communication network e Physically distributed memory shared or not ) g May or may not have local disk Synchronized through blocking message-passing operations K Examples Intel Paragon, IBM SP2, Dawning-1000 E Typical Structure: MB MB P/C P/C LM LM NIC NIC Custom-designed Network NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn MPP:Massively Parallel Processors + MIMD, NUMA , medium/large grain . + A large number of commodity microprocessors . + A custom-designed high bandwidth , low latency communication network. + Physically distributed memory( shared or not ). + May or may not have local disk. + Synchronized through blocking message-passing operations. + Examples : Intel Paragon , IBM SP2,Dawning-1000 etc. + Typical Structure : Part I : Parallel Computer System Architectures 1 -4 P/C LM NIC Custom-designed Network P/C LM NIC MB MB
Part I: Parallel Computer System Architectures Comparison of three MPP systems Intel/Sandia ASCI MPP Models IBM SP2 SGICray Option Red Origin2000 A Large sample 9072 processors, 400 processors, 100 128 processors, 51 contiguration 1.8 Tflop/s at SNL Gflop/s at MHPCC Gflop/s at NCSA Available date December 1996 September 1994 October 1996 Processor type 00 MHz, 200 Mflop/s 67 MHz, 267 200 MHz, 400 Mflop Pentium Pro Mflop/s POWER2 MIPS R10000 Node architecture 2 processors, 32 to I processor, 64 MB to 2 processors,64 MB and data storage 256 MB of memory, 2 GB local memory, to 256 GB of DSM hared disk 1-4.5GB Local disk and shared disk Interconnect and Split 2D mesh Multistage network, Fat hypercube, memory model NORMA NORMA CC-NUMA Node operating Light-weighted kernel Complete AIX Microkernel ystem LWK (BM Unix) Cellular IriX Native MPi based on MPI and PVM Power C programming PUMA Portals mechanism Power Fortran Other programming Nx, PVM, HPF HPF. Linda models MPL PVM NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl 1-5
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Comparison of three MPP systems Part I : Parallel Computer System Architectures 1 -5
Part I: Parallel Computer System Architectures ASCI Option-Red System Service I/0 Nodes Computation Nodes Nodes I/0 Nodes PCI Computation Computation+ Service Node Node ode Node PPI, etc PCI Computati Computation+ Service Ethernet Node Node Node Node Node Ethernet Disks Operator Station Computation Computation ervice Node Node Node Boot raid Node System nodes NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl 1-6
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn ASCI Option-Red System Part I : Parallel Computer System Architectures 1 -6 Boot RAID Computation Node Service Node Service Node Service Node Node Station Boot Node ... Computation Node Computation Node... Computation Node Computation Node Computation Node ... ... ... ... ... I/O Nodes Computation Nodes Service Nodes I/O Nodes System Nodes Ethernet Operator ATM, Ethernet Node PCI Node PCI Node PCI Node PCI Node Disks Tapes HiPPI, etc
Part I: Parallel Computer System Architectures High-Performance CPU Chips for MPP Attribute Pentium Pro PowerPC 620 Alpha 21164A MIPS Roo TechnologyBiCMOS CMOS CMOS CMOS CMOS Transistors 55M55M7M 9.6M 5.4M 6.8M Clock Rate 150 MHZ 133 MHz 417 MHz 200 MHz 200 MHZ 2.9V 3.3V 22V 2.5V 3.3V Power 20W 30W 30W Word Length 32 bits 64 bits 64 bits 64 bits UD Cache 8KB/8KB 32 KB/32 KBKB/8 KB16 KB/16KB 32 KB/ZKB L2 Cache 256KB 1-128MB96KB 16 MB 16 MB a multi-chip off-chip ofif-chi off-chip module Execution Units 5 units 6 units uni Superscalar 4 way 4 way 4 4 way Pipeline depth 14 stages 4-8 stage 7-9 stages9stages 5-7 stages SPECint92 366 225 500 350 300 SPECip9z 283 300 2750 600 SPECint95 8.09 11 NA 74 SPECIp95 6.70 300 >17 NA 15 CISCRISC Short Highest clock Multimedia MP cluster hybrid2-level large LI caches rate and density and graphic bus su speculative with on-chip instructions up to L2 cache NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn High-Performance CPU Chips for MPP Part I : Parallel Computer System Architectures 1 -7
Part I: Parallel Computer System Architectures Microprocessor Families and Representative cPu chips Intel x86 Series: 86.286. 386. 486. Pentium. Pentium pro cisc Motorola Series M 68x0 and 680x0 Digital VAX( VLSI version) gital Alpha Series: 21064, 21164, 21264 MIPS Series R200030,400 RISC R500,0800,000 HP/PA-RISC Series PA 7300 and PA 8000 Micro- Sun sparc Series SPArC, MicrosPARC processors Supersparc and ultrasparc PowerPC series 601,603,604,620,630 DSP Chips Digital Sa-110, Motorola 68EC040 Microcontrollers Intel i960. IBM PowerPC 403GA NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl 1-8
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn Microprocessor Families and Representative CPU Chips Part I : Parallel Computer System Architectures 1 -8 : 86,286,386,486,Pentium,Pentium Pro : M 68x0 and 680x0 : PA 7300 and PA 8000 : SPARC,MicroSPARC, SuperSPARC and UltraSPARC : Digital SA-110,Motorola 68EC040 : Hitachi SuperH,NEC R4300 Gerenal Purpose CISC Microprocessors Embedded RISC Intel x86 Series Digital : VAX(VLSI version) RISC Digital Alpha Series MIPS Series HP/PA-RISC Series Sun SPARC Series PowerPC Series : 21064,21164,21264 : 601,603,604e,620,630 DSP Chips Microcontrollers Motorola Series Media Processors : Intel i960,IBM PowerPC 403GA : R2000, R3000, R4000 R5000, R8000, R10000
Part I: Parallel Computer System Architectures DSM: Distributed Shared-Memory E MIMD, NUMA, NORMA, large grain E Memory physically distributed, but system hardware and software support a single address space to application users. E DIR( Cache directory )is used to support distributed coherent caches A custom-designed communication netwo E Shared-memory programming style E Examples: Stanford DASH, Cray T3D etc Typical Structure: MB P/C P/C LM LM DIR DIR NIC NIC Custom-designed Network NHPCC(Hefei)·USTC· CHINA glchen @ustc.ed.cl 1-9
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn DSM : Distributed Shared-Memory + MIMD , NUMA, NORMA, large grain. + Memory physically distributed , but system hardware and software support a single address space to application users. + DIR( Cache directory ) is used to support distributed coherent caches. + A custom-designed communication network. + Shared-memory programming style. + Examples : Stanford DASH , Cray T3D etc. + Typical Structure : Part I : Parallel Computer System Architectures 1 -9 P/C LM NIC DIR MB Custom-designed Network P/C LM NIC DIR MB
Part I: Parallel Computer System Architectures COW: Cluster of workstations s MIMD, NUMA, coarse grain. w Distributed memory c Each node of Co w is a complete computer( SMP or PC)sometimes called headless workstation e A low-cost commodity network. e There is always a local disk. g A complete os resides on each node, whereas MPP only a microkernel exists g Examples: Berkeley now alpha Farm, FXCOWetc c Typical Sturcture: MB P/C P/C M M Bridge Bridge LD IOB LD IOB NIC NIC Commercial Networks(Ethernet, ATM etc.) NHPCC(Hefei)·USTC· CHINA glchenaustc edu.ci I-10
NHPCC(Hefei) •USTC •CHINA glchen@ustc.edu.cn COW : Cluster Of Workstations + MIMD , NUMA , coarse grain. + Distributed memory. + Each node of COW is a complete computer ( SMP or PC) sometimes called headless workstation. + A low-cost commodity network. + There is always a local disk. + A complete OS resides on each node , whereas MPP only a microkernel exists. + Examples : Berkeley NOW ,Alpha Farm ,FXCOW etc. + Typical Sturcture : Part I : Parallel Computer System Architectures 1 -10 LD P/C M MB IOB LD P/C M MB IOB Commercial Networks(Ethernet,ATM etc.) Bridge NIC NIC Bridge