
Computer ArchitectureMultiprocessorsComputerArchitecture
Computer Architecture Computer Architecture Mul0processors

Multiprocessors andIssues in MultiprocessingComputerArchitecture2
Computer Architecture Multiprocessors and Issues in Multiprocessing 2

Flynn's Taxonomy of ComputersMike Flynn,“Very High-Speed Computing Systems,"Proc.of IEEE,1966SISD: Single instruction operates on single data elementSIMD: Single instruction operates on multiple data elements- Array processor-VectorprocessorMISD:Multiple instructions operate on single data element- Closestform:systolicarrayprocessor,streamingprocessorMIMD: Multiple instructions operate on multiple dataelements (multiple instructionstreams)-Multiprocessor-MultithreadedprocessorComputerArchitecture
Computer Architecture Flynn’s Taxonomy of Computers • Mike Flynn, “Very High-Speed Computing Systems, ” Proc. of IEEE, 1966 • SISD: Single instruction operates on single data element • SIMD: Single instruction operates on multiple data elements – Array processor – Vector processor • MISD: Multiple instructions operate on single data element – Closest form: systolic array processor, streaming processor • MIMD: Multiple instructions operate on multiple data elements (multiple instruction streams) – Multiprocessor – Multithreaded processor 3

Why Parallel Computers?Parallelism: Doing multiple things at a timeThings: instructions, operations, tasks. Main Goal- Improve performance (Execution time or task throughput)ExecutiontimeofaprogramgovernedbyAmdahl'sLaw: Other Goals-Reduce power consumption(4N units at freq F/4) consume less power than (N units at freq F)Why?- Improve cost efficiency and scalability, reduce complexityHardertodesignasingleunitthatperformsaswellasNsimplerunits- Improve dependability: Redundant execution in spaceComputerArchitecture
Computer Architecture Why Parallel Computers? • Parallelism: Doing multiple things at a time • Things: instructions, operations, tasks • Main Goal – Improve performance (Execution time or task throughput) • Execution time of a program governed by Amdahl’s Law • Other Goals – Reduce power consumption • (4N units at freq F/4) consume less power than (N units at freq F) • Why? – Improve cost efficiency and scalability, reduce complexity • Harder to design a single unit that performs as well as N simpler units – Improve dependability: Redundant execution in space 4

Types of Parallelism and How to Exploit Them.InstructionLevelParallelism- Different instructions within a stream canbe executed in parallel-Pipelining,out-of-orderexecution,speculative execution,VLIw-Dataflow·DataParallelism- Different pieces of data can be operated on in parallel-SIMD:Vectorprocessing,arrayprocessing- Systolic arrays, streamingprocessorsTaskLevelParallelism- Different"tasks/threads”can beexecuted inparallel- Multithreading- Multiprocessing (multi-core)ComputerArchitecture
Computer Architecture Types of Parallelism and How to Exploit Them • Instruction Level Parallelism – Different instructions within a stream can be executed in parallel – Pipelining, out-of-order execution, speculative execution, VLIW – Dataflow • Data Parallelism – Different pieces of data can be operated on in parallel – SIMD: Vector processing, array processing – Systolic arrays, streaming processors • Task Level Parallelism – Different “tasks/threads” can be executed in parallel – Multithreading – Multiprocessing (multi-core) 5

Task-Level Parallelism: Creating TasksPartition a single problem into multiple related tasks(threads)-Explicitly:Parallelprogramming·EasywhentasksarenaturalintheproblemWeb/databasequeries. Difficult when natural task boundaries are unclear-Transparently/implicitly: Thread level speculation. Partition a single thread speculatively: Run many independent tasks (processes) together- Easy when there are many processes·Batch simulations, different users,cloud computing workloads- Does not improve the performance of a single taskComputerArchitecture
Computer Architecture Task-Level Parallelism: Creating Tasks • Partition a single problem into multiple related tasks (threads) – Explicitly: Parallel programming • Easy when tasks are natural in the problem – Web/database queries • Difficult when natural task boundaries are unclear – Transparently/implicitly: Thread level speculation • Partition a single thread speculatively • Run many independent tasks (processes) together – Easy when there are many processes • Batch simulations, different users, cloud computing workloads – Does not improve the performance of a single task 6

Multiprocessing FundamentalsComputerArchitecture
Computer Architecture Multiprocessing Fundamentals 7

Multiprocessor Types. Loosely coupled multiprocessors- No shared global memory address space-Multicomputernetwork:Network-basedmultiprocessors-Usuallyprogrammedviamessagepassing. Explicit calls (send, receive) for communicationTightly coupled multiprocessors- Shared global memory address space- Traditional multiprocessing: symmetric multiprocessing (SMP):Existingmulti-coreprocessors,multithreadedprocessors- Programming model similar to uniprocessors (i.e.,multitasking uniprocessor)exceptOperations on shared data require synchronizationComputerArchitecture
Computer Architecture Multiprocessor Types • Loosely coupled multiprocessors – No shared global memory address space – Multicomputer network • Network-based multiprocessors – Usually programmed via message passing • Explicit calls (send, receive) for communication • Tightly coupled multiprocessors – Shared global memory address space – Traditional multiprocessing: symmetric multiprocessing (SMP) • Existing multi-core processors, multithreaded processors – Programming model similar to uniprocessors (i.e., multitasking uniprocessor) except • Operations on shared data require synchronization 8

Main Issues in Tightly-Coupled MP. Shared memory synchronization- Locks,atomic operationsCacheconsistency-MorecommonlycalledcachecoherenceOrdering of memory operations- What should the programmer expect the hardware to provide?Resource sharing, contention, partitioning Communication: Interconnection networksLoad imbalanceComputerArchitecture
Computer Architecture Main Issues in Tightly-Coupled MP • Shared memory synchronization – Locks, atomic operations • Cache consistency – More commonly called cache coherence • Ordering of memory operations – What should the programmer expect the hardware to provide? • Resource sharing, contention, partitioning • Communication: Interconnection networks • Load imbalance 9

Aside: Hardware-based Multithreading· Coarse grained-Quantumbased- Event based (switch-on-event multithreading)· Fine grained- Cycleby cycle- Thornton, "CDC 6600: Design of a Computer," 1970.- Burton Smith, “A pipelined, shared resource MIMD computer," ICPP1978..Simultaneous Can dispatch instructions from multiple threads at the same time-Goodfor improving execution unit utilizationComputerArchitecture10
Computer Architecture Aside: Hardware-based Multithreading • Coarse grained – Quantum based – Event based (switch-on-event multithreading) • Fine grained – Cycle by cycle – Thornton, “CDC 6600: Design of a Computer, ” 1970. – Burton Smith, “A pipelined, shared resource MIMD computer, ” ICPP 1978. • Simultaneous – Can dispatch instructions from multiple threads at the same time – Good for improving execution unit utilization 10