xii Contents 7.4 Clusters and Other Message-Passing Multiprocessors 641 7.5 Hardware Multithreading 645 7.6 SISD,MIMD,SIMD,SPMD,and Vector 648 7.7 Introduction to Graphics Processing Units 654 7.8 Introduction to Multiprocessor Network Topologies 660 7.9 Multiprocessor Benchmarks 664 7.10 Roofline:A Simple Performance Model 667 7.11 Real Stuff:Benchmarking Four Multicores Using the Roofline Model 675 7.12 Fallacies and Pitfalls 684 7.13 Concluding Remarks 686 7.14 Historical Perspective and Further Reading 688 7.15 Exercises 688 APPENDICES Graphics and Computing GPUs A-2 A.1 Introduction A-3 A.2 GPU System Architectures A-7 A.3 Programming GPUs A-12 A.4 Multithreaded Multiprocessor Architecture A-25 A.5 Parallel Memory System A-36 A.6 Floating Point Arithmetic A-41 A.7 Real Stuff:The NVIDIA GeForce 8800 A-46 A.8 Real Stuff:Mapping Applications to GPUs A-55 A.9 Fallacies and Pitfalls A-72 A.10 Concluding Remarks A-76 A.11 Historical Perspective and Further Reading A-77 B Assemblers,Linkers,and the SPIM Simulator B-2 B.1 Introduction B-3 B.2 Assemblers B-10 B.3 Linkers B-18 B.4 Loading B-19 B.5 Memory Usage B-20 B.6 Procedure Call Convention B-22 B.7 Exceptions and Interrupts B-33 B.8 Input and Output B-38 B.9 SPIM B-407.4 Clusters and Other Message-Passing Multiprocessors 641 7.5 Hardware Multithreading 645 7.6 SISD, MIMD, SIMD, SPMD, and Vector 648 7.7 Introduction to Graphics Processing Units 654 7.8 Introduction to Multiprocessor Network Topologies 660 7.9 Multiprocessor Benchmarks 664 7.10 Roofline: A Simple Performance Model 667 7.11 Real Stuff: Benchmarking Four Multicores Using the Roofline Model 675 7.12 Fallacies and Pitfalls 684 7.13 Concluding Remarks 686 7.14 Historical Perspective and Further Reading 688 7.15 Exercises 688 A P P E N D I C E S A Graphics and Computing GPUs A-2 A.1 Introduction A-3 A.2 GPU System Architectures A-7 A.3 Programming GPUs A-12 A.4 Multithreaded Multiprocessor Architecture A-25 A.5 Parallel Memory System A-36 A.6 Floating Point Arithmetic A-41 A.7 Real Stuff: The NVIDIA GeForce 8800 A-46 A.8 Real Stuff: Mapping Applications to GPUs A-55 A.9 Fallacies and Pitfalls A-72 A.10 Concluding Remarks A-76 A.11 Historical Perspective and Further Reading A-77 B Assemblers, Linkers, and the SPIM Simulator B-2 B.1 Introduction B-3 B.2 Assemblers B-10 B.3 Linkers B-18 B.4 Loading B-19 B.5 Memory Usage B-20 B.6 Procedure Call Convention B-22 B.7 Exceptions and Interrupts B-33 B.8 Input and Output B-38 B.9 SPIM B-40 xii Contents