)救中国研究院 Secrets of performance 如何提高程序的性能 Bin lin Development Manager Microsoft research. china
Secrets of Performance 如何提高程序的性能 Bin Lin Development Manager Microsoft Research, China
提高性能的方法 Microsoft 软中国研究院 Faster hardware-$$$=速度 Use the right language, compiler optimizations Design scalable application Architectural design: cache -> RAM-> disk Choose right data structures and algorithms Tune code Avoid slow Os apis Tune. measure tune. measure tune. measure
提高性能的方法 • Faster hardware - $$$ = 速度 • Use the right language, compiler optimizations • Design scalable application – Architectural design: cache -> RAM -> disk – Choose right data structures and algorithms • Tune code – Avoid slow OS APIs – Tune, measure, tune, measure, tune, measure…
提高性能的方法 Microsoft 软中国研究院 Faster hardware-$$$=速度 Use the right language, compiler optimizations Design scalable application Architectural design: cache→>RAM→>disk Choose right data structures and algorithms Tune code Avoid slow Os apis Tune. measure tune. measure tune. measure
提高性能的方法 • Faster hardware - $$$ = 速度 • Use the right language, compiler optimizations • Design scalable application – Architectural design: cache -> RAM -> disk – Choose right data structures and algorithms • Tune code – Avoid slow OS APIs – Tune, measure, tune, measure, tune, measure…
Design: A Case Study )救中国研究院 Design a scalable SMTP server Scalable is the key 2-CPU. 4-cpu. 8-CPU machines Handle as many request as possible, with relatively fast response time
Design: A Case Study • Design a scalable SMTP server – Scalable is the key – 2-CPU, 4-CPU, 8-CPU machines – Handle as many request as possible, with relatively fast response time
Design: A Case Study 软中国研究院 A simple smtp server / Read SMTP commands/data from sockets 工f( ReadI1e()) // various housekeeping removed. // Parse SMTP recipients and other headers f(! ParseSMTPHeaders())( handle errors // Parse bodies If ( ParseSMTPBodies()) I // handle errors
Design: A Case Study • A simple SMTP server // Read SMTP commands/data from sockets If (ReadFile( … )) { // various housekeeping removed… } // Parse SMTP recipients and other headers If (!ParseSMTPHeaders(…)) { // handle errors… } // Parse bodies If (!ParseSMTPBodies(…)) { // handle errors… }
Design: A Case Study(cont) 软中国研究院 // Local delivery or routing 工f( LocalDelivery(….)) Deliver(.) 1 else t Route(….) // Send SMTP response through Socket 工f( Writers1e(.)) // various housekeeping skips
Design: A Case Study (cont.) // Local delivery or routing If (LocalDelivery( … )) { Deliver( … ); } else { Route( … ); } // Send SMTP response through Socket If (WriteFile(…)) { // various housekeeping skips… }
Traditional Thread architecture Microsoft )救中国研究院 I thread to receive and dispatch SMTP request 64 worker threads doing Worker Parse SMTP headers Thread SMTP Request Parse sMTP bodies Receiver (Other workers) Socket Local delivery Worker Thread Routing All in the same thread sequentially
Traditional Thread Architecture SMTP Request Receiver (Socket) Worker Thread Worker Thread (Other workers) • 1 thread to receive and dispatch SMTP request • 64 worker threads doing: – Parse SMTP headers – Parse SMTP bodies – Local delivery – Routing – All in the same thread sequentially…
The Evolution of hardware )救中国研究院 Relative Performance(Latency) 800 8600 日cPU E400 口RAM 200 口Disk 0 19921994199619982000 Time
The Evolution of Hardware Relative Performance (Latency) 0 200 400 600 800 1992 1994 1996 1998 2000 Time Performance CPU RAM Disk
Bridge the Gap-Caches 软中国研究院 CPU LI cache 8K instruction cache, plus 8K data cache Closely coupled 0.333 clock/instruction -practical 1 CPI CPU L2 cache 512K static RAM Coupled with full clock-speed, 64-bit, cache bus Latency: 4-1-1-1-7 clocks/instruction 1O caches(RAM based file caches)
Bridge the Gap - Caches • CPU L1 cache – 8K instruction cache, plus – 8K data cache – Closely coupled – 0.333 clock/instruction – practical 1 CPI • CPU L2 cache – 512K static RAM – Coupled with full clock-speed, 64-bit, cache bus – Latency: 4-1-1-1 – 7 clocks/instruction • I/O caches (RAM based file caches)
The Price of failure )救中国研究院 Let's look at the costs Assume I second to zero a register LI cache hit-1 second(1x L2 cache hit- 4 seconds(plus 3 seconds extra work-7x) RAM hit-25-150 seconds(24x-150x) Disk or net hit -3 weeks(2, 000,000x)
The Price of Failure • Let’s look at the costs: – Assume 1 second to zero a register – L1 cache hit - 1 second (1x) – L2 cache hit - 4 seconds (plus 3 seconds extra work - 7x) – RAM hit - 25-150 seconds (24x-150x) – Disk or net hit - 3 weeks (2,000,000x)