Introduction In the days of assembler lan estimated the tion speed 器0 fragment co r of clock cycles their execution would er was trivially one-to-one.The assembler code was con bler.The code. s ta g.It re moet ght chCstatement translates C++has shattered this nice linear relationship between the number of source level statements and compiler-gener mbly statemn co ereas the co st of C sta nents is largely uniform.the whereas another can generate 300.Implementing high-performance Ccod has nexpected demand on programmers:the need to navigate through a performance mne d,trying to st cenerate large overhead and know how to design around them These are thatCand assembler eq moK puryaq weod mok jo Mou uonnax a ouu apo Lsu ospe y duo++l The task ng( programmer migrating h skills that are specife toC +and that trar nd the s performance t lkely to by hi en overhead, e to stumble upo a are hot going en many xamples of poor that were rooed in since moved into the mainstream.However.and reuse seldomg ency.In mathemati It would be painf to reduce ever theorem back to basic akes sense to leverage sp ecial circumstances and to take e desigr it is acceptable under some circumstances to place ce than reuse. hen you mpie s are tha future it might be reused.Some performance problems inO design are due to putting the emphasison the wrong place at the ng th em they have,not on Roots of Software Inefficiency the root of all nerf vil.Even elimin ed overhead the cas se,then e awesome performance due to the lack of silent overhead.Additional factors affect software performance in viii viii Introduction In the days of assembler language programming, experienced programmers estimated the execution speed of their source code by counting the number of assembly language instructions. On some architectures, such as RISC, most assembler instructions executed in one clock cycle each. Other architectures featured wide variations in instruction to instruction execution speed, but experienced programmers were able to develop a good feel for average instruction latency. If you knew how many instructions your code fragment contained, you could estimate with accuracy the number of clock cycles their execution would consume. The mapping from source code to assembler was trivially one-to-one. The assembler code was the source code. On the ladder of programming languages, C is one step higher than assembler language. C source code is not identical to the corresponding compiler-generated assembler code. It is the compiler’s task to bridge the gap from source code to assembler. The mapping of source-to-assembler code is no longer the one-toone identity mapping. It remains, however, a linear relationship: Each source level statement in C corresponds to a small number of assembler instructions. If you estimate that each C statement translates into five to eight assembler instructions, chances are you will be in the ballpark. C++ has shattered this nice linear relationship between the number of source level statements and compiler-generated assembly statement count. Whereas the cost of C statements is largely uniform, the cost of C++ statements fluctuates wildly. One C++ statement can generate three assembler instructions, whereas another can generate 300. Implementing high-performance C++ code has placed a new and unexpected demand on programmers: the need to navigate through a performance minefield, trying to stay on a safe three-instruction-per-statement path and to avoid usage of routes that contain 300-instruction land mines. Programmers must identify language constructs likely to generate large overhead and know how to code or design around them. These are considerations that C and assembler language programmers have never had to worry about. The only exception may be the use of macros in C, but those are hardly as frequent as the invocations of constructors and destructors in C++ code. The C++ compiler might also insert code into the execution flow of your program “behind your back.” This is news to the unsuspecting C programmer migrating to C++ (which is where many of us are coming from). The task of writing efficient C++ programs requires C++ developers to acquire new performance skills that are specific to C++ and that transcend the generic software performance principles. In C programming, you are not likely to be blindsided by hidden overhead, so it is possible to stumble upon good performance in a C program. In contrast, this is unlikely to happen in C++: You are not going to achieve good performance accidentally, without knowing the pitfalls lurking about. To be fair, we have seen many examples of poor performance that were rooted in inefficient objectoriented (OO) design. The ideas of software flexibility and reuse have been promoted aggressively ever since OO moved into the mainstream. However, flexibility and reuse seldom go hand-in-hand with performance and efficiency. In mathematics, it would be painful to reduce every theorem back to basic principles. Mathematicians try to reuse results that have already been proven. Outside mathematics, however, it often makes sense to leverage special circumstances and to take shortcuts. In software design, it is acceptable under some circumstances to place higher priority on performance than reuse. When you implement the read() or write() function of a device driver, the known performance requirements are generally much more important to your software’s success than the possibility that at some point in the future it might be reused. Some performance problems in OO design are due to putting the emphasis on the wrong place at the wrong time. Programmers should focus on solving the problem they have, not on making their current solution amenable to some unidentified set of possible future requirements. Roots of Software Inefficiency Silent C++ overhead is not the root of all performance evil. Even eliminating compiler-generated overhead would not always be sufficient. If that were the case, then every C program would enjoy automatic awesome performance due to the lack of silent overhead. Additional factors affect software performance in