28 Chapter 1.Preliminaries 1.3 Error,Accuracy,and Stability Although we assume no prior training of the reader in formal numerical analysis, we will need to presume a common understanding of a few key concepts.We will define these briefly in this section. Computers store numbers not with infinite precision but rather in some approxi- mation that can be packed into a fixed number of bits(binary digits)or bytes(groups of 8 bits).Almost all computers allow the programmer a choice among several different such representations or data types.Data types can differ in the number of bits utilized (the wordlength).but also in the more fundamental respect of whether the stored number is represented in fixed-point (int or long)or floating-point (float or double)format. A number in integer representation is exact.Arithmetic between numbers in integer representation is also exact,with the provisos that(i)the answer is not outside the range of(usually,signed)integers that can be represented,and(ii)that division is interpreted as producing an integer result,throwing away any integer remainder. 9 In floating-point representation,a number is represented internally by a sign bit s (interpreted as plus or minus),an exact integer exponent e,and an exact positive integer mantissa M.Taken together these represent the number s x MX Be-E (1.3.1) 套之,9 where B is the base of the representation (usually B=2,but sometimes B=16), 中导个 9 and E is the bias of the exponent,a fixed integer constant for any given machine and representation.An example is shown in Figure 1.3.1. 6 Several floating-point bit patterns can represent the same number.If B =2, for example,a mantissa with leading (high-order)zero bits can be left-shifted,ie., multiplied by a power of 2,if the exponent is decreased by a compensating amount. Bit patterns that are "as left-shifted as they can be"are termed normalized.Most computers always produce normalized results,since these don't waste any bits of 10621 the mantissa and thus allow a greater accuracy of the representation.Since the high-order bit of a properly normalized mantissa(when B=2)is always one,some E喜 Numerical Recipes 43106 computers don't store this bit at all,giving one extra bit of significance Arithmetic among numbers in floating-point representation is not exact,even if the operands happen to be exactly represented(ie.,have exact values in the form of equation 1.3.1).For example,two floating numbers are added by first right-shifting North (dividing by two)the mantissa of the smaller (in magnitude)one,simultaneously increasing its exponent,until the two operands have the same exponent.Low-order (least significant)bits of the smaller operand are lost by this shifting.If the two operands differ too greatly in magnitude,then the smaller operand is effectively replaced by zero,since it is right-shifted to oblivion. The smallest(in magnitude)floating-point number which,when added to the floating-point number 1.0,produces a floating-point result different from 1.0 is termed the machine accuracy em.A typical computer with B=2 and a 32-bit wordlength has em around 3 x 10-8.(A more detailed discussion of machine characteristics,and a program to determine them,is given in $20.1.)Roughly
28 Chapter 1. Preliminaries Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 1.3 Error, Accuracy, and Stability Although we assume no prior training of the reader in formal numerical analysis, we will need to presume a common understanding of a few key concepts. We will define these briefly in this section. Computers store numbers not with infinite precision but rather in some approximation that can be packed into a fixed number of bits (binary digits) or bytes (groups of 8 bits). Almost all computers allow the programmer a choice among several different such representations or data types. Data types can differ in the number of bits utilized (the wordlength), but also in the more fundamental respect of whether the stored number is represented in fixed-point (int or long) or floating-point (float or double) format. A number in integer representation is exact. Arithmetic between numbers in integer representation is also exact, with the provisos that (i) the answer is not outside the range of (usually, signed) integers that can be represented, and (ii) that division is interpreted as producing an integer result, throwing away any integer remainder. In floating-point representation, a number is represented internally by a sign bit s (interpreted as plus or minus), an exact integer exponent e, and an exact positive integer mantissa M. Taken together these represent the number s × M × Be−E (1.3.1) where B is the base of the representation (usually B = 2, but sometimes B = 16), and E is the bias of the exponent, a fixed integer constant for any given machine and representation. An example is shown in Figure 1.3.1. Several floating-point bit patterns can represent the same number. If B = 2, for example, a mantissa with leading (high-order) zero bits can be left-shifted, i.e., multiplied by a power of 2, if the exponent is decreased by a compensating amount. Bit patterns that are “as left-shifted as they can be” are termed normalized. Most computers always produce normalized results, since these don’t waste any bits of the mantissa and thus allow a greater accuracy of the representation. Since the high-order bit of a properly normalized mantissa (when B = 2) is always one, some computers don’t store this bit at all, giving one extra bit of significance. Arithmetic among numbers in floating-point representation is not exact, even if the operands happen to be exactly represented (i.e., have exact values in the form of equation 1.3.1). For example, two floating numbers are added by first right-shifting (dividing by two) the mantissa of the smaller (in magnitude) one, simultaneously increasing its exponent, until the two operands have the same exponent. Low-order (least significant) bits of the smaller operand are lost by this shifting. If the two operands differ too greatly in magnitude, then the smaller operand is effectively replaced by zero, since it is right-shifted to oblivion. The smallest (in magnitude) floating-point number which, when added to the floating-point number 1.0, produces a floating-point result different from 1.0 is termed the machine accuracy m. A typical computer with B = 2 and a 32-bit wordlength has m around 3 × 10−8. (A more detailed discussion of machine characteristics, and a program to determine them, is given in §20.1.) Roughly
1.3 Error,Accuracy,and Stability 29 sign bit 8-bit exponent this bit could "phantom' 23-bit mantissa 2=01000000010000000000000000000000 (a) 3=01000001011000000000000000000000 (b) 4=00111111110000000000000000000000(c) 10-7=001101001 11010110101111111001010 (d) =010000010 00000000000000000000000 1(e) 3+10-7=010000010 11000000000000000000000(f) Figure 1.3.1.Floating point representations of numbers in a typical 32-bit(4-byte)format.(a)The number 1/2 (note the bias in the exponent);(b)the number 3;(c)the number 1/4;(d)the number 10,represented to machine accuracy;(e)the same number 107,but shifted so as to have the same exponent as the number 3.with this shifting,all significance is lost and 107 becomes zero,shifting to a common exponent must occur before two numbers can be added;(f)sum of the numbers 3+107, which equals 3 to machine accuracy.Even though 10 can be represented accurately by itself,it cannot accurately be added to a much larger number. 32d 9 speaking,the machine accuracy em is the fractional accuracy to which floating-point numbers are represented,corresponding to a change of one in the least significant bit of the mantissa.Pretty much any arithmetic operation among floating numbers should be thought of as introducing an additional fractional error of at least em.This R尘豆o type of error is called roundoff error. It is important to understand that em is not the smallest floating-point number that can be represented on a machine.That number depends on how many bits there are in the exponent,while e depends on how many bits there are in the mantissa. 6 Roundoff errors accumulate with increasing amounts of calculation.If,in the course of obtaining a calculated value,you perform N such arithmetic operations, you might be so lucky as to have a total roundoff error on the order of vNem,if the roundoff errors come in randomly up or down.(The square root comes from a random-walk.)However,this estimate can be very badly off the mark for two reasons: 、 10621 (i)It very frequently happens that the regularities of your calculation,or the Numerica peculiarities of your computer,cause the roundofferrors to accumulate preferentially 4310 in one direction.In this case the total will be of order Nem. Recipes (11)Some especially unfavorable occurrences can vastly increase the roundoff error of single operations.Generally these can be traced to the subtraction of two very nearly equal numbers,giving a result whose only significant bits are those North (few)low-order ones in which the operands differed.You might think that such a "coincidental"subtraction is unlikely to occur.Not always so.Some mathematical expressions magnify its probability of occurrence tremendously.For example,in the familiar formula for the solution of a quadratic equation, =-b+2-4ac (1.3.2) 2a the addition becomes delicate and roundoff-prone whenever acb2.(In 85.6 we will learn how to avoid the problem in this particular case.)
1.3 Error, Accuracy, and Stability 29 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). = = = = = = 1 1 0 0 1 1 0 0 1 1 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 0 0 0 1 1 1 1 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 (a) (b) (c) (d) (e) (f ) 1 ⁄ 2 3 1 ⁄ 4 10 − 7 3 + 10 − 7 ... 23-bit mantissa this bit could be “phantom” sign bit 8-bit exponent Figure 1.3.1. Floating point representations of numbers in a typical 32-bit (4-byte) format. (a) The number 1/2 (note the bias in the exponent); (b) the number 3; (c) the number 1/4; (d) the number 10−7, represented to machine accuracy; (e) the same number 10−7, but shifted so as to have the same exponent as the number 3; with this shifting, all significance is lost and 10−7 becomes zero; shifting to a common exponent must occur before two numbers can be added; (f) sum of the numbers 3 + 10−7, which equals 3 to machine accuracy. Even though 10−7 can be represented accurately by itself, it cannot accurately be added to a much larger number. speaking, the machine accuracy m is the fractional accuracy to which floating-point numbers are represented, corresponding to a change of one in the least significant bit of the mantissa. Pretty much any arithmetic operation among floating numbers should be thought of as introducing an additional fractional error of at least m. This type of error is called roundoff error. It is important to understand that m is not the smallest floating-point number that can be represented on a machine. That number depends on how many bits there are in the exponent, while m depends on how many bits there are in the mantissa. Roundoff errors accumulate with increasing amounts of calculation. If, in the course of obtaining a calculated value, you perform N such arithmetic operations, you might be so lucky as to have a total roundoff error on the order of √ N m, if the roundoff errors come in randomly up or down. (The square root comes from a random-walk.) However, this estimate can be very badly off the mark for two reasons: (i) It very frequently happens that the regularities of your calculation, or the peculiarities of your computer, cause the roundoff errors to accumulate preferentially in one direction. In this case the total will be of order N m. (ii) Some especially unfavorable occurrences can vastly increase the roundoff error of single operations. Generally these can be traced to the subtraction of two very nearly equal numbers, giving a result whose only significant bits are those (few) low-order ones in which the operands differed. You might think that such a “coincidental” subtraction is unlikely to occur. Not always so. Some mathematical expressions magnify its probability of occurrence tremendously. For example, in the familiar formula for the solution of a quadratic equation, x = −b + √b2 − 4ac 2a (1.3.2) the addition becomes delicate and roundoff-prone whenever ac b 2. (In §5.6 we will learn how to avoid the problem in this particular case.)
30 Chapter 1.Preliminaries Roundoff error is a characteristic of computer hardware.There is another, different,kind of error that is a characteristic of the program or algorithm used, independent of the hardware on which the program is executed.Many numerical algorithms compute“discrete”approximations to some desired“continuous'”quan- tity.For example,an integral is evaluated numerically by computing a function at a discrete set of points,rather than at "every"point.Or,a function may be evaluated by summing a finite number of leading terms in its infinite series,rather than all infinity terms.In cases like this,there is an adjustable parameter,e.g.,the number of points or of terms,such that the "true"answer is obtained only when that parameter goes to infinity.Any practical calculation is done with a finite,but 81 sufficiently large,choice of that parameter. The discrepancy between the true answer and the answer obtained in a practical calculation is called the truncation error.Truncation error would persist even on a hypothetical,"perfect"computer that had an infinitely accurate representation and no 生9 roundofferror.As a general rule there is not much that a programmer can do about roundoff error,other than to choose algorithms that do not magnify it unnecessarily (see discussion of"stability"below).Truncation error,on the other hand,is entirely RECIPES under the programmer's control.In fact,it is only a slight exaggeration to say that clever minimization of truncation error is practically the entire content of the 9 field of numerical analysis! Most of the time,truncation error and roundoff error do not strongly interact with one another.A calculation can be imagined as having,first,the truncation error that it would have if run on an infinite-precision computer,"plus"the roundofferror 9 associated with the number of operations performed. g Sometimes,however,an otherwise attractive method can be unstable.This means that any roundofferror that becomes"mixed into"the calculation at an early stage is successively magnified until it comes to swamp the true answer.An unstable 61 method would be useful on a hypothetical,perfect computer,but in this imperfect world it is necessary for us to require that algorithms be stable-or if unstable that we use them with great caution. Here is a simple,if somewhat artificial,example of an unstable algorithm: Suppose that it is desired to calculate all integer powers of the so-called"Golden Mean,"the number given by Numerica 10.621 ps6-1 ≈0.61803398 (1.3.3) 2 It turns out (you can easily verify)that the powers o satisfy a simple recursion relation. n+1=n-1-n (1.3.4) Thus,knowing the first two valueso =1 and !=0.61803398,we can successively apply(1.3.4)performing only a single subtraction,rather than a slower multiplication by o,at each stage. Unfortunately,the recurrence(1.3.4)also has anothersolution,namely the value (v5+1).Since the recurrence is linear,and since this undesired solution has magnitude greater than unity,any small admixture of it introduced by roundofferrors will grow exponentially.On a typical machine with 32-bit wordlength,(1.3.4)starts
30 Chapter 1. Preliminaries Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Roundoff error is a characteristic of computer hardware. There is another, different, kind of error that is a characteristic of the program or algorithm used, independent of the hardware on which the program is executed. Many numerical algorithms compute “discrete” approximations to some desired “continuous” quantity. For example, an integral is evaluated numerically by computing a function at a discrete set of points, rather than at “every” point. Or, a function may be evaluated by summing a finite number of leading terms in its infinite series, rather than all infinity terms. In cases like this, there is an adjustable parameter, e.g., the number of points or of terms, such that the “true” answer is obtained only when that parameter goes to infinity. Any practical calculation is done with a finite, but sufficiently large, choice of that parameter. The discrepancy between the true answer and the answer obtained in a practical calculation is called the truncation error. Truncation error would persist even on a hypothetical, “perfect” computer that had an infinitely accurate representation and no roundoff error. As a general rule there is not much that a programmer can do about roundoff error, other than to choose algorithms that do not magnify it unnecessarily (see discussion of “stability” below). Truncation error, on the other hand, is entirely under the programmer’s control. In fact, it is only a slight exaggeration to say that clever minimization of truncation error is practically the entire content of the field of numerical analysis! Most of the time, truncation error and roundoff error do not strongly interact with one another. A calculation can be imagined as having, first, the truncation error that it would have if run on an infinite-precision computer, “plus” the roundoff error associated with the number of operations performed. Sometimes, however, an otherwise attractive method can be unstable. This means that any roundoff error that becomes “mixed into” the calculation at an early stage is successively magnified until it comes to swamp the true answer. An unstable method would be useful on a hypothetical, perfect computer; but in this imperfect world it is necessary for us to require that algorithms be stable — or if unstable that we use them with great caution. Here is a simple, if somewhat artificial, example of an unstable algorithm: Suppose that it is desired to calculate all integer powers of the so-called “Golden Mean,” the number given by φ ≡ √5 − 1 2 ≈ 0.61803398 (1.3.3) It turns out (you can easily verify) that the powers φ n satisfy a simple recursion relation, φn+1 = φn−1 − φn (1.3.4) Thus, knowing the first two values φ0 = 1 and φ1 = 0.61803398, we can successively apply (1.3.4) performing only a single subtraction, rather than a slower multiplication by φ, at each stage. Unfortunately, the recurrence (1.3.4) also has anothersolution, namely the value −1 2 ( √5 + 1). Since the recurrence is linear, and since this undesired solution has magnitude greater than unity, any small admixture of it introduced by roundoff errors will grow exponentially. On a typical machine with 32-bit wordlength, (1.3.4) starts
1.3 Error,Accuracy,and Stability 31 to give completely wrong answers by about n =16.at which point o"is down to only 10-4.The recurrence(1.3.4)is unstable,and cannot be used for the purpose stated. We will encounter the question of stability in many more sophisticated guises. later in this book CITED REFERENCES AND FURTHER READING: Stoer,J.,and Bulirsch,R.1980,Introduction to Numerical Analysis(New York:Springer-Verlag). Chapter 1. Kahaner.D..Moler.C..and Nash.S.1989.Numerica/Methods and Software (Englewood Cliffs http://www.nr read able files Permission is opyright(C Sample page NJ:Prentice Hall),Chapter 2. Johnson,L.W.,and Riess,R.D.1982.Numerica/Analysis,2nd ed.(Reading,MA:Addison- 83 Wesley),$1.3. granted fori Wilkinson,J.H.1964,Rounding Errors in Algebraic Processes(Englewood Cliffs,NJ:Prentice- Hall) (including this one) ll 1-800-872-7423 (North America only),orsend email to directcustserv@cambridge.org(outside North America). internet users to make one paper copy for their own 1988-1992 by Cambridge University Press.Programs Copyright(C)1988-1992 by Numerical Recipes from NUMERICAL RECIPES IN C:THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) Software
1.3 Error, Accuracy, and Stability 31 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). to give completely wrong answers by about n = 16, at which point φ n is down to only 10−4. The recurrence (1.3.4) is unstable, and cannot be used for the purpose stated. We will encounter the question of stability in many more sophisticated guises, later in this book. CITED REFERENCES AND FURTHER READING: Stoer, J., and Bulirsch, R. 1980, Introduction to Numerical Analysis (New York: Springer-Verlag), Chapter 1. Kahaner, D., Moler, C., and Nash, S. 1989, Numerical Methods and Software (Englewood Cliffs, NJ: Prentice Hall), Chapter 2. Johnson, L.W., and Riess, R.D. 1982, Numerical Analysis, 2nd ed. (Reading, MA: AddisonWesley), §1.3. Wilkinson, J.H. 1964, Rounding Errors in Algebraic Processes (Englewood Cliffs, NJ: PrenticeHall)