808 Chapter 18.Integral Equations and Inverse Theory The single central idea in inverse theory is the prescription minimize:A+入B (18.4.12) for various values of 0<A<oo along the so-called trade-off curve (see Figure 18.4.1),and then to settle on a"best"value of A by one or another criterion,ranging from fairly objective (e.g.,making x2=N)to entirely subjective.Successful methods.several of which we will now describe.differ as to their choices ofA and B,as to whether the prescription(18.4.12)yields linear or nonlinear equations,as to their recommended method for selecting a final A,and as to their practicality for 8 computer-intensive two-dimensional problems like image processing. They also differ as to the philosophical baggage that they (or rather,their proponents)carry.We have thus far avoided the word "Bayesian."(Courts have consistently held that academic license does not extend to shouting"Bayesian"in a crowded lecture hall.)But it is hard,nor have we any wish,to disguise the fact that B has something to do with a priori expectation,or knowledge,of a solution,while A has something to do with a posteriori knowledge.The constant A adjudicates a delicate compromise between the two.Some inverse methods have acquired a more Bayesian stamp than others,but we think that this is purely an accident of history. An outsider looking only at the equations that are actually solved,and not at the 需 accompanying philosophical justifications,would have a difficult time separating the so-called Bayesian methods from the so-called empirical ones,we think. The next three sections discuss three different approaches to the problem of inversion,which have had considerable success in different fields.All three fit within the general framework that we have outlined,but they are quite different in OF SCIENTIFIC detail and in implementation. CITED REFERENCES AND FURTHER READING: 6 Craig,I.J.D.,and Brown,J.C.1986,Inverse Problems in Astronomy(Bristol,U.K.:Adam Hilger). Twomey,S.1977,Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements (Amsterdam:Elsevier). Tikhonov.A.N..and Arsenin,V.Y.1977,Solutions of Il/-Posed Problems(New York:Wiley). Tikhonov,A.N.,and Goncharsky,A.V.(eds.)1987,Il/-Posed Problems in the Natural Sciences (Moscow:MIR). Parker,R.L.1977,Annual Review of Earth and Planetary Science,vol.5,pp.35-64 Frieden,B.R.1975,in Picture Processing and Digital Filtering,T.S.Huang,ed.(New York: Numerical Recipes 10621 43106 Springer-Verlag). Tarantola,A.1987,Inverse Problem Theory (Amsterdam:Elsevier). (outside Baumeister,J.1987,Stable Solution of Inverse Problems(Braunschweig.Germany:Friedr.Vieweg Sohn)[mathematically oriented]. North Software. Titterington,D.M.1985.Astronomy and Astrophysics,vol.144,pp.381-387. Jeffrey,W.,and Rosner,R.1986,Astrophysical Journal,vol.310,pp.463-472. 18.5 Linear Regularization Methods What we will call linear regularization is also called the Phillips-Twomey method (1.21,the constrained linear inversion method 31,the method of regulariza- tion [41,and Tikhonov-Miller regularization [5-71.(It probably has other names also
808 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). The single central idea in inverse theory is the prescription minimize: A + λB (18.4.12) for various values of 0 <λ< ∞ along the so-called trade-off curve (see Figure 18.4.1), and then to settle on a “best” value of λ by one or another criterion, ranging from fairly objective (e.g., making χ2 = N) to entirely subjective. Successful methods, several of which we will now describe, differ as to their choices of A and B, as to whether the prescription (18.4.12) yields linear or nonlinear equations, as to their recommended method for selecting a final λ, and as to their practicality for computer-intensive two-dimensional problems like image processing. They also differ as to the philosophical baggage that they (or rather, their proponents) carry. We have thus far avoided the word “Bayesian.” (Courts have consistently held that academic license does not extend to shouting “Bayesian” in a crowded lecture hall.) But it is hard, nor have we any wish, to disguise the fact that B has something to do with a priori expectation, or knowledge, of a solution, while A has something to do with a posteriori knowledge. The constant λ adjudicates a delicate compromise between the two. Some inverse methods have acquired a more Bayesian stamp than others, but we think that this is purely an accident of history. An outsider looking only at the equations that are actually solved, and not at the accompanying philosophical justifications, would have a difficult time separating the so-called Bayesian methods from the so-called empirical ones, we think. The next three sections discuss three different approaches to the problem of inversion, which have had considerable success in different fields. All three fit within the general framework that we have outlined, but they are quite different in detail and in implementation. CITED REFERENCES AND FURTHER READING: Craig, I.J.D., and Brown, J.C. 1986, Inverse Problems in Astronomy (Bristol, U.K.: Adam Hilger). Twomey, S. 1977, Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements (Amsterdam: Elsevier). Tikhonov, A.N., and Arsenin, V.Y. 1977, Solutions of Ill-Posed Problems (New York: Wiley). Tikhonov, A.N., and Goncharsky, A.V. (eds.) 1987, Ill-Posed Problems in the Natural Sciences (Moscow: MIR). Parker, R.L. 1977, Annual Review of Earth and Planetary Science, vol. 5, pp. 35–64. Frieden, B.R. 1975, in Picture Processing and Digital Filtering, T.S. Huang, ed. (New York: Springer-Verlag). Tarantola, A. 1987, Inverse Problem Theory (Amsterdam: Elsevier). Baumeister, J. 1987, Stable Solution of Inverse Problems (Braunschweig, Germany: Friedr. Vieweg & Sohn) [mathematically oriented]. Titterington, D.M. 1985, Astronomy and Astrophysics, vol. 144, pp. 381–387. Jeffrey, W., and Rosner, R. 1986, Astrophysical Journal, vol. 310, pp. 463–472. 18.5 Linear Regularization Methods What we will call linear regularization is also called the Phillips-Twomey method [1,2], the constrained linear inversion method [3], the method of regularization [4], and Tikhonov-Miller regularization [5-7]. (It probably has other names also
18.5 Linear Regularization Methods 809 since it is so obviously a good idea.)In its simplest form,the method is an immediate generalization of zeroth-order regularization (equation 18.4.11,above).As before, the functional A is taken to be the x2 deviation,equation(18.4.9),but the functional B is replaced by more sophisticated measures of smoothness that derive from first or higher derivatives. For example,suppose that your a priori belief is that a credible u()is not too different from a constant.Then a reasonable functional to minimize is M-1 Bx/位(2dx∑位u-立u+1]2 (18.5.1) 4=1 since it is nonnegative and equal to zero only when (z)is constant.Here =()and the second equality (proportionality)assumes that the a's are uniformly spaced.We can write the second form of B as B=B.2=.(BT.B).=H. (18.5.2) where u is the vector of components,u=1,...,M,B is the (M-1)x M first difference matrix (Nort serve -1 100 00 0 0 -1 1 0 0 0 0 0 computer, Americ make one paper University Press. THE (18.5.3) ART 0 00 0 0 -1 1 0 0 Programs 0 0 0 0 0 -1 and H is the M×M matrix 量 1 -1 0 0 0 0 0 0 -1 2 -1 0 0 0 0 0 -1 2 -1 0 0 0 0 H=BT.B (18.5.4) OF SCIENTIFIC COMPUTING (ISBN 1988-19920 0 0 0 0 -1 2 -1 0 0 0 0 0 -1 @cambri 0 0 0 0 0 0 -1 Numerical Recipes 10-6211 Note that B has one fewer row than column.It follows that the symmetric H is degenerate:it has exactly one zero eigenvalue corresponding to the value of a constant function,any one of which makes B exactly zero. (outside If,just as in $15.4,we write North Software. Ai三Ru/o:b:≡c/o (18.5.5) then,using equation(18.4.9),the minimization principle(18.4.12)is minimize:A+B A.-b+.H. (18.5.6) This can readily be reduced to a linear set of normal equations,just as in $15.4:The components of the solution satisfy the set of M equations in M unknowns, ∑(E44)+aa=∑4 4=1,2,,M(18.5.7)
18.5 Linear Regularization Methods 809 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). since it is so obviously a good idea.) In its simplest form, the method is an immediate generalization of zeroth-order regularization (equation 18.4.11, above). As before, the functional A is taken to be the χ2 deviation, equation (18.4.9), but the functional B is replaced by more sophisticated measures of smoothness that derive from first or higher derivatives. For example, suppose that your a priori belief is that a credible u(x) is not too different from a constant. Then a reasonable functional to minimize is B ∝ [u (x)]2dx ∝ M −1 µ=1 [uµ − uµ+1] 2 (18.5.1) since it is nonnegative and equal to zero only when u(x) is constant. Here uµ ≡ u(xµ), and the second equality (proportionality) assumes that the x µ’s are uniformly spaced. We can write the second form of B as B = |B · u| 2 = u · (BT · B) · u ≡ u · H · u (18.5.2) where u is the vector of components uµ, µ = 1,...,M, B is the (M − 1) × M first difference matrix B = −1100000 ··· 0 0 −110000 ··· 0 . . . ... . . . 0 ··· 0000 −110 0 ··· 00000 −1 1 (18.5.3) and H is the M × M matrix H = BT · B = 1 −100000 ··· 0 −1 2 −10000 ··· 0 0 −1 2 −1000 ··· 0 . . . ... . . . 0 ··· 000 −1 2 −1 0 0 ··· 0000 −1 2 −1 0 ··· 00000 −1 1 (18.5.4) Note that B has one fewer row than column. It follows that the symmetric H is degenerate; it has exactly one zero eigenvalue corresponding to the value of a constant function, any one of which makes B exactly zero. If, just as in §15.4, we write Aiµ ≡ Riµ/σi bi ≡ ci/σi (18.5.5) then, using equation (18.4.9), the minimization principle (18.4.12) is minimize: A + λB = |A · u − b| 2 + λu · H · u (18.5.6) This can readily be reduced to a linear set of normal equations, just as in §15.4: The components uµ of the solution satisfy the set of M equations in M unknowns, ρ i AiµAiρ + λHµρ uρ = i Aiµbi µ = 1, 2,...,M (18.5.7)
810 Chapter 18.Integral Equations and Inverse Theory or,in vector notation. (AT.A+AH).=AT.b (18.5.8) Equations(18.5.7)or (18.5.8)can be solved by the standard techniques of Chapter 2,e.g.,LU decomposition.The usual warnings about normal equations being ill-conditioned do not apply,since the whole purpose of the A term is to cure that same ill-conditioning.Note,however,that the A term by itself is ill-conditioned, since it does not select a preferred constant value.You hope your data can at least do that! Although inversion of the matrix(AT.A+AH)is not generally the best way to solve for u,let us digress to write the solution to equation(18.5.8)schematically as 1 AAA.b nted for i=(A.A+H (schematic only!) (18.5.9) where the identity matrix in the form A.A-has been inserted.This is schematic not only because the matrix inverse is fancifully written as a denominator,but also because,in general,the inverse matrix A-does not exist.However,it is illuminating to compare equation (18.5.9)with equation (13.3.6)for optimal or (North Wiener filtering,or with equation(13.6.6)for general linear prediction.One sees that AT.A plays the role of S2,the signal power or autocorrelation,while AH America computer, make one paper e University Press. THE plays the role of N2,the noise power or autocorrelation.The term in parentheses in equation(18.5.9)is something like an optimal filter,whose effect is to pass the 是 ill-posed inverse A-.b through unmodified when AT.A is sufficiently large,but Programs to suppress it when AT.A is small. The above choices of B and H are only the simplest in an obvious sequence of derivatives.If your a priori belief is that a linear function is a good approximation to u(x),then minimize to dir -2 Bx"(dx∑ -iu+2tu+1-iu+2]2 (18.5.10) OF SCIENTIFIC COMPUTING(ISBN 0-521- implying -1 2 -1 0 0 0 0 0 -1 2-1 0 0 0 0 1988-1992 by Numerical Recipes B= (18.5.11) -43108. 0 0 0 0 -1 2 -1 0 (outside 0 0 0 0 0 -1 2 Software. and North 1 0 0 0 0 -2 5 -4 1 0 0 0 America). 1 4 6 -4 1 0 0 0 0 1 -4 6 -4 0 0 H=BT.B- (18.5.12) 0 0 一4 6 0 0 0 0 1 6 -4 0 0 0 -4 -2 0 0 0 0 0 1 -2 1
810 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). or, in vector notation, (AT · A + λH) · u = AT · b (18.5.8) Equations (18.5.7) or (18.5.8) can be solved by the standard techniques of Chapter 2, e.g., LU decomposition. The usual warnings about normal equations being ill-conditioned do not apply, since the whole purpose of the λ term is to cure that same ill-conditioning. Note, however, that the λ term by itself is ill-conditioned, since it does not select a preferred constant value. You hope your data can at least do that! Although inversion of the matrix (AT · A + λH) is not generally the best way to solve for u, let us digress to write the solution to equation (18.5.8) schematically as u = 1 AT · A + λH · AT · A A−1 · b (schematic only!) (18.5.9) where the identity matrix in the form A · A−1 has been inserted. This is schematic not only because the matrix inverse is fancifully written as a denominator, but also because, in general, the inverse matrix A−1 does not exist. However, it is illuminating to compare equation (18.5.9) with equation (13.3.6) for optimal or Wiener filtering, or with equation (13.6.6) for general linear prediction. One sees that AT · A plays the role of S2, the signal power or autocorrelation, while λH plays the role of N 2, the noise power or autocorrelation. The term in parentheses in equation (18.5.9) is something like an optimal filter, whose effect is to pass the ill-posed inverse A−1 · b through unmodified when AT · A is sufficiently large, but to suppress it when AT · A is small. The above choices of B and H are only the simplest in an obvious sequence of derivatives. If your a priori belief is that a linear function is a good approximation to u(x), then minimize B ∝ [u(x)]2dx ∝ M −2 µ=1 [−uµ + 2uµ+1 − uµ+2] 2 (18.5.10) implying B = −1 2 −10000 ··· 0 0 −1 2 −1000 ··· 0 . . . ... . . . 0 ··· 000 −1 2 −1 0 0 ··· 0000 −1 2 −1 (18.5.11) and H = BT · B = 1 −210000 ··· 0 −2 5 −41000 ··· 0 1 −4 6 −4100 ··· 0 0 1 −4 6 −410 ··· 0 . . . ... . . . 0 ··· 0 1 −4 6 −410 0 ··· 001 −4 6 −4 1 0 ··· 0001 −4 5 −2 0 ··· 00001 −2 1 (18.5.12)
18.5 Linear Regularization Methods 811 This H has two zero eigenvalues,corresponding to the two undetermined parameters of a linear function. If your a priori belief is that a quadratic function is preferable,then minimize M-3 B x "(r)P2dr 立+3u+1-3u+2+i4+3]2 (18.5.13) with -1 3-31 0 0 0… 0 -1 3-3 1 0 0 0 83 B= : (18.5.14) 鱼 0 0 0 -1 3 -3 1 0 0 0 0 -1 3 -3 1 务9 1872 and now 1 -3 3 -1 0 0 0 0 0 0 10-12 6 -1 0 0 0 0 3 -12 19 -15 6 -1 0 0 0 (North America to any server computer, users to make one paper UnN电.t 9 THE -1 6 -15 20 -15 6 -1 0 0 0 ART 0 -1 6 -15 20 -15 6 -1 0 0 H= strictly prohibited Programs 0 E ..E 0 -1 6 -15 20 -15 6 -1 0 send 0 0 -1 6 -15 20 -15 6 -1 0 0 0 0 -1 6 -15 19 -12 0 0 0 0 -1 6 -12 10 3 to dir 0 0 0 0 0 -1 3 -3 1/ (18.5.15) (We'll leave the calculation of cubics and above to the compulsive reader.) OF SCIENTIFIC COMPUTING(ISBN 0-521- Notice that you can regularize with "closeness to a differential equation,"if you want.Just pick B to be the appropriate sum of finite-difference operators(the coefficients can depend on x),and calculate H=BT.B.You don't need to know the values of your boundary conditions,since B can have fewer rows than columns, 1988-1992 by Numerical Recipes as above;hopefully,your data will determine them.Of course,if you do know some -43198.5 boundary conditions,you can build these into B too. (outside With all the proportionality signs above,you may have lost track of what actual value of A to try first.A simple trick for at least getting"on the map"is to first try North Software. 入=Tr(AT·A)/Tr(H) (18.5.16 visit website where Tr is the trace of the matrix(sum of diagonal components).This choice will tend to make the two parts of the minimization have comparable weights,and you can adjust from there. As for what is the "correct value of A,an objective criterion,if you know your errors oi with reasonable accuracy,is to make x2(that is,A.u-b2)equal to N,the number of measurements.We remarked above on the twin acceptable choices N(2N)1/2.A subjective criterion is to pick any value that you like in the
18.5 Linear Regularization Methods 811 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). This H has two zero eigenvalues, corresponding to the two undetermined parameters of a linear function. If your a priori belief is that a quadratic function is preferable, then minimize B ∝ [u(x)]2dx ∝ M −3 µ=1 [−uµ + 3uµ+1 − 3uµ+2 + uµ+3] 2 (18.5.13) with B = −1 3 −31000 ··· 0 0 −1 3 −3100 ··· 0 . . . ... . . . 0 ··· 0 0 −1 3 −310 0 ··· 000 −1 3 −3 1 (18.5.14) and now H = 1 −3 3 −100000 ··· 0 −3 10 −12 6 −10000 ··· 0 3 −12 19 −15 6 −1000 ··· 0 −1 6 −15 20 −15 6 −100 ··· 0 0 −1 6 −15 20 −15 6 −1 0 ··· 0 . . . ... . . . 0 ··· 0 −1 6 −15 20 −15 6 −1 0 0 ··· 0 0 −1 6 −15 20 −15 6 −1 0 ··· 000 −1 6 −15 19 −12 3 0 ··· 0000 −1 6 −12 10 −3 0 ··· 00000 −1 3 −3 1 (18.5.15) (We’ll leave the calculation of cubics and above to the compulsive reader.) Notice that you can regularize with “closeness to a differential equation,” if you want. Just pick B to be the appropriate sum of finite-difference operators (the coefficients can depend on x), and calculate H = BT · B. You don’t need to know the values of your boundary conditions, since B can have fewer rows than columns, as above; hopefully, your data will determine them. Of course, if you do know some boundary conditions, you can build these into B too. With all the proportionality signs above, you may have lost track of what actual value of λ to try first. A simple trick for at least getting “on the map” is to first try λ = Tr(AT · A)/Tr(H) (18.5.16) where Tr is the trace of the matrix (sum of diagonal components). This choice will tend to make the two parts of the minimization have comparable weights, and you can adjust from there. As for what is the “correct” value of λ, an objective criterion, if you know your errors σi with reasonable accuracy, is to make χ2 (that is, |A · u − b| 2) equal to N, the number of measurements. We remarked above on the twin acceptable choices N ± (2N)1/2. A subjective criterion is to pick any value that you like in the
812 Chapter 18.Integral Equations and Inverse Theory range 0<A<oo,depending on your relative degree of belief in the a priori and a posteriori evidence.(Yes,people actually do that.Don't blame us.) Two-Dimensional Problems and Iterative Methods Up to now our notation has been indicative of a one-dimensional problem, finding ()or=().However,all of the discussion easily generalizes to the problem of estimating a two-dimensional set of unknowns=1,...,M,K= 1,...,K,corresponding,say,to the pixel intensities of a measured image.In this case,equation(18.5.8)is still the one we want to solve. 三 81 In image processing,it is usual to have the same number of input pixels in a measured“raw'or“dirty'image as desired“clean”pixels in the processed output image,so the matrices R and A(equation 18.5.5)are square and of size MK x MK. A is typically much too large to represent as a full matrix,but often it is either(i) sparse,with coefficients blurring an underlying pixel(i,j)only into measurements (ifew,j+few),or(ii)translationally invariant,so that A(i)()=A(i-u,j-v). Both of these situations lead to tractable problems. RECIPES In the case of translational invariance,fast Fourier transforms(FFTs)are the obvious method of choice.The general linear relation between underlying function 9 and measured values(18.4.7)now becomes a discrete convolution like equation (13.1.1).Ifk denotes a two-dimensional wave-vector.then the two-dimensional FFT takes us back and forth between the transform pairs A(i-4,j-w))←→A(k)ba)←→b(k) ,)←→i(k)(18.5.17) S3 We also need a regularization or smoothing operator B and the derived H=BT.B. One popular choice for B is the five-point finite-difference approximation of the 6 Laplacian operator,that is,the difference between the value of each point and the average of its four Cartesian neighbors.In Fourier space,this choice implies, B(k)xsin2(πk1/M)sin2(πk2/K) (18.5.18) H(k)o sin(1/M)sin(k2/K) Numerica 10621 43126 In Fourier space,equation(18.5.7)is merely algebraic,with solution A*(k)b(k) u(k)= (18.5.19) |A(k)P+入H(k) North where asterisk denotes complex conjugation.You can make use of the FFT routines for real data in 812.5 Turn now to the case where A is not translationally invariant.Direct solution of(18.5.8)is now hopeless,since the matrix A is just too large.We need some kind of iterative scheme. One way to proceed is to use the full machinery of the conjugate gradient method in 810.6 to find the minimum ofA+AB,equation (18.5.6).Of the various methods in Chapter 10,conjugate gradient is the unique best choice because (1) it does not require storage of a Hessian matrix,which would be infeasible here
812 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). range 0 <λ< ∞, depending on your relative degree of belief in the a priori and a posteriori evidence. (Yes, people actually do that. Don’t blame us.) Two-Dimensional Problems and Iterative Methods Up to now our notation has been indicative of a one-dimensional problem, finding u(x) or uµ = u(xµ). However, all of the discussion easily generalizes to the problem of estimating a two-dimensional set of unknowns uµκ, µ = 1,...,M, κ = 1,...,K, corresponding, say, to the pixel intensities of a measured image. In this case, equation (18.5.8) is still the one we want to solve. In image processing, it is usual to have the same number of input pixels in a measured “raw” or “dirty” image as desired “clean” pixels in the processed output image, so the matrices R and A (equation 18.5.5) are square and of size MK ×MK. A is typically much too large to represent as a full matrix, but often it is either (i) sparse, with coefficients blurring an underlying pixel (i, j) only into measurements (i±few, j±few), or (ii) translationally invariant, so that A(i,j)(µ,ν) = A(i−µ, j−ν). Both of these situations lead to tractable problems. In the case of translational invariance, fast Fourier transforms (FFTs) are the obvious method of choice. The general linear relation between underlying function and measured values (18.4.7) now becomes a discrete convolution like equation (13.1.1). If k denotes a two-dimensional wave-vector, then the two-dimensional FFT takes us back and forth between the transform pairs A(i − µ, j − ν) ⇐⇒ A(k) b(i,j) ⇐⇒ b(k) u(i,j) ⇐⇒ u(k) (18.5.17) We also need a regularization or smoothing operator B and the derived H = B T · B. One popular choice for B is the five-point finite-difference approximation of the Laplacian operator, that is, the difference between the value of each point and the average of its four Cartesian neighbors. In Fourier space, this choice implies, B(k) ∝ sin2(πk1/M) sin2(πk2/K) H(k) ∝ sin4(πk1/M) sin4(πk2/K) (18.5.18) In Fourier space, equation (18.5.7) is merely algebraic, with solution u(k) = A*(k)b(k) |A(k)| 2 + λH(k) (18.5.19) where asterisk denotes complex conjugation. You can make use of the FFT routines for real data in §12.5. Turn now to the case where A is not translationally invariant. Direct solution of (18.5.8) is now hopeless, since the matrix A is just too large. We need some kind of iterative scheme. One way to proceed is to use the full machinery of the conjugate gradient method in §10.6 to find the minimum of A + λB, equation (18.5.6). Of the various methods in Chapter 10, conjugate gradient is the unique best choice because (i) it does not require storage of a Hessian matrix, which would be infeasible here,
18.5 Linear Regularization Methods 813 and(ii)it does exploit gradient information,which we can readily compute:The gradient of equation (18.5.6)is V(A+B)=2[(AT.A+H).-AT.b] (18.5.20) (cf.18.5.8).Evaluation of both the function and the gradient should of course take advantage of the sparsity of A,for example via the routines sprsax and sprstx in 82.7.We will discuss the conjugate gradient technique further in $18.7,in the context of the(nonlinear)maximum entropy method.Some of that discussion can apply here as well. The conjugate gradient method notwithstanding,application of the unsophis- ticated steepest descent method(see $10.6)can sometimes produce useful results, 菲 邑 particularly when combined with projections onto convex sets (see below).If the solution after k iterations is denoted),then after+1 iterations we have ICAL 3 (+1)=[1-E(AT.A+H)].()+eAT.b (18.5.21) RECIPES Here e is a parameter that dictates how far to move in the downhill gradient direction. 9 The method converges when e is small enough,in particular satisfying 2 0<e< (18.5.22) max eigenvalue(A.A +AH) es9爱兰 9 There exist complicated schemes for finding optimal values or sequences for e, see [7];or,one can adopt an experimental approach,evaluating (18.5.6)to be sure that downhill steps are in fact being taken. 6 In those image processing problems where the final measure of success is somewhat subjective(e.g.,"how good does the picture look?"),iteration (18.5.21) sometimes produces significantly improved images long before convergence is achieved.This probably accounts for much of its use,since its mathematical convergence is extremely slow.In fact,(18.5.21)can be used with H =0,in which 盖 10621 case the solution is not regularized at all,and full convergence would be disastrous! Numerica This is called Van Cittert's method and goes back to the 1930s.A number of 431 iterations the order of 1000 is not uncommon [7]. Recipes Deterministic Constraints:Projections onto Convex Sets North A set of possible underlying functions (or images){u is said to be corex if, for any two elements ua and u in the set,all the linearly interpolated combinations (1-)ia+ni60≤n≤1 (18.5.23) are also in the set.Many deterministic constraints that one might want to impose on the solution u to an inverse problem in fact define convex sets,for example: ·positivity compact support (1.e.,zero value outside of a certain region)
18.5 Linear Regularization Methods 813 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). and (ii) it does exploit gradient information, which we can readily compute: The gradient of equation (18.5.6) is ∇(A + λB) = 2[(AT · A + λH) · u − AT · b] (18.5.20) (cf. 18.5.8). Evaluation of both the function and the gradient should of course take advantage of the sparsity of A, for example via the routines sprsax and sprstx in §2.7. We will discuss the conjugate gradient technique further in §18.7, in the context of the (nonlinear) maximum entropy method. Some of that discussion can apply here as well. The conjugate gradient method notwithstanding, application of the unsophisticated steepest descent method (see §10.6) can sometimes produce useful results, particularly when combined with projections onto convex sets (see below). If the solution after k iterations is denoted u(k) , then after k + 1 iterations we have u(k+1) = [1 − (AT · A + λH)] · u(k) + AT · b (18.5.21) Here is a parameter that dictates how far to move in the downhill gradient direction. The method converges when is small enough, in particular satisfying 0 << 2 max eigenvalue (AT · A + λH) (18.5.22) There exist complicated schemes for finding optimal values or sequences for , see [7]; or, one can adopt an experimental approach, evaluating (18.5.6) to be sure that downhill steps are in fact being taken. In those image processing problems where the final measure of success is somewhat subjective (e.g., “how good does the picture look?”), iteration (18.5.21) sometimes produces significantly improved images long before convergence is achieved. This probably accounts for much of its use, since its mathematical convergence is extremely slow. In fact, (18.5.21) can be used with H = 0, in which case the solution is not regularized at all, and full convergence would be disastrous! This is called Van Cittert’s method and goes back to the 1930s. A number of iterations the order of 1000 is not uncommon [7]. Deterministic Constraints: Projections onto Convex Sets A set of possible underlying functions (or images) {u} is said to be convex if, for any two elements ua and ub in the set, all the linearly interpolated combinations (1 − η)ua + ηub 0 ≤ η ≤ 1 (18.5.23) are also in the set. Many deterministic constraints that one might want to impose on the solution u to an inverse problem in fact define convex sets, for example: • positivity • compact support (i.e., zero value outside of a certain region)
814 Chapter 18.Integral Equations and Inverse Theory ·known bounds(i.e.,uL(r)≤t(z)≤uw(x)for specified functions uL and uv). (In this last case,the bounds might be related to an initial estimate and its error bars, e.g.,uo()+(),where y is of order I or 2.)Notice that these,and similar, constraints can be either in the image space,or in the Fourier transform space,or(in fact)in the space of any linear transformation of u. If C;is a convex set,then Pi is called a nonexpansive projection operator onto that set if(i)Pi leaves unchanged any u already in Ci,and (ii)Pi maps any u outside Ci to the closest element of Ci.in the sense that 81 |P:i-t创≤lia-for all in C. (18.5.24) While this definition sounds complicated,examples are very simple:A nonexpansive B projection onto the set of positive u's is "set all negative components of u equal to zero."A nonexpansive projection onto the set of (z)'s bounded by uL()< ICAL ()<u()is "set all values less than the lower bound equal to that bound,and set all values greater than the upper bound equal to that bound."A nonexpansive RECIPES projection onto functions with compact support is "zero the values outside of the region of support.” 9 The usefulness of these definitions is the following remarkable theorem:Let C be the intersection of m convex sets C1,C2,....Cm.Then the iteration )=(PiP2...Pm)() (18.5.25 巴6 will converge to C from all starting points,as k-oo.Also,if C is empty (there is no intersection),then the iteration will have no limit point.Application of this theorem is called the method of projections onto comvex sets or sometimes POCS [7]. A generalization of the POCS theorem is that the Pi's can be replaced by a set of Ti's, T≡1+0:(P-1)0<6:<2 (18.5.26) A well-chosen set of Bi's can accelerate the convergence to the intersection set C. 会 Numerica 10621 Some inverse problems can be completely solved by iteration(18.5.25)alone! For example,a problem that occurs in both astronomical imaging and X-ray diffraction work is to recover an image given only the modulus of its Fourier transform (equivalent to its power spectrum or autocorrelation)and not the phase. 台第 Here three convex sets can be utilized:the set of all images whose Fourier transform has the specified modulus to within specified error bounds:the set of all positive images;and the set of all images with zero intensity outside of some specified region. In this case the POCS iteration(18.5.25)cycles among these three,imposing each constraint in turn;FFTs are used to get in and out of Fourier space each time the Fourier constraint is imposed. The specific application of POCS to constraints alternately in the spatial and Fourier domains is also known as the Gerchberg-Saxton algorithm [81.While this algorithm is non-expansive,and is frequently convergent in practice,it has not been proved to converge in all cases [9].In the phase-retrieval problem mentioned above, the algorithm often "gets stuck"on a plateau for many iterations before making sudden,dramatic improvements.As many as 104 to 105 iterations are sometimes
814 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). • known bounds (i.e., uL(x) ≤ u(x) ≤ uU (x) for specified functions uL and uU ). (In this last case, the bounds might be related to an initial estimate and its error bars, e.g., u0(x) ± γσ(x), where γ is of order 1 or 2.) Notice that these, and similar, constraints can be either in the image space, or in the Fourier transform space, or (in fact) in the space of any linear transformation of u. If Ci is a convex set, then Pi is called a nonexpansive projection operator onto that set if (i) Pi leaves unchanged any u already in Ci, and (ii) Pi maps any u outside Ci to the closest element of Ci, in the sense that |Piu − u|≤|ua − u| for all ua in Ci (18.5.24) While this definition sounds complicated, examples are very simple: A nonexpansive projection onto the set of positive u’s is “set all negative components of u equal to zero.” A nonexpansive projection onto the set of u(x)’s bounded by u L(x) ≤ u(x) ≤ uU (x) is “set all values less than the lower bound equal to that bound, and set all values greater than the upper bound equal to that bound.” A nonexpansive projection onto functions with compact support is “zero the values outside of the region of support.” The usefulness of these definitions is the following remarkable theorem: Let C be the intersection of m convex sets C1, C2,...,Cm. Then the iteration u(k+1) = (P1P2 ···Pm)u(k) (18.5.25) will converge to C from all starting points, as k → ∞. Also, if C is empty (there is no intersection), then the iteration will have no limit point. Application of this theorem is called the method of projections onto convex sets or sometimes POCS [7]. A generalization of the POCS theorem is that the Pi’s can be replaced by a set of Ti’s, Ti ≡ 1 + βi(Pi − 1) 0 < βi < 2 (18.5.26) A well-chosen set of βi’s can accelerate the convergence to the intersection set C. Some inverse problems can be completely solved by iteration (18.5.25) alone! For example, a problem that occurs in both astronomical imaging and X-ray diffraction work is to recover an image given only the modulus of its Fourier transform (equivalent to its power spectrum or autocorrelation) and not the phase. Here three convex sets can be utilized: the set of all images whose Fourier transform has the specified modulus to within specified error bounds; the set of all positive images; and the set of all images with zero intensity outside of some specified region. In this case the POCS iteration (18.5.25) cycles among these three, imposing each constraint in turn; FFTs are used to get in and out of Fourier space each time the Fourier constraint is imposed. The specific application of POCS to constraints alternately in the spatial and Fourier domains is also known as the Gerchberg-Saxton algorithm [8]. While this algorithm is non-expansive, and is frequently convergent in practice, it has not been proved to converge in all cases [9]. In the phase-retrieval problem mentioned above, the algorithm often “gets stuck” on a plateau for many iterations before making sudden, dramatic improvements. As many as 10 4 to 105 iterations are sometimes
18.6 Backus-Gilbert Method 815 necessary.(For"unsticking"procedures,see [101.)The uniqueness of the solution is also not well understood,although for two-dimensional images of reasonable complexity it is believed to be unique. Deterministic constraints can be incorporated,via projection operators,into iterative methods of linear regularization.In particular,rearranging terms somewhat we can write the iteration (18.5.21)as +1)=(1-eH田).)+AT.(b-A.) (18.5.27) If the iteration is modified by the insertion of projection operators at each step +1=(P1P2.Pm)[1-AHD·)+AT.(b-A.】 (18.5.28) (or,instead of Pi's,the Ti operators of equation 18.5.26),then it can be shown that 茶 the convergence condition(18.5.22)is unmodified,and the iteration will converge to minimize the quadratic functional (18.5.6)subject to the desired nonlinear deterministic constraints.See [7]for references to more sophisticated,and faster converging,iterations along these lines. 9 CITED REFERENCES AND FURTHER READING: Phillips,D.L.1962,Journal of the Association for Computing Machinery,vol.9,pp.84-97.[1] Twomey,S.1963.Journal of the Association for Computing Machinery.vol.10,pp.97-101.[2] 9 Twomey,S.1977,Introduction to the Mathematics of Inversion in Remote Sensing and Indirect 9 Measurements (Amsterdam:Elsevier).[3] 是 Craig,I.J.D.,and Brown,J.C.1986,Inverse Problems in Astronomy(Bristol,U.K.:Adam Hilger). 4 Tikhonov,A.N.,and Arsenin,V.Y.1977,Solutions of Il-Posed Problems (New York:Wiley).[5] 6 Tikhonov,A.N.,and Goncharsky,A.V.(eds.)1987,IlI-Posed Problems in the Natural Sciences (Moscow:MIR). Miller,K.1970.SIAM Journal on Mathematical Analysis,vol.1,pp.52-74.[6] Schafer.R.W..Mersereau,R.M..and Richards.M.A.1981,Proceedings of the /EEE.vol.69. Pp.432-450. Biemond,J.,Lagendijk,R.L.,and Mersereau,R.M.1990,Proceedings of the /EEE,vol.78. pp.856-883.7 Numerica 10621 Gerchberg,R.W.and Saxton,W.O.1972.Optik,vol.35.pp.237-246.[8] 431 Fienup,J.R.1982,Applied Optics,vol.15,pp.2758-2769.[9] Fienup,J.R.,and Wackerman,C.C.1986,Journal of the Optical Society of America A,vol.3, (outside Recipes pp.1897-1907.[101 North Software. 18.6 Backus-Gilbert Method The Backus-Gilbert method [1.2](see,e.g.,[3]or [4]for summaries)differs from other regularization methods in the nature of its functionals A and B.For B,the method seeks to maximize the stability of the solution u(x)rather than,in the first instance,its smoothness.That is, B≡Var[a(x] (18.6.1)
18.6 Backus-Gilbert Method 815 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). necessary. (For “unsticking” procedures, see [10].) The uniqueness of the solution is also not well understood, although for two-dimensional images of reasonable complexity it is believed to be unique. Deterministic constraints can be incorporated, via projection operators, into iterative methods of linear regularization. In particular, rearranging terms somewhat, we can write the iteration (18.5.21) as u(k+1) = (1 − λH) · u(k) + AT · (b − A · u(k) ) (18.5.27) If the iteration is modified by the insertion of projection operators at each step u(k+1) = (P1P2 ···Pm)[(1 − λH) · u(k) + AT · (b − A · u(k) )] (18.5.28) (or, instead of Pi’s, the Ti operators of equation 18.5.26), then it can be shown that the convergence condition (18.5.22) is unmodified, and the iteration will converge to minimize the quadratic functional (18.5.6) subject to the desired nonlinear deterministic constraints. See [7] for references to more sophisticated, and faster converging, iterations along these lines. CITED REFERENCES AND FURTHER READING: Phillips, D.L. 1962, Journal of the Association for Computing Machinery, vol. 9, pp. 84–97. [1] Twomey, S. 1963, Journal of the Association for Computing Machinery, vol. 10, pp. 97–101. [2] Twomey, S. 1977, Introduction to the Mathematics of Inversion in Remote Sensing and Indirect Measurements (Amsterdam: Elsevier). [3] Craig, I.J.D., and Brown, J.C. 1986, Inverse Problems in Astronomy (Bristol, U.K.: Adam Hilger). [4] Tikhonov, A.N., and Arsenin, V.Y. 1977, Solutions of Ill-Posed Problems (New York: Wiley). [5] Tikhonov, A.N., and Goncharsky, A.V. (eds.) 1987, Ill-Posed Problems in the Natural Sciences (Moscow: MIR). Miller, K. 1970, SIAM Journal on Mathematical Analysis, vol. 1, pp. 52–74. [6] Schafer, R.W., Mersereau, R.M., and Richards, M.A. 1981, Proceedings of the IEEE, vol. 69, pp. 432–450. Biemond, J., Lagendijk, R.L., and Mersereau, R.M. 1990, Proceedings of the IEEE, vol. 78, pp. 856–883. [7] Gerchberg, R.W., and Saxton, W.O. 1972, Optik, vol. 35, pp. 237–246. [8] Fienup, J.R. 1982, Applied Optics, vol. 15, pp. 2758–2769. [9] Fienup, J.R., and Wackerman, C.C. 1986, Journal of the Optical Society of America A, vol. 3, pp. 1897–1907. [10] 18.6 Backus-Gilbert Method The Backus-Gilbert method [1,2] (see, e.g., [3] or [4] for summaries) differs from other regularization methods in the nature of its functionals A and B. For B, the method seeks to maximize the stability of the solution u(x) rather than, in the first instance, its smoothness. That is, B ≡ Var[u(x)] (18.6.1)