818 Chapter 18.Integral Equations and Inverse Theory (Don't let this notation mislead you into inverting the full matrix W(z)+AS.You only need to solve for some y the linear system (W(z)+AS).y =R,and then substitute y into both the numerators and denominators of 18.6.12 or 18.6.13.) Equations(18.6.12)and (18.6.13)have a completely different character from the linearly regularized solutions to(18.5.7)and(18.5.8).The vectors and matrices in (18.6.12)all have size N,the number of measurements.There is no discretization of the underlying variable z,so M does not come into play at all.One solves a different N x N set of linear equations for each desired value of z.By contrast,in(18.5.8), one solves an M x M linear set,but only once.In general,the computational burden of repeatedly solving linear systems makes the Backus-Gilbert method unsuitable 81 for other than one-dimensional problems. How does one choose A within the Backus-Gilbert scheme?As already mentioned,you can (in some cases should)make the choice before you see any actual data.For a given trial value of A.and for a sequence of x's,use equation(18.6.12) to calculate q();then use equation(18.6.6)to plot the resolution functions 6(,' as a function of x'.These plots will exhibit the amplitude with which different underlying values z'contribute to the point ()of your estimate.For the same value of A,also plot the function Var()]using equation (18.6.8).(You need an estimate of your measurement covariance matrix for this. As you change A you will see very explicitly the trade-off between resolution and stability.Pick the value that meets your needs.You can even choose A to be a function of A =A(x),in equations (18.6.12)and (18.6.13),should you desire to do so.(This is one benefit of solving a separate set of equations for each z.)For the chosen value or values of A,you now have a quantitative understanding of your 三兰号∽6 inverse solution procedure.This can prove invaluable if-once you are processing OF SCIENTIFIC real data-you need to judge whether a particular feature,a spike or jump for example,is genuine,and/or is actually resolved.The Backus-Gilbert method has found particular success among geophysicists,who use it to obtain information about the structure of the Earth(e.g.,density run with depth)from seismic travel time data 、复公 CITED REFERENCES AND FURTHER READING: Backus,G.E.,and Gilbert,F.1968,Geophysical Journal of the Royal Astronomical Society, vol.16,pp.169-205.I1) Numerica 10621 Backus,G.E..and Gilbert,F.1970,Philosophical Transactions of the Royal Society of London 431 A,vol.266,pp.123-192.2 Recipes Parker,R.L.1977,Annual Review of Earth and Planetary Science,vol.5,pp.35-64.[3] Loredo,T.J.,and Epstein,R.I.1989,Astrophysical Journal,vol.336,pp.896-919.[4] (outside Software. 18.7 Maximum Entropy Image Restoration Above.we commented that the association of certain inversion methodsbreak with Bayesian arguments is more historical accident than intellectual imperative. Maximum entropy methods,so-called,are notorious in this regard;to summarize these methods without some,at least introductory,Bayesian invocations would be to serve a steak without the sizzle,or a sundae without the cherry.We should
818 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). (Don’t let this notation mislead you into inverting the full matrix W(x) + λS. You only need to solve for some y the linear system (W(x) + λS) · y = R, and then substitute y into both the numerators and denominators of 18.6.12 or 18.6.13.) Equations (18.6.12) and (18.6.13) have a completely different character from the linearly regularized solutions to (18.5.7) and (18.5.8). The vectors and matrices in (18.6.12) all have size N, the number of measurements. There is no discretization of the underlying variable x, so M does not come into play at all. One solves a different N × N set of linear equations for each desired value of x. By contrast, in (18.5.8), one solves an M ×M linear set, but only once. In general, the computational burden of repeatedly solving linear systems makes the Backus-Gilbert method unsuitable for other than one-dimensional problems. How does one choose λ within the Backus-Gilbert scheme? As already mentioned, you can (in some cases should) make the choice before you see any actual data. For a given trial value of λ, and for a sequence of x’s, use equation (18.6.12) to calculate q(x); then use equation (18.6.6) to plot the resolution functions δ (x, x ) as a function of x . These plots will exhibit the amplitude with which different underlying values x contribute to the point u(x) of your estimate. For the same value of λ, also plot the function Var[u(x)] using equation (18.6.8). (You need an estimate of your measurement covariance matrix for this.) As you change λ you will see very explicitly the trade-off between resolution and stability. Pick the value that meets your needs. You can even choose λ to be a function of x, λ = λ(x), in equations (18.6.12) and (18.6.13), should you desire to do so. (This is one benefit of solving a separate set of equations for each x.) For the chosen value or values of λ, you now have a quantitative understanding of your inverse solution procedure. This can prove invaluable if — once you are processing real data — you need to judge whether a particular feature, a spike or jump for example, is genuine, and/or is actually resolved. The Backus-Gilbert method has found particular success among geophysicists, who use it to obtain information about the structure of the Earth (e.g., density run with depth) from seismic travel time data. CITED REFERENCES AND FURTHER READING: Backus, G.E., and Gilbert, F. 1968, Geophysical Journal of the Royal Astronomical Society, vol. 16, pp. 169–205. [1] Backus, G.E., and Gilbert, F. 1970, Philosophical Transactions of the Royal Society of London A, vol. 266, pp. 123–192. [2] Parker, R.L. 1977, Annual Review of Earth and Planetary Science, vol. 5, pp. 35–64. [3] Loredo, T.J., and Epstein, R.I. 1989, Astrophysical Journal, vol. 336, pp. 896–919. [4] 18.7 Maximum Entropy Image Restoration Above, we commented that the association of certain inversion methodsbreak with Bayesian arguments is more historical accident than intellectual imperative. Maximum entropy methods, so-called, are notorious in this regard; to summarize these methods without some, at least introductory, Bayesian invocations would be to serve a steak without the sizzle, or a sundae without the cherry. We should
18.7 Maximum Entropy Image Restoration 819 also comment in passing that the connection between maximum entropy inversion methods,considered here,and maximum entropy spectral estimation,discussed in $13.7,is rather abstract.For practical purposes the two techniques,though both named maximum entropy method or MEM,are unrelated. Bayes'Theorem,which follows from the standard axioms of probability,relates the conditional probabilities of two events,say A and B: Prob(AIB)=Prob(A)Prob(BA) (18.7.1) Prob(B) Here Prob(A B)is the probability of A given that B has occurred,and similarly for Prob(BA),while Prob(A)and Prob(B)are unconditional probabilities. 超君 "Bayesians"(so-called)adopt a broader interpretation of probabilities than do so-called "frequentists."To a Bayesian,P(A|B)is a measure of the degree of plausibility of A(given B)on a scale ranging from zero to one.In this broader view, ICAL A and B need not be repeatable events;they can be propositions or hypotheses. The equations of probability theory then become a set of consistent rules for RECIPES conducting inference [1,2].Since plausibility is itself always conditioned on some, perhaps unarticulated,set of assumptions,all Bayesian probabilities are viewed as 9 conditional on some collective background information 7. Suppose H is some hypothesis.Even before there exist any explicit data, a Bayesian can assign to H some degree of plausibility Prob(HI),called the "Bayesian prior.Now,when some data Di comes along,Bayes theorem tells how to reassess the plausibility of H, Prob(HDI)=Prob(HI Prob(D1lHI) Prob(D1|I) (18.7.2) 6 The factor in the numerator on the right of equation(18.7.2)is calculable as the probability of a data set given the hypothesis(compare with"likelihood"in 815.1). The denominator,called the "prior predictive probability"of the data,is in this case merely a normalization constant which can be calculated by the requirement that the probability of all hypotheses should sum to unity.(In other Bayesian contexts, Numerica 10621 the prior predictive probabilities of two qualitatively different models can be used to assess their relative plausibility. 431 If some additional data D2 comes along tomorrow,we can further refine our Recipes estimate of H's probability.as Prob(HDaD)=Prob(HD)Prob(DaD:1) Prob(D2 HDI) (18.7.3) Using the product rule for probabilities,Prob(ABC)=Prob(A|C)Prob(BAC), we find that equations (18.7.2)and (18.7.3)imply naD,D0=naap% (18.7.4) which shows that we would have gotten the same answer if all the data DiD2 had been taken together
18.7 Maximum Entropy Image Restoration 819 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). also comment in passing that the connection between maximum entropy inversion methods, considered here, and maximum entropy spectral estimation, discussed in §13.7, is rather abstract. For practical purposes the two techniques, though both named maximum entropy method or MEM, are unrelated. Bayes’ Theorem, which follows from the standard axioms of probability, relates the conditional probabilities of two events, say A and B: Prob(A|B) = Prob(A) Prob(B|A) Prob(B) (18.7.1) Here Prob(A|B) is the probability of A given that B has occurred, and similarly for Prob(B|A), while Prob(A) and Prob(B) are unconditional probabilities. “Bayesians” (so-called) adopt a broader interpretation of probabilities than do so-called “frequentists.” To a Bayesian, P(A|B) is a measure of the degree of plausibility of A (given B) on a scale ranging from zero to one. In this broader view, A and B need not be repeatable events; they can be propositions or hypotheses. The equations of probability theory then become a set of consistent rules for conducting inference [1,2]. Since plausibility is itself always conditioned on some, perhaps unarticulated, set of assumptions, all Bayesian probabilities are viewed as conditional on some collective background information I. Suppose H is some hypothesis. Even before there exist any explicit data, a Bayesian can assign to H some degree of plausibility Prob(H|I), called the “Bayesian prior.” Now, when some data D1 comes along, Bayes theorem tells how to reassess the plausibility of H, Prob(H|D1I) = Prob(H|I) Prob(D1|HI) Prob(D1|I) (18.7.2) The factor in the numerator on the right of equation (18.7.2) is calculable as the probability of a data set given the hypothesis (compare with “likelihood” in §15.1). The denominator, called the “prior predictive probability” of the data, is in this case merely a normalization constant which can be calculated by the requirement that the probability of all hypotheses should sum to unity. (In other Bayesian contexts, the prior predictive probabilities of two qualitatively different models can be used to assess their relative plausibility.) If some additional data D2 comes along tomorrow, we can further refine our estimate of H’s probability, as Prob(H|D2D1I) = Prob(H|D1I) Prob(D2|HD1I) Prob(D2|D1I) (18.7.3) Using the product rule for probabilities, Prob(AB|C) = Prob(A|C)Prob(B|AC), we find that equations (18.7.2) and (18.7.3) imply Prob(H|D2D1I) = Prob(H|I) Prob(D2D1|HI) Prob(D2D1|I) (18.7.4) which shows that we would have gotten the same answer if all the data D1D2 had been taken together
820 Chapter 18.Integral Equations and Inverse Theory From a Bayesian perspective,inverse problems are inference problems [3,4].The underlying parameter set u is a hypothesis whose probability,given the measured data values c,and the Bayesian prior Prob(uI)can be calculated.We might want to report a single"best"inverse u,the one that maximizes rob(u Prob(ulcI)=Prob(clul) Prob(c|I) (18.7.5) over all possible choices of u.Bayesian analysis also admits the possibility of reporting additional information that characterizes the region of possible u's with high relative probability,the so-called "posterior bubble"in u. 81 The calculation of the probability of the data c,given the hypothesis u proceeds exactly as in the maximum likelihood method.For Gaussian errors,e.g.,it is given by 鱼君 Prob(cul))=exp(-2X2)△u1△2…△uM (18.7.6) 、三 3 where x2 is calculated from u and c using equation (18.4.9),and the Au's are constant,small ranges of the components of u whose actual magnitude is irrelevant, RECIPES because they do not depend on u (compare equations 15.1.3 and 15.1.4). 9 In maximum likelihood estimation we,in effect,chose the prior Prob(u)to be constant.That was a luxury that we could afford when estimating a small number of parameters from a large amount of data.Here,the number of "parameters" (components of u)is comparable to or larger than the number of measured values (components of c);we need to have a nontrivial prior,Prob(u),to resolve the degeneracy of the solution. 8S孕是g合 In maximum entropy image restoration,that is where entropy comes in.The entropy of a physical system in some macroscopic state,usually denoted S.is the logarithm of the number of microscopically distinct configurations that all have 6 the same macroscopic observables(i.e.,consistent with the observed macroscopic state).Actually,we will find it useful to denote the negative of the entropy,also called the negentropy,by H=-S(a notation that goes back to Boltzmann).In situations where there is reason to believe that the a priori probabilities of the microscopic configurations are all the same (these situations are called ergodic),then the Bayesian prior Prob(u)for a macroscopic state with entropy S is proportional Numerica 10.621 to exp(S)or exp(-H). 43106 MEM uses this concept to assign a prior probability to any given underlying Recipes function u.For example [5-7],suppose that the measurement of luminance in each pixel is quantized to (in some units)an integer value.Let M North U=up (18.7.7) =1 be the total number of luminance quanta in the whole image.Then we can base our "prior"on the notion that each luminance quantum has an equal a priori chance of being in any pixel.(See [8]for a more abstract justification of this idea.)The number of ways of getting a particular configuration u is Ul u2.…!exp 空+(-刃 187.8
820 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). From a Bayesian perspective, inverse problems are inference problems [3,4]. The underlying parameter set u is a hypothesis whose probability, given the measured data values c, and the Bayesian prior Prob(u|I) can be calculated. We might want to report a single “best” inverse u, the one that maximizes Prob(u|cI) = Prob(c|uI) Prob(u|I) Prob(c|I) (18.7.5) over all possible choices of u. Bayesian analysis also admits the possibility of reporting additional information that characterizes the region of possible u’s with high relative probability, the so-called “posterior bubble” in u. The calculation of the probability of the data c, given the hypothesis u proceeds exactly as in the maximum likelihood method. For Gaussian errors, e.g., it is given by Prob(c|uI) = exp(−1 2 χ2)∆u1∆u2 ··· ∆uM (18.7.6) where χ2 is calculated from u and c using equation (18.4.9), and the ∆u µ’s are constant, small ranges of the components of u whose actual magnitude is irrelevant, because they do not depend on u (compare equations 15.1.3 and 15.1.4). In maximum likelihood estimation we, in effect, chose the prior Prob(u|I) to be constant. That was a luxury that we could afford when estimating a small number of parameters from a large amount of data. Here, the number of “parameters” (components of u) is comparable to or larger than the number of measured values (components of c); we need to have a nontrivial prior, Prob(u|I), to resolve the degeneracy of the solution. In maximum entropy image restoration, that is where entropy comes in. The entropy of a physical system in some macroscopic state, usually denoted S, is the logarithm of the number of microscopically distinct configurations that all have the same macroscopic observables (i.e., consistent with the observed macroscopic state). Actually, we will find it useful to denote the negative of the entropy, also called the negentropy, by H ≡ −S (a notation that goes back to Boltzmann). In situations where there is reason to believe that the a priori probabilities of the microscopic configurations are all the same (these situations are called ergodic), then the Bayesian prior Prob(u|I) for a macroscopic state with entropy S is proportional to exp(S) or exp(−H). MEM uses this concept to assign a prior probability to any given underlying function u. For example [5-7], suppose that the measurement of luminance in each pixel is quantized to (in some units) an integer value. Let U = M µ=1 uµ (18.7.7) be the total number of luminance quanta in the whole image. Then we can base our “prior” on the notion that each luminance quantum has an equal a priori chance of being in any pixel. (See [8] for a more abstract justification of this idea.) The number of ways of getting a particular configuration u is U! u1!u2! ··· uM ! ∝ exp − µ uµ ln(uµ/U) + 1 2 lnU − µ ln uµ (18.7.8)
18.7 Maximum Entropy Image Restoration 821 Here the left side can be understood as the number of distinct orderings of all the luminance quanta,divided by the numbers of equivalent reorderings within each pixel,while the right side follows by Stirling's approximation to the factorial function.Taking the negative of the logarithm,and neglecting terms of order log U in the presence of terms of order U,we get the negentropy M H(u)-∑4uln(uμ/U) (18.7.9) H=1 三 From equations (18.7.5).(18.7.6).and (18.7.9)we now seek to maximize Prob(ue)x exp expf-H(u)l (18.7.10) ICAL or,equivalently, RECIPES M minimize: -in[Pro(1)-Xu+H)回-2Xu+∑ulne,/U 9 =1 (18.7.11) This ought to remind you ofequation(18.4.11),or equation(18.5.6),or in fact any of our previous minimization principles along the lines ofA+AB,where AB=H(u) 日是。会 is a regularizing operator.Where is A?We need to put it in for exactly the reason discussed following equation(18.4.11):Degenerate inversions are likely to be able to achieve unrealistically small values of x2.We need an adjustable parameter to bring x2 into its expected narrow statistical range of N+(2N)1/2.The discussion at the beginning of $18.4 showed that it makes no difference which term we attach the A to.For consistency in notation,we absorb a factor 2 into A and put it on the entropy 家 term.(Another way to see the necessity of an undetermined A factor is to note that it is necessary if our minimization principle is to be invariant under changing the units in which u is quantized,e.g.,if an 8-bit analog-to-digital converter is replaced by a 12-bit one.)We can now also put"hats"back to indicate that this is the procedure Recipes Numerica 10.621 for obtaining our chosen statistical estimator: 43106 (outside Recipes M minimize::A+AB=X2向+H(O=X2向+入>立.ln(位) (18.7.12) =1 North (Formally,we might also add a second Lagrange multiplier AU,to constrain the total intensity U to be constant.) It is not hard to see that the negentropy,H(u),is in fact a regularizing operator, similar to u.(equation 18.4.11)or u.H.u (equation 18.5.6).The following of its properties are noteworthy: 1.When U is held constant,H(u)is minimized for =U/M constant,so it smooths in the sense of trying to achieve a constant solution,similar to equation (18.5.4).The fact that the constant solution is a minimum follows from the fact that the second derivative of u ln u is positive
18.7 Maximum Entropy Image Restoration 821 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Here the left side can be understood as the number of distinct orderings of all the luminance quanta, divided by the numbers of equivalent reorderings within each pixel, while the right side follows by Stirling’s approximation to the factorial function. Taking the negative of the logarithm, and neglecting terms of order log U in the presence of terms of order U, we get the negentropy H(u) = M µ=1 uµ ln(uµ/U) (18.7.9) From equations (18.7.5), (18.7.6), and (18.7.9) we now seek to maximize Prob(u|c) ∝ exp −1 2 χ2 exp[−H(u)] (18.7.10) or, equivalently, minimize: − ln [ Prob(u|c)]= 1 2 χ2[u] + H(u) = 1 2 χ2[u] + M µ=1 uµ ln(uµ/U) (18.7.11) This ought to remind you of equation (18.4.11), or equation (18.5.6), or in fact any of our previous minimization principles along the lines of A + λB, where λB = H(u) is a regularizing operator. Where is λ? We need to put it in for exactly the reason discussed following equation (18.4.11): Degenerate inversions are likely to be able to achieve unrealistically small values of χ2. We need an adjustable parameter to bring χ2 into its expected narrow statistical range of N ±(2N)1/2. The discussion at the beginning of §18.4 showed that it makes no difference which term we attach the λ to. For consistency in notation, we absorb a factor 2 into λ and put it on the entropy term. (Another way to see the necessity of an undetermined λ factor is to note that it is necessary if our minimization principle is to be invariant under changing the units in which u is quantized, e.g., if an 8-bit analog-to-digital converter is replaced by a 12-bit one.) We can now also put “hats” back to indicate that this is the procedure for obtaining our chosen statistical estimator: minimize: A + λB = χ2[u] + λH(u) = χ2[u] + λ M µ=1 uµ ln(uµ) (18.7.12) (Formally, we might also add a second Lagrange multiplier λ U, to constrain the total intensity U to be constant.) It is not hard to see that the negentropy, H(u), is in fact a regularizing operator, similar to u · u (equation 18.4.11) or u · H · u (equation 18.5.6). The following of its properties are noteworthy: 1. When U is held constant, H(u) is minimized for uµ = U/M = constant, so it smooths in the sense of trying to achieve a constant solution, similar to equation (18.5.4). The fact that the constant solution is a minimum follows from the fact that the second derivative of u ln u is positive.
822 Chapter 18.Integral Equations and Inverse Theory 2.Unlike equation (18.5.4),however,H(u)is local,in the sense that it does not difference neighboring pixels.It simply sums some function f,here f(u)=ulnu (18.7.13) over all pixels;it is invariant,in fact,under a complete scrambling of the pixels in an image.This form implies that H(u)is not seriously increased by the occurrence of a small number of very bright pixels(point sources)embedded in a low-intensity smooth background. 3.H(u)goes to infinite slope as any one pixel goes to zero.This causes it to 8 enforce positivity of the image,without the necessity ofadditional deterministic constraints. 4.The biggest difference between H(u)and the other regularizing operators that we have met is that H(u)is not a quadratic functional of u,so the equations obtained by varying equation(18.7.12)are nonlinear.This fact is itself worthy ICAL of some additional discussion. Nonlinear equations are harder to solve than linear equations.For image RECIPES processing,however,the large number of equations usually dictates an iterative solution procedure,even for linear equations,so the practical effect ofthe nonlinearity is somewhat mitigated.Below,we will summarize some of the methods that are successfully used for MEM inverse problems. For some problems,notably the problem in radio-astronomy of image recovery from an incomplete set of Fourier coefficients,the superior performance of MEM inversion can be,in part,traced to the nonlinearity of H(u).One way to see this [51 9么 is to consider the limit of perfect measurements o:0.In this case the x2 term in the minimization principle(18.7.12)gets replaced by a set of constraints,each with its own Lagrange multiplier,requiring agreement between model and data;that is, minimize: ∑y5-∑n +H() (18.7.14) 写曼⊙ (cf.equation 18.4.7).Setting the formal derivative with respect toto zero gives Numerica 10.621 ∂H =f(位)=∑Rr (18.7.15) 43126 or defining a function G as the inverse function of f', (18.7.16) This solution is only formal,since the Ai's must be found by requiring that equation (18.7.16)satisfy all the constraints built into equation(18.7.14).However,equation (18.7.16)does show the crucial fact that ifG is linear,then the solution ucontains only a linear combination of basis functions Rj corresponding to actual measurements j.This is equivalent to setting unmeasured ci's to zero.Notice that the principal solution obtained from equation (18.4.11)in fact has a linear G
822 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). 2. Unlike equation (18.5.4), however, H(u) is local, in the sense that it does not difference neighboring pixels. It simply sums some function f, here f(u) = u ln u (18.7.13) over all pixels; it is invariant, in fact, under a complete scrambling of the pixels in an image. This form implies that H(u) is not seriously increased by the occurrence of a small number of very bright pixels (point sources) embedded in a low-intensity smooth background. 3. H(u) goes to infinite slope as any one pixel goes to zero. This causes it to enforce positivity of the image, without the necessity of additional deterministic constraints. 4. The biggest difference between H(u) and the other regularizing operators that we have met is that H(u) is not a quadratic functional of u, so the equations obtained by varying equation (18.7.12) are nonlinear. This fact is itself worthy of some additional discussion. Nonlinear equations are harder to solve than linear equations. For image processing, however, the large number of equations usually dictates an iterative solution procedure, even for linear equations, so the practical effect of the nonlinearity is somewhat mitigated. Below, we will summarize some of the methods that are successfully used for MEM inverse problems. For some problems, notably the problem in radio-astronomy of image recovery from an incomplete set of Fourier coefficients, the superior performance of MEM inversion can be, in part, traced to the nonlinearity of H(u). One way to see this [5] is to consider the limit of perfect measurements σi → 0. In this case the χ2 term in the minimization principle (18.7.12) gets replaced by a set of constraints, each with its own Lagrange multiplier, requiring agreement between model and data; that is, minimize: j λj cj − µ Rjµuµ + H(u) (18.7.14) (cf. equation 18.4.7). Setting the formal derivative with respect to u µ to zero gives ∂H ∂uµ = f (uµ) = j λjRjµ (18.7.15) or defining a function G as the inverse function of f , uµ = G j λjRjµ (18.7.16) This solution is only formal, since the λj ’s must be found by requiring that equation (18.7.16) satisfy all the constraints built into equation (18.7.14). However, equation (18.7.16) does show the crucial fact that ifGislinear, then the solution ucontains only a linear combination of basis functions Rjµ corresponding to actual measurements j. This is equivalent to setting unmeasured cj ’s to zero. Notice that the principal solution obtained from equation (18.4.11) in fact has a linear G.
18.7 Maximum Entropy Image Restoration 823 In the problem of incomplete Fourier image reconstruction,the typical Rj has the form exp(-2mikj.x),where x is a two-dimensional vector in the image space and ku is a two-dimensional wave-vector.If an image contains strong point sources,then the effect of setting unmeasured c,'s to zero is to produce sidelobe ripples throughout the image plane.These ripples can mask any actual extended, low-intensity image features lying between the point sources.If,however,the slope of G is smaller for small values of its argument,larger for large values,then ripples in low-intensity portions of the image are relatively suppressed,while strong point sources will be relatively sharpened ("superresolution").This behavior on the slope of G is equivalent to requiring f"(u)<0.For f(u)=ulnu,we in fact have 8 f'"(u)=-1/u2<0. In more picturesque language,the nonlinearity acts to"create"nonzero values for the unmeasured ci's,so as to suppress the low-intensity ripple and sharpen the point sources. Is MEM Really Magical? RECIPES How unique is the negentropy functional(18.7.9)?Recall thatthat equation is based on the assumption that luminance elements are a priori distributed over the 9 pixels uniformly.If we instead had some other preferred a priori image in mind,one with pixel intensities mu,then it is easy to show that the negentropy becomes M H(u)=∑uuln(uμ/mu)+constant (18.7.17 里是g么 9 H=1 (the constant can then be ignored).All the rest of the discussion then goes through. More fundamentally,and despite statements by zealots to the contrary [7].there is actually nothing universal about the functional form f(u)=uIn u.In some other physical situations(for example,the entropy of an electromagnetic field in the limit of many photons per mode,as in radio-astronomy)the physical negentropy functional is actually f(u)=-Inu(see [5]for other examples).In general,the Q0系、8令 question,"Entropy of what?"is not uniquely answerable in any particular situation. 10621 (See reference [9]for an attempt at articulating a more general principle that reduces Recipes Numerica to one or another entropy functional under appropriate circumstances.) 43106 The four numbered properties summarized above,plus the desirable sign for nonlinearity,f"(u)<0,are all as true for f(u)=-In u as for f(u)=uln u.In fact these properties are shared by a nonlinear function as simple as f(u)=-vu, which has no information theoretic justification at all (no logarithms!).MEM North reconstructions of test images using any of these entropy forms are virtually indistinguishable [5]. By all available evidence.MEM seems to be neither more nor less than one usefully nonlinear version of the general regularization scheme A+AB that we have by now considered in many forms.Its peculiarities become strengths when applied to the reconstruction from incomplete Fourier data of images that are expected to be dominated by very bright point sources,but which also contain interesting low-intensity,extended sources.For images of some other character,there is no reason to suppose that MEM methods will generally dominate other regularization schemes,either ones already known or yet to be invented
18.7 Maximum Entropy Image Restoration 823 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). In the problem of incomplete Fourier image reconstruction, the typical R jµ has the form exp(−2πikj · xµ), where xµ is a two-dimensional vector in the image space and kµ is a two-dimensional wave-vector. If an image contains strong point sources, then the effect of setting unmeasured c j ’s to zero is to produce sidelobe ripples throughout the image plane. These ripples can mask any actual extended, low-intensity image features lying between the point sources. If, however, the slope of G is smaller for small values of its argument, larger for large values, then ripples in low-intensity portions of the image are relatively suppressed, while strong point sources will be relatively sharpened (“superresolution”). This behavior on the slope of G is equivalent to requiring f (u) < 0. For f(u) = u ln u, we in fact have f(u) = −1/u2 < 0. In more picturesque language, the nonlinearity acts to “create” nonzero values for the unmeasured ci’s, so as to suppress the low-intensity ripple and sharpen the point sources. Is MEM Really Magical? How unique is the negentropy functional (18.7.9)? Recall that that equation is based on the assumption that luminance elements are a priori distributed over the pixels uniformly. If we instead had some other preferred a priori image in mind, one with pixel intensities mµ, then it is easy to show that the negentropy becomes H(u) = M µ=1 uµ ln(uµ/mµ) + constant (18.7.17) (the constant can then be ignored). All the rest of the discussion then goes through. More fundamentally, and despite statements by zealots to the contrary [7], there is actually nothing universal about the functional form f(u) = u ln u. In some other physical situations (for example, the entropy of an electromagnetic field in the limit of many photons per mode, as in radio-astronomy) the physical negentropy functional is actually f(u) = − ln u (see [5] for other examples). In general, the question, “Entropy of what?” is not uniquely answerable in any particular situation. (See reference [9] for an attempt at articulating a more general principle that reduces to one or another entropy functional under appropriate circumstances.) The four numbered properties summarized above, plus the desirable sign for nonlinearity, f (u) < 0, are all as true for f(u) = − ln u as for f(u) = u ln u. In fact these properties are shared by a nonlinear function as simple as f(u) = −√u, which has no information theoretic justification at all (no logarithms!). MEM reconstructions of test images using any of these entropy forms are virtually indistinguishable [5]. By all available evidence, MEM seems to be neither more nor less than one usefully nonlinear version of the general regularization scheme A+ λB that we have by now considered in many forms. Its peculiarities become strengths when applied to the reconstruction from incomplete Fourier data of images that are expected to be dominated by very bright point sources, but which also contain interesting low-intensity, extended sources. For images of some other character, there is no reason to suppose that MEM methods will generally dominate other regularization schemes, either ones already known or yet to be invented.
824 Chapter 18.Integral Equations and Inverse Theory Algorithms for MEM The goal is to find the vector u that minimizes A+AB where in the notation of equations(18.5.5),(18.5.6),and(18.7.13), A=b-A·2 B=∑f(tu) (18.7.18) Compared with a "general"minimization problem,we have the advantage that we can compute the gradients and the second partial derivative matrices (Hessian 三 matrices)explicitly, VA=2(AT.A·i-AT.b) 82A -=2AT.Ale Buuoup 02B (18.7.19) 秀 VB]u=f(p) -=6pf"(tu) Ouuou It is important to note that while A's second partial derivative matrix cannot be stored (its size is the square of the number of pixels),it can be applied to any vector by 令 first applying A,then A.In the case of reconstruction from incomplete Fourier data,or in the case of convolution with a translation invariant point spread function, these applications will typically involve several FFTs.Likewise,the calculation of 、/ Press. the gradient V.A will involve FFTs in the application of A and AT While some success has been achieved with the classical conjugate gradient 9 method ($10.6),it is often found that the nonlinearity in f(u)=uInu causes problems.Attempted steps that give u with even one negative value must be cut in OF SCIENTIFIC( magnitude,sometimes so severely as to slow the solution to a crawl.The underlying problem is that the conjugate gradient method develops its information about the 61 inverse of the Hessian matrix a bit at a time,while changing its location in the search space.When a nonlinear function is quite different from a pure quadratic form,the old information becomes obsolete before it gets usefully exploited. Skilling and collaborators [6.7,10.11]developed a complicated but highly suc- cessful scheme,wherein a minimum is repeatedly sought not along a single search direction,but in a small-(typically three-)dimensional subspace,spanned by vectors 10.621 that are calculated anew at each landing point.The subspace basis vectors are Numerica chosen in such a way as to avoid directions leading to negative values.One of the 431 most successful choices is the three-dimensional subspace spanned by the vectors Recipes with components given by (outside ed)=iVA] North Software. e2)iu[VB]u e tiu∑,(aA/OupOup)ip[Vp立u∑,(A/tu8tp)plV Ale V∑,ip(vlp2 V∑pi(v4p2 (18.7.20) (In these equations there is no sum over u.)The form of the e(3)has some justification if one views dot products as occurring in a space with the metric g=/u, chosen to make zero values“far away”,see[6l
824 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Algorithms for MEM The goal is to find the vector u that minimizes A + λB where in the notation of equations (18.5.5), (18.5.6), and (18.7.13), A = |b − A · u| 2 B = µ f(uµ) (18.7.18) Compared with a “general” minimization problem, we have the advantage that we can compute the gradients and the second partial derivative matrices (Hessian matrices) explicitly, ∇A = 2(AT · A · u − AT · b) ∂2A ∂uµ∂uρ = [2AT · A]µρ [∇B]µ = f (uµ) ∂2B ∂uµ∂uρ = δµρf(uµ) (18.7.19) It is important to note that while A’s second partial derivative matrix cannot be stored (its size is the square of the number of pixels), it can be applied to any vector by first applying A, then AT . In the case of reconstruction from incomplete Fourier data, or in the case of convolution with a translation invariant point spread function, these applications will typically involve several FFTs. Likewise, the calculation of the gradient ∇A will involve FFTs in the application of A and AT . While some success has been achieved with the classical conjugate gradient method (§10.6), it is often found that the nonlinearity in f(u) = u ln u causes problems. Attempted steps that give u with even one negative value must be cut in magnitude, sometimes so severely as to slow the solution to a crawl. The underlying problem is that the conjugate gradient method develops its information about the inverse of the Hessian matrix a bit at a time, while changing its location in the search space. When a nonlinear function is quite different from a pure quadratic form, the old information becomes obsolete before it gets usefully exploited. Skilling and collaborators [6,7,10,11] developed a complicated but highly successful scheme, wherein a minimum is repeatedly sought not along a single search direction, but in a small- (typically three-) dimensional subspace, spanned by vectors that are calculated anew at each landing point. The subspace basis vectors are chosen in such a way as to avoid directions leading to negative values. One of the most successful choices is the three-dimensional subspace spanned by the vectors with components given by e(1) µ = uµ[∇A]µ e(2) µ = uµ[∇B]µ e(3) µ = uµ ρ(∂2A/∂uµ∂uρ)uρ[∇B]ρ ρ uρ ([∇B]ρ) 2 − uµ ρ(∂2A/∂uµ∂uρ)uρ[∇A]ρ ρ uρ ([∇A]ρ) 2 (18.7.20) (In these equations there is no sum over µ.) The form of the e (3) has some justification if one views dot products as occurring in a space with the metric g µν = δµν/uµ, chosen to make zero values “far away”; see [6].
18.7 Maximum Entropy Image Restoration 825 Within the three-dimensional subspace,the three-component gradient and nine- component Hessian matrix are computed by projection from the large space,and the minimum in the subspace is estimated by (trivially)solving three simultaneous linear equations,as in $10.7,equation (10.7.4).The size of a step Au is required to be limited by the inequality (△i)2/mu<(0.1to0.5)U (18.7.21) Because the gradient directions V.A and VB are separately available,it is possible to combine the minimum search with a simultaneous adjustment of A so as finally to 81 satisfy the desired constraint.There are various further tricks employed. A less general,but in practice often equally satisfactory,approach is due to Cornwell and Evans [121.Here,noting that B's Hessian(second partial derivative) matrix is diagonal,one asks whether there is a useful diagonal approximation to A's Hessian,namely 2A.A.If Au denotes the diagonal components of such an approximation,then a useful step in u would be 1 RECIPES Ai=+((VA+VB) (18.7.22) 5为 9 (again compare equation 10.7.4).Even more extreme,one might seek an approx- imation with constant diagonal elements,A=A,so that 1 Ain=-+((VA+AVB) (18.7.23) 23的 9 Since AT.A has something of the nature of a doubly convolved point spread function,and since in real cases one often has a point spread function with a sharp IENTIFIC central peak,even the more extreme of these approximations is often fruitful.One starts with a rough estimate of A obtained from the Ai's,e.g., 18.7.24) 道 An accurate value is not important,since in practice A is adjusted adaptively:If A is too large,then equation(18.7.23)'s steps will be too small (that is,larger steps in Recipes Numerica 10621 the same direction will produce even greater decrease in A+AB).If A is too small, then attempted steps will land in an unfeasible region(negative values of),or will 431 result in an increasedA+AB.There is an obvious similarity between the adjustment Recipes of A here and the Levenberg-Marquardt method of $15.5;this should not be too surprising,since MEM is closely akin to the problem of nonlinear least-squares fitting.Reference[12]also discusses how the value of A+Af"()can be used to adjust the Lagrange multiplier A so as to converge to the desired value of x2. All practical MEM algorithms are found to require on the order of 30 to 50 iterations to converge.This convergence behavior is not now understood in any fundamental way. “Bayesian”versus“Historic”Maximum Entropy Several more recent developments in maximum entropy image restoration go under the rubric "Bayesian"to distinguish them from the previous "historic" methods.See[13]for details and references
18.7 Maximum Entropy Image Restoration 825 Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). Within the three-dimensional subspace, the three-component gradient and ninecomponent Hessian matrix are computed by projection from the large space, and the minimum in the subspace is estimated by (trivially) solving three simultaneous linear equations, as in §10.7, equation (10.7.4). The size of a step ∆ u is required to be limited by the inequality µ (∆uµ) 2/uµ < (0.1 to 0.5)U (18.7.21) Because the gradient directions ∇A and ∇B are separately available, it is possible to combine the minimum search with a simultaneous adjustment of λ so as finally to satisfy the desired constraint. There are various further tricks employed. A less general, but in practice often equally satisfactory, approach is due to Cornwell and Evans [12]. Here, noting that B’s Hessian (second partial derivative) matrix is diagonal, one asks whether there is a useful diagonal approximation to A’s Hessian, namely 2AT · A. If Λµ denotes the diagonal components of such an approximation, then a useful step in u would be ∆uµ = − 1 Λµ + λf(uµ) (∇A + λ∇B) (18.7.22) (again compare equation 10.7.4). Even more extreme, one might seek an approximation with constant diagonal elements, Λµ = Λ, so that ∆uµ = − 1 Λ + λf(uµ) (∇A + λ∇B) (18.7.23) Since AT · A has something of the nature of a doubly convolved point spread function, and since in real cases one often has a point spread function with a sharp central peak, even the more extreme of these approximations is often fruitful. One starts with a rough estimate of Λ obtained from the Aiµ’s, e.g., Λ ∼ i [Aiµ] 2 (18.7.24) An accurate value is not important, since in practice Λ is adjusted adaptively: If Λ is too large, then equation (18.7.23)’s steps will be too small (that is, larger steps in the same direction will produce even greater decrease in A + λB). If Λ is too small, then attempted steps will land in an unfeasible region (negative values of u µ), or will result in an increased A+λB. There is an obvious similarity between the adjustment of Λ here and the Levenberg-Marquardt method of §15.5; this should not be too surprising, since MEM is closely akin to the problem of nonlinear least-squares fitting. Reference [12] also discusses how the value of Λ + λf (uµ) can be used to adjust the Lagrange multiplier λ so as to converge to the desired value of χ2. All practical MEM algorithms are found to require on the order of 30 to 50 iterations to converge. This convergence behavior is not now understood in any fundamental way. “Bayesian” versus “Historic” Maximum Entropy Several more recent developments in maximum entropy image restoration go under the rubric “Bayesian” to distinguish them from the previous “historic” methods. See [13] for details and references.
826 Chapter 18.Integral Equations and Inverse Theory Better priors:We already noted that the entropy functional (equation 18.7.13)is invariant under scrambling all pixels and has no notion of smoothness.The so-called"intrinsic correlation function"(ICF)model (Ref.[13],where it is called "New MaxEnt")is similar enough to the entropy functional to allow similar algorithms.but it makes the values of neighboring pixels correlated,enforcing smoothness. Better estimation of A:Above we chose A to bring x2 into its expected narrow statistical range of N+(2N)1/2.This in effect overestimatesx2, however,since some effective number y ofparameters are being"fitted"in doing the reconstruction.A Bayesian approach leads to a self-consistent estimate of this y and an objectively better choice for A. 83 鱼 18881892 CITED REFERENCES AND FURTHER READING: s三eoR 1-8 Jaynes,E.T.1976,in Foundations of Probability Theory,Statistical Inference,and Statistical Theories of Science,W.L.Harper and C.A.Hooker,eds.(Dordrecht:Reidel).[1] Jaynes,E.T.1985,in Maximum-Entropy and Bayesian Methods in Inverse Problems,C.R.Smith from NUMERICAL RECIPESI and W.T.Grandy,Jr.,eds.(Dordrecht:Reidel).[2] Jaynes.E.T.1984.in S/AM-AMS Proceedings.vol.14.D.W.McLaughlin,ed.(Providence.Rl: (Nort server American Mathematical Society).[3] Titterington,D.M.1985,Astronomy and Astrophysics,vol.144.381-387.[4] America computer, University Press. THE Narayan,R.,and Nityananda,R.1986,Annual Review of Astronomy and Astrophysics,vol.24, ART pp.127-170.[5] Skilling.J.,and Bryan,R.K.1984,Monthly Notices of the Royal Astronomical Society,vol.211. Programs pp.111-124.[6] Burch,S.F.,Gull,S.F.,and Skilling,J.1983,Computer Vision,Graphics and Image Processing. vol.23,pp.113-128.[☑ Skilling,J.1989,in Maximum Entropy and Bayesian Methods,J.Skilling,ed.(Boston:Kluwer).[8] u基之 Frieden.B.R.1983.Journal of the Optical Society of America,vol.73,pp.927-938.[9] Skilling,J.,and Gull,S.F.1985,in Maximum-Entropy and Bayesian Methods in Inverse Problems, C.R.Smith and W.T.Grandy.Jr.,eds.(Dordrecht:Reidel).[10] Skilling.J.1986,in Maximum Entropy and Bayesian Methods in Applied Statistics,J.H.Justice. OF SCIENTIFIC COMPUTING (ISBN 0-521- ed.(Cambridge:Cambridge University Press).[11] Cornwell,T.J.,and Evans,K.F.1985,Astronomy and Astrophysics,vol.143,pp.77-83.[12] Gull,S.F.1989,in Maximum Entropy and Bayesian Methods,J.Skilling,ed.(Boston:Kluwer).[13] 1988-1992 by Numerical Recipes -43108- (outside North Amer Software
826 Chapter 18. Integral Equations and Inverse Theory Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copyin Copyright (C) 1988-1992 by Cambridge University Press. Programs Copyright (C) 1988-1992 by Numerical Recipes Software. Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5) g of machinereadable files (including this one) to any server computer, is strictly prohibited. To order Numerical Recipes books or CDROMs, visit website http://www.nr.com or call 1-800-872-7423 (North America only), or send email to directcustserv@cambridge.org (outside North America). • Better priors: We already noted that the entropy functional (equation 18.7.13) is invariant under scrambling all pixels and has no notion of smoothness. The so-called “intrinsic correlation function” (ICF) model (Ref. [13], where it is called “New MaxEnt”) is similar enough to the entropy functional to allow similar algorithms, but it makes the values of neighboring pixels correlated, enforcing smoothness. • Better estimation of λ: Above we chose λ to bring χ2 into its expected narrow statistical range of N ± (2N)1/2. This in effect overestimates χ2, however, since some effective number γ of parameters are being “fitted” in doing the reconstruction. A Bayesian approach leads to a self-consistent estimate of this γ and an objectively better choice for λ. CITED REFERENCES AND FURTHER READING: Jaynes, E.T. 1976, in Foundations of Probability Theory, Statistical Inference, and Statistical Theories of Science, W.L. Harper and C.A. Hooker, eds. (Dordrecht: Reidel). [1] Jaynes, E.T. 1985, in Maximum-Entropy and Bayesian Methods in Inverse Problems, C.R. Smith and W.T. Grandy, Jr., eds. (Dordrecht: Reidel). [2] Jaynes, E.T. 1984, in SIAM-AMS Proceedings, vol. 14, D.W. McLaughlin, ed. (Providence, RI: American Mathematical Society). [3] Titterington, D.M. 1985, Astronomy and Astrophysics, vol. 144, 381–387. [4] Narayan, R., and Nityananda, R. 1986, Annual Review of Astronomy and Astrophysics, vol. 24, pp. 127–170. [5] Skilling, J., and Bryan, R.K. 1984, Monthly Notices of the Royal Astronomical Society, vol. 211, pp. 111–124. [6] Burch, S.F., Gull, S.F., and Skilling, J. 1983, Computer Vision, Graphics and Image Processing, vol. 23, pp. 113–128. [7] Skilling, J. 1989, in Maximum Entropy and Bayesian Methods, J. Skilling, ed. (Boston: Kluwer). [8] Frieden, B.R. 1983, Journal of the Optical Society of America, vol. 73, pp. 927–938. [9] Skilling, J., and Gull, S.F. 1985, in Maximum-Entropy and Bayesian Methods in Inverse Problems, C.R. Smith and W.T. Grandy, Jr., eds. (Dordrecht: Reidel). [10] Skilling, J. 1986, in Maximum Entropy and Bayesian Methods in Applied Statistics, J.H. Justice, ed. (Cambridge: Cambridge University Press). [11] Cornwell, T.J., and Evans, K.F. 1985, Astronomy and Astrophysics, vol. 143, pp. 77–83. [12] Gull, S.F. 1989, in Maximum Entropy and Bayesian Methods, J. Skilling, ed. (Boston: Kluwer). [13]