《随机预算与调节》（英文版）Lecture 22 Last time:

The estimate of x based on N data points can then be made without reprocessing the first N, points. Their effect can be included simply by starting with pseudo observation which is equal to the estimate based on the first N points having a variance equal to the variance of the estimate based on N The same is true of the variance of the estimate based on N.

团购合买资源类别：文库，文档格式：PDF，文档页数：8，文件大小：202.05KB

16.322 Stochastic Estimation and Control, Fall 2004 Prof vander velde More computation unless you know each conditional density is going to be al Must provide f(x)-the a priori distribution. This is both the advantage and disadvantage of this method Other estimators include the effect of a priori information directly. Several estimators are based on the conditional probability distribution of x given the values of the observations. In this approach, we think of x as a random variabl having some distribution. This troubles some people since we know x is in fact fixed at some value throughout all the experiment. However, the fact that we do not know what the value isis expressed in terms of a distribution of possible values for x. The extent of our a priori knowledge is reflected in the variance of the a priori distribution we assign. Having an a priori distribution for x, and the values of the observations, we can in principle-and often in fact-calculate the conditional distribution of x given the observations. This is in fact the a posteriori distribution, f(x =,-N).this distribution expresses the probability density for various values of x given the values of the observations and the a priori distribution. Having this distribution, one can define a number of reasonable estimates One is the minimum variance estimate -that valuex which minimizes the error variance Var=J(i-x)/(x|=En)dx But a derivative of this variance with respect to i shows the minimizing value of to be =xf(x1…,=)x which is the conditional mean -the mean of the conditional distribution of x Another reasonable estimate of x based on this conditional distribution is the value at the maximum probability density. This can perfectly well be called the maximum likelihood"estimate though it is not necessarily the same as th maximum likelihood estimate we have just derived. Schweppe calls it the MAP (maximum a posteriori probability)estimator. The two are related as follows The first is the x which maximizes f(2…,/)∫(=,…,=,x) f(x) Page 2 of 8

16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Page 2 of 8 • More computation unless you know each conditional density is going to be normal • Must provide f ( ) x - the a priori distribution. This is both the advantage and disadvantage of this method. Other estimators include the effect of a priori information directly. Several estimators are based on the conditional probability distribution of x given the values of the observations. In this approach, we think of x as a random variable having some distribution. This troubles some people since we know x is in fact fixed at some value throughout all the experiment. However, the fact that we do not know what the value is is expressed in terms of a distribution of possible values for x . The extent of our a priori knowledge is reflected in the variance of the a priori distribution we assign. Having an a priori distribution for x , and the values of the observations, we can in principle – and often in fact – calculate the conditional distribution of x given the observations. This is in fact the a posteriori distribution, f ( xz z | ,..., 1 N ). This distribution expresses the probability density for various values of x given the values of the observations and the a priori distribution. Having this distribution, one can define a number of reasonable estimates. One is the minimum variance estimate – that value xˆ which minimizes the error variance. ( )( ) 2 1 Var | ,..., ˆ N x x f x z z dx ∞ −∞ = − ∫ But a derivative of this variance with respect to xˆ shows the minimizing value of xˆ to be ( ) 1 ˆ | ,..., N x xf x z z dx ∞ −∞ = ∫ which is the conditional mean – the mean of the conditional distribution of x . Another reasonable estimate of x based on this conditional distribution is the value at the maximum probability density. This can perfectly well be called the “maximum likelihood” estimate though it is not necessarily the same as the maximum likelihood estimate we have just derived. Schweppe calls it the MAP (“maximum a posteriori probability”) estimator. The two are related as follows: The first is the x which maximizes ( ) ( ) 1 1 ,..., , ,..., | ( ) N N f z zx fz z x f x =

16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Page 3 of 8 The second is the x which maximizes ( ) ( ) ( ) ( ) ( ) ( ) ( ) 1 1 1 1 1 1 1 , ,..., | ,..., ,..., ,..., , ,..., | ( ) ,..., | ,..., ( ) N N N N N N N f xz z f xz z fz z fz z x fz z x f x f z z f xz z f x = = = The a priori distribution f ( ) x , and the distribution of x is also involved in the joint distribution of the observations. Only if the distribution of x is flat for all x can we guarantee that these two functions will have maxima at the same value of x . But a flat distribution for x for all x is exactly the case when there is no a priori knowledge about x . In that case all values of x have equal probability density which is in fact an infinitesimal. We can consider f ( ) x in that case to be the limit of almost any convenient distribution as the variance → ∞ . If, on the other hand, we do have some prior information about x based on some previous measurements or on physical reason, f ( ) x will have some finite shape – often a normal shape – and the x which maximizes f (z zx 1,..., | N ) will not maximize ( ) 1 | ,..., N f xz z . In this case the latter choice of x is to be preferred since it is the most probable value of x based on the a priori distribution of x and the values of the observations, whereas the former depends only on the observations. We just note in passing that for a normal ( ) 1 | ,..., N f xz z or any other distribution which is symmetric about the maximum point, the most probable value is equal to the conditional mean which is the minimum variance value of xˆ . In the absence of prior information, this is also the maximum likelihood estimate, which is the least weighted squares estimate. This we found to be a linear combination of the data. So in the case of a normal conditional distribution with no prior information, the optimum linear estimate based on the data is the minimum variance estimator. No nonlinear operation on the data can give a smaller variance estimate

16.322 Stochastic Estimation and Control, Fall 2004 Prof. Vander Velde Page 4 of 8 For non-normal distributions, one often defines a linear estimator and optimizes it based on minimum mean squared error. This gives the same estimate as that derived here. But in that case it may be that some nonlinear operation on the data could do better. Also note that if ( ) 1 | ,..., N f xz z is known to be normal, as if all noises are normal, the initial distribution is normal, and only linear operations are involved, then the distribution is completely defined by its mean and variance – and only these parameters need be computed. But in any other case, updating a general distribution requires the entire distribution- and the estimation problem becomes infinite in dimension. Estimators based on the conditional distribution of x are thus very satisfying theoretically, but are more complicated to derive than a simple least squares fit. We have found the least squares estimate to equal these other estimators if there is no a priori information. Fortunately it is possible to include a priori information in the least squares format in such a way as to recast the problem in the form of an equivalent estimation problem with no prior information. This is done by introducing the a priori information in the form of an additional pseudo observation. That this can be done is due to the fact that the least squares estimate is a linear combination of the observations – and thus it is possible to group the observations in any way. Recursive formulation (The other method is called batch processing.) Suppose we had the estimate 0 xˆ based on any prior information and all measurements already taken, and its variance 2 σ 0 . Then we took just one more measurement and wished to obtain an improved estimate of x immediately. The formula says ( ) ( ) 0 2 2 2 2 0 0 1 0 22 22 0 0 2 2 0 2 0 0 0 2 2 0 0 0 ˆ ˆ ˆ 1 1 ˆ ˆ ˆ ˆ z z z z z z x z x x z x zx x kz x σ σ σ σ σσ σσ σ σ σ σ σ + == + + + + =+ − + =+ −

点击进入文档下载页（PDF格式）

已到末页，全文结束

点击下载（PDF格式）

浏览记录