DEFINITION OF REFERENCE PRIORS 11 (3.1) =/kfglp(x)}p(x)dx, where p(x)=p(x0)q(0)/p(x)and p(x)=fep(x0)q(0)de Note that x here refers to the entire observation vector.It can have any dependency structure whatsoever (e.g.,it could consist of n normal random variables with mean zero,variance one and correlation 0.)Thus,when we re- fer to a model henceforth,we mean the probability model for the actual com- plete observation vector.Although somewhat nonstandard,this convention is necessary here because reference prior theory requires the introduction of (artificial)independent replications of the entire experiment. The amount of information IfqM}to be expected from observing x from M depends on the prior g(0):the sharper the prior the smaller the amount of information to be expected from the data.Consider now the information IfgM}which may be expected from k independent repli- cations of M.As k-oo,the sequence of realizations {x1,...,x would eventually provide any missing information about the value of 0.Hence,as k-oo,Ifq M}provides a measure of the missing information about 0 associated to the prior g(0).Intuitively,a reference prior will be a permissi- ble prior which maximizes the missing information about 6 within the class P of priors compatible with any assumed knowledge about the value of 0. With a continuous parameter space,the missing information I will typically diverge as k-oo,since an infinite amount of information would be required to learn the value of 0.Likewise,the expected informa- tion is typically not defined on an unbounded set.These two difficulties are overcome with the following definition,that formalizes the heuristics described in Bernardo [10]and in Berger and Bernardo [7]. DEFINITION 7 [Maximizing Missing Information (MMI)Property].Let M≡{p(x|),x∈X,0∈Θ∈R},be a model with one continuous parame- ter,and let P be a class of prior functions for 0 for which Jep(x)p()de< oo.The function r()is said to have the MMI property for model M given Pif,for any compact set o∈and any p∈P, (3.2) lim{I{πolM}-I{polM}≥0, where mo and po are,respectively,the renormalized restrictions of r(0)and p(0)to eo. The restriction of the definition to a compact set typically ensures the ex- istence of the missing information for given k.That the missing information will diverge for large k is handled by the device of simply insisting that the missing information for the reference prior be larger,as k-oo,than the missing information for any other candidate p().DEFINITION OF REFERENCE PRIORS 11 (3.1) = Z X κ{q | p(· | x)}p(x) dx, where p(θ | x) = p(x | θ)q(θ)/p(x) and p(x) = R Θ p(x | θ)q(θ) dθ. Note that x here refers to the entire observation vector. It can have any dependency structure whatsoever (e.g., it could consist of n normal random variables with mean zero, variance one and correlation θ.) Thus, when we refer to a model henceforth, we mean the probability model for the actual complete observation vector. Although somewhat nonstandard, this convention is necessary here because reference prior theory requires the introduction of (artificial) independent replications of the entire experiment. The amount of information I{q | M} to be expected from observing x from M depends on the prior q(θ): the sharper the prior the smaller the amount of information to be expected from the data. Consider now the information I{q | Mk} which may be expected from k independent replications of M. As k → ∞, the sequence of realizations {x1,...,xk} would eventually provide any missing information about the value of θ. Hence, as k → ∞, I{q | Mk} provides a measure of the missing information about θ associated to the prior q(θ). Intuitively, a reference prior will be a permissible prior which maximizes the missing information about θ within the class P of priors compatible with any assumed knowledge about the value of θ. With a continuous parameter space, the missing information I{q | Mk} will typically diverge as k → ∞, since an infinite amount of information would be required to learn the value of θ. Likewise, the expected information is typically not defined on an unbounded set. These two difficulties are overcome with the following definition, that formalizes the heuristics described in Bernardo [10] and in Berger and Bernardo [7]. Definition 7 [Maximizing Missing Information (MMI) Property]. Let M ≡ {p(x | θ),x ∈ X ,θ ∈ Θ ∈ R}, be a model with one continuous parameter, and let P be a class of prior functions for θ for which R Θ p(x | θ)p(θ) dθ < ∞. The function π(θ) is said to have the MMI property for model M given P if, for any compact set Θ0 ∈ Θ and any p ∈ P, lim k→∞ {I{π0 | Mk } − I{p0 | Mk (3.2) }} ≥ 0, where π0 and p0 are, respectively, the renormalized restrictions of π(θ) and p(θ) to Θ0. The restriction of the definition to a compact set typically ensures the existence of the missing information for given k. That the missing information will diverge for large k is handled by the device of simply insisting that the missing information for the reference prior be larger, as k → ∞, than the missing information for any other candidate p(θ)