The restriction of the definition to a compact set typically ensures the ex- istence of the missing information for given k.That the missing information will diverge for large k is handled by the device of simply insisting that the missing information for the reference prior be larger,as k-oo,than the missing information for any other candidate p().DEFINITION OF REFERENCE PRIORS 11 (3.1) = Z X κ{q | p(· | x)}p(x) dx, where p(θ | x) = p(x | θ)q(θ)/p(x) and p(x) = R Θ p(x | θ)q(θ) dθ. Note that x here refers to the entire observation vector. It can have any dependency structure whatsoever (e.g., it could consist of n normal random variables with mean zero, variance one and correlation θ.) Thus, when we re￾fer to a model henceforth, we mean the probability model for the actual com￾plete observation vector. Although somewhat nonstandard, this convention is necessary here because reference prior theory requires the introduction of (artificial) independent replications of the entire experiment. The amount of information I{q | M} to be expected from observing x from M depends on the prior q(θ): the sharper the prior the smaller the amount of information to be expected from the data. Consider now the information I{q | Mk} which may be expected from k independent repli￾cations of M. As k → ∞, the sequence of realizations {x1,...,xk} would eventually provide any missing information about the value of θ. Hence, as k → ∞, I{q | Mk} provides a measure of the missing information about θ associated to the prior q(θ). Intuitively, a reference prior will be a permissi￾ble prior which maximizes the missing information about θ within the class P of priors compatible with any assumed knowledge about the value of θ. With a continuous parameter space, the missing information I{q | Mk} will typically diverge as k → ∞, since an infinite amount of information would be required to learn the value of θ. Likewise, the expected informa￾tion is typically not defined on an unbounded set. These two difficulties are overcome with the following definition, that formalizes the heuristics described in Bernardo [10] and in Berger and Bernardo [7]. Definition 7 [Maximizing Missing Information (MMI) Property]. Let M ≡ {p(x | θ),x ∈ X ,θ ∈ Θ ∈ R}, be a model with one continuous parame￾ter, and let P be a class of prior functions for θ for which R Θ p(x | θ)p(θ) dθ < ∞. The function π(θ) is said to have the MMI property for model M given P if, for any compact set Θ0 ∈ Θ and any p ∈ P, lim k→∞ {I{π0 | Mk } − I{p0 | Mk (3.2) }} ≥ 0, where π0 and p0 are, respectively, the renormalized restrictions of π(θ) and p(θ) to Θ0. The restriction of the definition to a compact set typically ensures the ex￾istence of the missing information for given k. That the missing information will diverge for large k is handled by the device of simply insisting that the missing information for the reference prior be larger, as k → ∞, than the missing information for any other candidate p(θ)
