• Maximum-a-posteriori (MAP) inferenc_中国高校课件下载中心

正在加载图片...

Maximum-a-posteriori (MAP)inference:find the maximum-a-posteriori (MAP)probabilities P()for the configurations over A,where VoA∈Q,PAr(GA)兰maxμAUB,I(aA,TB. tB∈QB All these fundamental inference problems can be described abstractly by a function 0:RK.For instances,for marginal inference,contains all MRF instances where A is a subset of the vertices, K=1,and ()=(A.(and for posterior or MAP inference,contains all MRF instances where AB is a subset of the vertices,K=Ql and 0(T)=(HA.r(A TB)(for posterior inference)or()=(PA(A(for MAP inference). One canonical approach for probabilistic inference is by sampling:sufficiently many independent samples are drawn(approximately)from the Gibbs distribution of the MRF instance and an estimate of the target probabilities is calculated from these samples.Given a probabilistic inference problem 0(),we use ()to denote an estimating function that approximates (T)using independent samples drawn approximately from ur.For the aforementioned problems of marginal,posterior and MAP inferences,such estimating function Se()simply counts the frequency of the samples that satisfy certain properties. The sampling cost of an estimator is captured in two aspects:the number of samples it uses and the accuracy of each individual sample it requires. Definition 2.1((N,e)-estimator for 0).Let 0:RK be a probabilistic inference problem and o()an estimating function for ()that for each instance I=(V,E,,)e,maps samples in v to an estimate of0(I).LetN:Nt→N+and e:Nt→(0,1).For any instance I=(W,E,Q,Φ)∈ where n =IVI,the random variable e(x(1),...,X(N(m))is said to be an (N,e)-estimator for ( if x(1),...,X(N(n))Ov are N(n)independent samples drawn approximately from ur such that drv(X,μr）≤e(n)for all1≤j≤N(n. 2.3.Dynamic inference problem.We consider the inference problem where the input graphical model is changed dynamically:at each step,the current MRF instance I =(V,E,O,)is updated to a new instance I'=(V,E',O,')We consider general update operations for MRFs that can change both the underlying graph and all edge/vertex potentials simultaneously,where the update request is made by a non-adaptive adversary independently of the randomness used by the inference algorithm. Such updates are general enough and cover many applications,e.g.analyses of time series network data [CW+07,QS93,QS92,AQ+17],and learning algorithms for graphical models [Hin12,WJ08]. The difference between the original and the updated instances is measured as follows. Definition 2.2(difference between MRF instances).The difference between two MRF instances I=(V,E,Q,Φ)andT'=(W',E',Q,Φ')，whereΦ=(pa)aEVUE andΦ'=(pa)aeVUE',is defined as (2) dI,I)± ∑lp。-9l+∑m-l,+V®1+E@E vEVnV' eEEnE' where AB=(A\B)U(B\A)stands for the symmetric difference between two sets A and B, pu-%l,兰∑ceQ 1ou(c)-φ%(c外and e-l1兰∑c,ceQ e(c,c)-pc,c Given a probability vector specified by the function 0:RK,the dynamic inference problem asks to maintain an estimator 0(T)of 0(T)for the current MRF instance I =(V,E,O,)i,with a data structure,such that when I is updated to T'=(V',E',')the algorithm updates (T) to an estimator 0(')of the new vector 0(),or equivalently,outputs the difference between the estimators (I)and (I'). It is desirable to have the dynamic inference algorithm which maintains an(N,e)-estimator for 0(T) for the current instance I.However,the dynamic algorithm cannot be efficient if N(n)and e(n)change drastically with n(so that significantly more samples or substantially more accurate samples may be needed when a new vertex is added),or if recalculating the estimating function S()itself is expensive. We introduce a notion of dynamical efficiency for the estimators that are suitable for dynamic inference.• Maximum-a-posteriori (MAP) inference: find the maximum-a-posteriori (MAP) probabilities P ∗ A,I (·) for the configurations over A, where ∀σA ∈ Q A , P ∗ A,I (σA) ≜ max τB ∈QB µA∪B,I(σA, τB). All these fundamental inference problems can be described abstractly by a function θ : M → R K . For instances, for marginal inference, M contains all MRF instances where A is a subset of the vertices, K = |Q| |A| , and θ(I) = (µA,I(σA))σA ∈QA ; and for posterior or MAP inference, M contains all MRF instances where A ] B is a subset of the vertices, K = |Q| |A| and θ(I) = (µA,I(σA | τB))σA ∈QA (for posterior inference) or θ(I) = (P ∗ A,I (σA))σA ∈QA (for MAP inference). One canonical approach for probabilistic inference is by sampling: sufficiently many independent samples are drawn (approximately) from the Gibbs distribution of the MRF instance and an estimate of the target probabilities is calculated from these samples. Given a probabilistic inference problem θ(·), we use Eθ (·) to denote an estimating function that approximates θ(I) using independent samples drawn approximately from µI. For the aforementioned problems of marginal, posterior and MAP inferences, such estimating function Eθ (·) simply counts the frequency of the samples that satisfy certain properties. The sampling cost of an estimator is captured in two aspects: the number of samples it uses and the accuracy of each individual sample it requires. Definition 2.1 ((N, ϵ)-estimator for θ). Let θ : M → R K be a probabilistic inference problem and Eθ (·) an estimating function for θ(·) that for each instance I = (V, E,Q, Φ) ∈ M, maps samples in Q V to an estimate of θ(I). Let N : N + → N + and ϵ : N + → (0, 1). For any instance I = (V, E,Q, Φ) ∈ M where n = |V |, the random variable Eθ (X (1) , . . . ,X (N (n))) is said to be an (N, ϵ)-estimator for θ(I) if X (1) , . . . ,X (N (n)) ∈ Q V are N(n) independent samples drawn approximately from µI such that dTV X (j) , µI ≤ ϵ(n) for all 1 ≤ j ≤ N(n). 2.3. Dynamic inference problem. We consider the inference problem where the input graphical model is changed dynamically: at each step, the current MRF instance I = (V, E,Q, Φ) is updated to a new instance I 0 = (V 0 , E 0 ,Q, Φ 0 ). We consider general update operations for MRFs that can change both the underlying graph and all edge/vertex potentials simultaneously, where the update request is made by a non-adaptive adversary independently of the randomness used by the inference algorithm. Such updates are general enough and cover many applications, e.g. analyses of time series network data [CW+ 07, QS93, QS92, AQ+ 17], and learning algorithms for graphical models [Hin12, WJ08]. The difference between the original and the updated instances is measured as follows. Definition 2.2 (difference between MRF instances). The difference between two MRF instances I = (V, E,Q, Φ) and I 0 = (V 0 , E 0 ,Q, Φ 0 ), where Φ = (φa)a ∈V ∪E and Φ 0 = (φ 0 a )a ∈V 0∪E 0, is defined as (2) d(I, I 0 ) ≜ Õ v ∈V ∩V 0 φv − φ 0 v 1 + Õ e ∈E∩E 0 φe − φ 0 e 1 + |V ⊕ V 0 | + |E ⊕ E 0 |, where A ⊕ B = (A \ B) ∪ (B \ A) stands for the symmetric difference between two sets A and B, φv − φ 0 v 1 ≜ Í c ∈Q φv (c) − φ 0 v (c) , and φe − φ 0 e 1 ≜ Í c,c 0∈Q φe (c,c 0 ) − φ 0 e (c,c 0 ) . Given a probability vector specified by the function θ : M → R K , the dynamic inference problem asks to maintain an estimator ˆθ(I) of θ(I) for the current MRF instance I = (V, E,Q, Φ) ∈ M, with a data structure, such that when I is updated to I 0 = (V 0 , E 0 ,Q, Φ 0 ) ∈ M, the algorithm updates ˆθ(I) to an estimator ˆθ(I0 ) of the new vector θ(I0 ), or equivalently, outputs the difference between the estimators ˆθ(I) and ˆθ(I0 ). It is desirable to have the dynamic inference algorithm which maintains an (N, ϵ)-estimator for θ(I) for the current instance I. However, the dynamic algorithm cannot be efficient if N(n) and ϵ(n) change drastically with n (so that significantly more samples or substantially more accurate samples may be needed when a new vertex is added), or if recalculating the estimating function Eθ (·) itself is expensive. We introduce a notion of dynamical efficiency for the estimators that are suitable for dynamic inference. 4

<<向上翻页向下翻页>>