THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 3 1.1 Markov Kernels Consider a probability space, (Q,B(Q),), with an n-dimensional sample space,Q,the Borel a-algebra over Q,B(Q),and a distin- guished probability measure,.In a Bayesian application,for example,the distinguished measure would be the posterior distribution and our ultimate goal would be the estimation of expectations with respect to the posterior,Ef]. A Markov kernel,T,is a map from an element of the sample space and the a-algebra to a probability, T:Q×B(Q)→0,1, such that the kernel is a measurable function in the first argument, T(,A):Q→Q,VA∈B(Q) and a probability measure in the second argument, r(g,):B(Q)→0,1,g∈Q. By construction the kernel defines a map, T:Q→P(Q), point. ver all initial points in the state space we can construct measures to probability measures. T:P(Q)→P(Q), by 可(A)=oT(A)=r(q,A)o(dq),g∈Q,A∈B(Q). When the transition is aperiodic.irreducible.Harris recurrent.and preserves the target measure,=w,its repeated application generates a Markov chain that will even- tually explore the entirety of Correlated samples,(o. N)from the markov pectation (Roberts et al. can construct estimators, jN(ao)=下∑fa, THE GEOMETRIC FOUNDATIONS OF HAMILTONIAN MONTE CARLO 3 1.1 Markov Kernels Consider a probability space, (Q,B(Q), ̟), with an n-dimensional sample space, Q, the Borel σ-algebra over Q, B(Q), and a distinguished probability measure, ̟. In a Bayesian application, for example, the distinguished measure would be the posterior distribution and our ultimate goal would be the estimation of expectations with respect to the posterior, E̟[f]. A Markov kernel, τ , is a map from an element of the sample space and the σ-algebra to a probability, τ : Q × B(Q) → [0, 1] , such that the kernel is a measurable function in the first argument, τ (·, A) : Q → Q, ∀A ∈ B(Q), and a probability measure in the second argument, τ (q, ·) : B(Q) → [0, 1] , ∀q ∈ Q. By construction the kernel defines a map, τ : Q → P(Q), where P(Q) is the space of probability measures over Q; intuitively, at each point in the sample space the kernel defines a probability measure describing how to sample a new point. By averaging the Markov kernel over all initial points in the state space we can construct a Markov transition from probability measures to probability measures, T : P(Q) → P(Q), by ̟ ′ (A) = ̟T (A) = Z τ (q, A) ̟(dq), ∀q ∈ Q, A ∈ B(Q). When the transition is aperiodic, irreducible, Harris recurrent, and preserves the target measure, ̟T = ̟, its repeated application generates a Markov chain that will eventually explore the entirety of ̟. Correlated samples, (q0, q1, . . . , qN ) from the Markov chain yield Markov Chain Monte Carlo estimators of any expectation (Roberts et al., 2004; Meyn and Tweedie, 2009). Formally, for any integrable function f ∈ L 1 (Q, ̟) we can construct estimators, ˆfN (q0) = 1 N X N n=0 f(qn)