2.2 Filtrations Let(Q, F)be a measureable space. A filtration in discrete time is a sequence of r-algebras Ft) such that FtCy Ft Ft+ for all t=0.1... In continuous time, the second condition is replaced by Fs CFt for all s< t 3 Markov processes The idea of a Markov process is to capture the idea of a short-memory stochastic process: once its current state is known, past history is irrelevant from the point of view of predicting its future Definition. Let( Q, F)be a measurable space and let(P, E)be, respectively,a probability measure on and a filtration of this space. Let X be a stochastic process in discrete time on( Q, F). Then X is called a(P, E )-Markov process if 1. X is F-adapted, and
2.2 Filtrations Let (Ω, F) be a measureable space. A filtration in discrete time is a sequence of σ–algebras {Ft} such that Ft ⊂ F and Ft ⊂ Ft+1 for all t = 0, 1, . . .. In continuous time, the second condition is replaced by Fs ⊂ Ft for all s ≤ t. 3 Markov processes The idea of a Markov process is to capture the idea of a short-memory stochastic process: once its current state is known, past history is irrelevant from the point of view of predicting its future. Definition. Let (Ω, F) be a measurable space and let (P, F) be, respectively, a probability measure on and a filtration of this space. Let X be a stochastic process in discrete time on (Ω, F). Then X is called a (P, F)-Markov process if 1. X is F−adapted, and 11
2. For each t E Z+ and each Borel set B CB(R P(Xt+1∈BFt)=P(Xt+1∈B|a(Xt) ometimes when the probability measure and filtration are understood, we will talk simply of a Markov process Remark. Often the filtration F is taken to be that generated by the process X itself Proposition. Let(Q, F, P, E)be a filtered probability space and let X be a(P, F)- Markov process. Let t, k E Z+. Let f: R-R be a Borel function such that f(Xt+k) is integrable. Then EIf(Xt+)Ft]=EI(Xi+)o(Xi) 10 and hence there is a borel function g:Z+×Z+×R→ R such that, for each t, EI(Xu+k)Ft=g(t, k, Xt) 11 Proof. To show it for k:= 1, use that f(Xi+1) is a o(X++1)-measurable random variable and hence is the limit of a monotone increasing sequence of o(Xt+1)-simple random variables. But such random variables are linear combinations of indicator functions of sets X+1(B)with B a Borel set. This completes the proof for k=1 To prove it for arbitrary positive k, use induction. To prove it for k+ 1 assuming it true for k use the law of iterated The vector case is a simple extension of the scalar case. However, it is important that the definition of a vector Markov process is not that each component is Markov
2. For each t ∈ Z+ and each Borel set B ⊂ B (R) P (Xt+1 ∈ B|Ft) = P (Xt+1 ∈ B|σ (Xt)). (9) Sometimes when the probability measure and filtration are understood, we will talk simply of a Markov process. Remark. Often the filtration F is taken to be that generated by the process X itself. Proposition. Let (Ω, F, P, F) be a filtered probability space and let X be a (P, F)- Markov process. Let t, k ∈ Z+. Let f : R → R be a Borel function such that f (Xt+k) is integrable. Then, E [f (Xt+k)|Ft ] = E [f (Xt+k)|σ (Xt)] (10) and hence there is a Borel function g : Z+ × Z+ × R → R such that, for each t, E [f (Xt+k)|Ft ] = g (t, k, Xt). (11) Proof. To show it for k = 1, use that f (Xt+1) is a σ (Xt+1) −measurable random variable and hence is the limit of a monotone increasing sequence of σ (Xt+1) −simple random variables. But such random variables are linear combinations of indicator functions of sets X −1 t+1 (B) with B a Borel set. This completes the proof for k = 1. To prove it for arbitrary positive k, use induction. To prove it for k + 1 assuming it true for k, use the law of iterated expectations. The vector case is a simple extension of the scalar case. However, it is important that the definition of a vector Markov process is not that each component is Markov. 12
Instead, we require that all the relevant(for the future of X) bits of information in Ft are in the a-algebra generated by all the stochastic variables in Xt, i.e. o(Xt)is defined as the single a-algebra o(Xi.t, X2,t,,Xn,t). This means that, for each Borel function f:Rn→R EI(Xi+k)]= EI(X++k)lo(Xt) and hence that there is a Borel function g: Z+ X R"-R such that, for each t EIf(X+1ft=g(t, Xt (13 (The case k= l is so important that we stress it here by ignoring greater values of 3.0.1 Probability transition functions and time homogeneity Definition. Let(, F, P, E be a filtered probability space and let X be a(p F Markov process. Then, for each t=0, 1, 2... its probability transition function Qt:R×B(R)→[0.,1 is defined via Qt(x,B)=P(Xt∈BXt Note that any Markov process has a sequence of probability transition functions Note also that for each fixed t and a, Qt+1(, )is a probability measure on B(R) Meanwhile, if we fix B, Q++1(X(), B)is a random variable. Indeed, it is the con ditional probability of Xt+1 E B given Xt, i.e. Q++1(Xt, B)=EIx41(o(X. Moreover, the conditional expectation of any a(Xt+1)-measurable random variable (given X, is an integral with respect to the measure Qt+1 in the following sense
Instead, we require that all the relevant (for the future of X) bits of information in Ft are in the σ−algebra generated by all the stochastic variables in Xt , i.e. σ (Xt) is defined as the single σ−algebra σ ({X1,t, X2,t, ..., Xn,t}). This means that, for each Borel function f : R n → R m, E [f (Xt+k)|Ft ] = E [f (Xt+k)|σ (Xt)] (12) and hence that there is a Borel function g : Z+ × R n→ R m such that, for each t, E [f (Xt+1)|Ft ] = g (t, Xt). (13) (The case k = 1 is so important that we stress it here by ignoring greater values of k.) 3.0.1 Probability transition functions and time homogeneity Definition. Let (Ω, F, P, F) be a filtered probability space and let X be a (P, F)- Markov process. Then, for each t = 0, 1, 2... its probability transition function Qt : R×B (R) → [0, 1] is defined via Qt (x, B) = P (Xt ∈ B|Xt−1 = x). (14) Note that any Markov process has a sequence of probability transition functions. Note also that for each fixed t and x, Qt+1 (x, ·) is a probability measure on B (R). Meanwhile, if we fix B, Qt+1 (Xt (·), B) is a random variable. Indeed, it is the conditional probability of Xt+1 ∈ B given Xt , i.e. Qt+1 (Xt , B) = E h IX −1 t+1(B) ¯ ¯ ¯ σ (Xt) i . Moreover, the conditional expectation of any σ (Xt+1)-measurable random variable (given Xt) is an integral with respect to the measure Qt+1 in the following sense. 13
Proposition. Let(Q, F, P, F be a filtered probability space and let X be a (P, E)-Markov process. Let(Qt) be its probability transition functions and let ZEC(Q, o(X++1), P). Then, for each t=0, 1, E[ZIXi=/f(y)Q++1(Xt, dy) or, put differently, we have for each t=0, 1, .. and each a E[ZIX =a]=/f(y)Q++1(a, dy) (16) Proof We will show it first for an indicator variable Z=Ix-1(A) where AE B(R).Then f(y)=IA(y). We now need to show that the random variable/f(y)Qt+1(Xt, dy) qualifies as the conditional expectation E [ZXt. Clearly it is a(Xt)-measurable But does it integrate to the right thing? Well, let G E o(Xt) and recall that, by definition,Qu+1(X,A)=E((alo(XD).Hence f (y)Q+1(Xt, dy)P(dw) IA(y)Q++1(Xt, dy)P(d /-(x,4P)=/E(xa(x)P()=/x=aP() Meanwhile. since z (A) we obviously have zP(dw)=Ix (18)
Proposition. Let (Ω, F, P, F) be a filtered probability space and let X be a (P, F)-Markov process. Let hQti be its probability transition functions and let Z ∈ L1 (Ω, σ (Xt+1), P). Then, for each t = 0, 1, ... E [Z|Xt ] = Z R f (y) Qt+1 (Xt , dy) (15) or, put differently, we have for each t = 0, 1, ... and each x, E [Z|Xt = x] = Z R f (y) Qt+1 (x, dy). (16) Proof. We will show it first for an indicator variable Z = IX −1 t+1(A) where A ∈ B (R). Then f (y) = IA (y). We now need to show that the random variable Z R f (y) Qt+1 (Xt , dy) qualifies as the conditional expectation E [Z|Xt ] . Clearly it is σ (Xt) −measurable. But does it integrate to the right thing? Well, let G ∈ σ (Xt) and recall that, by definition, Qt+1 (Xt , A) = E ³ IX −1 t+1(A) |σ (Xt) ´ . Hence Z G Z R f (y) Qt+1 (Xt , dy) P (dω) = Z G Z R IA (y) Qt+1 (Xt , dy) P (dω) = = Z G Qt+1 (Xt , A) P (dω) = Z G E ³ IX −1 t+1(A) |σ (Xt) ´ P (dω) = Z G IX −1 t+1(A)P (dω). (17) Meanwhile, since Z = IX −1 t+1(A) we obviously have Z G ZP (dω) = Z G IX −1 t+1(A)P (dω) . (18) 14
To show the theorem for an arbitrary Z E C(Q, o(Xt+1), P), use the Monotone Convergence Theorem. L We now use the probability transition function to define a time homogeneous Markov Definition. Let(Q, F, P, )be a filtered probability space and let X be a(P, e Markov process. Let(Qt)t be its probability transition functions. If there is a Q such that Qt=Q for all t= 1, 2, . then X is called a time homogeneous Markov process Proposition. Let(@, F, P, e be a filtered probability space and let X be a time homogeneous(P, F)-Markov process. For any nonnegative integers k, t, let Yt+k C(Q, o(Xt+k), P). Then for each k=0, 1, .. there is a Borel function gk: R-R such that. for each t=0.1 EY++Ft= gk(XL) (19) In particular, there is a Borel function h such that, for each t=0, 1 E[t+1|F]=h(X) 3. 1 Finitestate markov chains in discrete time This is perhaps the simplest class of Markov processes. Let(Q2, F, P)be a probability space and let={x1,x2,……,xn} be a finite set.X:Z+→ a be a stochastic process. Denote by u the vector of probabilities that Xt= i and suppose there is
To show the theorem for an arbitrary Z ∈ L1 (Ω, σ (Xt+1), P), use the Monotone Convergence Theorem. We now use the probability transition function to define a time homogeneous Markov process. Definition. Let (Ω, F, P, F) be a filtered probability space and let X be a (P, F)- Markov process. Let hQti ∞ t=1 be its probability transition functions. If there is a Q such that Qt = Q for all t = 1, 2, ... then X is called a time homogeneous Markov process. Proposition. Let (Ω, F, P, F) be a filtered probability space and let X be a time homogeneous (P, F)-Markov process. For any nonnegative integers k, t, let Yt+k ∈ L 1 (Ω, σ (Xt+k), P). Then for each k = 0, 1, ... there is a Borel function gk : R → R such that, for each t = 0, 1, ... E [Yt+k|Ft ] = gk (Xt). (19) In particular, there is a Borel function h such that, for each t = 0, 1, ... E [Yt+1|Ft ] = h (Xt). (20) 3.1 Finite–state Markov chains in discrete time This is perhaps the simplest class of Markov processes. Let (Ω, F, P) be a probability space and let X = {x1, x2, . . . , xn} be a finite set. X : Z+ → X be a stochastic process. Denote by µt the vector of probabilities that Xt = xi and suppose there is 15
a sequence of matrices It such that H+1=4x (2 Then X is said to be a finite-state Markov chain in discrete time. Call It the probability transition matrix The interpretation of (21) is the following P(Xu+1=iXt=si)=Tt(i,j) If Tt=r we call the process and time-homogenenous or stationary If r is sufficiently well-behaved, then X has a unique stationary distribution Definition. Let X be a stationary finite-state Markov chain and let T be the time of the first visit to state j after t=0. Then state j is called recurrent(opposite transient) if P({T 1 if 8 is the largest integer P(T=n8 for some n 1X0=j=1 If there is no such 8>1, then i is called aperiodic Definition. A Markov chain is said to be aperiodic if all its states are aperiodic Definition. A state j can be reached from i if there exists an integer n >0 such that r"(i,j)>0
a sequence of matrices Γt such that µt+1 = Γtµt . (21) Then X is said to be a finite–state Markov chain in discrete time. Call Γt the probability transition matrix. The interpretation of (21) is the following. P(Xt+1 = xi |Xt = xj ) = Γt(i, j). If Γt = Γ we call the process and time–homogenenous or stationary. If Γ is sufficiently well–behaved, then X has a unique stationary distribution. Definition. Let X be a stationary finite–state Markov chain and let T be the time of the first visit to state j after t = 0. Then state j is called recurrent (opposite: transient) if P({T 1 if δ is the largest integer for which P({T = nδ for some n ≥ 1}|X0 = j) = 1. If there is no such δ > 1, then j is called aperiodic. Definition. A Markov chain is said to be aperiodic if all its states are aperiodic. Definition. A state j can be reached from i if there exists an integer n ≥ 0 such that Γ n (i, j) > 0. 16
where by rn(i, j)we mean the(i, j) element of the matrix Tn Definition, A set of states is said to be closed if no state outside it can be reached from any state in it Definition. A set of states is said to be ergodic if it is closed and no proper subset is closed Definition. A Markov chain is called irreducible if its only d set is the set of all states Theorem. Let X be a finite state stationary Markov chain with transition matrix T and suppose X irreducible and aperiodic. Then has a unique solution u* and this u* has the property that lim Tu=μ for all po such that u. 1=1 Proof. See Cinlar(1975).E 3.2 Finite-state markov chains in continuous time Let( F, P) be a probability space and let x=a1, 12,..., In be a finite set be a stochastic process. Denote by u(t)the vector of probabilities
where by Γn (i, j) we mean the (i, j) element of the matrix Γn . Definition. A set of states is said to be closed if no state outside it can be reached from any state in it. Definition. A set of states is said to be ergodic if it is closed and no proper subset is closed. Definition. A Markov chain is called irreducible if its only closed set is the set of all states. Theorem. Let X be a finite state stationary Markov chain with transition matrix Γ and suppose X irreducible and aperiodic. Then µ = Γµ µ · 1 = 1 has a unique solution µ ∗ and this µ ∗ has the property that lim t→∞ Γ tµ0 = µ ∗ for all µ0 such that µ · 1 = 1. Proof. See Cinlar (1975). 3.2 Finite–state Markov chains in continuous time Let (Ω, F, P) be a probability space and let X = {x1, x2, . . . , xn} be a finite set. Let X : R+ → X be a stochastic process. Denote by µ(t) the vector of probabilities 17
that X(t)=ri and suppose there is a matrix-valued function r (t) such that u(t) satisfies t)=r(t)(t) Then we call X a finite-state continuous time Markov chain and If r(t)=r we call the process and time-homogenenous or stationary 3.3 Poisson processes 3.3.1 The poisson distribution Intuitively, the Poisson distribution comes from taking the limit of a sum of bernoulli random var tables Definition, a random variable x is said to be bernoulli if there is a real number 0≤p≤1 such that P Definition. A random variable y is said to be binomial distribution if there is an integer n and a real number0≤p≤1 such that P(Y=6)=∑(x)少(-p where the binomial coefficient is defined as follows
that X(t) = xi and suppose there is a matrix-valued function Γ(t) such that µ(t) satisfies µ˙(t) = Γ(t)µ(t). Then we call X a finite-state continuous time Markov chain and If Γ(t) = Γ we call the process and time–homogenenous or stationary. 3.3 Poisson processes 3.3.1 The Poisson distribution Intuitively, the Poisson distribution comes from taking the limit of a sum of Bernoulli random variables. Definition. A random variable X is said to be Bernoulli if there is a real number 0 ≤ p ≤ 1 such that P({X = 1}) = p and P({X = 0}) = 1 − p. Definition. A random variable Y is said to be binomial distribution if there is an integer n and a real number 0 ≤ p ≤ 1 such that P({Y = k}) = Xn k=0 µ n k ¶ p k (1 − p) n−k where the binomial coefficient is defined as follows. µ n k ¶ = n! k!(n − k)! 18
Proposition. If iXi, i=l,., n is a collection of independent Bernoulli random variables. then y defined via is binomially distributed Proposition. Let X be binomial with parameters n, p. Then EIX Proof. Exercise Now imagine that we are on a fishing expedition. For some reason we dip the fishing ole into the water 10 times for one minute at a time. The probability of catching a fish during any one dipping is p, independently of whether I caught a fish in any previous dipping. Then the total number of fish caught is a binomial variable But what if I dip 20 times for half a minute, or 40 times for 15 seconds etc. Assume that the probability of catching a fish during a half-minute dip is p/2 and similarly for shorter dips. What happens in the limit? As we take limits, let the expected total number of fish caught A= np be constant and let n -00.(It follows that p-0.) Then it turns out that the distribution of the total number of fish caught tends to the Poisson distribution with parameter A Definition. A random variable x is said to be poisson distributed if there is a real number \>0 such that P({X=k})=e- Proposition Suppose two independent random variables X and Y are Poisson
Proposition. If {Xi , i = 1, . . . , n} is a collection of independent Bernoulli random variables, then Y defined via Y = Xn i=1 Xi is binomially distributed. Proposition. Let X be binomial with parameters n, p. Then E[X] = np. Proof. Exercise. Now imagine that we are on a fishing expedition. For some reason we dip the fishing pole into the water 10 times for one minute at a time. The probability of catching a fish during any one dipping is p, independently of whether I caught a fish in any previous dipping. Then the total number of fish caught is a binomial variable. But what if I dip 20 times for half a minute, or 40 times for 15 seconds etc. Assume that the probability of catching a fish during a half–minute dip is p/2 and similarly for shorter dips. What happens in the limit? As we take limits, let the expected total number of fish caught λ = np be constant and let n → ∞. (It follows that p → 0.) Then it turns out that the distribution of the total number of fish caught tends to the Poisson distribution with parameter λ. Definition. A random variable X is said to be Poisson distributed if there is a real number λ ≥ 0 such that P({X = k}) = e −λ λ k k! . Proposition Suppose two independent random variables X and Y are Poisson 19
distributed with parameters A and u, respectively. Then Z=X+y is Poisson distributed with parameter A+u. Proof. Use the characteristic function Definition. A continuous time stochastic process is a function X(t)where for each fixed t>0, X(t) is a random variable Definition. Let( Q, F, P, F) be a filtered probability space. A stochastic process IN(t,w); t20 is said to be a(P, E)-Poisson process with intensity A if ·Nisf- adapted. The trajectories of N are(with probability one)right continuous and piecewise continuous N(0)=0 ·△N(t)=0or1( with probability one) where f)=N(t)-N(t-) · For all s≤t,N(t)-N(s) is independent of F s N()-N(s) is Poisson distributed with parameter A(t-s), that is P(N(t)-N(s)=k|)=P(N()-N()=k)=c-A(-2(t-s Proposition. The time between jumps isis exponentially distributed. More pre- cisely, let T be defned via T=infN(t)>0)
distributed with parameters λ and µ, respectively. Then Z = X + Y is Poisson distributed with parameter λ + µ. Proof. Use the characteristic function. Definition. A continuous time stochastic process is a function X(t) where for each fixed t ≥ 0, X(t) is a random variable. Definition. Let (Ω, F, P, F) be a filtered probability space. A stochastic process {N(t, ω);t ≥ 0} is said to be a (P, F)–Poisson process with intensity λ if • N is F–adapted. • The trajectories of N are (with probability one) right continuous and piecewise continuous. • N(0) = 0. • ∆N(t) = 0 or 1 (with probability one) where ∆N(t) = N(t) − N(t−). • For all s ≤ t, N(t) − N(s) is independent of Fs. • N(t) − N(s) is Poisson distributed with parameter λ(t − s), that is P (N(t) − N(s) = k | Fs)) = P (N(t) − N(s) = k) = e −λ(t−s)λ k (t − s) k k! . Proposition. The time between jumps is is exponentially distributed. More precisely, let τ be defined via τ = inf t≥0 {N(t) > 0}. 20