BRANCHING PROCESSES 1.GALTON-WATSON PROCESSES Galton-Watson processes were introduced by Francis Galton in 1889 as a simple mathemat- ical model for the propagation of family names.They were reinvented by Leo Szilard in the late 1930s as models for the proliferation of free neutrons in a nuclear fission reaction.General- izations of the extinction probability formulas that we shall derive below played a role in the calculation of the critical mass of fissionable material needed for a sustained chain reaction. Galton-Watson processes continue to play a fundamental role in both the theory and applica- tions of stochastic processes. First,an informal desription:A population of individuals(which may represent people,or- ganisms,free neutrons,etc.,depending on the context)evolves in discrete time n=0,1,2,... according to the following rules.Each nth generation individual produces a random number (possibly 0)of individuals,called offspring,in the(n+1)st generation.The offspring counts a,5B,,..for distinct individuals a,B,r,...are mutually independent,and also indepen- dent of the offspring counts of individuals from earlier generations.Furthermore,they are identically distributed,with common distribution {pk}.The state Zn of the Galton-Watson process at time n is the number of individuals in the nth generation. More formally, Definition 1.A Galton-Watson process {Zn}nzo with offspring distribution F ={pklkzo is a discrete-time Markov chain taking values in the set Z+of nonnegative integers whose transition probabilities are as follows: (1) P(Zn+1 =klZn m}=pim. Here (p)denotes the m-th convolution power of the distributionp.In other words,the conditional distribution of Z given thatZn=m is the distribution of the sum of m i.i.d. random variables each with distribution (p}.The default initial state is Zo=1. Construction:A Galton-Watson process with offspring distribution F={p&}kz can be built on any probability space that supports an infinite sequence of i.i.d.random variables all with distribution F.Assume that these are arranged in a doubly infinite array,as follows: 引2动… 吊昆务… 品… etc
BRANCHING PROCESSES 1. GALTON-WATSON PROCESSES Galton-Watson processes were introduced by Francis Galton in 1889 as a simple mathematical model for the propagation of family names. They were reinvented by Leo Szilard in the late 1930s as models for the proliferation of free neutrons in a nuclear fission reaction. Generalizations of the extinction probability formulas that we shall derive below played a role in the calculation of the critical mass of fissionable material needed for a sustained chain reaction. Galton-Watson processes continue to play a fundamental role in both the theory and applications of stochastic processes. First, an informal desription: A population of individuals (which may represent people, organisms, free neutrons, etc., depending on the context) evolves in discrete time n = 0, 1, 2,... according to the following rules. Each nth generation individual produces a random number (possibly 0) of individuals, called offspring, in the (n + 1)st generation. The offspring counts ξα,ξβ ,ξγ,... for distinct individuals α,β,γ,... are mutually independent, and also independent of the offspring counts of individuals from earlier generations. Furthermore, they are identically distributed, with common distribution {pk }k≥0 . The state Zn of the Galton-Watson process at time n is the number of individuals in the nth generation. More formally, Definition 1. A Galton-Watson process {Zn }n≥0 with offspring distribution F = {pk }k≥0 is a discrete-time Markov chain taking values in the setZ+ of nonnegative integers whose transition probabilities are as follows: (1) P {Zn+1 = k |Zn = m} = p ∗m k . Here {p ∗m k } denotes the m−th convolution power of the distribution {pk }. In other words, the conditional distribution of Zn+1 given that Zn = m is the distribution of the sum of m i.i.d. random variables each with distribution {pk }. The default initial state is Z0 = 1. Construction: A Galton-Watson process with offspring distribution F = {pk }k≥0 can be built on any probability space that supports an infinite sequence of i.i.d. random variables all with distribution F . Assume that these are arranged in a doubly infinite array, as follows: ξ 1 1 ,ξ 1 2 ,ξ 1 3 ,··· ξ 2 1 ,ξ 2 2 ,ξ 2 3 ,··· ξ 3 1 ,ξ 3 2 ,ξ 3 3 ,··· etc. 1
2 BRANCHING PROCESSES Set Zo=1,and inductively define (2) Zn+1= rn+l The independence of the random variables"guarantees that the sequence(Zn)nzo has the Markov property,and that the conditional distributions satisfy equation(1). 0 For certain choices of the offspring distribution F,the Galton-Watson process isn't very in- teresting.For example,if F is the probability distribution that puts mass 1 on the integer 17, then the evolution of the process is purely deterministic: Zn =(17)"for every n20. Another uninteresting case is when F has the form popi>0 and po+p=1. In this case the population remains at its initial size Zo=1 for a random number of steps with a geometric distribution,then jumps to 0,after which it remains stuck at 0 forever afterwards. (Observe that for any Galton-Watson process,with any offspring distribution,the state 0 is an absorbing state.)To avoid having to consider these uninteresting cases separately in every result to follow,we make the following standing assumption: Assumption 1.The offspring distribution is not a point mass(that is,there is no k>0 such that p=1),and it places positive probability on some integer k >2.Furthermore,the offspring distribution has finite mean u>0 and finite variance o2>0. 1.1.First Moment Calculation.The inductive definition(2)allows a painless calculation of the means EZ Since the random variables are independent ofZ 1(Zn =m} i=1 P(Zn =m} =∑muPiZn=-ml 司 =uEZn. Since EZo=1,it follows that (3) EZn=u". Corollary 1.Ifun}≤un
2 BRANCHING PROCESSES Set Z0 = 1, and inductively define (2) Zn+1 = X Zn i=1 ξ n+1 i . The independence of the random variables ξ n i guarantees that the sequence (Zn )n≥0 has the Markov property, and that the conditional distributions satisfy equation (1). For certain choices of the offspring distribution F , the Galton-Watson process isn’t very interesting. For example, if F is the probability distribution that puts mass 1 on the integer 17, then the evolution of the process is purely deterministic: Zn = (17) n for every n ≥ 0. Another uninteresting case is when F has the form p0p1 > 0 and p0 + p1 = 1. In this case the population remains at its initial size Z0 = 1 for a random number of steps with a geometric distribution, then jumps to 0, after which it remains stuck at 0 forever afterwards. (Observe that for any Galton-Watson process, with any offspring distribution, the state 0 is an absorbing state.) To avoid having to consider these uninteresting cases separately in every result to follow, we make the following standing assumption: Assumption 1. The offspring distribution is not a point mass (that is, there is no k ≥ 0 such that pk = 1), and it places positive probability on some integer k ≥ 2. Furthermore, the offspring distribution has finite mean µ > 0 and finite variance σ2 > 0. 1.1. First Moment Calculation. The inductive definition (2) allows a painless calculation of the means E Zn . Since the random variables ξ n+1 i are independent of Zn , E Zn+1 = X∞ k=0 E Xm i=1 ξ n+1 i 1{Zn = m} = X∞ k=0 E Xm i=1 ξ n+1 i P {Zn = m} = X∞ k=0 mµP {Zn = m} = µE Zn . Since E Z0 = 1, it follows that (3) E Zn = µ n . Corollary 1. If µ n} ≤ µ n .
BRANCHING PROCESSES Proof.The event {>n}coincides with the event {n>1).By Markov's inequality, P{Zm21}≤EZn=u". 口 1.2.Recursive Structure and Generating Functions.The Galton-Watson process Zn has a sim- ple recursive structure that makes it amenable to analysis by generating function methods. Each of the first-generation individuals a,B,r,...behaves independently of the others;more- over,all of its descendants (the offspring of the offspring,etc.)behaves independently of the descendants of the other first-generation individuals.Thus,each of the first-generation indi- viduals engenders an independent copy of the Galton-Watson process.It follows that a Galton- Watson process is gotten by conjoining to the single individual in the Oth generation Z1(con- ditionally)independent copies of the Galton-Watson process.The recursive structure leads to a simple set of relations among the probability generating functions of the random variables Zn: Proposition 2.Denote by n(t)=EtZn the probability generating function of the random vari- ableZn and by(t)=>ptk the probability generating function of the offspring distribu- tion.Then on is the n-fold composition of p by itself,that is, (4) po(t)=t and (5) n+(t)=(on(t))=on(o(t))Ynz0. Proof.There are two ways to proceed,both simple.The first uses the recursive structure di- rectly to deduce that Zn+1 is the sum of Zi conditionally independent copies of Zn.Thus, n+i(t)=Et2nti=Eon(t)2 (on(t)). The second argument relies on the fact the generating function of the mth convolution power p}is the mth power of the generating function (t)of {pk}.Thus, Panl=E=E2-1Z=kpZ,=利 k=0 00 g(tPZn= =on((t)). By induction on n,this is the(n+1)st iterate of the function (t). ▣ Problem 1.(A)Show that if the mean offspring numberμ:=∑kkpk(k-u)2p&<oo then the variance of Zn is finite,and give a formula for it
BRANCHING PROCESSES 3 Proof. The event {τ > n} coincides with the event {Zn ≥ 1}. By Markov’s inequality, P {Zn ≥ 1} ≤ E Zn = µ n . 1.2. Recursive Structure and Generating Functions. The Galton-Watson processZn has a simple recursive structure that makes it amenable to analysis by generating function methods. Each of the first-generation individuals α,β,γ,... behaves independently of the others; moreover, all of its descendants (the offspring of the offspring, etc.) behaves independently of the descendants of the other first-generation individuals. Thus, each of the first-generation individuals engenders an independent copy of the Galton-Watson process. It follows that a GaltonWatson process is gotten by conjoining to the single individual in the 0th generation Z1 (conditionally) independent copies of the Galton-Watson process. The recursive structure leads to a simple set of relations among the probability generating functions of the random variables Zn : Proposition 2. Denote by ϕn (t ) = E t Zn the probability generating function of the random variable Zn , and by ϕ(t ) = P∞ k=0 pk t k the probability generating function of the offspring distribution. Then ϕn is the n−fold composition of ϕ by itself, that is, ϕ0 (4) (t ) = t and ϕn+1 (t ) = ϕ(ϕn (t )) = ϕn (ϕ(t )) ∀n ≥ 0.(5) Proof. There are two ways to proceed, both simple. The first uses the recursive structure directly to deduce that Zn+1 is the sum of Z1 conditionally independent copies of Zn . Thus, ϕn+1 (t ) = E t Zn+1 = E ϕn (t ) Z1 = ϕ(ϕn (t )). The second argument relies on the fact the generating function of themth convolution power {p ∗m k } is the mth power of the generating function ϕ(t ) of {pk }. Thus, ϕn+1 (t ) = E t Zn+1 = X∞ k=0 E (t Zn+1 |Zn = k)P (Zn = k) = X∞ k=0 ϕ(t ) m P (Zn = k) = ϕn (ϕ(t )). By induction on n, this is the (n + 1)st iterate of the function ϕ(t ). Problem 1. (A) Show that if the mean offspring number µ := P k k pk < ∞ then the expected size of the nth generation is E Zn = µ n . (B) Show that if the variance σ2 = P k (k − µ) 2pk < ∞ then the variance of Zn is finite, and give a formula for it.
BRANCHING PROCESSES Properties of the Generating Function (t):Assumption 1 guarantees that (t)is not a linear function,because the offspring distribution puts mass on some integer k >2.Thus,(t)has the following properties: (A)p(t)is strictly increasing for0≤t≤l. (B)(t)is strictly convex,with strictly increasing first derivative. (C)p(1)=1. 1.3.Extinction Probability.If for some n the population size Zn=0 then the population size is 0 in all subsequent generations.In such an event,the population is said to be extinct.The first time that the population size is 0(formally,=min{n:Zn=0},or =oo if there is no such n)is called the extinction time.The most obvious and natural question concerning the behavior of a Galton-Watson process is:What is the probability P{<oo}of extinction? Proposition 3.The probability ofextinction is the smallest nonnegative root t=of the equa- tion (6) (t)=t. Proof.The key idea is recursion.Consider what must happen in order for the event t<oo of extinction to occur:Either(a)the single individual alive at time 0 has no offspring;or(b)each of its offspring must engender a Galton-Watson process that reaches extinction.Possibility(a) occurs with probability po.Conditional on the event that Z1=k,possibility (b)occurs with probability.Therefore, =%+∑pgt=pg, k=1 that is,the extinction probability is a root of the Fixed-Point Equation(6). There is an alternative proof that =(that uses the iteration formula(5)for the prob- ability generating function of Zn.Observe that the probability of the eventZn=0 is easily recovered from the generating function n(t): P{Zm=0}=pn(0). By the nature of the Galton-Watson process,these probabilities are nondecreasing in n,be- cause if Zn=0 then Zn+1=0.Therefore,the limit:=lim(0)exists,and its value is the extinction probability for the Galton-Watson process.The limit must be a root of the Fixed-Point Equation,because by the continuity of p(ξ)=p(1impn(0) n+00 =lim (n(0)) n+0∞ =lim n+1(0) = Finally,it remains to show that is the smallest nonnegative root of the Fixed-Point Equa- tion.This follows from the monotonicity of the probability generating functions n:Since
4 BRANCHING PROCESSES Properties of the Generating Function ϕ(t ): Assumption 1 guarantees that ϕ(t ) is not a linear function, because the offspring distribution puts mass on some integer k ≥ 2. Thus, ϕ(t ) has the following properties: (A) ϕ(t ) is strictly increasing for 0 ≤ t ≤ 1. (B) ϕ(t ) is strictly convex, with strictly increasing first derivative. (C) ϕ(1) = 1. 1.3. Extinction Probability. If for some n the population size Zn = 0 then the population size is 0 in all subsequent generations. In such an event, the population is said to be extinct. The first time that the population size is 0 (formally, τ = min{n : Zn = 0}, or τ = ∞ if there is no such n) is called the extinction time. The most obvious and natural question concerning the behavior of a Galton-Watson process is: What is the probability P {τ <∞} of extinction? Proposition 3. The probability ζ of extinction is the smallest nonnegative root t = ζ of the equation (6) ϕ(t ) = t . Proof. The key idea is recursion. Consider what must happen in order for the event τ < ∞ of extinction to occur: Either (a) the single individual alive at time 0 has no offspring; or (b) each of its offspring must engender a Galton-Watson process that reaches extinction. Possibility (a) occurs with probability p0 . Conditional on the event that Z1 = k, possibility (b) occurs with probability ζ k . Therefore, ζ = p0 + X∞ k=1 p k ζ k = ϕ(ζ), that is, the extinction probability ζ is a root of the Fixed-Point Equation (6). There is an alternative proof that ζ = ϕ(ζ) that uses the iteration formula (5) for the probability generating function of Zn . Observe that the probability of the event Zn = 0 is easily recovered from the generating function ϕn (t ): P {Zn = 0} = ϕn (0). By the nature of the Galton-Watson process, these probabilities are nondecreasing in n, because if Zn = 0 then Zn+1 = 0. Therefore, the limit ξ := limn→∞ ϕn (0) exists, and its value is the extinction probability for the Galton-Watson process. The limit ξ must be a root of the Fixed-Point Equation, because by the continuity of ϕ, ϕ(ξ) = ϕ( limn→∞ ϕn (0)) = limn→∞ ϕ(ϕn (0)) = limn→∞ ϕn+1 (0) = ξ. Finally, it remains to show that ξ is the smallest nonnegative root ζ of the Fixed-Point Equation. This follows from the monotonicity of the probability generating functions ϕn : Since
BRANCHING PROCESSES 3≥0, 9n(0)≤pn(3)=3 Taking the limit of each side as n→o reveals thatξ≤y. 0 It now behooves us to find out what we can about the roots of the Fixed-Point Equation(6). First,observe that there is always at least one nonnegative root,to wit,t=1,this because o(t) is a probability generating function.Furthermore,since Assumption 1 guarantees that (t)is strictly convex,roots of equation 6 must be isolated.The next proposition asserts that there are either one or two roots,depending on whether the mean number of offspring u:=>kkpk is greater than one. Definition 2.A Galton-Watson process with mean offspring number u is said to be supercritical if u>1,critical ifu=1,or subcritical if u1.If po =0 then the Fixed-Point Equation has roots t=0 and t =1,and because (t)is strictly convex,there are no other positive roots.So suppose that po>0,so that (0)=po >0.Since (1)=u>1,Taylor's formula implies that o(t)0 and (t,)-t,n}~Cμ"asn→oo. (B)Ifu=1 then P(t>n)~2/(o2n)as noo
BRANCHING PROCESSES 5 ζ ≥ 0, ϕn (0) ≤ ϕn (ζ) = ζ. Taking the limit of each side as n →∞ reveals that ξ ≤ ζ. It now behooves us to find out what we can about the roots of the Fixed-Point Equation (6). First, observe that there is always at least one nonnegative root, to wit, t = 1, this because ϕ(t ) is a probability generating function. Furthermore, since Assumption 1 guarantees that ϕ(t ) is strictly convex, roots of equation 6 must be isolated. The next proposition asserts that there are either one or two roots, depending on whether the mean number of offspring µ := P k k pk is greater than one. Definition 2. A Galton-Watson process with mean offspring numberµis said to be supercritical if µ > 1, critical if µ = 1, or subcritical if µ 1. If p0 = 0 then the Fixed-Point Equation has roots t = 0 and t = 1, and because ϕ(t ) is strictly convex, there are no other positive roots. So suppose that p0 > 0, so that ϕ(0) = p0 > 0. Since ϕ 0 (1) = µ > 1, Taylor’s formula implies that ϕ(t ) 0 and ϕ(t∗ ) − t∗ n} ∼ C µ n as n →∞. (B) If µ = 1 then P {τ > n} ∼ 2/(σ2n) as n →∞.
BRANCHING PROCESSES Thus,in the subcritical case,the extinction time has an exponentially decaying tail,and hence finite moments of all orders.On the other hand,in the critical case the extinction time has infinite mean. Proof.First note that P{>n}=P(Zn >0).Recall from the proof of Proposition 3 that P(Zn= 0)=n(0);hence, P{x>n}=1-pn(0) This shows that the tail of the distribution is determined by the speed at which the sequence n(0)approaches 1.In the subcritical case,the graph of the generating function o(t)has slope u<1 at t =1,whereas in the critical case the slope is u=1.It is this difference that accounts for the drastic difference in the rate of convergence. Subcritical Case:Consider first the case where u=(1)<1.Recall from the proof of Proposi- tion 3 that in this case the sequence(0)increases and has limit 1.Thus,for n large,(0)will be near 1,and in this neighborhood the first-order Taylor series will provide a good approxi- mation to Consequently, (8) 1-9n+1(0)=1-p(pn(0刃 =1-p(1-(1-9n(0) =1-(1-9'(11-pm(0)+O(1-pn(0)2 =41-9n(0)+0(1-9n0)2 If not for the remainder term,we would have an exact equality 1-n+(0)=u(1-n(0)),which could be iterated to give 1-9n(0)=μ"(1-9o(0)=μ”. This would prove the assertion(A).Unfortunately,the equalities are exact only in the special case where the generating function o(t)is linear.In the general case,the remainder term in the Taylor series expansion(??)must be accounted for. Because the generating function (t)is convex,with derivative (1)=u,the error in the approximation(8)is negative:in particular,for some constant 0<C<oo, 41-Pn(0)-C1-9n(02≤1-9n+1(0)≤4(1-9n(0. The upper bound implies that 1-(0)<u"(repeat the iteration argument above,replacing equalities by inequalities!).Now divide through by u(1-(0))to get 1-C1-9n0≤0-0-2*10)】 c1-9n0s1→ 1-C4"≤0-1-9n+10 ≤1 u-n(1-pn(0) Thus,successive ratios of the terms u"(1-(0))are exceedingly close to 1,the error decay- ing geometrically.Since these errors sum,Weierstrass'Theorem on convergence of products implies that lim μ-"(1-pn(0) n-0-0(1-9o(0)n= =,li"(1-9n(o)=Cp
6 BRANCHING PROCESSES Thus, in the subcritical case, the extinction time has an exponentially decaying tail, and hence finite moments of all orders. On the other hand, in the critical case the extinction time has infinite mean. Proof. First note that P {τ > n} = P {Zn > 0}. Recall from the proof of Proposition 3 that P {Zn = 0} = ϕn (0); hence, P {τ > n} = 1 − ϕn (0). This shows that the tail of the distribution is determined by the speed at which the sequence ϕn (0) approaches 1. In the subcritical case, the graph of the generating function ϕ(t ) has slope µ < 1 at t = 1, whereas in the critical case the slope is µ = 1. It is this difference that accounts for the drastic difference in the rate of convergence. Subcritical Case: Consider first the case where µ = ϕ 0 (1) < 1. Recall from the proof of Proposition 3 that in this case the sequence ϕn (0) increases and has limit 1. Thus, for n large, ϕn (0) will be near 1, and in this neighborhood the first-order Taylor series will provide a good approximation to ϕ. Consequently, 1 − ϕn+1 (8) (0) = 1 − ϕ(ϕn (0)) = 1 − ϕ(1 − (1 − ϕn (0))) = 1 − (1 − ϕ 0 (1)(1 − ϕn (0))) + O(1 − ϕn (0))2 = µ(1 − ϕn (0)) + O(1 − ϕn (0))2 . If not for the remainder term, we would have an exact equality 1−ϕn+1 (0) = µ(1−ϕn (0)), which could be iterated to give 1 − ϕn (0) = µ n (1 − ϕ0 (0)) = µ n . This would prove the assertion (A). Unfortunately, the equalities are exact only in the special case where the generating function ϕ(t ) is linear. In the general case, the remainder term in the Taylor series expansion (??) must be accounted for. Because the generating function ϕ(t ) is convex, with derivative ϕ 0 (1) = µ, the error in the approximation (8) is negative: in particular, for some constant 0 < C <∞, µ(1 − ϕn (0)) − C (1 − ϕn (0))2 ≤ 1 − ϕn+1 (0) ≤ µ(1 − ϕn (0)). The upper bound implies that 1 − ϕn (0) ≤ µ n (repeat the iteration argument above, replacing equalities by inequalities!). Now divide through by µ(1 − ϕn (0)) to get 1 − C (1 − ϕn (0)) ≤ µ −n−1 (1 − ϕn+1 (0)) µ−n (1 − ϕn (0)) ≤ 1 =⇒ 1 − C µ n ≤ µ −n−1 (1 − ϕn+1 (0)) µ−n (1 − ϕn (0)) ≤ 1. Thus, successive ratios of the terms µ −n (1 − ϕn (0)) are exceedingly close to 1, the error decaying geometrically. Since these errors sum, Weierstrass’ Theorem on convergence of products implies that limn→∞ µ −n (1 − ϕn (0)) µ−0(1 − ϕ0 (0)) = limn→∞ µ −n (1 − ϕn (0)) := CF
BRANCHING PROCESSES exists and is positive. Critical Case:Exercise.(See Problem 4 below.) 0 1.5.Asymptotic Growth Rate for Supercritical Galton-Watson Processes.It is not hard to see that if a Galton-Watson process Zn is supercritical(that is,the mean offspring number u>1) then either Zn=0 eventually or Znoo.Here is an informal argument for the case where po >0:Each time that Zn=K,for some K >1,there is chance po that Zn+1=0.If somehow the process Zn were to visit the state K infinitely many times,then it would have infinitely many chances to hit an event of probability p;but once it hits this event,it is absorbed in the state 0 and can never revisit state K.This argument can be made rigorous: Problem 2.Prove that if a Markov chain has an absorbing state z,and if x is a state such that z is accessible from x,then x is transient. If Zn is supercritical,then it follows that with positive probability(=1-probability of extinc- tion)Zno.How fast does it grow? Theorem 6.There exists a nonnegative random variable W such that (9) i。Zn/μ"=W almost surely. If the offspring distribution has finite second moment andu>1 then the limit random variable W is positive on the event that Zn-. Given the Martingale Convergence Theorem,the convergence(9)is easy;however,(9)is quite difficult to prove without martingales.In section 2 below,I will prove an analogous conver- gence theorem for a continuous-time branching process. 1.6.Problems. Problem 3.Suppose that the offspring distribution is nondegenerate,with mean u1,and let be the smallest positive root of the Fixed-Point Equation.(A)Show that ifu 1 then the root is an attractive fixed point of that is,()1, lim P(t>nxlt>n)=C/x. 11+●● lActually,it is enough thatklogk<o:this is the Kesten-Stigum theorem
BRANCHING PROCESSES 7 exists and is positive. Critical Case: Exercise. (See Problem 4 below.) 1.5. Asymptotic Growth Rate for Supercritical Galton-Watson Processes. It is not hard to see that if a Galton-Watson process Zn is supercritical (that is, the mean offspring number µ > 1) then either Zn = 0 eventually or Zn → ∞. Here is an informal argument for the case where p0 > 0: Each time that Zn = K , for some K ≥ 1, there is chance p K 0 that Zn+1 = 0. If somehow the process Zn were to visit the state K infinitely many times, then it would have infinitely many chances to hit an event of probability p K 0 ; but once it hits this event, it is absorbed in the state 0 and can never revisit state K . This argument can be made rigorous: Problem 2. Prove that if a Markov chain has an absorbing state z , and if x is a state such that z is accessible from x , then x is transient. If Zn is supercritical, then it follows that with positive probability (=1-probability of extinction) Zn →∞. How fast does it grow? Theorem 6. There exists a nonnegative random variable W such that (9) limn→∞ Zn /µn = W almost surely. If the offspring distribution has finite second moment1 and µ > 1 then the limit random variable W is positive on the event that Zn →∞. Given the Martingale Convergence Theorem, the convergence (9) is easy; however, (9) is quite difficult to prove without martingales. In section 2 below, I will prove an analogous convergence theorem for a continuous-time branching process. 1.6. Problems. Problem 3. Suppose that the offspring distribution is nondegenerate, with mean µ 6= 1, and let ζ be the smallest positive root of the Fixed-Point Equation. (A) Show that if µ 6= 1 then the root ζ is an attractive fixed point of ϕ, that is, ϕ 0 (ζ) 1, limn→∞ P (τ > n x |τ > n) = C /x . 1Actually, it is enough that P k≥3 pk k logk <∞: this is the Kesten-Stigum theorem.
8 BRANCHING PROCESSES HINT for part (A):The Taylor series approximation to (t)at=1 leads to the following ap- proximate relationship,valid for large n: 1-9a+10)≈1-pn0)-2p"11-pno, which at first does not seem to help,but on further inspection does.The trick is to change variables:ifn is a sequence of positive numbers that satisfies the recursion Xn+1=xn-bxn then the sequence y:=1/xn satisfies yn+i=yn+b+b/yn+.... Problem5.There's a Galton-Watson processinmy random walk!Let S be the simple nearest- neighbor random walk on the integers started at So=1.Define T to be the time of the first visit to the origin,that is,the smallest n>1 such that S=0.Define Zo =1 and T-1 Zk=>1{Xn=k and Xn+1=k+1]. n=0 In words,Zk is the number of times that the random walk Xn crosses from k to k+1 before first visiting 0. (A)Prove that the sequence {Zk}o is a Galton-Watson process,and identify the offspring dis- tribution as a geometric distribution. (B)Calculate the probability generating function of the offspring distribution,and observe that it is a linear fractional transformation.(See Ahlfors,Complex Analysis,ch.1 for the definition and basic theory of LFTs.Alternatively,try the Wikipedia article.) (C)Use the result of(B)to find out as much as you can about the distribution of Zk. (D)Show that T=Zis the total number of individuals ever born in the course of the Galton-Watson process,and show that T(the extinction time of the Galton-Watson process)is the maximum displacement M from 0 attained by the random walk before its first return to the origin.What does the result of problem 4,part(B),tell you about the distribution of M? 2.YULE'S BINARY FISSION PROCESS 2.1.Definition and Construction.The Yule process is a continuous-time branching model, in which individuals undergo binary fission at random times.It evolves as follows:Each in- dividual,independently of all others and of the past of the process,waits an exponentially distributed time and then splits into two identical particles.(It is useful for the construction below to take the view that at each fission time the fissioning particle survives and creates one new clone of itself.)The exponential waiting times all have mean 1.Because the exponential random variables are mutually independent,the probability that two fissions will occur simul- taneously is 0. A Yule process started by 1 particle at time 0 can be built from independent Poisson processes as follows.Let {Ni(t)}ieN be a sequence of independent Poisson counting processes.Since the
8 BRANCHING PROCESSES HINT for part (A): The Taylor series approximation to ϕ(t ) at ζ = 1 leads to the following approximate relationship, valid for large n: 1 − ϕn+1 (0) ≈ 1 − ϕn (0) − 1 2 ϕ 00(1)(1 − ϕn (0))2 , which at first does not seem to help, but on further inspection does. The trick is to change variables: if xn is a sequence of positive numbers that satisfies the recursion xn+1 = xn − b x 2 n then the sequence yn := 1/xn satisfies yn+1 = yn + b + b /yn + .... Problem 5. There’s a Galton-Watson process inmy random walk! LetSn be the simple nearestneighbor random walk on the integers started at S0 = 1. Define T to be the time of the first visit to the origin, that is, the smallest n ≥ 1 such that Sn = 0. Define Z0 = 1 and Zk = T X−1 n=0 1{Xn = k and Xn+1 = k + 1}. In words, Zk is the number of times that the random walk Xn crosses from k to k +1 before first visiting 0. (A) Prove that the sequence {Zk }k≥0 is a Galton-Watson process, and identify the offspring distribution as a geometric distribution. (B) Calculate the probability generating function of the offspring distribution, and observe that it is a linear fractional transformation. (See Ahlfors, Complex Analysis, ch. 1 for the definition and basic theory of LFTs. Alternatively, try the Wikipedia article.) (C) Use the result of (B) to find out as much as you can about the distribution of Zk . (D) Show that T = P k≥1 Zk is the total number of individuals ever born in the course of the Galton-Watson process, and show that τ (the extinction time of the Galton-Watson process) is the maximum displacement M from 0 attained by the random walk before its first return to the origin. What does the result of problem 4, part (B), tell you about the distribution of M ? 2. YULE’S BINARY FISSION PROCESS 2.1. Definition and Construction. The Yule process is a continuous-time branching model, in which individuals undergo binary fission at random times. It evolves as follows: Each individual, independently of all others and of the past of the process, waits an exponentially distributed time and then splits into two identical particles. (It is useful for the construction below to take the view that at each fission time the fissioning particle survives and creates one new clone of itself.) The exponential waiting times all have mean 1. Because the exponential random variables are mutually independent, the probability that two fissions will occur simultaneously is 0. A Yule process started by 1 particle at time 0 can be built from independent Poisson processes as follows. Let {Nj (t )}j∈N be a sequence of independent Poisson counting processes. Since the
BRANCHING PROCESSES 9 interoccurrence times in a Poisson process are exponential-1,the jump times in the Poisson process Ni(t)can be used as the fission times of the jth particle;at each such fission time,a new particle must be added to the population,and so a new Poisson process N(t)must be "activated".Thus,the time Tm at which the mth fission occurs can be defined as follows:set To=0 and (10) Tm=min{t>Tm-1: N(t)-N(Tm-)=1} =1 Thus,Tm is the first time after Tm-1 that one of the first m Poisson processes jumps.The size Zt of the population at time t is then (11) Z,=m for Tm-1≤t2 particles:just change the definition of the fission times Tm to m+k-1 (12) Tm=min{t>Tm-1:∑(Wy(t)-N,(Tm-》=1. j=1 Alternatively,a Yule process with Zo =k can be gotten by superposing k independent Yule processesZ all with=1,that is, (13) Z=L Problem 6.Show that by suitably indexing the Poisson processes in the first construction(12) one can deduce the superposition representation(13). Problem 7.Calculate the mean EZt and variance var(Zt)of the population size in a Yule pro- cess.For the mean you should get EZ,=e'.HINT:Condition on the time of the first fission. 2.2.Asymptotic Growth. Theorem 7.Let Z,be the population size at time t in a Yule process with Zo=1.Then (14) Zrle!asw where W has the unit exponential distribution. The proof has two parts:First,it must be shown that Z/et converges to something;and second,it must be shown that the limit random variable W is exponentially distributed.The proof of almost sure convergence will be based on a careful analysis of the first passage times Tm defined by (10).Convergence of Z/et to a positive random variable W is equivalent to convergence of log Z:-t to a real-valued limit log W.Since Zt is a counting process(that is,it is nondecreasing in t and its only discontinuities are jumps of size 1),convergence oflog Z-t is equivalent to showing that there exists a finite random variable Y =-log W such that for any e>0, (15) lim(Tm-logm)=Y. 71+●0 To accomplish this,we will use the following consequence of the construction(10)
BRANCHING PROCESSES 9 interoccurrence times in a Poisson process are exponential-1, the jump times in the Poisson process Nj (t ) can be used as the fission times of the jth particle; at each such fission time, a new particle must be added to the population, and so a new Poisson process Nk (t ) must be “activated”. Thus, the time Tm at which the mth fission occurs can be defined as follows: set T0 = 0 and (10) Tm = min{t > Tm−1 : Xm j =1 (Nj (t ) −Nj (Tm−1 )) = 1}. Thus, Tm is the first time after Tm−1 that one of the first m Poisson processes jumps. The size Zt of the population at time t is then (11) Zt = m for Tm−1 ≤ t Tm−1 : mX +k−1 j =1 (Nj (t ) −Nj (Tm−1 )) = 1}. Alternatively, a Yule process with Z0 = k can be gotten by superposing k independent Yule processes Z j t all with Z j 0 = 1, that is, (13) Zt = X k j =1 Z j t Problem 6. Show that by suitably indexing the Poisson processes in the first construction (12) one can deduce the superposition representation (13). Problem 7. Calculate the mean E Zt and variance var(Zt ) of the population size in a Yule process. For the mean you should get E Zt = e t . HINT: Condition on the time of the first fission. 2.2. Asymptotic Growth. Theorem 7. Let Zt be the population size at time t in a Yule process with Z0 = 1. Then (14) Zt /e t a.s. −→ W where W has the unit exponential distribution. The proof has two parts: First, it must be shown that Zt /e t converges to something; and second, it must be shown that the limit random variable W is exponentially distributed. The proof of almost sure convergence will be based on a careful analysis of the first passage times Tm defined by (10). Convergence of Zt /e t to a positive random variable W is equivalent to convergence of logZt − t to a real-valued limit logW . Since Zt is a counting process (that is, it is nondecreasing in t and its only discontinuities are jumps of size 1), convergence of logZt −t is equivalent to showing that there exists a finite random variable Y = −logW such that for any " > 0, (15) limm→∞ (Tm − logm) = Y . To accomplish this, we will use the following consequence of the construction (10)
10 BRANCHING PROCESSES Proposition 8.Let Tm be the fission times in a Yule process Z with Zo=k.Then the interoccur- rence timesTm:=Tm-Tm-are independent,exponentially distributed random variables with expectations ETm =1/(m+k-1). Proof(Sketch).The random variable Tim is the first time after Ti-1 at which one of the Poisson processes Ni(t),for 1s js m+k-1,has a jump.Times between jumps in a Poisson process are exponentially distributed with mean 1,and jump times in independent Poisson processes are independent.Thus,the time until the next jump in m independent Poisson processes is the minimum of m independent exponentials,which is exponentially distributed with mean 1/m. This is not quite a complete argument,because the "start"times Tm are random.However, it is not difficult(exercise!)to turn the preceding into a rigorous argument by integrating out over the possible values of Tm and the possible choices for which Poisson processes jump at which times. 口 The family of exponential distributions is closed under scale transformations:In particular, if y is exponentially distributed with mean 1 and a>0 is a scalar,then ay is exponentially distributed with mean a.Since the variance var(Y)of a unit exponential is 1,it follows that the variance var(a y)of an exponential with mean a is a2.Consequently,ifm=Ti-Tim-1 is the time between the(m-1)th and the mth fission times in a Yule process with Zo=1,then (16) ETm+i=m and var(tm+1)=m 2, and so 12 177 (17) E Tm+1= k-1~logm and var(Tm+)=∑k-2g2)<o k= k= asm-→o,where(2)=】 In particular,the variance of T remains bounded as m-oo,and so the distribution of Tm remains concentrated around log m.In fact,Tm-log m converges,to a possibly random limit,by the following general result about random series of independent random variables: Theorem 9.Let X;be independent random variables with mean EX;=0 and finite variances var(Xj)=o.Then ●X (18) =2<0 → lim =S exists and is finite with probability one,and the limit random variable S has mean zero and varianceσ2. A proof of Theorem 9,based on Wald's Second Identity,is given in section 3 below.Modulo this,we have proved(15),and hence that W=limZ/e exists and is finite and strictly positive with probability 1
10 BRANCHING PROCESSES Proposition 8. Let Tm be the fission times in a Yule process Zt with Z0 = k. Then the interoccurrence times τm := Tm − Tm−1 are independent, exponentially distributed random variables with expectations E τm = 1/(m + k − 1). Proof (Sketch). The random variable Tm is the first time after Tm−1 at which one of the Poisson processes Nj (t ), for 1 ≤ j ≤ m + k − 1, has a jump. Times between jumps in a Poisson process are exponentially distributed with mean 1, and jump times in independent Poisson processes are independent. Thus, the time until the next jump in m independent Poisson processes is the minimum of m independent exponentials, which is exponentially distributed with mean 1/m. This is not quite a complete argument, because the “start” times Tm are random. However, it is not difficult (exercise!) to turn the preceding into a rigorous argument by integrating out over the possible values of Tm and the possible choices for which Poisson processes jump at which times. The family of exponential distributions is closed under scale transformations: In particular, if Y is exponentially distributed with mean 1 and α > 0 is a scalar, then αY is exponentially distributed with mean α. Since the variance var(Y ) of a unit exponential is 1, it follows that the variance var(αY ) of an exponential with mean α is α 2 . Consequently, if τm = Tm − Tm−1 is the time between the (m − 1)th and the mth fission times in a Yule process with Z0 = 1, then (16) E τm+1 = m−1 and var(τm+1 ) = m−2 , and so (17) E Tm+1 = Xm k=1 k −1 ∼ logm and var(Tm+1 ) =Xm k=1 k −2 → ζ(2) <∞ as m → ∞, where ζ(2) = P∞ k=1 k −2 . In particular, the variance of Tm remains bounded as m →∞, and so the distribution of Tm remains concentrated around logm. In fact, Tm −logm converges, to a possibly random limit, by the following general result about random series of independent random variables: Theorem 9. Let X j be independent random variables with mean E X j = 0 and finite variances var(X j ) = σ 2 j . Then (18) X∞ j =1 σ 2 j := σ 2 <∞ =⇒ limn→∞ Xn j =1 X j := S exists and is finite with probability one, and the limit random variable S has mean zero and variance σ2 . A proof of Theorem 9, based on Wald’s Second Identity, is given in section 3 below. Modulo this, we have proved (15), and hence that W = limt →∞ Zt /e t exists and is finite and strictly positive with probability 1.