《理论计算机科学》课程教学资源（阅读文献）Galton–Watson process - Branching.pdf_大学文库

BRANCHING PROCESSES 1.GALTON-WATSON PROCESSES Galton-Watson processes were introduced by Francis Galton in 1889 as a simple mathemat- ical model for the propagation of family names.They were reinvented by Leo Szilard in the late 1930s as models for the proliferation of free neutrons in a nuclear fission reaction.General- izations of the extinction probability formulas that we shall derive below played a role in the calculation of the critical mass of fissionable material needed for a sustained chain reaction. Galton-Watson processes continue to play a fundamental role in both the theory and applica- tions of stochastic processes. First,an informal desription:A population of individuals(which may represent people,or- ganisms,free neutrons,etc.,depending on the context)evolves in discrete time n=0,1,2,... according to the following rules.Each nth generation individual produces a random number (possibly 0)of individuals,called offspring,in the(n+1)st generation.The offspring counts a,5B,,..for distinct individuals a,B,r,...are mutually independent,and also indepen- dent of the offspring counts of individuals from earlier generations.Furthermore,they are identically distributed,with common distribution {pk}.The state Zn of the Galton-Watson process at time n is the number of individuals in the nth generation. More formally, Definition 1.A Galton-Watson process {Zn}nzo with offspring distribution F ={pklkzo is a discrete-time Markov chain taking values in the set Z+of nonnegative integers whose transition probabilities are as follows: (1) P(Zn+1 =klZn m}=pim. Here (p)denotes the m-th convolution power of the distributionp.In other words,the conditional distribution of Z given thatZn=m is the distribution of the sum of m i.i.d. random variables each with distribution (p}.The default initial state is Zo=1. Construction:A Galton-Watson process with offspring distribution F={p&}kz can be built on any probability space that supports an infinite sequence of i.i.d.random variables all with distribution F.Assume that these are arranged in a doubly infinite array,as follows: 引2动… 吊昆务… 品… etc

BRANCHING PROCESSES 1. GALTON-WATSON PROCESSES Galton-Watson processes were introduced by Francis Galton in 1889 as a simple mathematical model for the propagation of family names. They were reinvented by Leo Szilard in the late 1930s as models for the proliferation of free neutrons in a nuclear fission reaction. Generalizations of the extinction probability formulas that we shall derive below played a role in the calculation of the critical mass of fissionable material needed for a sustained chain reaction. Galton-Watson processes continue to play a fundamental role in both the theory and applications of stochastic processes. First, an informal desription: A population of individuals (which may represent people, organisms, free neutrons, etc., depending on the context) evolves in discrete time n = 0, 1, 2,... according to the following rules. Each nth generation individual produces a random number (possibly 0) of individuals, called offspring, in the (n + 1)st generation. The offspring counts ξα,ξβ ,ξγ,... for distinct individuals α,β,γ,... are mutually independent, and also independent of the offspring counts of individuals from earlier generations. Furthermore, they are identically distributed, with common distribution {pk }k≥0 . The state Zn of the Galton-Watson process at time n is the number of individuals in the nth generation. More formally, Definition 1. A Galton-Watson process {Zn }n≥0 with offspring distribution F = {pk }k≥0 is a discrete-time Markov chain taking values in the setZ+ of nonnegative integers whose transition probabilities are as follows: (1) P {Zn+1 = k |Zn = m} = p ∗m k . Here {p ∗m k } denotes the m−th convolution power of the distribution {pk }. In other words, the conditional distribution of Zn+1 given that Zn = m is the distribution of the sum of m i.i.d. random variables each with distribution {pk }. The default initial state is Z0 = 1. Construction: A Galton-Watson process with offspring distribution F = {pk }k≥0 can be built on any probability space that supports an infinite sequence of i.i.d. random variables all with distribution F . Assume that these are arranged in a doubly infinite array, as follows: ξ 1 1 ,ξ 1 2 ,ξ 1 3 ,··· ξ 2 1 ,ξ 2 2 ,ξ 2 3 ,··· ξ 3 1 ,ξ 3 2 ,ξ 3 3 ,··· etc. 1

BRANCHING PROCESSES 3 Proof. The event {τ > n} coincides with the event {Zn ≥ 1}. By Markov’s inequality, P {Zn ≥ 1} ≤ E Zn = µ n . 1.2. Recursive Structure and Generating Functions. The Galton-Watson processZn has a simple recursive structure that makes it amenable to analysis by generating function methods. Each of the first-generation individuals α,β,γ,... behaves independently of the others; moreover, all of its descendants (the offspring of the offspring, etc.) behaves independently of the descendants of the other first-generation individuals. Thus, each of the first-generation individuals engenders an independent copy of the Galton-Watson process. It follows that a GaltonWatson process is gotten by conjoining to the single individual in the 0th generation Z1 (conditionally) independent copies of the Galton-Watson process. The recursive structure leads to a simple set of relations among the probability generating functions of the random variables Zn : Proposition 2. Denote by ϕn (t ) = E t Zn the probability generating function of the random variable Zn , and by ϕ(t ) = P∞ k=0 pk t k the probability generating function of the offspring distribution. Then ϕn is the n−fold composition of ϕ by itself, that is, ϕ0 (4) (t ) = t and ϕn+1 (t ) = ϕ(ϕn (t )) = ϕn (ϕ(t )) ∀n ≥ 0.(5) Proof. There are two ways to proceed, both simple. The first uses the recursive structure directly to deduce that Zn+1 is the sum of Z1 conditionally independent copies of Zn . Thus, ϕn+1 (t ) = E t Zn+1 = E ϕn (t ) Z1 = ϕ(ϕn (t )). The second argument relies on the fact the generating function of themth convolution power {p ∗m k } is the mth power of the generating function ϕ(t ) of {pk }. Thus, ϕn+1 (t ) = E t Zn+1 = X∞ k=0 E (t Zn+1 |Zn = k)P (Zn = k) = X∞ k=0 ϕ(t ) m P (Zn = k) = ϕn (ϕ(t )). By induction on n, this is the (n + 1)st iterate of the function ϕ(t ). Problem 1. (A) Show that if the mean offspring number µ := P k k pk < ∞ then the expected size of the nth generation is E Zn = µ n . (B) Show that if the variance σ2 = P k (k − µ) 2pk < ∞ then the variance of Zn is finite, and give a formula for it.

BRANCHING PROCESSES Properties of the Generating Function (t):Assumption 1 guarantees that (t)is not a linear function,because the offspring distribution puts mass on some integer k >2.Thus,(t)has the following properties: (A)p(t)is strictly increasing for0≤t≤l. (B)(t)is strictly convex,with strictly increasing first derivative. (C)p(1)=1. 1.3.Extinction Probability.If for some n the population size Zn=0 then the population size is 0 in all subsequent generations.In such an event,the population is said to be extinct.The first time that the population size is 0(formally,=min{n:Zn=0},or =oo if there is no such n)is called the extinction time.The most obvious and natural question concerning the behavior of a Galton-Watson process is:What is the probability P{<oo}of extinction? Proposition 3.The probability ofextinction is the smallest nonnegative root t=of the equa- tion (6) (t)=t. Proof.The key idea is recursion.Consider what must happen in order for the event t<oo of extinction to occur:Either(a)the single individual alive at time 0 has no offspring;or(b)each of its offspring must engender a Galton-Watson process that reaches extinction.Possibility(a) occurs with probability po.Conditional on the event that Z1=k,possibility (b)occurs with probability.Therefore, =%+∑pgt=pg, k=1 that is,the extinction probability is a root of the Fixed-Point Equation(6). There is an alternative proof that =(that uses the iteration formula(5)for the prob- ability generating function of Zn.Observe that the probability of the eventZn=0 is easily recovered from the generating function n(t): P{Zm=0}=pn(0). By the nature of the Galton-Watson process,these probabilities are nondecreasing in n,be- cause if Zn=0 then Zn+1=0.Therefore,the limit:=lim(0)exists,and its value is the extinction probability for the Galton-Watson process.The limit must be a root of the Fixed-Point Equation,because by the continuity of p(ξ)=p(1impn(0) n+00 =lim (n(0)) n+0∞ =lim n+1(0) = Finally,it remains to show that is the smallest nonnegative root of the Fixed-Point Equa- tion.This follows from the monotonicity of the probability generating functions n:Since

4 BRANCHING PROCESSES Properties of the Generating Function ϕ(t ): Assumption 1 guarantees that ϕ(t ) is not a linear function, because the offspring distribution puts mass on some integer k ≥ 2. Thus, ϕ(t ) has the following properties: (A) ϕ(t ) is strictly increasing for 0 ≤ t ≤ 1. (B) ϕ(t ) is strictly convex, with strictly increasing first derivative. (C) ϕ(1) = 1. 1.3. Extinction Probability. If for some n the population size Zn = 0 then the population size is 0 in all subsequent generations. In such an event, the population is said to be extinct. The first time that the population size is 0 (formally, τ = min{n : Zn = 0}, or τ = ∞ if there is no such n) is called the extinction time. The most obvious and natural question concerning the behavior of a Galton-Watson process is: What is the probability P {τ <∞} of extinction? Proposition 3. The probability ζ of extinction is the smallest nonnegative root t = ζ of the equation (6) ϕ(t ) = t . Proof. The key idea is recursion. Consider what must happen in order for the event τ < ∞ of extinction to occur: Either (a) the single individual alive at time 0 has no offspring; or (b) each of its offspring must engender a Galton-Watson process that reaches extinction. Possibility (a) occurs with probability p0 . Conditional on the event that Z1 = k, possibility (b) occurs with probability ζ k . Therefore, ζ = p0 + X∞ k=1 p k ζ k = ϕ(ζ), that is, the extinction probability ζ is a root of the Fixed-Point Equation (6). There is an alternative proof that ζ = ϕ(ζ) that uses the iteration formula (5) for the probability generating function of Zn . Observe that the probability of the event Zn = 0 is easily recovered from the generating function ϕn (t ): P {Zn = 0} = ϕn (0). By the nature of the Galton-Watson process, these probabilities are nondecreasing in n, because if Zn = 0 then Zn+1 = 0. Therefore, the limit ξ := limn→∞ ϕn (0) exists, and its value is the extinction probability for the Galton-Watson process. The limit ξ must be a root of the Fixed-Point Equation, because by the continuity of ϕ, ϕ(ξ) = ϕ( limn→∞ ϕn (0)) = limn→∞ ϕ(ϕn (0)) = limn→∞ ϕn+1 (0) = ξ. Finally, it remains to show that ξ is the smallest nonnegative root ζ of the Fixed-Point Equation. This follows from the monotonicity of the probability generating functions ϕn : Since

BRANCHING PROCESSES Thus,in the subcritical case,the extinction time has an exponentially decaying tail,and hence finite moments of all orders.On the other hand,in the critical case the extinction time has infinite mean. Proof.First note that P{>n}=P(Zn >0).Recall from the proof of Proposition 3 that P(Zn= 0)=n(0);hence, P{x>n}=1-pn(0) This shows that the tail of the distribution is determined by the speed at which the sequence n(0)approaches 1.In the subcritical case,the graph of the generating function o(t)has slope u<1 at t =1,whereas in the critical case the slope is u=1.It is this difference that accounts for the drastic difference in the rate of convergence. Subcritical Case:Consider first the case where u=(1)<1.Recall from the proof of Proposi- tion 3 that in this case the sequence(0)increases and has limit 1.Thus,for n large,(0)will be near 1,and in this neighborhood the first-order Taylor series will provide a good approxi- mation to Consequently, (8) 1-9n+1(0)=1-p(pn(0刃 =1-p(1-(1-9n(0) =1-(1-9'(11-pm(0)+O(1-pn(0)2 =41-9n(0)+0(1-9n0)2 If not for the remainder term,we would have an exact equality 1-n+(0)=u(1-n(0)),which could be iterated to give 1-9n(0)=μ"(1-9o(0)=μ”. This would prove the assertion(A).Unfortunately,the equalities are exact only in the special case where the generating function o(t)is linear.In the general case,the remainder term in the Taylor series expansion(??)must be accounted for. Because the generating function (t)is convex,with derivative (1)=u,the error in the approximation(8)is negative:in particular,for some constant 0<C<oo, 41-Pn(0)-C1-9n(02≤1-9n+1(0)≤4(1-9n(0. The upper bound implies that 1-(0)<u"(repeat the iteration argument above,replacing equalities by inequalities!).Now divide through by u(1-(0))to get 1-C1-9n0≤0-0-2*10)】 c1-9n0s1→ 1-C4"≤0-1-9n+10 ≤1 u-n(1-pn(0) Thus,successive ratios of the terms u"(1-(0))are exceedingly close to 1,the error decay- ing geometrically.Since these errors sum,Weierstrass'Theorem on convergence of products implies that lim μ-"(1-pn(0) n-0-0(1-9o(0)n= =,li"(1-9n(o)=Cp

6 BRANCHING PROCESSES Thus, in the subcritical case, the extinction time has an exponentially decaying tail, and hence finite moments of all orders. On the other hand, in the critical case the extinction time has infinite mean. Proof. First note that P {τ > n} = P {Zn > 0}. Recall from the proof of Proposition 3 that P {Zn = 0} = ϕn (0); hence, P {τ > n} = 1 − ϕn (0). This shows that the tail of the distribution is determined by the speed at which the sequence ϕn (0) approaches 1. In the subcritical case, the graph of the generating function ϕ(t ) has slope µ < 1 at t = 1, whereas in the critical case the slope is µ = 1. It is this difference that accounts for the drastic difference in the rate of convergence. Subcritical Case: Consider first the case where µ = ϕ 0 (1) < 1. Recall from the proof of Proposition 3 that in this case the sequence ϕn (0) increases and has limit 1. Thus, for n large, ϕn (0) will be near 1, and in this neighborhood the first-order Taylor series will provide a good approximation to ϕ. Consequently, 1 − ϕn+1 (8) (0) = 1 − ϕ(ϕn (0)) = 1 − ϕ(1 − (1 − ϕn (0))) = 1 − (1 − ϕ 0 (1)(1 − ϕn (0))) + O(1 − ϕn (0))2 = µ(1 − ϕn (0)) + O(1 − ϕn (0))2 . If not for the remainder term, we would have an exact equality 1−ϕn+1 (0) = µ(1−ϕn (0)), which could be iterated to give 1 − ϕn (0) = µ n (1 − ϕ0 (0)) = µ n . This would prove the assertion (A). Unfortunately, the equalities are exact only in the special case where the generating function ϕ(t ) is linear. In the general case, the remainder term in the Taylor series expansion (??) must be accounted for. Because the generating function ϕ(t ) is convex, with derivative ϕ 0 (1) = µ, the error in the approximation (8) is negative: in particular, for some constant 0 < C <∞, µ(1 − ϕn (0)) − C (1 − ϕn (0))2 ≤ 1 − ϕn+1 (0) ≤ µ(1 − ϕn (0)). The upper bound implies that 1 − ϕn (0) ≤ µ n (repeat the iteration argument above, replacing equalities by inequalities!). Now divide through by µ(1 − ϕn (0)) to get 1 − C (1 − ϕn (0)) ≤ µ −n−1 (1 − ϕn+1 (0)) µ−n (1 − ϕn (0)) ≤ 1 =⇒ 1 − C µ n ≤ µ −n−1 (1 − ϕn+1 (0)) µ−n (1 − ϕn (0)) ≤ 1. Thus, successive ratios of the terms µ −n (1 − ϕn (0)) are exceedingly close to 1, the error decaying geometrically. Since these errors sum, Weierstrass’ Theorem on convergence of products implies that limn→∞ µ −n (1 − ϕn (0)) µ−0(1 − ϕ0 (0)) = limn→∞ µ −n (1 − ϕn (0)) := CF

8 BRANCHING PROCESSES HINT for part (A):The Taylor series approximation to (t)at=1 leads to the following ap- proximate relationship,valid for large n: 1-9a+10)≈1-pn0)-2p"11-pno, which at first does not seem to help,but on further inspection does.The trick is to change variables:ifn is a sequence of positive numbers that satisfies the recursion Xn+1=xn-bxn then the sequence y:=1/xn satisfies yn+i=yn+b+b/yn+.... Problem5.There's a Galton-Watson processinmy random walk!Let S be the simple nearest- neighbor random walk on the integers started at So=1.Define T to be the time of the first visit to the origin,that is,the smallest n>1 such that S=0.Define Zo =1 and T-1 Zk=>1{Xn=k and Xn+1=k+1]. n=0 In words,Zk is the number of times that the random walk Xn crosses from k to k+1 before first visiting 0. (A)Prove that the sequence {Zk}o is a Galton-Watson process,and identify the offspring dis- tribution as a geometric distribution. (B)Calculate the probability generating function of the offspring distribution,and observe that it is a linear fractional transformation.(See Ahlfors,Complex Analysis,ch.1 for the definition and basic theory of LFTs.Alternatively,try the Wikipedia article.) (C)Use the result of(B)to find out as much as you can about the distribution of Zk. (D)Show that T=Zis the total number of individuals ever born in the course of the Galton-Watson process,and show that T(the extinction time of the Galton-Watson process)is the maximum displacement M from 0 attained by the random walk before its first return to the origin.What does the result of problem 4,part(B),tell you about the distribution of M? 2.YULE'S BINARY FISSION PROCESS 2.1.Definition and Construction.The Yule process is a continuous-time branching model, in which individuals undergo binary fission at random times.It evolves as follows:Each in- dividual,independently of all others and of the past of the process,waits an exponentially distributed time and then splits into two identical particles.(It is useful for the construction below to take the view that at each fission time the fissioning particle survives and creates one new clone of itself.)The exponential waiting times all have mean 1.Because the exponential random variables are mutually independent,the probability that two fissions will occur simul- taneously is 0. A Yule process started by 1 particle at time 0 can be built from independent Poisson processes as follows.Let {Ni(t)}ieN be a sequence of independent Poisson counting processes.Since the

8 BRANCHING PROCESSES HINT for part (A): The Taylor series approximation to ϕ(t ) at ζ = 1 leads to the following approximate relationship, valid for large n: 1 − ϕn+1 (0) ≈ 1 − ϕn (0) − 1 2 ϕ 00(1)(1 − ϕn (0))2 , which at first does not seem to help, but on further inspection does. The trick is to change variables: if xn is a sequence of positive numbers that satisfies the recursion xn+1 = xn − b x 2 n then the sequence yn := 1/xn satisfies yn+1 = yn + b + b /yn + .... Problem 5. There’s a Galton-Watson process inmy random walk! LetSn be the simple nearestneighbor random walk on the integers started at S0 = 1. Define T to be the time of the first visit to the origin, that is, the smallest n ≥ 1 such that Sn = 0. Define Z0 = 1 and Zk = T X−1 n=0 1{Xn = k and Xn+1 = k + 1}. In words, Zk is the number of times that the random walk Xn crosses from k to k +1 before first visiting 0. (A) Prove that the sequence {Zk }k≥0 is a Galton-Watson process, and identify the offspring distribution as a geometric distribution. (B) Calculate the probability generating function of the offspring distribution, and observe that it is a linear fractional transformation. (See Ahlfors, Complex Analysis, ch. 1 for the definition and basic theory of LFTs. Alternatively, try the Wikipedia article.) (C) Use the result of (B) to find out as much as you can about the distribution of Zk . (D) Show that T = P k≥1 Zk is the total number of individuals ever born in the course of the Galton-Watson process, and show that τ (the extinction time of the Galton-Watson process) is the maximum displacement M from 0 attained by the random walk before its first return to the origin. What does the result of problem 4, part (B), tell you about the distribution of M ? 2. YULE’S BINARY FISSION PROCESS 2.1. Definition and Construction. The Yule process is a continuous-time branching model, in which individuals undergo binary fission at random times. It evolves as follows: Each individual, independently of all others and of the past of the process, waits an exponentially distributed time and then splits into two identical particles. (It is useful for the construction below to take the view that at each fission time the fissioning particle survives and creates one new clone of itself.) The exponential waiting times all have mean 1. Because the exponential random variables are mutually independent, the probability that two fissions will occur simultaneously is 0. A Yule process started by 1 particle at time 0 can be built from independent Poisson processes as follows. Let {Nj (t )}j∈N be a sequence of independent Poisson counting processes. Since the

BRANCHING PROCESSES 9 interoccurrence times in a Poisson process are exponential-1,the jump times in the Poisson process Ni(t)can be used as the fission times of the jth particle;at each such fission time,a new particle must be added to the population,and so a new Poisson process N(t)must be "activated".Thus,the time Tm at which the mth fission occurs can be defined as follows:set To=0 and (10) Tm=min{t>Tm-1: N(t)-N(Tm-)=1} =1 Thus,Tm is the first time after Tm-1 that one of the first m Poisson processes jumps.The size Zt of the population at time t is then (11) Z,=m for Tm-1≤t2 particles:just change the definition of the fission times Tm to m+k-1 (12) Tm=min{t>Tm-1:∑(Wy(t)-N,(Tm-》=1. j=1 Alternatively,a Yule process with Zo =k can be gotten by superposing k independent Yule processesZ all with=1,that is, (13) Z=L Problem 6.Show that by suitably indexing the Poisson processes in the first construction(12) one can deduce the superposition representation(13). Problem 7.Calculate the mean EZt and variance var(Zt)of the population size in a Yule pro- cess.For the mean you should get EZ,=e'.HINT:Condition on the time of the first fission. 2.2.Asymptotic Growth. Theorem 7.Let Z,be the population size at time t in a Yule process with Zo=1.Then (14) Zrle!asw where W has the unit exponential distribution. The proof has two parts:First,it must be shown that Z/et converges to something;and second,it must be shown that the limit random variable W is exponentially distributed.The proof of almost sure convergence will be based on a careful analysis of the first passage times Tm defined by (10).Convergence of Z/et to a positive random variable W is equivalent to convergence of log Z:-t to a real-valued limit log W.Since Zt is a counting process(that is,it is nondecreasing in t and its only discontinuities are jumps of size 1),convergence oflog Z-t is equivalent to showing that there exists a finite random variable Y =-log W such that for any e>0, (15) lim（Tm-logm)=Y. 71+●0 To accomplish this,we will use the following consequence of the construction(10)

BRANCHING PROCESSES 9 interoccurrence times in a Poisson process are exponential-1, the jump times in the Poisson process Nj (t ) can be used as the fission times of the jth particle; at each such fission time, a new particle must be added to the population, and so a new Poisson process Nk (t ) must be “activated”. Thus, the time Tm at which the mth fission occurs can be defined as follows: set T0 = 0 and (10) Tm = min{t > Tm−1 : Xm j =1 (Nj (t ) −Nj (Tm−1 )) = 1}. Thus, Tm is the first time after Tm−1 that one of the first m Poisson processes jumps. The size Zt of the population at time t is then (11) Zt = m for Tm−1 ≤ t Tm−1 : mX +k−1 j =1 (Nj (t ) −Nj (Tm−1 )) = 1}. Alternatively, a Yule process with Z0 = k can be gotten by superposing k independent Yule processes Z j t all with Z j 0 = 1, that is, (13) Zt = X k j =1 Z j t Problem 6. Show that by suitably indexing the Poisson processes in the first construction (12) one can deduce the superposition representation (13). Problem 7. Calculate the mean E Zt and variance var(Zt ) of the population size in a Yule process. For the mean you should get E Zt = e t . HINT: Condition on the time of the first fission. 2.2. Asymptotic Growth. Theorem 7. Let Zt be the population size at time t in a Yule process with Z0 = 1. Then (14) Zt /e t a.s. −→ W where W has the unit exponential distribution. The proof has two parts: First, it must be shown that Zt /e t converges to something; and second, it must be shown that the limit random variable W is exponentially distributed. The proof of almost sure convergence will be based on a careful analysis of the first passage times Tm defined by (10). Convergence of Zt /e t to a positive random variable W is equivalent to convergence of logZt − t to a real-valued limit logW . Since Zt is a counting process (that is, it is nondecreasing in t and its only discontinuities are jumps of size 1), convergence of logZt −t is equivalent to showing that there exists a finite random variable Y = −logW such that for any " > 0, (15) limm→∞ (Tm − logm) = Y . To accomplish this, we will use the following consequence of the construction (10)

10 BRANCHING PROCESSES Proposition 8.Let Tm be the fission times in a Yule process Z with Zo=k.Then the interoccur- rence timesTm:=Tm-Tm-are independent,exponentially distributed random variables with expectations ETm =1/(m+k-1). Proof(Sketch).The random variable Tim is the first time after Ti-1 at which one of the Poisson processes Ni(t),for 1s js m+k-1,has a jump.Times between jumps in a Poisson process are exponentially distributed with mean 1,and jump times in independent Poisson processes are independent.Thus,the time until the next jump in m independent Poisson processes is the minimum of m independent exponentials,which is exponentially distributed with mean 1/m. This is not quite a complete argument,because the "start"times Tm are random.However, it is not difficult(exercise!)to turn the preceding into a rigorous argument by integrating out over the possible values of Tm and the possible choices for which Poisson processes jump at which times. 口 The family of exponential distributions is closed under scale transformations:In particular, if y is exponentially distributed with mean 1 and a>0 is a scalar,then ay is exponentially distributed with mean a.Since the variance var(Y)of a unit exponential is 1,it follows that the variance var(a y)of an exponential with mean a is a2.Consequently,ifm=Ti-Tim-1 is the time between the(m-1)th and the mth fission times in a Yule process with Zo=1,then (16) ETm+i=m and var(tm+1)=m 2, and so 12 177 (17) E Tm+1= k-1~logm and var(Tm+)=∑k-2g2)<o k= k= asm-→o,where(2)=】 In particular,the variance of T remains bounded as m-oo,and so the distribution of Tm remains concentrated around log m.In fact,Tm-log m converges,to a possibly random limit,by the following general result about random series of independent random variables: Theorem 9.Let X;be independent random variables with mean EX;=0 and finite variances var(Xj)=o.Then ●X (18) =2<0 → lim =S exists and is finite with probability one,and the limit random variable S has mean zero and varianceσ2. A proof of Theorem 9,based on Wald's Second Identity,is given in section 3 below.Modulo this,we have proved(15),and hence that W=limZ/e exists and is finite and strictly positive with probability 1

10 BRANCHING PROCESSES Proposition 8. Let Tm be the fission times in a Yule process Zt with Z0 = k. Then the interoccurrence times τm := Tm − Tm−1 are independent, exponentially distributed random variables with expectations E τm = 1/(m + k − 1). Proof (Sketch). The random variable Tm is the first time after Tm−1 at which one of the Poisson processes Nj (t ), for 1 ≤ j ≤ m + k − 1, has a jump. Times between jumps in a Poisson process are exponentially distributed with mean 1, and jump times in independent Poisson processes are independent. Thus, the time until the next jump in m independent Poisson processes is the minimum of m independent exponentials, which is exponentially distributed with mean 1/m. This is not quite a complete argument, because the “start” times Tm are random. However, it is not difficult (exercise!) to turn the preceding into a rigorous argument by integrating out over the possible values of Tm and the possible choices for which Poisson processes jump at which times. The family of exponential distributions is closed under scale transformations: In particular, if Y is exponentially distributed with mean 1 and α > 0 is a scalar, then αY is exponentially distributed with mean α. Since the variance var(Y ) of a unit exponential is 1, it follows that the variance var(αY ) of an exponential with mean α is α 2 . Consequently, if τm = Tm − Tm−1 is the time between the (m − 1)th and the mth fission times in a Yule process with Z0 = 1, then (16) E τm+1 = m−1 and var(τm+1 ) = m−2 , and so (17) E Tm+1 = Xm k=1 k −1 ∼ logm and var(Tm+1 ) =Xm k=1 k −2 → ζ(2) <∞ as m → ∞, where ζ(2) = P∞ k=1 k −2 . In particular, the variance of Tm remains bounded as m →∞, and so the distribution of Tm remains concentrated around logm. In fact, Tm −logm converges, to a possibly random limit, by the following general result about random series of independent random variables: Theorem 9. Let X j be independent random variables with mean E X j = 0 and finite variances var(X j ) = σ 2 j . Then (18) X∞ j =1 σ 2 j := σ 2 <∞ =⇒ limn→∞ Xn j =1 X j := S exists and is finite with probability one, and the limit random variable S has mean zero and variance σ2 . A proof of Theorem 9, based on Wald’s Second Identity, is given in section 3 below. Modulo this, we have proved (15), and hence that W = limt →∞ Zt /e t exists and is finite and strictly positive with probability 1.