Ch. 4 Asymptotic Theory From the discussion of last Chapter it is obvious that determining the dis- tribution of h(X1, X2, . . Xr) is by no means a trival exercise. It turns out that more often than not we cannot determine the distribution exactly. Because of the importance of the problem, however, we are forced to develop approximations the subject of this Chapter This Chaper will cover the limit theorem. The terms 'limit theorems'refers to several theorems in probability theory under the generic names, 'laws of large numbers'(LLN) andcentral limit theorem'(CLT). These limit theorem consis- tute one of the most important and elegent chapters of probability theory and play a crucial roal in statistical inferences 1 Consistency In this section we introduce the concepts needed to analyzed the behaviors of a random variable indexed by the size of a sample, say r, as T00 1.1 Limits Definition: Let (brli, or just br) be a sequence of real numbers. If there exists a real num- ber b and if for every 8>0 there exist an integer N(O) such that for all T br-6 <8, then b is the limit of the sequence br] In this definition the constant d can take on any real value, but it is the very small values of o that provide the definition with its impact. By choosing a very small o, we ensure that br gets arbitrarily close to its limit b for all T that are suf- ficiently large. When a limit exists, we say that the sequence br converges to b as T tends to infinity, written as b→basT→∞. We also write b=limr-br When no ambiguity is possible, we simply write br -b or b= lim Example Let
Ch. 4 Asymptotic Theory From the discussion of last Chapter it is obvious that determining the distribution of h(X1, X2, ..., XT ) is by no means a trival exercise. It turns out that more often than not we cannot determine the distribution exactly. Because of the importance of the problem, however, we are forced to develop approximations; the subject of this Chapter. This Chaper will cover the limit theorem. The terms ’limit theorems’ refers to several theorems in probability theory under the generic names, ’laws of large numbers’ (LLN) and ’central limit theorem’ (CLT). These limit theorem consistute one of the most important and elegent chapters of probability theory and play a crucial roal in statistical inferences. 1 Consistency In this section we introduce the concepts needed to analyzed the behaviors of a random variable indexed by the size of a sample, say ˆθT , as T → ∞. 1.1 Limits Definition: Let {bT } T 1 , or just {bT } be a sequence of real numbers. If there exists a real number b and if for every δ > 0 there exist an integer N(δ) such that for all T ≥ N, |bT − b| < δ, then b is the limit of the sequence {bT }. In this definition the constant δ can take on any real value, but it is the very small values of δ that provide the definition with its impact. By choosing a very small δ, we ensure that bT gets arbitrarily close to its limit b for all T that are suf- ficiently large. When a limit exists, we say that the sequence {bT } converges to b as T tends to infinity, written as bT → b as T → ∞. We also write b = limT→∞ bT . When no ambiguity is possible, we simply write bT → b or b = lim bT . Example: Let aT = 2 T − (−1)T 2 T . 1
Here 1= limT_oo ar, for Since by binomial theorem we have 2=(1+1)2=1+T+ T(T+1) +1>T Hence, if we choose N=1/ 8 or large, we have, for T>N T M This complete the solution 6 The concept of a limit extends directly to sequences of real vectors. Let bT beak×1 vector with real elements br,i=1,…,k.Ifbr→b2,i=1,…,k,then br -b, where b has elements bi, i= 1,...,k. An analogous extensions applies Definition Given g:R→R4(k,l∈N)andb∈Rk. The function g is continous at b if for any sequence{br} such that bT→b,g(br)→g(b) The following definition compares the behavior of a sequence br with the ehavior of a power of T, say T, where A is chosen so that and TA behave similarly Definition (i). The sequence br is at most of order TA, denoted br=O(T), if for some finite real number A>0, there existes a finite interger N such that for all T>M IT-M|0, there existes a finite interger N(O) such that for al T≥N(6),|T-b<6,ie.,TMbr→0 As we have defined these notations, br =O(T ) if iT-AbrI is eventually bounded, whereas bT o(T)if T-AbT -0. Obviously, if bT o(T), then
Here 1 = limT→∞ aT , for |aT − 1| = 2 T − (−1)T 2 T − 1 = 1 2 T . Since by binomial theorem we have 2 T = (1 + 1)T = 1 + T + T(T + 1) 2 · · · +1 > T. Hence, if we choose N = 1/δ or large, we have, for T > N, |aT − 1| = 1 2 T 0, there existes a finite interger N such that for all T ≥ N, |T −λ bT | 0, there existes a finite interger N(δ) such that for all T ≥ N(δ), |T −λ bT | < δ, i.e., T −λ bT → 0. As we have defined these notations, bT = O(T λ ), if {T −λ bT } is eventually bounded, whereas bT = o(T λ ) if T −λ bT → 0. Obviously, if bT = o(T λ ), then 2
br=O(T ). Furture, if bT=O(T), then for every E >0, br =o(ra+).Wher br=o(T), it is simply (eventually) bounded and may or may not have a limit We often write O(1)in place of O(To). Similarly, br=o(1)means br=0 If each element of a vector or matrix is O(T )or o(T), then that vector or matrix is O(T )or o(Ta) Pr Let ar and br be scalar (i). If aT=O(T)and br=O(T"), then arbT=O(TAtH)and ar +bT=O(T), where=max{入,p (ii). If ar = o(T)and br= o(T"), then arbr= o(Tat) and ar +bT = o(T), where=max入,p (ii). If aT=O(T)and br =o(T), then arbr =o(rAt)and ar +bT=O(T) where=max入,p 1.2 Almost Sure Convergence The stochastic convergence concept most closely related to the limit notations previously discussed is that of almost sure convergence. Recall our discussing real-valued random variables br, we are in fact talking a mapping br: S-R. we let s be a typical element of sample space S, and call the real number br(s)a realization of the random variables Interest will often center on average such as br()=T1∑Z() t=1 Definition Let br(1 be a sequence of real-valued random variables. We say that br( converges almost surely to b, written br)- b if there exists a real number b such that PrIs: br(s-b=1. When no ambiguity is possible. we nay s imply write br-+b A sequence br converges almost surely if the probability of obtaining a realiza- tion of the sequence (Zt for which convergence to b occurs is unity. Equivalently
bT = O(T λ ). Furture, if bT = O(T λ ), then for every ξ > 0, bT = o(T λ+ξ ). When bT = O(T 0 ), it is simply (eventually) bounded and may or may not have a limit. We often write O(1) in place of O(T 0 ). Similarly, bT = o(1) means bT → 0. If each element of a vector or matrix is O(T λ ) or o(T λ ), then that vector or matrix is O(T λ ) or o(T λ ). Proposition: Let aT and bT be scalar. (i). If aT = O(T λ ) and bT = O(T µ ), then aT bT = O(T λ+µ ) and aT + bT = O(T κ ), where κ = max[λ, µ]. (ii). If aT = o(T λ ) and bT = o(T µ ), then aT bT = o(T λ+µ ) and aT + bT = o(T κ ), where κ = max[λ, µ]. (iii). If aT = O(T λ ) and bT = o(T µ ), then aT bT = o(T λ+µ ) and aT + bT = O(T κ ), where κ = max[λ, µ]. 1.2 Almost Sure Convergence The stochastic convergence concept most closely related to the limit notations previously discussed is that of almost sure convergence. Recall our discussing a real-valued random variables bT , we are in fact talking a mapping bT : S → R. we let s be a typical element of sample space S, and call the real number bT (s) a realization of the random variables. Interest will often center on average such as bT (·) = T −1X T t=1 Zt(·). Definition: Let {bT (·)} be a sequence of real-valued random variables. We say that bT (·) converges almost surely to b, written bT (·) a.s. −→ b if there exists a real number b such that Pr{s : bT (s) → b} = 1. When no ambiguity is possible, we may simply write bT a.s. −→ b. A sequence bT converges almost surely if the probability of obtaining a realization of the sequence {Zt} for which convergence to b occurs is unity. Equivalently, 3
the probability of observing a realization of (Zt) for which convergence to b does not occur is zero. Failure to converge is possible but will almost never happen under this definition Proposition: Given g: RKR(k, I E M and any sequence of random k x 1 vector bT such that br b, where b is k x 1, if g is continous at b, then g(br)gb) This results is one of the most important in this Chapter, because con esults for many of our estimators follows by simply applying this Proposition 1.3 Convergence in Probability a weaker stochastic convergence concept is that of convergence in probability Definition Let br be a sequence of real-valued random variables. If there exists a real num- ber b such that for every6>0, such that Pr(s:|br(s)-b<0)→1,asT→∞ then br converge in probability to b, written as br- b or plim br= b Example Let Zr=T- Eta Zt, where [Zt) is a sequence of random variables such that E(Zt=u, Var(Zt)=02<o for all t and Cou(Zt, Z)=0 fort+T.Then Zr-u by the Chebyshev weak law of large numbers. See the plot of Hamilton 184 When the plim of a sequence of estimator(such as iZrlt_) is equal to the true population parameter(in thius case, p), the estimator is said to be consistent Convergence in probability is also referred as weak consistency, and since this has been the most familiar stochastic convergence concept in econometrics the word"weak"is often simply dropped
the probability of observing a realization of {Zt} for which convergence to b does not occur is zero. Failure to converge is possible but will almost never happen under this definition. Proposition: Given g : Rk → Rl (k, l ∈ N ) and any sequence of random k × 1 vector bT such that bT a.s. −→ b, where b is k × 1, if g is continous at b, then g(bT) a.s. −→ g(b). This results is one of the most important in this Chapter, because consistency results for many of our estimators follows by simply applying this Proposition. 1.3 Convergence in Probability A weaker stochastic convergence concept is that of convergence in probability. Definition: Let {bT } be a sequence of real-valued random variables. If there exists a real number b such that for every δ > 0, such that Pr(s : |bT (s)−b| < δ) → 1, as T → ∞, then bT converge in probability to b, written as bT p −→ b or plim bT = b. Example: Let Z¯ T ≡ T −1 PT t=1 Zt , where {Zt} is a sequence of random variables such that E(Zt) = µ, V ar(Zt) = σ 2 < ∞ for all t and Cov(Zt , Zτ ) = 0 fort 6= τ . Then Z¯ T p −→ µ by the Chebyshev weak law of large numbers. See the plot of Hamilton p.184. When the plim of a sequence of estimator (such as {Z¯ T } ∞ T =1) is equal to the true population parameter (in thius case, µ), the estimator is said to be consistent. Convergence in probabbility is also referred as weak consistency, and since this has been the most familiar stochastic convergence concept in econometrics, the word ”weak” is often simply dropped. 4
Theorem Let br be a sequence of real-valued random variables. Ifbr - b, then br-b Proposition: Given g:Rk→Rl(k,1∈M) and any sequence of random k× 1 vector bT such that bT b, where b is k x 1, if g is continous at b, then g br)-+g(b) amp If XIT - PCI and X2T P.C2, then(XIT+X2r)P(c1 +C2). This follows im- tely, since g(XIT, X2T)=(XIT+ X2r)is a continous function of(XIT, X2T) Ex amp Consider an alternative estimator of the mean given by YT=1/(T-lI2taIY This can be written as cITYT, where cIT =T/(T-1) and YT =(1/T)2Yt Under general condition, the sample mean is a consistent estimator of the popu- lation mean, implying that Yr-+ u. It is also easy to verify that cIr -+1. Since CiTYT is a continous function of CIr and Yr, it follows that cirYT P,1-u=u Y is alos a consistent estimator of u Definition (i). The sequence br) is at most of order T in probability, denoted bT=Op(Ta) if for every e>0 there exist a finite△>0,andN∈ N such that for all T≥N, Pr{s:T-br(s)>△}< (ii). The sequence (br is of order smaller than TA in probability, denoted br=op(T ), if T-Ab Lemma(Product rule) Let Ar be l x k and let br be k x 1. If AT =op(1)and br =Op(), then ArbT=op(1) Proof: Each element of Arbr is the sums of the product of Op(T )op(T)=Op(To+o) Op(1) and therefore is op(1)
Theorem: Let {bT } be a sequence of real-valued random variables. If bT a.s. −→ b, then bT p −→ b. Proposition: Given g : Rk → Rl (k, l ∈ N ) and any sequence of random k × 1 vector bT such that bT p −→ b, where b is k × 1, if g is continous at b, then g(bT) p −→ g(b). Example: If X1T p −→ c1 and X2T p −→ c2, then (X1T + X2T ) p −→ (c1 + c2). This follows immediately, since g(X1T , X2T ) ≡ (X1T +X2T ) is a continous function of (X1T , X2T ). Example: Consider an alternative estimator of the mean given by Y¯ ∗ T = [1/(T −1)] PT t=1 Yt . This can be written as c1T Y¯ T , where c1T ≡ [T/(T − 1)] and Y¯ T ≡ (1/T) PT t=1 Yt . Under general condition, the sample mean is a consistent estimator of the population mean, implying that Y¯ T p −→ µ. It is also easy to verify that c1T → 1. Since c1T Y¯ T is a continous function of c1T and Y¯ T , it follows that c1T Y¯ T p −→ 1 · µ = µ. Thus Y¯ ∗ T is alos a consistent estimator of µ. Definition: (i). The sequence {bT } is at most of order T λ in probability, denoted bT = Op(T λ ), if for every ε > 0 there exist a finite 4ε > 0, and Nε ∈ N such that for all T ≥ Nε, Pr{s : |T −λ bT (s)| > 4ε} < ε. (ii). The sequence {bT } is of order smaller than T λ in probability, denoted bT = op(T λ ), if T −λ bT p −→ 0. Lemma (Product rule): Let AT be l × k and let bT be k × 1. If AT = op(1) and bT = Op(1), then AT bT = op(1). Proof: Each element of AT bT is the sums of the product of Op(T 0 )op(T 0 ) = op(T 0+0) = op(1) and therefore is op(1). 5
1.4 Convergence in rth mean A stronger condition than convergence in probability is mean square convergence Definition Let (brl be a sequence of real-valued random variables such that for some r>0 r0. Then for every E>0, Pr(z>e EZr When r= l we have markov's inequality and when r=2 we have the familiar Chebyshev inequality Theorem. If br -b for some r>0 then br -P,b Proof: Since e(b-b)→0asT→∞,E(|br-b)0 Pr(s: or(s)-bl>e)<Elbr-bI- ence br(s)-b<e)≥1-→1asT→∞, since br"b.It follows that Without further conditions, no necessary relationship holds between conver- gence in the rth mean and almost sure convergence
1.4 Convergence in rth mean A stronger condition than convecrgence in probability is mean square convergence. Definition: Let {bT } be a sequence of real-valued random variables such that for some r > 0, E|bT | r 0. Then for every ε > 0, Pr(|Z| > ε) ≤ E|Z| r ε r . When r = 1 we have Markov’s inequality and when r = 2 we have the familiar Chebyshev inequality. Theorem: If bT r.m. −→ b for some r > 0, then bT p −→ b. Proof: Since E(|bT − b| r ) → 0 as T → ∞, E(|bT − b| r ) 0, Pr(s : |bT (s) − b| > ε) ≤ E|bT − b| r ε r . Hence Pr(s : |bT (s) − b| < ε) ≥ 1 − E|bT −b| r ε r → 1 as T → ∞, since bT r.m. −→ b. It follows that bT p −→ b. Without further conditions, no necessary relationship holds between convergence in the rth mean and almost sure convergence. 6
2 Convergence in Distribution The most fundamental concept is that of convergence in distribution Let ibr be a sequence of scalar random variables with cumulative distribution function Fr). If Fr(a)-F(z) as T-o for every continuity point z, where F is the(cumulative) distribution of a random variable Z, then br converge in distribution to the random variable Z, written as br -Z When br -Z, we also say that b converges in law to Z, written as br -Z, or that br is asymptotically distributed as F, denoted as br F Then F is called the limiting distribution of br Example Let (Zt be ii d. random variables with mean u and finite variance 02>0 Define bT 2-E(21)T-1∑1(z1-p)√T(21-p) (Var(zr))1/2 hen by the Lindeberg-Levy central limit theorem, bT A N(O, 1). See the plot of Hamilton p 185 The above definition are unchanged if the scalar br is replaced with an(kx 1) vector br. A simple way to verify convergence in distribution of a vector is the Proposition( Cramer-Wold device) Let br be a sequence of random k x l vector and suppose that for every real k×1 vector X( such that a'入=1?, the scalar Abr a'z where z is a k×1 vector with joint(emulative) distribution function F. Then the limitting distri bution function of bt exists and equals to F. O2(1)
2 Convergence in Distribution The most fundamental concept is that of convergence in distribution. Definition: Let {bT } be a sequence of scalar random variables with cumulative distribution function {FT }. If FT (z) → F(z) as T → ∞ for every continuity point z, where F is the (cumulative) distribution of a random variable Z, then bT converge in distribution to the random variable Z, written as bT d −→ Z. When bT d −→ Z, we also say that bT converges in law to Z, written as bT L−→ Z, or that bT is asymptotically distributed as F, denoted as bT A∼ F. Then F is called the limiting distribution of bT . Example: Let {Zt} be i.i.d. random variables with mean µ and finite variance σ 2 > 0. Define bT ≡ Z¯ T − E(Z¯ T ) (V ar(Z¯ T ))1/2 = T −1/2 PT t=1(Zt − µ) σ = √ T(Z¯ t − µ) σ . Then by the Lindeberg-Levy central limit theorem, bT A∼ N(0, 1). See the plot of Hamilton p.185. The above definition are unchanged if the scalar bT is replaced with an (k×1) vector bT. A simple way to verify convergence in distribution of a vector is the following. Proposition (Crame´r-Wold device): Let {bT} be a sequence of random k × 1 vector and suppose that for every real k × 1 vector λ (such that λ 0λ = 1 ?), the scalar λ 0bT A∼ λ 0 z where z is a k × 1 vector with joint (cmulative) distribution function F. Then the limitting distribution function of bT exists and equals to F. Lemma: If bT L−→ Z, then bT = Op(1). 7
Lemma(Product rule Recall that if AT =op(1)and br=Op(1), then Arbr=0p(1). Hence, if AT+0 and br -+Z, then Lemma(Asymptotic equivalence Let ar) and br be two sequence of random vectors. If ar- br -+0 and Z The results is helpful in situation in which we wish to find the asymptotic distribution of ar but cannot do so directly. Often, however, it is easy to find a br that has a known asymptotic distribution and that satisfies at-br-0 This Lemma then ensures that ar has the same limiting distribution as br and we say that ar is"asymptotically equivalent"to bT Lemma given g:Rk→R(k,l∈) and any sequence of random k× I vector br such that bm_L *Z, where z is k x 1, if g is continous(not dependent on T)at z, then b g(z) Suppose that Xr- N(O, 1)Then the square of Xr asymptotically behaves as the square of a N(O, 1)variables: XF-x( Let xr be a sequence of random(n x 1) vector with xT -+c, and let yrI ector with y constructed from the sum xr+yr converges in distribution to c+y and the sequence constructed from the product xryr converges in distribution to cy Ex Let iXr) be a sequence of random(m x n)matrixwith XT C, and let Then the limitting distribution of Xryr is the same as that of Cy; that is N(Cu, CQ2C/) (C1
Lemma (Product rule): Recall that if AT = op(1) and bT = Op(1), then AT bT = op(1). Hence, if AT p −→ 0 and bT d −→ Z, then AT bT p −→ 0. Lemma (Asymptotic equivalence): Let {aT } and {bT } be two sequence of random vectors. If aT − bT p −→ 0 and bT d −→ Z, then aT d −→ Z. The results is helpful in situation in which we wish to find the asymptotic distribution of aT but cannot do so directly. Often, however, it is easy to find a bT that has a known asymptotic distribution and that satisfies aT − bT p −→ 0. This Lemma then ensures that aT has the same limiting distribution as bT and we say that aT is ”asymptotically equivalent” to bT . Lemma: Given g : Rk → Rl (k, l ∈ N ) and any sequence of random k × 1 vector bT such that bT L−→ z, where z is k ×1, if g is continous (not dependent on T) at z, then g(bT) L−→ g(z). Example: Suppose that XT L−→ N(0, 1) Then the square of XT asymptotically behaves as the square of a N(0, 1) variables: X2 T L−→ χ 2 (1). Lemma: Let {xT } be a sequence of random (n × 1) vector with xT p −→ c , and let {yT } be a sequence of random (n × 1) vector with yT L−→ y. Then the sequence constructed from the sum {xT + yT } converges in distribution to c + y and the sequence constructed from the product {x 0 T yT } converges in distribution to c 0y. Example: Let {XT } be a sequence of random (m × n) matrixwith XT p −→ C , and let {yT } be a sequence of random (n × 1) vector with yT L−→ y ∼ N(µ, Ω). Then the limitting distribution of XT yT is the same as that of Cy; that is XT yT L−→ N(Cµ, CΩC0 ). Lemma (Cramer δ ): 8
Let xr be a sequence of random(n x 1) vector such that for some b>0. If g(x)is a real-valued function with gradient g' (a)(ag then T(g(xr)-g(a))-g(a)x Example Let Y, Y2, ,Yr be an ii d sample of size T deawn from a distribution with mean u+0 and variance o. Consider the distribution of the reciprocal of the sample mean, ST=1/ YT, where Yr=(1/T)2Y. We know from the CLT that VT(Yr -u)Y, where Y NN(0, o2). Also, g(y)=1/y is continous at y=u Let g(u(ag/ayly=u)=(1/u2). Then VTIST-(1/)1-g(u)r in other word, VTIST-(1/p)I-N(0,02/u)
Let {xT } be a sequence of random (n × 1) vector such that T b (xT − a) L−→ x for some b > 0. If g(x) is a real-valued function with gradient g 0 (a)(= ∂g ∂x0 x=c ), then T b (g(xT) − g(a)) L−→ g 0 (a)x. Example: Let {Y1, Y2, ..., YT } be an i.i.d. sample of size T deawn from a distribution with mean µ 6= 0 and variance σ 2 . Consider the distribution of the reciprocal of the sample mean, ST = 1/Y¯ T , where Y¯ T = (1/T) PT t=1 Yt . We know from the CLT that √ T(Y¯ T − µ) L−→ Y , where Y ∼ N(0, σ 2 ). Also, g(y) = 1/y is continous at y = µ. Let g 0 (u)(= ∂g/∂y|y = µ) = (−1/µ2 ). Then √ T[ST − (1/µ)] L−→ g 0 (µ)Y ; in other word, √ T[ST − (1/µ)] L−→ N(0, σ 2/µ4 ). 9
3 Martingales Some very useful limit theorems pertain to martingale sequence Let iXt, t T be a stochastic process defined on(S, F, P() and let Ft be a sequence of a-fields Ft C F for all t(i.e. Ft is an increasing sequence of g-fields)satisfying the following conditions (i). X, is a random variable relatives to Ft for all tE T (ii). E(LXtD)<oo for all tE T (iii. E(XLIFt-1)=Xt-1, for alltET Then (Xt, tET is said to be a martingale with respect to Ft, tET) Example(of increasing sequence of a-fields Define the function X-the number of heads", then X(HH=2, X(THD) 1, X(HTD)=l, and X(TT))=0. Further we see that X-(2)=I(HH) X-(1)=I(TH), (HT)) and X-(0)=i(Tr). In fact, it can be shown that the o-field related to the random variables, X, so defined is F={S,,{(HH)},{(①T)},{(TH),(HT)},{(HH),(TT)}, {(HT),(H),(HH)},{(HT),(TH),(TT)} We further define the function X1-at least one head", then XI(HHI XI(THD=XIHT))=l, and X1(TT=0. Further we see that XI (1) [(HH), (TH), (HT))E F and X-(0)=I(TT)E F. In fact, it can be shown that the a-field related to the random variables, X1, so defined is 万1={S,0,{(HH),(TH),(HT)},{(TT)} Finally we define the function X2-"two heads", then X2(HH=1, X2(TH)) 2(HTD= X2(TT)=0. Further we see that X2(1)=I(HHIE F, X-(0)=I(TH), (HT), (TT)) E F. In fact, it can be shown that the a-field related to the random variables, X2, so defined is 2={S,0,{(HH)},{(HT),(TH),(TT)}
3 Martingales Some very useful limit theorems pertain to martingale sequence. Definition: Let {Xt ,t ∈ T } be a stochastic process defined on (S, F, P(·)) and let {Ft} be a sequence of σ − fields Ft ⊂ F for all t (i.e.{Ft} is an increasing sequence of σ − fields) satisfying the following conditions: (i). Xt is a random variable relatives to {Ft} for all t ∈ T . (ii). E(|Xt |) < ∞ for all t ∈ T . (iii). E(Xt |Ft−1) = Xt−1, for all t ∈ T . Then {Xt ,t ∈ T } is said to be a martingale with respect to {Ft ,t ∈ T }. Example (of increasing sequence of σ − fields): Define the function X—”the number of heads”, then X({HH}) = 2, X({T H}) = 1, X({HT}) = 1, and X({TT}) = 0. Further we see that X −1 (2) = {(HH)}, X−1 (1) = {(T H),(HT)} and X−1 (0) = {(TT)}. In fact, it can be shown that the σ − field related to the random variables, X, so defined is F = {S, ∅, {(HH)}, {(TT)}, {(T H),(HT)}, {(HH),(TT)}, {(HT),(T H),(HH)}, {(HT),(T H),(TT)}}. We further define the function X1—”at least one head”, then X1({HH}) = X1({T H}) = X1({HT}) = 1, and X1({TT}) = 0. Further we see that X −1 1 (1) = {(HH),(T H),(HT)} ∈ F and X−1 (0) = {(TT)} ∈ F. In fact, it can be shown that the σ − field related to the random variables, X1, so defined is F1 = {S, ∅, {(HH),(T H),(HT)}, {(TT)}}. Finally we define the function X2—”two heads”, then X2({HH}) = 1, X2({T H}) = X2({HT}) = X2({TT}) = 0. Further we see that X −1 2 (1) = {(HH)} ∈ F, X−1 (0) = {(T H),(HT),(TT)} ∈ F. In fact, it can be shown that the σ − field related to the random variables, X2, so defined is F2 = {S, ∅, {(HH)}, {(HT),(T H),(TT)}}. 10