Ch. 2 Probability Theory 1 Descriptive Study of Data 1.1 Histograms and Their Numerical Characteristics By descriptive study of data we refer to the summarization and exposition(tab- ulation, grouping, graphical representation) of observed data as well as the derivation of numerical characteristics such as measures of location, dispersion and shape Although the descriptive study of data is an important facet of modeling with real data itself, in the present study it is mainly used to motivate the need for probability theory and statistical inference proper In order to make the discussion more specific let us consider the after-tax ersonal income data of 23000 household for 1999-2000 in the uS. There data in row form constitute 23000 numbers between $5000 and $100000. This presents us with a formidable task in attempting to understand how income is distributed among the 23000 households represented in the data. The purpose of descriptive catistics is to help us make some sense of such data. A natural way to proceed to summarize the data by allocating the numbers into classes(intervals). The number of intervals is chosen a priori and it depends on the degree of summa- rization needed. Then we have the Table of the personal income in the US The first column of the table shows the income intervals. the second column the second column shows the number of income falling into each interval and the third column the relative frequency for each interval. The relative frequency is calculated by dividing the number of observations in each interval by the total number of observations. The fourth column is the cumulative frequency. Sum- marizing the data in this Table enables us to get some idea of how income is distributed among various class. If we plot the relative(cumulative)frequencies in a bar graph we get what is known as the histogram(cumulative) For further information on the distribution of income we could calculate vari- ous numerical characteristics describing the histogram's location, dispersion and shape. Such measure can be calculate directly in terms of the raw data. How- in the expositional purpose grouped data. The main reason for this is to introduce various concept which ill b ted in the context of probability
Ch. 2 Probability Theory 1 Descriptive Study of Data 1.1 Histograms and Their Numerical Characteristics By descriptive study of data we refer to the summarization and exposition (tabulation, grouping, graphical representation) of observed data as well as the derivation of numerical characteristics such as measures of location, dispersion and shape. Although the descriptive study of data is an important facet of modeling with real data itself, in the present study it is mainly used to motivate the need for probability theory and statistical inference proper. In order to make the discussion more specific let us consider the after-tax personal income data of 23000 household for 1999-2000 in the US. There data in row form constitute 23000 numbers between $5000 and $100000. This presents us with a formidable task in attempting to understand how income is distributed among the 23000 households represented in the data. The purpose of descriptive statistics is to help us make some sense of such data. A natural way to proceed is to summarize the data by allocating the numbers into classes (intervals). The number of intervals is chosen a priori and it depends on the degree of summarization needed. Then we have the ” Table of the personal income in the US”. The first column of the table shows the income intervals, the second column the second column shows the number of income falling into each interval and the third column the relative frequency for each interval. The relative frequency is calculated by dividing the number of observations in each interval by the total number of observations. The fourth column is the cumulative frequency. Summarizing the data in this Table enables us to get some idea of how income is distributed among various class. If we plot the relative (cumulative) frequencies in a bar graph we get what is known as the histogram (cumulative). For further information on the distribution of income we could calculate various numerical characteristics describing the histogram’s location, dispersion and shape. Such measure can be calculate directly in terms of the raw data. However, in the present case it is more convenient for expositional purpose to use the grouped data. The main reason for this is to introduce various concept which will be reinterpreted in the context of probability. 1
The mean as measure of location takes the form z=∑ where o i and zi refer to the relatively frequency and the midpoint of interval i The mode as a measure of location refers to the value of income that occurs most frequently in the data set. Another measure of location is the median referring to the value of income in the middle when income are arranged in an ascending order according to the size of income. The best way to calculate the median is to plot the cumulative frequency grap Another important feature of the histogram is the dispersion of the relative frequency around a measure of central tendency. The most frequently used mea- sured of dispersion is the variance defined by ∑(a1-2) which is a measure of dispersion around the mean; v is known as the standard deviation We can extend the concept of the variance to i=1 defining what are known as higher central moments. These higher moments can be used to get a better idea of the shape of the histogram. For example, the standardized form of the third and fourth moments defined by known as the skewness and kurtosis coef ficients, measure the asymmetry and the peakedness of the histogram, respectively. In the case of a symmetric his- togram, SK=0 and the less d the histogram the 1.2 Looking Ahead The most important drawback of descriptive statistics is that the study of the observed data enables us to draw certain conclusions which relate only to the
The mean as measure of location takes the form z¯ = Xn i=1 φizi , where φi and zi refer to the relatively frequency and the midpoint of interval i. The mode as a measure of location refers to the value of income that occurs most frequently in the data set. Another measure of location is the median referring to the value of income in the middle when income are arranged in an ascending order according to the size of income. The best way to calculate the median is to plot the cumulative frequency graph. Another important feature of the histogram is the dispersion of the relative frequency around a measure of central tendency. The most frequently used measured of dispersion is the variance defined by v 2 = Xn i=1 (zi − z¯) 2φi , which is a measure of dispersion around the mean; v is known as the standard deviation. We can extend the concept of the variance to mk = Xn i=1 (zi − z¯) kφi , k = 3, 4, ... defining what are known as higher central moments. These higher moments can be used to get a better idea of the shape of the histogram. For example, the standardized form of the third and fourth moments defined by SK = m3 v 3 and K = m4 v 4 , known as the skewness and kurtosis coefficients, measure the asymmetry and the peakedness of the histogram, respectively. In the case of a symmetric histogram, SK = 0 and the less peaked the histogram the greater value of K. 1.2 Looking Ahead The most important drawback of descriptive statistics is that the study of the observed data enables us to draw certain conclusions which relate only to the 2
data in hand. The temptation in analyzing the above income data is to at tempt to make generalizations beyond the data in hand, in particular about the distribution of income in the US (not just 23000 households in US). This, how- ever, is not possible in the descriptive statistics framework. In order to be able to generalize beyond the data in hand we need to model"the distribution of income in the US and not just describe the observed data in hand. Such a general model is provided by probability theory to be considered in Section 2 It turns out that the model provided by probability theory owns a lot to the earlier developed descriptive statistics. In particular, most of the concepts which form the basis of the probability theory were motivated by the descriptive statistic concept considered above. The concepts of measures of location, dispersion and shape, as well as the frequency curve, were transplanted into probability theory with renewed interpretations. The frequency curve when reinterpreted becomes a density function purporting to model observable real world phenomena, As for the various measures, they will now be reinterpreted in terms of the density function
data in hand. The temptation in analyzing the above income data is to attempt to make generalizations beyond the data in hand, in particular about the distribution of income in the US (not just 23000 households in US). This, however, is not possible in the descriptive statistics framework. In order to be able to generalize beyond the data in hand we need to ”model” the distribution of income in the US and not just describe the observed data in hand. Such a general model is provided by probability theory to be considered in Section 2. It turns out that the model provided by probability theory owns a lot to the earlier developed descriptive statistics. In particular, most of the concepts which form the basis of the probability theory were motivated by the descriptive statistic concept considered above. The concepts of measures of location, dispersion and shape, as well as the frequency curve, were transplanted into probability theory with renewed interpretations. The frequency curve when reinterpreted becomes a density function purporting to model observable real world phenomena, As for the various measures, they will now be reinterpreted in terms of the density function. 3
2 Probability Why we need the probability theory in analyzing observed data? In the descrip- tive study of data considered in the last section, it is emphasized that the result cannot be generalized outside the observed data under consideration. Any ques- tion relating to the population from which the observed data were from cannot be answered within the descriptive statistics framework. In order to be able to do that we need the theoretical framework offered by probability theory. In ef- fect probability theory develops a mathematical model which provides the logical foundation of statistical inference procedures for analyzing observed dat In developing a mathematical model we must first identify the important fea- tures, relations and entities in the real world phenomena and then devise the concepts and choose the assumptions with which to project a generalized de- scription of there phenomena; an idealized pictures of these phenomena. The model as a consistent mathematical system has" a life of its own"and can be analyzed and studied without direct reference to real world phenomena.(Thinks of analyzing the population, we do not have to refer to the information in the sample By the 1920s there was a wealth of results and probability began to grow into a systematic body of knowledge. Although various people attempted a systemati zation of probability it was the work of the Russian mathematician Kolmogorov hich provided to be the cornerstone for a systematic approach to probability theory. Kolmogorov managed to relate the concept of the probability to that of a measure in integration theory and exploited to the full the analogies between set theory and the theory of functions on the one hand and the concept of random variable on the other. In a monumental monograph in 1933 he proposed an axiomatization of probability theory establishing it once and for all as part of mathematical proper. There is no doubt that this monograph provided to be the watershed for the later development of probability theory growing enormously in importance and applicability
2 Probability Why we need the probability theory in analyzing observed data ? In the descriptive study of data considered in the last section, it is emphasized that the results cannot be generalized outside the observed data under consideration. Any question relating to the population from which the observed data were from cannot be answered within the descriptive statistics framework. In order to be able to do that we need the theoretical framework offered by probability theory. In effect probability theory develops a mathematical model which provides the logical foundation of statistical inference procedures for analyzing observed data. In developing a mathematical model we must first identify the important features, relations and entities in the real world phenomena and then devise the concepts and choose the assumptions with which to project a generalized description of there phenomena; an idealized pictures of these phenomena. The model as a consistent mathematical system has ”a life of its own” and can be analyzed and studied without direct reference to real world phenomena. (Thinks of analyzing the population, we do not have to refer to the information in the sample.) By the 1920s there was a wealth of results and probability began to grow into a systematic body of knowledge. Although various people attempted a systematization of probability it was the work of the Russian mathematician Kolmogorov which provided to be the cornerstone for a systematic approach to probability theory. Kolmogorov managed to relate the concept of the probability to that of a measure in integration theory and exploited to the full the analogies between set theory and the theory of functions on the one hand and the concept of a random variable on the other. In a monumental monograph in 1933 he proposed an axiomatization of probability theory establishing it once and for all as part of mathematical proper. There is no doubt that this monograph provided to be the watershed for the later development of probability theory growing enormously in importance and applicability. 4
2.1 The Axiomatic Approach The axiomatic approach to probability proceeds from a set of axioms(accepted without questioning as obvious), which are based on many centuries of human experience, and the subsequent development is built deductively using formal logical arguments, like any other part of mathematics such as geometry or linear algebra. In mathematics an axiomatic system is required to be complete, non redundant and consistent. By complete we mean that the set of axioms postu- lated should enables us to prove every other theorem in the theory in question using the axioms and mathematical logic. The notion of non-redundancy refers to the impossibility of deriving any axiom of the system from the other axioms Consistency refers to the non- contradictory nature of the axioms a probability model is by construction intended to be a description of a chand mechanism giving rise to observed data. The starting point of such a model is provided by the concept of a random experiment describing a simplistic and idealized process giving rise to observed data. The starting point of such a model is provided by the concept of a random erperiment describing a simplistic and idealized process giving rise to the observed data Definition 1 A random experiment, denoted by &, is an experiment which satisfies the fol- wing conditions (a) all possible distinct outcomes are known a priori (b) in any particular trial the outcomes is not known a priori; and (c)it can be repeated under identical conditions The axiomatic approach to probability theory can be viewed as a formalization of the concept of a random experiment. In an attempt to formalize condition(a) all possible distinct outcome are known a priori, Kolmogorov devised the set s which included "all possible distinct outcome" and has to be postulated before the experiment is performed Definition 2 The sample space, denoted by S, is defined to be the set of all possible outcome of the experiment 8. The elements of S are called elementary events
2.1 The Axiomatic Approach The axiomatic approach to probability proceeds from a set of axioms (accepted without questioning as obvious), which are based on many centuries of human experience, and the subsequent development is built deductively using formal logical arguments, like any other part of mathematics such as geometry or linear algebra. In mathematics an axiomatic system is required to be complete, non − redundant and consistent. By complete we mean that the set of axioms postulated should enables us to prove every other theorem in the theory in question using the axioms and mathematical logic. The notion of non-redundancy refers to the impossibility of deriving any axiom of the system from the other axioms. Consistency refers to the non- contradictory nature of the axioms. A probability model is by construction intended to be a description of a chance mechanism giving rise to observed data. The starting point of such a model is provided by the concept of a random experiment describing a simplistic and idealized process giving rise to observed data. The starting point of such a model is provided by the concept of a random experiment describing a simplistic and idealized process giving rise to the observed data. Definition 1: A random experiment, denoted by E, is an experiment which satisfies the following conditions: (a) all possible distinct outcomes are known a priori; (b) in any particular trial the outcomes is not known a priori; and (c) it can be repeated under identical conditions. The axiomatic approach to probability theory can be viewed as a formalization of the concept of a random experiment. In an attempt to formalize condition (a) all possible distinct outcome are known a priori, Kolmogorov devised the set S which included ”all possible distinct outcome” and has to be postulated before the experiment is performed. Definition 2: The sample space, denoted by S, is defined to be the set of all possible outcome of the experiment E. The elements of S are called elementary events. 5
Example Consider the random experiment e of tossing a fair coin twice and observing the faces turning up. The sample space of 8 is S={(HT),(TH),(HH),(TT)} with(hT), (Th), (Hh), TT) being the elementary events belonging to S The second ingredient of E related to(b) and in particular to the various form events can take. A moment's of reflection suggested that there is no particular reason why we should be interested in elementary outcomes only. We might be interested in such event as A1-at least one H, A2-at most one H,, and these are not elementary events; in particular A1={(HT),(TH),(HH)} and A2={(HT),(TH),(TT)} are combinations of elementary events. All such outcome are called events as sociated with the same sample space S and they are defined bi ng elementary events. Understanding the concept of an event is crucial for the discussion which follows. Intuitively an event is any proposition associated with E which may occur or not at each trial. We say that event Al occurs when any one of the elementary events it comprises occurs. Thus, when a trial is made only one elementary event is observed but a large number of event may have occurred Fir example, if the elementary event(HT)occurs in a particular trial, Al and A have occurred as well Given that s is a set with members the elementary events this takes us im- mediately into the realm of set theory and event can be formally defined to be subsets of S formed by set theoretic operation ("n"-intersection, "U" -union, complementation) on the elementary events. For example A1={(HT)}U{(TH)}U{(HH)}={(TT)}cS, A2=(HT)UI(THJU((TT)=I(HHJCS
Example: Consider the random experiment E of tossing a fair coin twice and observing the faces turning up. The sample space of E is S = {(HT),(T H),(HH),(TT)}, with (HT),(T H),(HH),(TT) being the elementary events belonging to S. The second ingredient of E related to (b) and in particular to the various form events can take. A moment’s of reflection suggested that there is no particular reason why we should be interested in elementary outcomes only. We might be interested in such event as A1–’at least one H’, A2–’at most one H’, and these are not elementary events; in particular A1 = {(HT),(T H),(HH)} and A2 = {(HT),(T H),(TT)} are combinations of elementary events. All such outcome are called events associated with the same sample space S and they are defined by combining elementary events. Understanding the concept of an event is crucial for the discussion which follows. Intuitively an event is any proposition associated with E which may occur or not at each trial. We say that event A1 occurs when any one of the elementary events it comprises occurs. Thus, when a trial is made only one elementary event is observed but a large number of event may have occurred. Fir example, if the elementary event (HT) occurs in a particular trial, A1 and A2 have occurred as well. Given that S is a set with members the elementary events this takes us immediately into the realm of set theory and event can be formally defined to be subsets of S formed by set theoretic operation (” ∩ ”-intersection, ” ∪ ”-union, ” − ”-complementation) on the elementary events. For example, A1 = {(HT)} ∪ {(T H)} ∪ {(HH)} = {(TT)} ⊂ S, A2 = {(HT)} ∪ {(T H)} ∪ {(TT)} = {(HH)} ⊂ S. 6
Two special events are S itself, called the sure events and the impos sible event o defined to contain no elements of S, i.e. 0= the latter is defined for com- pleteness a third ingredient of 8 associated with(b) which Kolmogorov had to formal ed was the idea of uncertainty related to the outcome of any particular trial 8. This he formalized in the notion of probabilities attributed to the various events associated with 8, such as P(A1), P(A2), expressing the"likelihood"of occurrence of these events. Although attributing probabilities to the elementary events presents no particular mathematical problem, going the same for events in general is not as straightforward. The difficulty arise because if Al and A2 are ts, A1=S-Al, A2=S-A2, A10A2, A1 UA2, etc, are also events becaus the occurrence or non-occurrence of A1 and A2 implies the occurrence or not of these events. This implies that for the attribution of probabilities to make sense we have to impose some mathematical structure on the set of all events, say F, which reflects the fact that whichever way we combine these events, the end result is always an event. The temptation at this stage is to define f to be the set of all subsets of S, called the power set; Surely, this covers all possibilities In the above example, the power set of s take the form F={S,0,{(HT)},{(TH)},{(HH)},{(TT)},{(HT),(TH)},{(HT),(HH)},{(HT),(TT)} {(TH),(HH)},{(TH),(TT)},{(HH),(①T)},{(HT),(TH),(HH)},{(HT),(TH),(TT)}, {(TH),(HH),(TT)},{(HT),(TH),(HH),(TT)}} Sometimes we are not interested in all the subsets of s. we need to define a set independently of the power set by endowing it with a mathematical structure which ensures that no inconsistency arise. This is achieved by requiring that F in the following has a special mathematical structures, It is a a-field related to S Definition 3 Let f be a set of subsets of s. F is called a o-field if (a) if A E F, then A E F-closure under complementation (b)ifA1∈F,i=1,2,…,then(U1A)∈F- -closure under countable union Note that (a) and (b) taken together implying the following
Two special events are S itself, called the sure events and the impossible event ∅ defined to contain no elements of S, i.e. ∅ = { }; the latter is defined for completeness. A third ingredient of E associated with (b) which Kolmogorov had to formalized was the idea of uncertainty related to the outcome of any particular trial of E. This he formalized in the notion of probabilities attributed to the various events associated with E, such as P(A1), P(A2), expressing the ”likelihood” of occurrence of these events. Although attributing probabilities to the elementary events presents no particular mathematical problem, going the same for events in general is not as straightforward. The difficulty arise because if A1 and A2 are events, A1 = S −A1, A2 = S −A2, A1 ∩A2, A1 ∪A2, etc., are also events because the occurrence or non-occurrence of A1 and A2 implies the occurrence or not of these events. This implies that for the attribution of probabilities to make sense we have to impose some mathematical structure on the set of all events, say F, which reflects the fact that whichever way we combine these events, the end result is always an event. The temptation at this stage is to define F to be the set of all subsets of S, called the power set; Surely, this covers all possibilities ! In the above example, the power set of S take the form F = {S, ∅, {(HT)}, {(T H)}, {(HH)}, {(TT)}, {(HT),(TH)}, {(HT),(HH)}, {(HT),(TT)}, {(T H),(HH)}, {(T H),(TT)}, {(HH),(TT)}, {(HT),(T H),(HH)}, {(HT),(T H),(TT)}, {(T H),(HH),(TT)}, {(HT),(T H),(HH),(TT)}}. Sometimes we are not interested in all the subsets of S, we need to define a set independently of the power set by endowing it with a mathematical structure which ensures that no inconsistency arise. This is achieved by requiring that F in the following has a special mathematical structures, It is a σ-field related to S. Definition 3: Let F be a set of subsets of S. F is called a σ-field if: (a) if A ∈ F, then A ∈ F–closure under complementation; (b) if Ai ∈ F, i = 1, 2, ..., then (∪ ∞ i=1Ai) ∈ F–closure under countable union. Note that (a) and (b) taken together implying the following: 7
(c)S∈J, because AU A=S (d)a∈F(from(c)S=∈升);and (e)A1∈F,i=1,2,…,then(n1A1)∈F These suggest that a o-field is a set of subsets of s which is closed under complementation, countable unions and intersections. That is, any of these op- eration on the elements of will give rise to an element of F Example: If we are interested in events with one of each H or T there is no point in defining the a-field to be the power set, and Fe can do as well with fewer event to attributed probabilities to F={{(HT),(TH)},{(HH),(TT)},S,∞} Check if the set 万1={{(HT)},{(TH),(HH),(TT)},S,} is a o-field or not Let us turn our attention to the various collections of events(o-fields) that are relevant for econometrics Definition 4. The borel o-field B is the smallest collection of sets(called the Borel sets) that Includes (a) all open sets of R; (b) the complements B of any B in B (c) the union Un=1; of any sequences Bi) of sets in B The Borel set of R just defined are said to be generated by the open sets of IR. The same Borel sets would be generated by ball the open half-lines of R the closed half-lines of R, all the open intervals of R, or all the closed intervals of IR. The Borel sets are a rich"collection of events for which probabilities can be defined. To see how the borel set contains alomost every conceviable subset of R
(c) S ∈ F, because A ∪ A = S; (d) ∅ ∈ F (from (c) S = ∅ ∈ F); and (e) Ai ∈ F, i = 1, 2, ..., then (∩ ∞ i=1Ai) ∈ F. These suggest that a σ-field is a set of subsets of S which is closed under complementation, countable unions and intersections. That is, any of these operation on the elements of F will give rise to an element of F. Example: If we are interested in events with one of each H or T there is no point in defining the σ-field to be the power set, and Fc can do as well with fewer events to attributed probabilities to. Fc = {{(HT),(T H)}, {(HH),(TT)}, S, ∅}. Exercise: Check if the set F1 = {{(HT)}, {(T H),(HH),(TT)}, S, ∅} is a σ-field or not. Let us turn our attention to the various collections of events (σ-fields) that are relevant for econometrics. Definition 4: The Borel σ-field B is the smallest collection of sets (called the Borel sets) that includes (a) all open sets of R; (b) the complements B of any B in B; (c) the union ∪ ∞ n=1Bi of any sequences {Bi} of sets in B. The Borel set of R just defined are said to be generated by the open sets of R. The same Borel sets would be generated by ball the open half-lines of R, all the closed half-lines of R, all the open intervals of R, or all the closed intervals of R. The Borel sets are a ”rich” collection of events for which probabilities can be defined. To see how the Borel set contains alomost every conceviable subset of R 8
from the closed half-lines, consider the following example Example Let S be the real line R=a:-o0c)=(a,ooE a(J) (2). Taking countable unions of e:U∞=1(-∞,x-(1/m)]=(-∞,x)∈a(J) (3). Taking complements of(2):(=∞,m)=[x,∞)∈o(); (4).From(1),fory>x,lv,∞)∈a() ]U[v,∞)=(x,y)∈o(J) (6).∩=1(x-(1/mn),x]={x}∈(J) This shows not only that a()is a a-field but it includes almost every con- ceivable subset of R, that is, it coincides with the a-field generated by any set of subsets of R, which we denote by B, i.e. o()=B, or the Borel Field on R Having solved the technical problem in attributing probabilities to events by postulating the existence of a a-field F associated with the sample space S Kolmogorov went on to formalize the concept of probability itself A mapping p: F-0, 1]is a probability measures on S, F) provided that (a)P(∞)=0. (b)For any AE F, P(A)=l-P( (c)For any disjoint sequence (Ai) of sets in F(i.e, A; A,=0 for all i#j), P(U≌1A)=∑1P(A Example
from the closed half-lines, consider the following example. Example: Let S be the real line R = {x : −∞ x} = (x, ∞) ∈ σ(J); (2). Taking countable unions of Bx: ∪ ∞ n=1(−∞, x − (1/n)] = (−∞, x) ∈ σ(J); (3). Taking complements of (2): (−∞, x) = [x, ∞) ∈ σ(J); (4). From (1), for y > x, [y, ∞) ∈ σ(J); (5). From (4), (−∞, x] ∪ [y, ∞) = (x, y) ∈ σ(J); (6). ∩ ∞ n=1(x − (1/n), x] = {x} ∈ σ(J). This shows not only that σ(J) is a σ-field but it includes almost every conceivable subset of R, that is, it coincides with the σ-field generated by any set of subsets of R, which we denote by B, i.e. σ(J) = B, or the Borel Field on R. Having solved the technical problem in attributing probabilities to events by postulating the existence of a σ- field F associated with the sample space S, Kolmogorov went on to formalize the concept of probability itself. Definition 5: A mapping P : F → [0, 1] is a probability measures on {S, F} provided that (a) P(∅) = 0. (b) For any A ∈ F, P(A) = 1 − P(A). (c) For any disjoint sequence {Ai} of sets in F (i.e., Ai ∩ Aj = ∅ for all i 6= j), P(∪ ∞ i=1Ai) = P∞ i=1 P(Ai). Example: 9
Since ((HT)n((HH)=8, P({(HT)}U{(HH)}=P({(HT)}+P(0{(HH)} 111 42 To summarize the argument so far, Kolmogorov formalized the condition(a) and () of the random experiment in the form of the trinity(S, F, P()com- prising the set of all outcomes S-the sample space a a-field F of events re- ated to S and a probability function P() assigning probability to events in F For the coin example, if we choose F(The first is H and the second is T) &(HT), I (TH), (HH), (TT)1,8, S to be the o-field of interest, P() is defined PS)=1,P()=0.P({(HD))=4,((TH),(Hm,(T)})=4 Because of its importance the trinity(S, F, P() is given a name Definition 6: A sample space S endowed with a a-field F and a probability measure P(is called a probability space. That is we call the triple (S, F, p) a probability As far as condition()of 8 is concerned, yet to be formalized, it will prove of paramount importance in the context of the limit theorems in Chapter 4 2.2 Conditional Probability So far we have considered probabilities of events on the assumption that no information is available relating to the outcome of a particular trial. Sometimes however, additional information is available in the form of the known occurrence of some event A. For example, in the case of tossing a fair coin twice we might know that in the first trial it was heads. what difference does this information make to the original triple(S, F, p)? Firstly, knowing that the first trial was a head, the set of all possible outcomes now becomes SA={(H),(HH)}
Since {(HT)} ∩ {(HH)} = ∅, P({(HT)} ∪ {(HH)}) = P({(HT)}) + P(∩{(HH)}) = 1 4 + 1 4 = 1 2 . To summarize the argument so far, Kolmogorov formalized the condition (a) and (b) of the random experiment E in the form of the trinity (S, F,P(·)) comprising the set of all outcomes S–the sample space, a σ-field F of events related to S and a probability function P(·) assigning probability to events in F. For the coin example, if we choose F(The first is H and the second is T)= {{(HT)}, {(T H),(HH),(TT)}, ∅, S} to be the σ-field of interest, P(·) is defined by P(S) = 1, P(∅) = 0, P({(HT)}) = 1 4 , P({(T H),(HH),(TT)}) = 3 4 . Because of its importance the trinity (S, F,P(·)) is given a name. Definition 6: A sample space S endowed with a σ-field F and a probability measure P(·) is called a probability space. That is we call the triple (S, F,P) a probability space. As far as condition (c) of E is concerned, yet to be formalized, it will prove of paramount importance in the context of the limit theorems in Chapter 4. 2.2 Conditional Probability So far we have considered probabilities of events on the assumption that no information is available relating to the outcome of a particular trial. Sometimes, however, additional information is available in the form of the known occurrence of some event A. For example, in the case of tossing a fair coin twice we might know that in the first trial it was heads. What difference does this information make to the original triple (S, F,P) ? Firstly, knowing that the first trial was a head, the set of all possible outcomes now becomes SA = {(HT),(HH)}, 10