6.042/18.] Mathematics for Computer Science April 28, 2005 Srini devadas and Eric Lehman Lecture notes Random variable Weve used probablity to model a variety of experiments, games, and tests. Through out, we have tried to compute probabilities of events. We asked for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn't, you're sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical 1 Random variables Let's begin with an example. Consider the experiment of tossing three independent, un- biased coins. Let C be the number of heads that appear. Let M= l if the three coins come up all heads or all tails, and let M=0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C=2 and M=0. If we flip tails, tails, tails, then C=0 and M=l. In effect, C counts the number of heads and M indicates whether all the coins match Since each outcome uniquely determines C and M, we can regard them as function mapping outcomes to numbers. For this experiment, the sample space is S=HHH, HHT, HTH, HTT, THH, THT, TTH, TTT) Now C is a function that maps each outcome in the sample space to a number as follows C(HHHI CTHH 2 C(HHT) 3221 CTHT C(HTH) C(TTH) C(HTT) C(TTT)=0 Similarly, M is a function mapping each outcome another way M(HHH M(HHT M(THT 0 M(HTH)=0 METH M(HTT)=0 M(TTT)
6.042/18.062J Mathematics for Computer Science April 28, 2005 Srini Devadas and Eric Lehman Lecture Notes Random Variables We’ve used probablity to model a variety of experiments, games, and tests. Throughout, we have tried to compute probabilities of events. We asked, for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn’t, you’re sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical tool. 1 Random Variables Let’s begin with an example. Consider the experiment of tossing three independent, unbiased coins. Let C be the number of heads that appear. Let M = 1 if the three coins come up all heads or all tails, and let M = 0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C = 2 and M = 0. If we flip tails, tails, tails, then C = 0 and M = 1. In effect, C counts the number of heads, and M indicates whether all the coins match. Since each outcome uniquely determines C and M, we can regard them as functions mapping outcomes to numbers. For this experiment, the sample space is: S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T} Now C is a function that maps each outcome in the sample space to a number as follows: C(HHH) = 3 C(T HH) = 2 C(HHT) = 2 C(T HT) = 1 C(HTH) = 2 C(T T H) = 1 C(HT T) = 1 C(T T T) = 0 Similarly, M is a function mapping each outcome another way: M(HHH) = 1 M(T HH) = 0 M(HHT) = 0 M(T HT) = 0 M(HTH) = 0 M(T T H) = 0 M(HT T) = 0 M(T T T) = 1
2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space.(The codomain can be anything, but well usually use a subset of the real numbers. )Notice that the name"random variable is a misnomer; random variables are actually functions! 1.1 Indicator random variables An indicator random variable (or simply an indicator or a Bernoulli random variable)is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M= l; otherwise, M=0 Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped toO. For example, the indicator M partitions the sample space into two blocks as follows HHH TTT HHT HTH HTT THH THT TTH In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in e and o for all outcomes not in e Thus, M is an indicator random variable for the event that all three coins mato 1.2 Random variables and events There is a strong relationship between events and more general random variables as well A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows TTT TTH THT HTT THH HTH HHT HHH Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C= 2 consists of the outcomes thh hth, and hht. the event c< l consists of the outcomes ttt tth. tht and htt. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example Pr(M=1)=Pr(TTT)+ Pr(HHh
� � � 2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space. (The codomain can be anything, but we’ll usually use a subset of the real numbers.) Notice that the name “random variable” is a misnomer; random variables are actually functions! 1.1 Indicator Random Variables An indicator random variable (or simply an indicator or a Bernoulli random variable) is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M = 1; otherwise, M = 0. Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped to 0. For example, the indicator M partitions the sample space into two blocks as follows: HHH �� T T T HHT HTH HT T ��T HH T HT T T H � M = 1 M = 0 In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in E and 0 for all outcomes not in E. Thus, M is an indicator random variable for the event that all three coins match. 1.2 Random Variables and Events There is a strong relationship between events and more general random variables as well. A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows: T T T T T H T HT HT T � T HH HTH HHT � HHH � �� � � �� � �� � �� � C = 0 C = 1 C = 2 C = 3 Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C = 2 consists of the outcomes T HH, HTH, and HHT. The event C ≤ 1 consists of the outcomes T T T, T T H, T HT, and HT T. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example: Pr (M = 1) = Pr (T T T) + Pr (HHH) 1 1 = + 8 8 1 = 4
Random variables As another example Pr(C22)=Pr THh)+Pr(hTh)+Pr(hHt)+ Pr(hhH) × 11 This is pretty wild; one normally thinks of equations and inequalities as either true or false. But when variables are replaced by random variables, there is a probability that the elationship holds 1.3 Conditional Probability Mixing conditional probabilities and events involving random variables creates no new difficulties. For example, Pr(C22 M=0) is the probability that at least two coins are heads(C> 2), given that not all three coins are the same(M=0). We can compute this probability using the definition of conditional probability Pr(≥2|M=0)=P(≥2nM=0) Pr (M=0) Pr(THH, HTH, HHTI) Pr(THH, HTH, HHT, HTT, THT, TTHD) The expression C2 2nM=0 on the first line may look odd; what is the set operation n doing between an inequality and an equality? But recall that, in this context, C 2 and M=0 are events, which sets of outcomes. So taking their intersection is perfectly valid! 1.4 Independence The notion of independence carries over from events to random variables as well. Ran dom variables R and r2 are independent if Pr(r1=1n R2= T2)=Pr(R1=T1). Pr(R2=2) for all i in the codomain of Ri and t2 in the codomain of r As with events, we can formulate independence for random variables in an equivalent and perhaps more intuitive way: random variables Ri and R2 are independent if and only Pr(R1= 1l R2=12)=Pr(R1=ai) or Pr(R2=x2)
Random Variables 3 As another example: Pr (C ≥ 2) = Pr (T HH) + Pr (HTH) + Pr (HHT) + Pr (HHH) 1 1 1 1 = + + + 8 8 8 8 1 = 2 This is pretty wild; one normally thinks of equations and inequalities as either true or false. But when variables are replaced by random variables, there is a probability that the relationship holds! 1.3 Conditional Probability Mixing conditional probabilities and events involving random variables creates no new difficulties. For example, Pr (C ≥ 2 | M = 0) is the probability that at least two coins are heads (C ≥ 2), given that not all three coins are the same (M = 0). We can compute this probability using the definition of conditional probability: Pr (C ≥ 2 M = 0) = Pr (C ≥ 2 ∩ M = 0) | Pr (M = 0) Pr ({T HH, HT H, HHT}) = Pr ({T HH, HT H, HHT, HT T, T HT, T T H}) 3/8 = 6/8 1 = 2 The expression C ≥ 2 ∩ M = 0 on the first line may look odd; what is the set operation ∩ doing between an inequality and an equality? But recall that, in this context, C ≥ 2 and M = 0 are events, which sets of outcomes. So taking their intersection is perfectly valid! 1.4 Independence The notion of independence carries over from events to random variables as well. Random variables R1 and R2 are independent if Pr (R1 = x1 ∩ R2 = x2) = Pr (R1 = x1) · Pr (R2 = x2) for all x1 in the codomain of R1 and x2 in the codomain of R2. As with events, we can formulate independence for random variables in an equivalent and perhaps more intuitive way: random variables R1 and R2 are independent if and only if Pr (R1 = x1 | R2 = x2) = Pr (R1 = x1) or Pr (R2 = x2) = 0
Random variables for all i in the codomain of Ri and r2 in the codomain of R2. In words, the probability that R, takes on a particular value is unaffected by the value of r As an example are C and M independent? Intuitively the answer should be"no The number of heads, C, completely determines whether all three coins match; that is whether M= l. But to verify this intuition we must find some 21, 2 E R such that: Pr(C=x1∩M=x2)≠Pr(C=x1)·Pr(M=x2) One appropriate choice of values is T1=2 and 2=1. In that case, we have Pr(c=2nM=1)=0 but Pr(C=2). Pr(M=1) ≠0 The notion of independence generalizes to a set of random variables as follows. Ran dom variables R1, R2, .. Rn are mutually independent if Pr(R1=x1∩R2=x2∩…∩Rn1=xn) Pr(R1=a1). Pr(R2=T2). Pr(Rn=an) for all x1,…, Inin the codomains of R1,…,Rn A consequence of this definition of mutual independence is that the probability of ssignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2,..., R1oo are mutually independent random variables with codomain n, then it follows that Pr(R1=9∩R7=84∩R23=13)=Pr(R1=9)·Pr(R7=84)Pr(R23=13) (This follows by summing over all possible values of the other random variables; we omit the details. 1.5 An Example with dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs(r1, r2)where r1, r2 E(1,2, 3, 4, 5, 6. Thus, for example, the outcome (3, 5)corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1 /6 1 6=1 36 since the dice are fair and indepen We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1( 3, 5)=3 and D2(3, 5)=5. Then the expression D1+ D2 random variable; lets call it T for"total". More precisely, weve defined T(w)=D1(w)+ D(w) for every outcome w Thus, T(3, 5)=D1( 3, 5)+ D2 (3, 5)=3+5=8. In general, any function of random variables is itself a random variable. For example,VDi+cos(D2)is a strange, but well defined random variable
� 4 Random Variables for all x1 in the codomain of R1 and x2 in the codomain of R2. In words, the probability that R1 takes on a particular value is unaffected by the value of R2. As an example, are C and M independent? Intuitively, the answer should be “no”. The number of heads, C, completely determines whether all three coins match; that is, whether M = 1. But to verify this intuition we must find some x1, x2 ∈ R such that: Pr (C = x1 ∩ M = x2) = Pr (C = x1) · Pr (M = x2) One appropriate choice of values is x1 = 2 and x2 = 1. In that case, we have: 3 1 Pr (C = 2 ∩ M = 1) = 0 but Pr (C = 2) · Pr (M = 1) = = 0 8 · 4 � The notion of independence generalizes to a set of random variables as follows. Random variables R1, R2, . . . , Rn are mutually independent if Pr (R1 = x1 ∩ R2 = x2 ∩ · · · ∩ Rn = xn) = Pr (R1 = x1) · Pr (R2 = x2)· · ·Pr (Rn = xn) for all x1, . . . , xn in the codomains of R1, . . . , Rn. A consequence of this definition of mutual independence is that the probability of an assignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2, . . . , R100 are mutually independent random variables with codomain N, then it follows that: Pr (R1 = 9 ∩ R7 = 84 ∩ R23 = 13) = Pr (R1 = 9) · Pr (R7 = 84) · Pr (R23 = 13) (This follows by summing over all possible values of the other random variables; we omit the details.) 1.5 An Example with Dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs (r1, r2) where r1, r2 ∈ {1, 2, 3, 4, 5, 6}. Thus, for example, the outcome (3, 5) corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1/6 1· /6 = 1/36 since the dice are fair and independent. We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1(3, 5) = 3 and D2(3, 5) = 5. Then the expression D1 + D2 is another random variable; let’s call it T for “total”. More precisely, we’ve defined: T(w) = D1(w) + D2(w) for every outcome w Thus, T(3, 5) = D1(3, 5) + D2(3, 5) = 3 + 5 = 8. In general, any function of random variables is itself a random variable. For example, √D1 + cos(D2) is a strange, but welldefined random variable
Let's also define an indicator random variable s for the event that the total of the two dice is seven: 1 if T( ifT()≠7 So S is equal to 1 when the sum is seven and is equal to 0 otherwise. For example, S(4,3)=1,butS(5,3) Now lets consider a couple questions about independence. First, are D1 and T inde- pendent? Intuitively, the answer would seem to be"no" since the number that comes up on the first die strongly affects the total of the two dice. But to prove this, we must find integers c1 and t, such that: Pr(D1 2)+Pr(D1=.1). Pr(T= 2) For example, we might choose T1=2 and 2=3. In this case, we have Pr(T=2|D1=3)=0 since the total can not be only 2 when one die alone is 3. On the other hand, we have (T=2).Pr(D1=3)=Pr({1,1})·Pr({(3,1),(3,2),…,(3,6)}) ≠0 So, as we suspected, these random variables are not independent. Are S and D independent? Once again, intuition suggests that the answer is"no The number on the first die ought to affect whether or not the sum is equal to seven But this time intuition turns out to be wrong! These two random variables actua Proving that two random variables are independent takes some work.(fortunately this is an uncommon task; usually independence is a modeling assumption. Only rarely do random variables unexpectedly turn out to be independent. In this case, we must show that Pr(S=rinD=T2)=Pr(S=a1). Pr(D1=x2) for all 1 E 0, 1) and all 2 E(1, 2, 3, 4, 5, 6. We can work through all these possibilities Suppose that 1=1. Then for every value of z2 we have Pr(S=1)=Pr(.6,2.),…,(6,1)=6 Pr(D1=2)=P(x21,(22,…,(x26)=6 Pr(S=1nD1=x2)=Pr(x2,7-x2)) Since 1/6. 1/6=1 /36, the independence condition is satisfied
� Random Variables 5 Let’s also define an indicator random variable S for the event that the total of the two dice is seven: � 1 if T(w) = 7 S(w) = 0 if T(w) =� 7 So S is equal to 1 when the sum is seven and is equal to 0 otherwise. For example, S(4, 3) = 1, but S(5, 3) = 0. Now let’s consider a couple questions about independence. First, are D1 and T independent? Intuitively, the answer would seem to be “no” since the number that comes up on the first die strongly affects the total of the two dice. But to prove this, we must find integers x1 and x2 such that: Pr (D1 = x1 ∩ T = x2) = Pr (D1 = x1) · Pr (T = x2) For example, we might choose x1 = 2 and x2 = 3. In this case, we have Pr (T = 2 | D1 = 3) = 0 since the total can not be only 2 when one die alone is 3. On the other hand, we have: Pr (T = 2) · Pr (D1 = 3) = Pr ({1, 1}) · Pr ({(3, 1), (3, 2), . . . , (3, 6)}) 1 6 = = 0 36 · 36 � So, as we suspected, these random variables are not independent. Are S and D1 independent? Once again, intuition suggests that the answer is “no”. The number on the first die ought to affect whether or not the sum is equal to seven. But this time intuition turns out to be wrong! These two random variables actually are independent. Proving that two random variables are independent takes some work. (Fortunately, this is an uncommon task; usually independence is a modeling assumption. Only rarely do random variables unexpectedly turn out to be independent.) In this case, we must show that Pr (S = x1 ∩ D1 = x2) = Pr (S = x1) · Pr (D1 = x2) (1) for all x1 ∈ {0, 1} and all x2 ∈ {1, 2, 3, 4, 5, 6}. We can work through all these possibilities in two batches: • Suppose that x1 = 1. Then for every value of x2 we have: 1 Pr (S = 1) = Pr ((1, 6), (2, 5), . . . , (6, 1)) = 6 1 Pr (D1 = x2) = Pr ((x2, 1), (x2, 2), . . . , (x2, 6)) = 6 1 Pr (S = 1 ∩ D1 = x2) = Pr ((x2, 7 − x2)) = 36 Since 1/6 1· /6 = 1/36, the independence condition is satisfied
Random variables Otherwise, suppose that 1=0. Then we have Pr(s=0)=1-Pr(S=1)=5/6 and Pr(D,=t2=1/6 as before. Now the event S=0∩D consists of 5 outcomes: all of (a2, 1),(2, 2) 2, 6)except for(a2, 7-r2)There- fore, the probability of this event is 5/36. Since 5/6. 1 / 6=5/36, the independence condition is again satisfied Thus, the outcome of the first die roll is independent of the fact that the sum is 7. This a strange, isolated result; for example, the first roll is not independent of the fact that the um is 6 or 8 or any number other than 7. But this example shows that the mathematical notion of independent random variables- while closely related to the intutive notion of unrelated quantities"is not exactly the same thing 2 Probability Distributions a random variable is defined to be a function whose domain is the sample space of an experiment. Often, however, random variables with essentially the same properties show up in completely different experiments. For example, some random variable that come up in polling, in primality testing, and in coin flipping all share some common properties If we could study such random variables in the abstract, divorced from the details any particular experiment, then our conclusions would apply to all the experiments where that sort of random variable turned up. Such general conclusions could be very useful There are a couple tools that capture the essential properties of a random variable, but leave other details of the associated experiment behind The probability density function (pdf) for a random variable R with codomain V is a function PDFR: V-0, 1] defined by PDFR()=Pr(R= . a consequence of this definition is that ∑PDFn(x)=1 since the random variable always takes on exactly one value in the set v As an example, let's return to the experiment of rolling two fair, independent dice. As (2,, 12). A plot of the probability density function is shown below es in the set before, let t be the total of the two rolls. This random variable takes on val
� 6 Random Variables • Otherwise, suppose that x1 = 0. Then we have Pr (S = 0) = 1 − Pr (S = 1) = 5/6 and Pr (D1 = x2) = 1/6 as before. Now the event S = 0 ∩ D1 = x2 consists of 5 outcomes: all of (x2, 1), (x2, 2), . . . , (x2, 6) except for (x2, 7−x2). Therefore, the probability of this event is 5/36. Since 5/6 1· /6 = 5/36, the independence condition is again satisfied. Thus, the outcome of the first die roll is independent of the fact that the sum is 7. This is a strange, isolated result; for example, the first roll is not independent of the fact that the sum is 6 or 8 or any number other than 7. But this example shows that the mathematical notion of independent random variables— while closely related to the intutive notion of “unrelated quantities”— is not exactly the same thing. 2 Probability Distributions A random variable is defined to be a function whose domain is the sample space of an experiment. Often, however, random variables with essentially the same properties show up in completely different experiments. For example, some random variable that come up in polling, in primality testing, and in coin flipping all share some common properties. If we could study such random variables in the abstract, divorced from the details any particular experiment, then our conclusions would apply to all the experiments where that sort of random variable turned up. Such general conclusions could be very useful. There are a couple tools that capture the essential properties of a random variable, but leave other details of the associated experiment behind. The probability density function (pdf) for a random variable R with codomain V is a function PDFR : V → [0, 1] defined by: PDFR(x) = Pr (R = x) A consequence of this definition is that PDFR(x) = 1 x∈V since the random variable always takes on exactly one value in the set V . As an example, let’s return to the experiment of rolling two fair, independent dice. As before, let T be the total of the two rolls. This random variable takes on values in the set V = {2, 3, . . . , 12}. A plot of the probability density function is shown below:
Random variables 2345678910111 The lump in the middle indicates that sums close to 7 are the most likely. The total area of all the rectangles is 1 since the dice must take on exactly one of the sums in V {2,3 A closely-related idea is the cumulative distribution function(cdf) for a random vari ble R. This is a function CDFR: V-[0, 1]defined by CDFR(x)=Pr(R≤x) As an example, the cumulative distribution function for the random variable T is show below 1 CDFR() 1/2 x∈ The height of the i-th bar in the cumulative distribution function is equal to the sum of the heights of the leftmost i bars in the probability density function. This follows from the definitions of pdf and cdf CDFR(x)=Pr(R≤x) ∑Pr(R=y) PDF In summary, PDFR(a)measures the probability that R=x and CDFr(a)measur the probability that R s a. Both the PDFR and CDFR capture the same information
� � Random Variables 7 6/36 6 PDFR(x) 3/36 - 2 3 4 5 6 7 8 9 10 11 12 x ∈ V The lump in the middle indicates that sums close to 7 are the most likely. The total area of all the rectangles is 1 since the dice must take on exactly one of the sums in V = {2, 3, . . . , 12}. A closelyrelated idea is the cumulative distribution function (cdf) for a random variable R. This is a function CDFR : V → [0, 1] defined by: CDFR(x) = Pr (R ≤ x) As an example, the cumulative distribution function for the random variable T is shown below: 1 6 CDFR(x) 1/2 0 - 2 3 4 5 6 7 8 9 10 11 12 x ∈ V The height of the ith bar in the cumulative distribution function is equal to the sum of the heights of the leftmost i bars in the probability density function. This follows from the definitions of pdf and cdf: CDFR(x) = Pr (R ≤ x) = Pr (R = y) y≤x = PDFR(y) y≤x In summary, PDFR(x) measures the probability that R = x and CDFR(x) measures the probability that R ≤ x. Both the PDFR and CDFR capture the same information
Random Variables bout the random variable R- you can derive one from the other-but sometimes one is more convenient. The key point here is that neither the probability density function nor the cumulative distribution function involves the sample space of an experiment. Thus, through these functions, we can study random variables without reference to a particular experiment. For the remainder of today we'll look at three important distributions and some plication 2.1 Bernoulli distribution Indicator random variables are perhaps the most common type because of their close association with events. The probability density function of an indicator random variable b is alway PDFB(O) PDFB(1)=l-p where <p< l. The corresponding cumulative ditribution function is CDFB(O)=p CDFB(1) This is called the Bernoulli distribution. The number of heads flipped on a(possibly biased)coin has a Bernoulli distribution. 2.2 Uniform distribution A random variable that takes on each possible values with the same probability is called uniform. For example, the probability density function of a random variable U that is uniform on the set (1 PDFU(k) And the cumulative distribution function is: CDFU (k) Uniform distributions come up all the time. For example, the number rolled on a fair die is uniform on the set (1, 2, .., 6
8 Random Variables about the random variable R— you can derive one from the other— but sometimes one is more convenient. The key point here is that neither the probability density function nor the cumulative distribution function involves the sample space of an experiment. Thus, through these functions, we can study random variables without reference to a particular experiment. For the remainder of today, we’ll look at three important distributions and some applications. 2.1 Bernoulli Distribution Indicator random variables are perhaps the most common type because of their close association with events. The probability density function of an indicator random variable B is always PDFB(0) = p PDFB(1) = 1 − p where 0 ≤ p ≤ 1. The corresponding cumulative ditribution function is: CDFB(0) = p CDFB(1) = 1 This is called the Bernoulli distribution. The number of heads flipped on a (possibly biased) coin has a Bernoulli distribution. 2.2 Uniform Distribution A random variable that takes on each possible values with the same probability is called uniform. For example, the probability density function of a random variable U that is uniform on the set {1, 2, . . . , N} is: 1 PDFU (k) = N And the cumulative distribution function is: k CDFU (k) = N Uniform distributions come up all the time. For example, the number rolled on a fair die is uniform on the set {1, 2, . . . , 6}
Random variables 2.3 The Numbers game Let's play a game! I have two envelopes. Each contains an integer in the range 0, 1,...,100 and the numbers are distinct. To win the game, you must determine which envelope con tains the larger number. To give you a fighting chance, I'll let you peek at the number in one envelope selected at random. Can you devise a strategy that gives you a better than 50% chance of winning? For example, you could just pick an evelope at random and guess that it contains the larger number. But this strategy wins only 50% of the time. Your challenge is to do better So you might try to be more clever. Suppose you peek in the left envelope and see the number 12. Since 12 is a small number, you might guess thatthat other number is larger. But perhaps I'm sort of tricky and put small numbers in both envelopes. Then your guess might not be so good! An important point here is that the numbers in the envelopes may not be random I'm picking the numbers and I'm choosing them in a way that I think will defeat your guessing strategy. I'll only use randomization to choose the numbers if that serves my end: making you lose 2.3.1 Intuition Behind the Winning Strategy Amazingly, there is a strategy that wins more than 50% of the time, regardless of what numbers i put in the envel Suppose that you somehow knew a number a between my lower number and higher numbers. Now you peek in an envelope and see one or the other. If it is bigger than r, then you know you' re peeking at the higher number. If it is smaller than then you're peeking at the lower number. In other words, if you know an number z between my lower and higher numbers, then you are certain to win the game The only flaw with this brilliant strategy is that you do not know c. Oh well But what if you try to guess a? There is some probability that you guess correctly. In this case, you win 100% of the time. On the other hand if you guess incorrectly, then you're no worse off than before; your chance of winning is still 50%o. Combining these two cases, your overall chance of winning is better than 50%! Informal arguments about probability, like this one, often sound plausible, but do not hold up under close scrutiny. In contrast, this argument sounds completely implausible- but is actually correct 2.3.2 Anal f the winning Strategy For generality, suppose that I can choose numbers from the set O,1,...,n). Call the lowe number L and the higher number H
Random Variables 9 2.3 The Numbers Game Let’s play a game! I have two envelopes. Each contains an integerin the range 0, 1, . . . , 100, and the numbers are distinct. To win the game, you must determine which envelope contains the larger number. To give you a fighting chance, I’ll let you peek at the number in one envelope selected at random. Can you devise a strategy that gives you a better than 50% chance of winning? For example, you could just pick an evelope at random and guess that it contains the larger number. But this strategy wins only 50% of the time. Your challenge is to do better. So you might try to be more clever. Suppose you peek in the left envelope and see the number 12. Since 12 is a small number, you might guess that that other number is larger. But perhaps I’m sort of tricky and put small numbers in both envelopes. Then your guess might not be so good! An important point here is that the numbers in the envelopes may not be random. I’m picking the numbers and I’m choosing them in a way that I think will defeat your guessing strategy. I’ll only use randomization to choose the numbers if that serves my end: making you lose! 2.3.1 Intuition Behind the Winning Strategy Amazingly, there is a strategy that wins more than 50% of the time, regardless of what numbers I put in the envelopes! Suppose that you somehow knew a number x between my lower number and higher numbers. Now you peek in an envelope and see one or the other. If it is bigger than x, then you know you’re peeking at the higher number. If it is smaller than x, then you’re peeking at the lower number. In other words, if you know an number x between my lower and higher numbers, then you are certain to win the game. The only flaw with this brilliant strategy is that you do not know x. Oh well. But what if you try to guess x? There is some probability that you guess correctly. In this case, you win 100% of the time. On the other hand, if you guess incorrectly, then you’re no worse off than before; your chance of winning is still 50%. Combining these two cases, your overall chance of winning is better than 50%! Informal arguments about probability, like this one, often sound plausible, but do not hold up under close scrutiny. In contrast, this argument sounds completely implausible— but is actually correct! 2.3.2 Analysis of the Winning Strategy For generality, suppose that I can choose numbers from the set {0, 1, . . . , n}. Call the lower number L and the higher number H
Random Variables Your goal is to guess a number a between L and H. To avoid confusing equality cases, you select a at random from among the half-integers But what probability distribution should you use? The uniform distribution turns out to be your best bet. An informal justification is that if I figured out that you were unlikely to pick some number-- say 505- then I'd always put 50 and 51 in the evelopes. Then you'd be unlikely to pick an x between L and H and would have less chance of winning After you ve selected the number z, you peek into an envelope and see some number p. If p>I, then you guess that you're looking at the er number If p h),or just right(L<a< H). Then you either peek at the lower number (p= L)or the higher number (p=h). This gives a total of six possible outcomes peeked at choice of x H L/2n 112 x too low wIn (H-L)2n x just right (H-L)/n p=H (H-L)2n x too high 12p=L In (n-H)/n (n-H)/2n H Step 2: Define events of interest. The four outcomes in the event that you win are marked in the tree diagram Step 3: Assign outcome probabilities. First, we assign edge probabilities. Your guess c is too low with probability L/n, too high with probability(n-H)/n, and just right with probability(H-L)/n. Next, you peek at either the lower or higher number with equal probability. Multiplying along root-to-leaf paths gives the outcome probabilities Step 4: Compute event probabilities. The probability of the event that you win is the
� � 10 Random Variables Your goal is to guess a number x between L and H. To avoid confusing equality cases, you select x at random from among the halfintegers: 1 1 1 1 , 1 , 2 , . . . , n − 2 2 2 2 But what probability distribution should you use? The uniform distribution turns out to be your best bet. An informal justification is that if I figured out that you were unlikely to pick some number— say 50 1 2 — then I’d always put 50 and 51 in the evelopes. Then you’d be unlikely to pick an x between L and H and would have less chance of winning. After you’ve selected the number x, you peek into an envelope and see some number p. If p > x, then you guess that you’re looking at the larger number. If p H), or just right (L < x < H). Then you either peek at the lower number (p = L) or the higher number (p = H). This gives a total of six possible outcomes. x just right 1/2 1/2 1/2 1/2 1/2 1/2 L/n (H−L)/n (n−H)/n choice of x # peeked at result probability win win x too high x too low win lose win lose L/2n L/2n (H−L)/2n (H−L)/2n (n−H)/2n (n−H)/2n p=H p=L p=H p=L p=H p=L Step 2: Define events of interest. The four outcomes in the event that you win are marked in the tree diagram. Step 3: Assign outcome probabilities. First, we assign edge probabilities. Your guess x is too low with probability L/n, too high with probability (n − H)/n, and just right with probability (H − L)/n. Next, you peek at either the lower or higher number with equal probability. Multiplying along roottoleaf paths gives the outcome probabilities. Step 4: Compute event probabilities. The probability of the event that you win is the