《Mathematics for Computer》Random variable.pdf_大学文库

6.042/18.] Mathematics for Computer Science April 28, 2005 Srini devadas and Eric Lehman Lecture notes Random variable Weve used probablity to model a variety of experiments, games, and tests. Through out, we have tried to compute probabilities of events. We asked for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn't, you're sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical 1 Random variables Let's begin with an example. Consider the experiment of tossing three independent, un- biased coins. Let C be the number of heads that appear. Let M= l if the three coins come up all heads or all tails, and let M=0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C=2 and M=0. If we flip tails, tails, tails, then C=0 and M=l. In effect, C counts the number of heads and M indicates whether all the coins match Since each outcome uniquely determines C and M, we can regard them as function mapping outcomes to numbers. For this experiment, the sample space is S=HHH, HHT, HTH, HTT, THH, THT, TTH, TTT) Now C is a function that maps each outcome in the sample space to a number as follows C(HHHI CTHH 2 C(HHT) 3221 CTHT C(HTH) C(TTH) C(HTT) C(TTT)=0 Similarly, M is a function mapping each outcome another way M(HHH M(HHT M(THT 0 M(HTH)=0 METH M(HTT)=0 M(TTT)

6.042/18.062J Mathematics for Computer Science April 28, 2005 Srini Devadas and Eric Lehman Lecture Notes Random Variables We’ve used probablity to model a variety of experiments, games, and tests. Throughout, we have tried to compute probabilities of events. We asked, for example, what is the probability of the event that you win the Monty Hall game? What is the probability of the event that it rains, given that the weatherman carried his umbrella today? What is the probability of the event that you have a rare disease, given that you tested positive? But one can ask more general questions about an experiment. How hard will it rain? How long will this illness last? How much will I lose playing 6.042 games all day? These questions are fundamentally different and not easily phrased in terms of events. The problem is that an event either does or does not happen: you win or lose, it rains or doesn’t, you’re sick or not. But these new questions are about matters of degree: how much, how hard, how long? To approach these questions, we need a new mathematical tool. 1 Random Variables Let’s begin with an example. Consider the experiment of tossing three independent, unbiased coins. Let C be the number of heads that appear. Let M = 1 if the three coins come up all heads or all tails, and let M = 0 otherwise. Now every outcome of the three coin flips uniquely determines the values of C and M. For example, if we flip heads, tails, heads, then C = 2 and M = 0. If we flip tails, tails, tails, then C = 0 and M = 1. In effect, C counts the number of heads, and M indicates whether all the coins match. Since each outcome uniquely determines C and M, we can regard them as functions mapping outcomes to numbers. For this experiment, the sample space is: S = {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T} Now C is a function that maps each outcome in the sample space to a number as follows: C(HHH) = 3 C(T HH) = 2 C(HHT) = 2 C(T HT) = 1 C(HTH) = 2 C(T T H) = 1 C(HT T) = 1 C(T T T) = 0 Similarly, M is a function mapping each outcome another way: M(HHH) = 1 M(T HH) = 0 M(HHT) = 0 M(T HT) = 0 M(HTH) = 0 M(T T H) = 0 M(HT T) = 0 M(T T T) = 1

2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space.(The codomain can be anything, but well usually use a subset of the real numbers. )Notice that the name"random variable is a misnomer; random variables are actually functions! 1.1 Indicator random variables An indicator random variable (or simply an indicator or a Bernoulli random variable)is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M= l; otherwise, M=0 Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped toO. For example, the indicator M partitions the sample space into two blocks as follows HHH TTT HHT HTH HTT THH THT TTH In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in e and o for all outcomes not in e Thus, M is an indicator random variable for the event that all three coins mato 1.2 Random variables and events There is a strong relationship between events and more general random variables as well A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows TTT TTH THT HTT THH HTH HHT HHH Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C= 2 consists of the outcomes thh hth, and hht. the event c< l consists of the outcomes ttt tth. tht and htt. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example Pr(M=1)=Pr(TTT)+ Pr(HHh

� � � 2 Random Variables The functions C and M are examples of random variables. In general, a random variable is a function whose domain is the sample space. (The codomain can be anything, but we’ll usually use a subset of the real numbers.) Notice that the name “random variable” is a misnomer; random variables are actually functions! 1.1 Indicator Random Variables An indicator random variable (or simply an indicator or a Bernoulli random variable) is a random variable that maps every outcome to either 0 or 1. The random variable M is an example. If all three coins match, then M = 1; otherwise, M = 0. Indicator random variables are closely related to events. In particular, an indicator partitions the sample space into those outcomes mapped to 1 and those outcomes mapped to 0. For example, the indicator M partitions the sample space into two blocks as follows: HHH �� T T T HHT HTH HT T ��T HH T HT T T H � M = 1 M = 0 In the same way, an event partitions the sample space into those outcomes in the event and those outcomes not in the event. Therefore, each event is naturally associated with a certain indicator random variable and vice versa: an indicator for an event E is an indicator random variable that is 1 for all outcomes in E and 0 for all outcomes not in E. Thus, M is an indicator random variable for the event that all three coins match. 1.2 Random Variables and Events There is a strong relationship between events and more general random variables as well. A random variable that takes on several values partitions the sample space into several blocks. For example, C partitions the sample space as follows: T T T T T H T HT HT T � T HH HTH HHT � HHH � �� C = 0 C = 1 C = 2 C = 3 Each block is a subset of the sample space and is therefore an event. Thus, we can regard an equation or inequality involving a random variable as an event. For example, the event that C = 2 consists of the outcomes T HH, HTH, and HHT. The event C ≤ 1 consists of the outcomes T T T, T T H, T HT, and HT T. Naturally enough, we can talk about the probability of events defined by equations and inequalities involving random variables. For example: Pr (M = 1) = Pr (T T T) + Pr (HHH) 1 1 = + 8 8 1 = 4

Random variables for all i in the codomain of Ri and r2 in the codomain of R2. In words, the probability that R, takes on a particular value is unaffected by the value of r As an example are C and M independent? Intuitively the answer should be"no The number of heads, C, completely determines whether all three coins match; that is whether M= l. But to verify this intuition we must find some 21, 2 E R such that: Pr(C=x1∩M=x2)≠Pr(C=x1)·Pr(M=x2) One appropriate choice of values is T1=2 and 2=1. In that case, we have Pr(c=2nM=1)=0 but Pr(C=2). Pr(M=1) ≠0 The notion of independence generalizes to a set of random variables as follows. Ran dom variables R1, R2, .. Rn are mutually independent if Pr(R1=x1∩R2=x2∩…∩Rn1=xn) Pr(R1=a1). Pr(R2=T2). Pr(Rn=an) for all x1,…, Inin the codomains of R1,…,Rn A consequence of this definition of mutual independence is that the probability of ssignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2,..., R1oo are mutually independent random variables with codomain n, then it follows that Pr(R1=9∩R7=84∩R23=13)=Pr(R1=9)·Pr(R7=84)Pr(R23=13) (This follows by summing over all possible values of the other random variables; we omit the details. 1.5 An Example with dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs(r1, r2)where r1, r2 E(1,2, 3, 4, 5, 6. Thus, for example, the outcome (3, 5)corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1 /6 1 6=1 36 since the dice are fair and indepen We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1( 3, 5)=3 and D2(3, 5)=5. Then the expression D1+ D2 random variable; lets call it T for"total". More precisely, weve defined T(w)=D1(w)+ D(w) for every outcome w Thus, T(3, 5)=D1( 3, 5)+ D2 (3, 5)=3+5=8. In general, any function of random variables is itself a random variable. For example,VDi+cos(D2)is a strange, but well defined random variable

� 4 Random Variables for all x1 in the codomain of R1 and x2 in the codomain of R2. In words, the probability that R1 takes on a particular value is unaffected by the value of R2. As an example, are C and M independent? Intuitively, the answer should be “no”. The number of heads, C, completely determines whether all three coins match; that is, whether M = 1. But to verify this intuition we must find some x1, x2 ∈ R such that: Pr (C = x1 ∩ M = x2) = Pr (C = x1) · Pr (M = x2) One appropriate choice of values is x1 = 2 and x2 = 1. In that case, we have: 3 1 Pr (C = 2 ∩ M = 1) = 0 but Pr (C = 2) · Pr (M = 1) = = 0 8 · 4 � The notion of independence generalizes to a set of random variables as follows. Random variables R1, R2, . . . , Rn are mutually independent if Pr (R1 = x1 ∩ R2 = x2 ∩ · · · ∩ Rn = xn) = Pr (R1 = x1) · Pr (R2 = x2)· · ·Pr (Rn = xn) for all x1, . . . , xn in the codomains of R1, . . . , Rn. A consequence of this definition of mutual independence is that the probability of an assignment to a subset of the variables is equal to the product of the probabilites of the individual assignments. Thus, for example, if R1, R2, . . . , R100 are mutually independent random variables with codomain N, then it follows that: Pr (R1 = 9 ∩ R7 = 84 ∩ R23 = 13) = Pr (R1 = 9) · Pr (R7 = 84) · Pr (R23 = 13) (This follows by summing over all possible values of the other random variables; we omit the details.) 1.5 An Example with Dice Suppose that we roll two fair, independent dice. The sample space for this experiment consists of all pairs (r1, r2) where r1, r2 ∈ {1, 2, 3, 4, 5, 6}. Thus, for example, the outcome (3, 5) corresponds to rolling a 3 on the first die and a 5 on the second. The probability of each outcome in the sample space is 1/6 1· /6 = 1/36 since the dice are fair and independent. We can regard the numbers that come up on the individual dice as random variables D1 and D2. So D1(3, 5) = 3 and D2(3, 5) = 5. Then the expression D1 + D2 is another random variable; let’s call it T for “total”. More precisely, we’ve defined: T(w) = D1(w) + D2(w) for every outcome w Thus, T(3, 5) = D1(3, 5) + D2(3, 5) = 3 + 5 = 8. In general, any function of random variables is itself a random variable. For example, √D1 + cos(D2) is a strange, but welldefined random variable

Random variables Otherwise, suppose that 1=0. Then we have Pr(s=0)=1-Pr(S=1)=5/6 and Pr(D,=t2=1/6 as before. Now the event S=0∩D consists of 5 outcomes: all of (a2, 1),(2, 2) 2, 6)except for(a2, 7-r2)There- fore, the probability of this event is 5/36. Since 5/6. 1 / 6=5/36, the independence condition is again satisfied Thus, the outcome of the first die roll is independent of the fact that the sum is 7. This a strange, isolated result; for example, the first roll is not independent of the fact that the um is 6 or 8 or any number other than 7. But this example shows that the mathematical notion of independent random variables- while closely related to the intutive notion of unrelated quantities"is not exactly the same thing 2 Probability Distributions a random variable is defined to be a function whose domain is the sample space of an experiment. Often, however, random variables with essentially the same properties show up in completely different experiments. For example, some random variable that come up in polling, in primality testing, and in coin flipping all share some common properties If we could study such random variables in the abstract, divorced from the details any particular experiment, then our conclusions would apply to all the experiments where that sort of random variable turned up. Such general conclusions could be very useful There are a couple tools that capture the essential properties of a random variable, but leave other details of the associated experiment behind The probability density function (pdf) for a random variable R with codomain V is a function PDFR: V-0, 1] defined by PDFR()=Pr(R= . a consequence of this definition is that ∑PDFn(x)=1 since the random variable always takes on exactly one value in the set v As an example, let's return to the experiment of rolling two fair, independent dice. As (2,, 12). A plot of the probability density function is shown below es in the set before, let t be the total of the two rolls. This random variable takes on val

� 6 Random Variables • Otherwise, suppose that x1 = 0. Then we have Pr (S = 0) = 1 − Pr (S = 1) = 5/6 and Pr (D1 = x2) = 1/6 as before. Now the event S = 0 ∩ D1 = x2 consists of 5 outcomes: all of (x2, 1), (x2, 2), . . . , (x2, 6) except for (x2, 7−x2). Therefore, the probability of this event is 5/36. Since 5/6 1· /6 = 5/36, the independence condition is again satisfied. Thus, the outcome of the first die roll is independent of the fact that the sum is 7. This is a strange, isolated result; for example, the first roll is not independent of the fact that the sum is 6 or 8 or any number other than 7. But this example shows that the mathematical notion of independent random variables— while closely related to the intutive notion of “unrelated quantities”— is not exactly the same thing. 2 Probability Distributions A random variable is defined to be a function whose domain is the sample space of an experiment. Often, however, random variables with essentially the same properties show up in completely different experiments. For example, some random variable that come up in polling, in primality testing, and in coin flipping all share some common properties. If we could study such random variables in the abstract, divorced from the details any particular experiment, then our conclusions would apply to all the experiments where that sort of random variable turned up. Such general conclusions could be very useful. There are a couple tools that capture the essential properties of a random variable, but leave other details of the associated experiment behind. The probability density function (pdf) for a random variable R with codomain V is a function PDFR : V → [0, 1] defined by: PDFR(x) = Pr (R = x) A consequence of this definition is that PDFR(x) = 1 x∈V since the random variable always takes on exactly one value in the set V . As an example, let’s return to the experiment of rolling two fair, independent dice. As before, let T be the total of the two rolls. This random variable takes on values in the set V = {2, 3, . . . , 12}. A plot of the probability density function is shown below:

Random variables 2.3 The Numbers game Let's play a game! I have two envelopes. Each contains an integer in the range 0, 1,...,100 and the numbers are distinct. To win the game, you must determine which envelope con tains the larger number. To give you a fighting chance, I'll let you peek at the number in one envelope selected at random. Can you devise a strategy that gives you a better than 50% chance of winning? For example, you could just pick an evelope at random and guess that it contains the larger number. But this strategy wins only 50% of the time. Your challenge is to do better So you might try to be more clever. Suppose you peek in the left envelope and see the number 12. Since 12 is a small number, you might guess thatthat other number is larger. But perhaps I'm sort of tricky and put small numbers in both envelopes. Then your guess might not be so good! An important point here is that the numbers in the envelopes may not be random I'm picking the numbers and I'm choosing them in a way that I think will defeat your guessing strategy. I'll only use randomization to choose the numbers if that serves my end: making you lose 2.3.1 Intuition Behind the Winning Strategy Amazingly, there is a strategy that wins more than 50% of the time, regardless of what numbers i put in the envel Suppose that you somehow knew a number a between my lower number and higher numbers. Now you peek in an envelope and see one or the other. If it is bigger than r, then you know you' re peeking at the higher number. If it is smaller than then you're peeking at the lower number. In other words, if you know an number z between my lower and higher numbers, then you are certain to win the game The only flaw with this brilliant strategy is that you do not know c. Oh well But what if you try to guess a? There is some probability that you guess correctly. In this case, you win 100% of the time. On the other hand if you guess incorrectly, then you're no worse off than before; your chance of winning is still 50%o. Combining these two cases, your overall chance of winning is better than 50%! Informal arguments about probability, like this one, often sound plausible, but do not hold up under close scrutiny. In contrast, this argument sounds completely implausible- but is actually correct 2.3.2 Anal f the winning Strategy For generality, suppose that I can choose numbers from the set O,1,...,n). Call the lowe number L and the higher number H

Random Variables 9 2.3 The Numbers Game Let’s play a game! I have two envelopes. Each contains an integerin the range 0, 1, . . . , 100, and the numbers are distinct. To win the game, you must determine which envelope contains the larger number. To give you a fighting chance, I’ll let you peek at the number in one envelope selected at random. Can you devise a strategy that gives you a better than 50% chance of winning? For example, you could just pick an evelope at random and guess that it contains the larger number. But this strategy wins only 50% of the time. Your challenge is to do better. So you might try to be more clever. Suppose you peek in the left envelope and see the number 12. Since 12 is a small number, you might guess that that other number is larger. But perhaps I’m sort of tricky and put small numbers in both envelopes. Then your guess might not be so good! An important point here is that the numbers in the envelopes may not be random. I’m picking the numbers and I’m choosing them in a way that I think will defeat your guessing strategy. I’ll only use randomization to choose the numbers if that serves my end: making you lose! 2.3.1 Intuition Behind the Winning Strategy Amazingly, there is a strategy that wins more than 50% of the time, regardless of what numbers I put in the envelopes! Suppose that you somehow knew a number x between my lower number and higher numbers. Now you peek in an envelope and see one or the other. If it is bigger than x, then you know you’re peeking at the higher number. If it is smaller than x, then you’re peeking at the lower number. In other words, if you know an number x between my lower and higher numbers, then you are certain to win the game. The only flaw with this brilliant strategy is that you do not know x. Oh well. But what if you try to guess x? There is some probability that you guess correctly. In this case, you win 100% of the time. On the other hand, if you guess incorrectly, then you’re no worse off than before; your chance of winning is still 50%. Combining these two cases, your overall chance of winning is better than 50%! Informal arguments about probability, like this one, often sound plausible, but do not hold up under close scrutiny. In contrast, this argument sounds completely implausible— but is actually correct! 2.3.2 Analysis of the Winning Strategy For generality, suppose that I can choose numbers from the set {0, 1, . . . , n}. Call the lower number L and the higher number H