《Mathematics for Computer》Lecture 22 Expected Value II.pdf_大学文库

Expected Value I Otherwise, there are several numbers tied for largest. My number is as likely to be one of these as any of your numbers, so with probability greater than 1/(k+ 1)you must pick again In both cases, with probability at least 1/(k+1), you need more than k picks to beat me other words Pr(# times you pick>k)≥ k+ This suggests that in order to minimize your rolls, you should chose a distribution such that ties are very rare. For example, you might choose the uniform distribution on (1, 2,..., 10003. In this case, the probability that you need more than k picks to beat me is very close to 1/(k+) for moderate values of k. For example, the probability that you need more than 99 picks is almost exactly 1%. This sounds very promising for you; intuitively, you might expect to win within a reasonable number of picks on average! Unfortunately for intuition, there is a simple proof that the expected number of picks that you need in order to beat me is infinite, regardless of the distribution! Let's plug(2) into(1) Ex(# times you pick) ∑; k+1 This phenomenon can cause all sorts of confusion! For example, suppose you have a communication network where each packet of data has a 1/k chance of being delayed by k or more steps. This sounds good; there is only a 1% chance of being delayed by 100 or more steps. But the expected delay for the packet is actually infinite e, There is a larger point here as well: not every random variable has a well-defined expectation. This idea may be disturbing at first, but remember that an expected value is just a weighted average. And there are many sets of numbers that have no conventional average either, such as {1,-2,3,-4,5,-6,…} Strictly speaking, we should qualify virtually all theorems involving expectation with phrases such as".Provided all expectations exist. But going to leave that sumption implicit. Fortunately, random variables without expectations don't arise too often in practice 2 The Coupon Collector Problem Every time I purchase a kid's meal at Taco Bell, I am graciously presented with a miniature Racin Rocket"car together with a launching device which enables me to project my new

2 Expected Value II • Otherwise, there are several numbers tied for largest. My number is as likely to be one of these as any of your numbers, so with probability greater than 1/(k + 1) you must pick again. In both cases, with probability at least 1/(k + 1), you need more than k picks to beat me. In other words: 1 Pr (# times you pick > k) ≥ (2) k + 1 This suggests that in order to minimize your rolls, you should chose a distribution such that ties are very rare. For example, you might choose the uniform distribution on {1, 2, . . . , 10100}. In this case, the probability that you need more than k picks to beat me is very close to 1/(k+1) for moderate values of k. For example, the probability that you need more than 99 picks is almost exactly 1%. This sounds very promising for you; intuitively, you might expect to win within a reasonable number of picks on average! Unfortunately for intuition, there is a simple proof that the expected number of picks that you need in order to beat me is infinite, regardless of the distribution! Let’s plug (2) into (1): � 1 ∞ Ex (# times you pick) = k + 1 k=0 = ∞ This phenomenon can cause all sorts of confusion! For example, suppose you have a communication network where each packet of data has a 1/k chance of being delayed by k or more steps. This sounds good; there is only a 1% chance of being delayed by 100 or more steps. But the expected delay for the packet is actually infinite! There is a larger point here as well: not every random variable has a welldefined expectation. This idea may be disturbing at first, but remember that an expected value is just a weighted average. And there are many sets of numbers that have no conventional average either, such as: {1, −2, 3, −4, 5, −6, . . .} Strictly speaking, we should qualify virtually all theorems involving expectation with phrases such as “...provided all expectations exist.” But we’re going to leave that assumption implicit. Fortunately, random variables without expectations don’t arise too often in practice. 2 The Coupon Collector Problem Every time I purchase a kid’s meal at Taco Bell, I am graciously presented with a miniature “Racin’ Rocket” car together with a launching device which enables me to project my new

Expected value Il vehicle across any tabletop or smooth floor at high velocity. Truly, my delight knows no bounds There are n different types of Racin Rocket car(blue, green, red, gray, etc. ) The type of car awarded to me each day by the kind woman at the Taco Bell register appears to be elected uniformly and independently at random. What is the expected number of kids meals that I must purchase in order to acquire at least one of each type of Racin Rocket car The same mathematical question shows up in many for example, what is the expected number of people you must poll in order to find at least one person with each possible birthday? Here, instead of collecting RacinRocket cars, you re collecting birth days. The general question is commonly called the coupon collector problem after yet another interpretation 2.1 A Solution Using Linearity of Expectation Linearity of expectation is somewhat like induction and the pigeonhole principle; it's a simple idea that can be used in all sorts of ingenious ways. For example, we can use linearity of expectation in a clever way to solve the coupon collector problem. Suppose there are five different types of Racin'Rocket, and I receive this sequence blue greengreen red blue orange blue orange gray Let's partition the sequence into 5 segments blue green green red blue orange blue orange gray The rule is that a segment ends whenever I get a new kind of car. For example, the middle segment ends when I get a red car for the first time. In this way, we can break the problem of collecting every type of car into stages. Then we can analyze each stage individually nd assemble the results using linearity of expectation. ya Let's return to the general case where I'm collecting n Racin Rockets. Let X, be the length of the k-th segment. The total number of kid's meals I must purchase to get all n Racin Rockets is the sum of the lengths of all these segments T=X0+X1+ Now let's focus our attention of the Xk, the length of the k-th segment. At the begin ning of segment k, I have k different types of car, and the segment ends when I acquire a new type. When I own k types, each kid's meal contains a type that I already have with probability k/n. Therefore, each meal contains a new type of car with probability k/n=(n-k)/n. Thus, the expected number of meals until i get a new kind of car

Expected Value II 3 vehicle across any tabletop or smooth floor at high velocity. Truly, my delight knows no bounds. There are n different types of Racin’ Rocket car (blue, green, red, gray, etc.). The type of car awarded to me each day by the kind woman at the Taco Bell register appears to be selected uniformly and independently at random. What is the expected number of kids meals that I must purchase in order to acquire at least one of each type of Racin’ Rocket car? The same mathematical question shows up in many guises: for example, what is the expected number of people you must poll in order to find at least one person with each possible birthday? Here, instead of collecting Racin’ Rocket cars, you’re collecting birthdays. The general question is commonly called the coupon collector problem after yet another interpretation. 2.1 A Solution Using Linearity of Expectation Linearity of expectation is somewhat like induction and the pigeonhole principle; it’s a simple idea that can be used in all sorts of ingenious ways. For example, we can use linearity of expecatation in a clever way to solve the coupon collector problem. Suppose there are five different types of Racin’ Rocket, and I receive this sequence: blue green green red blue orange blue orange gray Let’s partition the sequence into 5 segments: blue green green red blue orange blue orange gray �� X0 X1 X2 X3 X4 The rule is that a segment ends whenever I get a new kind of car. For example, the middle segment ends when I get a red car for the first time. In this way, we can break the problem of collecting every type of car into stages. Then we can analyze each stage individually and assemble the results using linearity of expectation. Let’s return to the general case where I’m collecting n Racin’ Rockets. Let Xk be the length of the kth segment. The total number of kid’s meals I must purchase to get all n Racin’ Rockets is the sum of the lengths of all these segments: T = X0 + X1 + . . . + Xn−1 Now let’s focus our attention of the Xk, the length of the kth segment. At the beginning of segment k, I have k different types of car, and the segment ends when I acquire a new type. When I own k types, each kid’s meal contains a type that I already have with probability k/n. Therefore, each meal contains a new type of car with probability 1 − k/n = (n − k)/n. Thus, the expected number of meals until I get a new kind of car

8 Expected Value II By exactly the same reasoning used by the authors, we could conclude that RISC programs are 10% longer on average than CISC programs! What’s going on? 3.3.2 A Probabilistic Interpretation To shed some light on this paradox, we can model the RISC vs. CISC debate with the machinery of probability theory. Let the sample space be the set of benchmark programs. Let the random variable R be the length of the RISC program, and let the random variable C be the length of the CISC program. We would like to compare the average length of a RISC program, Ex (R), to the average length of a CISC program, Ex (C). To compare average program lengths, we must assign a probability to each sample point; in effect, this assigns a “weight” to each benchmark. One might like to weigh benchmarks based on how frequently similar programs arise in practice. But let’s follow the original authors’ lead. They assign each ratio equal weight in their average, so they’re implicitly assuming that similar programs arise with equal probability. Let’s do that same and make the sample space uniform. We can now compute Ex (R) and Ex (C) as follows: 150 120 150 2800 Ex (R) = + + + 4 4 4 4 = 805 120 180 300 1400 Ex (C) = + + + 4 4 4 4 = 500 So the average length of a RISC program is actually Ex (R) /Ex (C) = 1.61 times greater than the average length of a CISC program. RISC is even worse than either of the two previous answers would suggest! In terms of our probability model, the authors computed C/R for each sample point and then averaged to obtain Ex (C/R) = 1.2. This much is correct. However, they interpret this to mean that CISC programs are longer than RISC programs on average. Thus, the key conclusion of this milestone paper rests on Corollary 4, which we know to be false! 3.3.3 A Simpler Example The root of the problem is more clear in the following, simpler example. Suppose the data were as follows. Benchmark Processor A Processor B B/A A/B Problem 1 2 1 1/2 2 Problem 2 1 2 2 1/2 Average 1.25 1.25

Expected value Il Now the statistics for processors A and b are exactly symmetric. Yet, from the third col- umn we would conclude that Processor B programs are 25% longer on average, and from the fourth column we would conclude that Processor a programs are 25% longer on av- erage. Both conclusions are obviously wrong. The moral is that averages of ratios can be very misleading. More generally, if you're computing the expectation of a quotient, think twice; you're going to get a value ripe for misuse and misinterpretation. 4 The Total Expectation Theorem Earlier we talked about conditional probability. For example, you might want to know the probability that someone was dealt at least two aces, given that they were dealt the ace of spades. Similarly, one can talk about conditional expectation. For example, you could determine the expected number that comes up on a fair die given that the roll is even There are several ways to compute a conditional expectation, just as there are several ways to compute an ordinary expectation. In fact, the conditional expectation formulas are the same as the ordinary expectation formulas, except that all the probabilities become conditional probabilities. If R is a random variable and E is an event, then the expected value of r given that event E occurs is denoted Ex(r e)and defined by Ex(r E)=> R(w)Pr(w| E) u∈S ∑x:P(R=x|E) For example, let R be the number that comes up on a roll of a fair die, and let E be the event that the number is even. Let's compute Ex(r e), the expected value of a die roll, given that the result is even Ex(|E)=∑R()P1(|E) u∈{1,…6} 4 Conditional expectation is really useful for breaking down the calculation of an ex- pectation into cases. The breakdown is justified by an analogue to the Total Probability Theorem Theorem 5 (Total Expectation). Let E1,..., En be events that partition the sample space and have nonzero probabilities. If R is a random variable, then Ex(B)=Ex(B|E1)·Pr(E1)+…+Ex(f|En)·Pr(E)

� � � Expected Value II 9 Now the statistics for processors A and B are exactly symmetric. Yet, from the third column we would conclude that Processor B programs are 25% longer on average, and from the fourth column we would conclude that Processor A programs are 25% longer on average. Both conclusions are obviously wrong. The moral is that averages of ratios can be very misleading. More generally, if you’re computing the expectation of a quotient, think twice; you’re going to get a value ripe for misuse and misinterpretation. 4 The Total Expectation Theorem Earlier we talked about conditional probability. For example, you might want to know the probability that someone was dealt at least two aces, given that they were dealt the ace of spades. Similarly, one can talk about conditional expectation. For example, you could determine the expected number that comes up on a fair die given that the roll is even. There are several ways to compute a conditional expectation, just as there are several ways to compute an ordinary expectation. In fact, the conditional expectation formulas are the same as the ordinary expectation formulas, except that all the probabilities become conditional probabilities. If R is a random variable and E is an event, then the expected value of R given that event E occurs is denoted Ex (R | E) and defined by: Ex (R | E) = R(w) Pr (w | E) w∈S = x · Pr (R = x | E) x∈ range(R) For example, let R be the number that comes up on a roll of a fair die, and let E be the event that the number is even. Let’s compute Ex (R | E), the expected value of a die roll, given that the result is even. Ex (R | E) = R(w) · Pr (w | E) w∈{1,...,6} 1 1 1 = 1 · 0 + 2 · + 3 · 0 + 4 · + 5 · 0 + 6 · 3 3 3 = 4 Conditional expectation is really useful for breaking down the calculation of an expectation into cases. The breakdown is justified by an analogue to the Total Probability Theorem: Theorem 5 (Total Expectation). Let E1, . . . , En be events that partition the sample space and have nonzero probabilities. If R is a random variable, then: Ex (R) = Ex (R | E1) · Pr (E1) + · · · + Ex (R E| n) · Pr (En)