6.042/18.] Mathematics for Computer Science April 26, 2005 Srini devadas and Eric Lehman Lecture notes Independence 1 Independent events Suppose that we flip two fair coins simultaneously on opposite sides of a room. Intu- itively the way one coin lands does not affect the way the other coin lands. The mathe- matical concept that captures this intuition is called independence. In particular, events A and B are independent if and only if Pr(A∩B)=Pr(A)·Pr(B Generally, independence is something you assume in modeling a phenomenon-or wish you could realistically assume. Many useful probability formulas only hold if certain events are independent, so a dash of independence can greatly simplify the analysis of a 1.1 Examples Let's return to the experiment of flipping two fair coins. Let A be the event that the first coin comes up heads, and let b be the event that the second coin is heads. If we assume that a and B are independent, then the probability that both coins come up heads is e Pr(A∩B)=Pr(A)·Pr(B) 11 On the other hand, let C be the event that tomorrow is cloudy and r be the event that omorrow is rainy. Perhaps Pr(C)=1 5 and Pr(r)=1/10 around here. If these events were independent, then we could conclude that the probability of a rainy, cloudy day was quite small Pr(R∩C)=Pr(B)·Pr(C) 11 Unfortunately, these events are definitely not independent; in particular, is cloudy. Thus, the probability of a rainy, cloudy day is actually 1/10
6.042/18.062J Mathematics for Computer Science April 26, 2005 Srini Devadas and Eric Lehman Lecture Notes Independence 1 Independent Events Suppose that we flip two fair coins simultaneously on opposite sides of a room. Intuitively, the way one coin lands does not affect the way the other coin lands. The mathematical concept that captures this intuition is called independence. In particular, events A and B are independent if and only if: Pr (A ∩ B) = Pr (A) · Pr (B) Generally, independence is something you assume in modeling a phenomenon— or wish you could realistically assume. Many useful probability formulas only hold if certain events are independent, so a dash of independence can greatly simplify the analysis of a system. 1.1 Examples Let’s return to the experiment of flipping two fair coins. Let A be the event that the first coin comes up heads, and let B be the event that the second coin is heads. If we assume that A and B are independent, then the probability that both coins come up heads is: Pr (A ∩ B) = Pr (A) · Pr (B) 1 1 = 2 · 2 1 = 4 On the other hand, let C be the event that tomorrow is cloudy and R be the event that tomorrow is rainy. Perhaps Pr (C) = 1/5 and Pr (R) = 1/10 around here. If these events were independent, then we could conclude that the probability of a rainy, cloudy day was quite small: Pr (R ∩ C) = Pr (R) · Pr (C) 1 1 = 5 · 10 1 = 50 Unfortunately, these events are definitely not independent; in particular, every rainy day is cloudy. Thus, the probability of a rainy, cloudy day is actually 1/10
2 1.2 Working with Independence There is another way to think about independence that you may find more intuitive According to the definition, events a and b are independent if and only if: Pr(A∩B)=Pr(A)Pr(B) The equation on the left always holds if Pr(B)=0. Otherwise, we can divide both sides by Pr(B) and use the definition of conditional probability to obtain an alternative definition of independence Pr(a b)=Pr(a) or Pr (B)=0 This equation says that events A and B are independent if the probability of A is unaf- fected by the fact that B happens. In these terms, the two coin tosses of the previous section were independent, because the probability that one coin comes up heads is un affected by the fact that the other came up heads. turning to our other example, the probability of clouds in the sky is strongly affected by the fact that it is raining. So, as we noted before, these events are not independent. 1.3 Some intuition Suppose that A and B are disjoint events, as shown in the figure below. Are these events independent? Let's check. On one hand, we know Pr(A∩B)=0 because an b contains no outcomes On the other hand we have Pr(4)·Pr(B)>0 except in degenerate cases where A or B has zero probability. Thus, disjointness and inde- pendence are very different ideas Here's a better mental picture of what independent events look like
2 Independence 1.2 Working with Independence There is another way to think about independence that you may find more intuitive. According to the definition, events A and B are independent if and only if: Pr (A ∩ B) = Pr (A) · Pr (B). The equation on the left always holds if Pr (B) = 0. Otherwise, we can divide both sides by Pr (B) and use the definition of conditional probability to obtain an alternative definition of independence: Pr (A | B) = Pr (A) or Pr (B) = 0 This equation says that events A and B are independent if the probability of A is unaffected by the fact that B happens. In these terms, the two coin tosses of the previous section were independent, because the probability that one coin comes up heads is unaffected by the fact that the other came up heads. Turning to our other example, the probability of clouds in the sky is strongly affected by the fact that it is raining. So, as we noted before, these events are not independent. 1.3 Some Intuition Suppose that A and B are disjoint events, as shown in the figure below. A B Are these events independent? Let’s check. On one hand, we know Pr (A ∩ B) = 0 because A ∩ B contains no outcomes. On the other hand, we have Pr (A) · Pr (B) > 0 except in degenerate cases where A or B has zero probability. Thus, disjointness and independence are very different ideas. Here’s a better mental picture of what independent events look like
A B The sample space is the whole rectangle. Event A is a vertical stripe, and event B is a horizontal stripe. Assume that the probability of each event is proportional to its area in the diagram. Now if A covers an a-fraction of the sample space, and B covers a B-fraction then the area of the intersection region is a. B In terms of probability Pr(A∩B)=Pr(A)·Pr(B) 1.4 An Experiment with Two Coins Suppose that we flip two independent, fair coins. Consider the following two events a= the coins match(both are heads or both are tails) b= the first coin is heads Are these independent events? Intuitively, the answer is"no". After all, whether or not the coins match depends on how the first coin comes up; if we toss HH, they match, but if we toss TH, then they do not. However, the mathematical definition of independence does not correspond perfectly to the intuitive notion of"unrelated"or"unconnected These events actually are independent! Claim 1. Events A and B are independent Proof. We must show that Pr(An B)=Pr(A)- Pr(B) Step 1: Find the Sample Space. As shown in the tree diagram below, there are four possible outcomes: HH, HT, TH, and TT
Independence 3 ✁✁✁✁ ✁✁✁✁ ✁✁✁✁ ✁✁✁✁ ✁✁✁✁ ✁✁✁✁ ✁✁✁✁ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✂✁✂✁✂✁✂✁✂ ✄✁✄✁✄✁✄✁✄✁✄✁✄✁✄✁✄✁✄✁✄✁✄ ☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎ ☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎✁☎ ✆✁✆✁✆✁✆✁✆ ✆✁✆✁✆✁✆✁✆ ✝✁✝✁✝✁✝✁✝ ✝✁✝✁✝✁✝✁✝ A B The sample space is the whole rectangle. Event A is a vertical stripe, and event B is a horizontal stripe. Assume that the probability of each event is proportional to its area in the diagram. Now if A covers an αfraction of the sample space, and B covers a βfraction, then the area of the intersection region is α β· . In terms of probability: Pr (A ∩ B) = Pr (A) · Pr (B) 1.4 An Experiment with Two Coins Suppose that we flip two independent, fair coins. Consider the following two events: A = the coins match (both are heads or both are tails) B = the first coin is heads Are these independent events? Intuitively, the answer is “no”. After all, whether or not the coins match depends on how the first coin comes up; if we toss HH, they match, but if we toss T H, then they do not. However, the mathematical definition of independence does not correspond perfectly to the intuitive notion of “unrelated” or “unconnected”. These events actually are independent! Claim 1. Events A and B are independent. Proof. We must show that Pr(A ∩ B) = Pr(A) · Pr(B). Step 1: Find the Sample Space. As shown in the tree diagram below, there are four possible outcomes: HH, HT, T H, and T T
H12°H14 1/2°HT TH1/4 1/2 probability event A vent B event coin coin2 col A∩B? Step 2: Define Events of Interest. The outcomes in event A(coins match) and event B (first coin is heads) are checked in the tree diagram above Step 3: Compute Outcome Probabilities. Since the coins are independent and fair, all edge probabilities are 1/2. We find outcome probabilities by multiplying edge probabili ties along each root-to-leaf path. All outcomes have probability 1/4 Step 4: Compute Event Probabilities. Now we can verify that Pr(AnB)= Pr(a) Pr(B) 1 P(4)=Pr(HH)+Pr(Tn)=4+4=2 Pr(B)=Pr(HH)+Pr(HT) P(4∩B)=P(HH)=4 Therefore, a and b are independent events as claimed 1.5 A Variation of the Two-Coin Experiment Suppose that we alter the preceding experiment so that the coins are independent, but not fair. In particular, suppose each coin is heads with probability p and tails with probability 1-p where p might not be 1 /2. As before, let a be the event that the coins match, and let B the event that the first coin is heads. Are events A and b independent for all values of The problem is worked out in the tree diagram below
4 Independence T H H H T 1/2 T 1/2 1/2 1/2 1/2 1/2 TT TH HT HH probability coin 1 coin2 event A: coins match? event B: 1st coin heads? 1/4 1/4 1/4 1/4 event A B? Step 2: Define Events of Interest. The outcomes in event A (coins match) and event B (first coin is heads) are checked in the tree diagram above Step 3: Compute Outcome Probabilities. Since the coins are independent and fair, all edge probabilities are 1/2. We find outcome probabilities by multiplying edge probabilities along each roottoleaf path. All outcomes have probability 1/4. Step 4: Compute Event Probabilities. Now we can verify that Pr (A ∩ B) = Pr (A) · Pr (B): 1 1 1 Pr(A) = Pr(HH) + Pr(T T) = + = 4 4 2 1 1 1 Pr(B) = Pr(HH) + Pr(HT) = + = 4 4 2 1 Pr(A ∩ B) = Pr(HH) = 4 Therefore, A and B are independent events as claimed. 1.5 A Variation of the TwoCoin Experiment Suppose that we alter the preceding experiment so that the coins are independent, but not fair. In particular, suppose each coin is heads with probability p and tails with probability 1 − p where p might not be 1/2. As before, let A be the event that the coins match, and let B the event that the first coin is heads. Are events A and B independent for all values of p? The problem is worked out in the tree diagram below
hT P(I-P) 、pTHp(l-p) 1 coin 1 probability even event coin 2 coin Ist coin A∩B? match? heads We can read event probabilities off the tree diagram Pr(4)=Pr(HH)+Pr(TT)=p2+(1-p)2=2p2-2p+1 Pr(B)=Pr(HH)+Pr(hT)=p+p(1-p)=p Pr(A∩B)=Pr(HH)=p2 Now events A and B are independent if and only if Pr(An B)= Pr(A)- Pr(B)or, equiv- alently (2p2-2p+1)·p=p2 Since both sides are multiples of p, one solution is p=0. Dividing both sides by p and simplifying leaves a quadratic equation 2p2-3p+1=0 According to the quadratic formula, the remaining solutions are p= l and p= 1/2 This analysis shows that events A and B are independent only if the coins are either fair or completely biased toward either heads or tails. Evidently, there was some dependence lurking at the fringes of the previous problem, but it was kept at bay because the coins The Ultimate Application Surprisingly, this has an application to Ultimate Frisbee. Here is an excerpt from the Tenth Edition rules
� � Independence 5 T H H H T T TT TH HT HH probability event A: coins match? event B: 1st coin heads? event A B? p p p 1-p 1-p 1-p (1-p) 2 p(1-p) p(1-p) p 2 coin 2 coin 1 We can read event probabilities off the tree diagram: 2 Pr (A) = Pr (HH) + Pr (T T) = p 2 + (1 − p) 2 = 2p − 2p + 1 Pr (B) = Pr (HH) + Pr (HT) = p 2 + p(1 − p) = p Pr (A ∩ B) = Pr (HH) = p 2 Now events A and B are independent if and only if Pr (A ∩ B) = Pr (A)·Pr (B) or, equivalently: 2 2p 2 − 2p + 1 = · p p Since both sides are multiples of p, one solution is p = 0. Dividing both sides by p and simplifying leaves a quadratic equation: 2p 2 − 3p + 1 = 0 According to the quadratic formula, the remaining solutions are p = 1 and p = 1/2. This analysis shows that events A and B are independent only if the coins are either fair or completely biased toward either heads or tails. Evidently, there was some dependence lurking at the fringes of the previous problem, but it was kept at bay because the coins were fair! The Ultimate Application Surprisingly, this has an application to Ultimate Frisbee. Here is an excerpt from the Tenth Edition rules:
Independence A. Representatives of the two teams each flip a disc. The representative of one team calls"same"or"different"while the discs are in the air. The team winning the flip has the choice of 1. Receiving or throwing the initial throw-off; or 2. Selecting which goal they wish to defend initially B. The team losing the flip is given the remaining choice. As we computed above, the probability that two flips match is Pr(4)=p2+(1-p)2 Below weve plotted this match probability as a function of p, the probability that one disc lands face-up Prob. that same wins 1/2 Frisbees are asymmetric objects with strong aerodynamic properties, so p is not likely to be 1/2. That plot shows that if there is any bias one way or the other, then saying"same wins more than half the time. In fact, even if frisbees land face up exactly half the time same"during the g"same"still wins half the time. Therefore, might as well always say (p= 1 2), then sayin fli 2 Mutual Independence We have defined what it means for two events to be independent. But how can we talk about independence when there are more than two events? For example, how can we say that the orientations of n coins are all independent of one another?
6 Independence A. Representatives of the two teams each flip a disc. The representative of one team calls ”same” or ”different” while the discs are in the air. The team winning the flip has the choice of: 1. Receiving or throwing the initial throwoff; or 2. Selecting which goal they wish to defend initially. B. The team losing the flip is given the remaining choice. As we computed above, the probability that two flips match is: Pr (A) = p 2 + (1 − p) 2 Below we’ve plotted this match probability as a function of p, the probability that one disc lands faceup. p 0 1 1/2 1 Prob. that "same" wins Frisbees are asymmetric objects with strong aerodynamic properties, so p is not likely to be 1/2. That plot shows that if there is any bias one way or the other, then saying “same” wins more than half the time. In fact, even if frisbees land face up exactly half the time (p = 1/2), then saying “same” still wins half the time. Therefore, might as well always say “same” during the opening flip! 2 Mutual Independence We have defined what it means for two events to be independent. But how can we talk about independence when there are more than two events? For example, how can we say that the orientations of n coins are all independent of one another?
Events E1,..., En are mutually independent if and only if for every subset of the events, the probability of the intersection is the product of the probabilities. In other words, all of the following equations must hold Pr(E2∩E)=Pr(E)Pr(E) for all distinct i, j Pr(E2∩E1∩Ek)=Pr(E)Pr(E)·Pr(Ek) for all distinct i, j, k Pr(E2∩E∩Ek∩E)=Pr(E)Pr(E)Pr(E)·Pr(E) for all distinct,j,k, Pr(E1∩…∩En)=Pr(E1)…Pr(En) As an example, if we toss 100 fair coins and let Ei be the event that the ith coin lands heads, then we might reasonably assume than E1,..., E1oo are mutually independent 1 DNA Testing This is testimony from the O J. Simpson murder trial on May 15, 1995 MR CLARKE: When you make these estimations of frequency- and I believe you touched a little bit on a concept called independence? DR COTTON: Yes, i did MR CLARKE: And what is that again? DR COTTON: It means whether or not you inherit one allele that you have is not- does not affect the second allele that you might get. That is, if you inherit a band at 5,000 base pairs, that doesnt mean you'll automatically or with some probability inherit one at 6,000. What you inherit from one parent is what you inherit from the other.( Got that?-eAld MR CLARKE: Why is that important? DR COTTON: Mathematically that's important because if that were not the case, it would be improper to multiply the frequencies between the different geneti MR CLARKE: How do you- well, first of all, are these markers independent that you ve described in your testing in this case? The jury was told that genetic markers in blood found at the crime scene matched Simpsons. Furthermore, the probability that the markers would be found in a randoml selected person was at most 1 in 170 million. This astronomical figure was derived from statistics such as:
Independence 7 Events E1, . . . , En are mutually independent if and only if for every subset of the events, the probability of the intersection is the product of the probabilities. In other words, all of the following equations must hold: Pr (Ei ∩ Ej ) = Pr (Ei) · Pr (Ej ) for all distinct i, j Pr (Ei ∩ Ej ∩ Ek) = Pr (Ei) · Pr (Ej ) · Pr (Ek) for all distinct i, j, k Pr (Ei ∩ Ej ∩ Ek ∩ El) = Pr (Ei) · Pr (Ej ) · Pr (Ek) · Pr (El) for all distinct i, j, k, l . . . Pr (E1 ∩ · · · ∩ En) = Pr (E1)· · ·Pr (En) As an example, if we toss 100 fair coins and let Ei be the event that the ith coin lands heads, then we might reasonably assume than E1, . . . , E100 are mutually independent. 2.1 DNA Testing This is testimony from the O. J. Simpson murder trial on May 15, 1995: MR. CLARKE: When you make these estimations of frequency— and I believe you touched a little bit on a concept called independence? DR. COTTON: Yes, I did. MR. CLARKE: And what is that again? DR. COTTON: It means whether or not you inherit one allele that you have is not— does not affect the second allele that you might get. That is, if you inherit a band at 5,000 base pairs, that doesn’t mean you’ll automatically or with some probability inherit one at 6,000. What you inherit from one parent is what you inherit from the other. (Got that? – EAL) MR. CLARKE: Why is that important? DR. COTTON: Mathematically that’s important because if that were not the case, it would be improper to multiply the frequencies between the different genetic locations. MR. CLARKE: How do you— well, first of all, are these markers independent that you’ve described in your testing in this case? The jury was told that genetic markers in blood found at the crime scene matched Simpson’s. Furthermore, the probability that the markers would be found in a randomlyselected person was at most 1 in 170 million. This astronomical figure was derived from statistics such as:
Independe 1 person in 100 has marker A ·1 person In50 marker B 1 person in 40 has marker C 1 person in 5 has marker D I person in 170 has marker E Then these numbers were multiplied to give the probability that a randomly-selected person would have all five markers Pr(A∩BnC∩D∩E)=Pr(4)Pr(B)·Pr(C).Pr(D).Pr(E) 10050405170 70.000.000 The defense pointed out that this assumes that the markers appear mutually indepen dently. Furthermore, all the statistics were based on just a few hundred blood samples The jury was widely mocked for failing to"understand"the dna evidence. If you were a juror, would you accept the l in 170 million calculation? 2.2 Pairwise Independence The definition of mutual independence seems awfully complicated- there are so many conditions! Here's an example that illustrates the subtlety of independence when more than two events are involved and the need for all those conditions. Suppose that we flip three fair, mutually-independent coins. Define the following events A1 is the event that coin 1 matches coin 2. A is the event that coin 2 matches coin 3. he event that coin 3 matches coin 1 Are Al, A2, As mutually independent? The sample space for this experiment is LHHH, HHT, HTH, HTT, THH, THT, TTH, TTTI Every outcome has probability(1/2)=1/8 by our assumption that the coins are mutu- ally independent
8 Independence • 1 person in 100 has marker A. • 1 person in 50 marker B. • 1 person in 40 has marker C. • 1 person in 5 has marker D. • 1 person in 170 has marker E. Then these numbers were multiplied to give the probability that a randomlyselected person would have all five markers: Pr (A ∩ B ∩ C ∩ D ∩ E) = Pr (A) · Pr (B) · Pr (C) · Pr (D) · Pr (E) 1 1 1 1 1 = 100 · 50 · 40 · 5 · 170 1 = 170, 000, 000 The defense pointed out that this assumes that the markers appear mutually independently. Furthermore, all the statistics were based on just a few hundred blood samples. The jury was widely mocked for failing to “understand” the DNA evidence. If you were a juror, would you accept the 1 in 170 million calculation? 2.2 Pairwise Independence The definition of mutual independence seems awfully complicated— there are so many conditions! Here’s an example that illustrates the subtlety of independence when more than two events are involved and the need for all those conditions. Suppose that we flip three fair, mutuallyindependent coins. Define the following events: • A1 is the event that coin 1 matches coin 2. • A2 is the event that coin 2 matches coin 3. • A3 is the event that coin 3 matches coin 1. Are A1, A2, A3 mutually independent? The sample space for this experiment is: {HHH, HHT, HT H, HT T, T HH, T HT, T T H, T T T} Every outcome has probability (1/2)3 = 1/8 by our assumption that the coins are mutually independent
To see if events Al, A2, and A] are mutually independent, we must check a sequence of equalities. It will be helpful first to compute the probability of each event A; Pr(A1)=Pr(HHH)+Pr(HHT)+Pr(TTH)+Pr(TTT) 1111 By symmetry, Pr(A2)=Pr(A3)=1/2 as well. Now we can begin checking all the equali ties required for mutual independence (A1n A2)=Pr(HHH)+Pr(TTT) 11 22 Pr(A1) Pr(A2) By symmetry, Pr(A1n A3)= Pr(A1) Pr(As)and Pr(A2 n A3)= Pr(A2). Pr(A3)must hold also. Finally, we must check one last condition Pr(A1∩A2∩A3)=Pr(HHH)+Pr(TTT) 11 8+8 ≠Pr(4)Pr(4)P(43)= The three events A1, A2, and A3 are not mutually independent, even though all pairs of events are independent! A set of events in pairwise independent if every pair is independent. Pairwise inde- pendence is a much weaker property than mutual independence. For example, suppose that the prosecutors in the O J. Simpson trial were wrong and markers A, B, C, D, and E appear only pairwise independently. Then the probability that a randomly-selected person has all five markers is no more than: Pr( A0B0CnD∩E)≤Pr(A∩E) =Pr(A)·Pr(E)
Independence 9 To see if events A1, A2, and A3 are mutually independent, we must check a sequence of equalities. It will be helpful first to compute the probability of each event Ai: Pr (A1) = Pr (HHH) + Pr (HHT) + Pr (T T H) + Pr (T T T) 1 1 1 1 = + + + 8 8 8 8 1 = 2 By symmetry, Pr (A2) = Pr (A3) = 1/2 as well. Now we can begin checking all the equalities required for mutual independence. Pr (A1 ∩ A2) = Pr (HHH) + Pr (T T T) 1 1 = + 8 8 1 = 4 1 1 = 2 · 2 = Pr (A1) Pr (A2) By symmetry, Pr (A1 ∩ A3) = Pr (A1) · Pr (A3) and Pr (A2 ∩ A3) = Pr (A2) · Pr (A3) must hold also. Finally, we must check one last condition: Pr (A1 ∩ A2 ∩ A3) = Pr (HHH) + Pr (T T T) 1 1 = + 8 8 1 = 4 1 = Pr ( � A1) Pr (A2) Pr (A3) = 8 The three events A1, A2, and A3 are not mutually independent, even though all pairs of events are independent! A set of events in pairwise independent if every pair is independent. Pairwise independence is a much weaker property than mutual independence. For example, suppose that the prosecutors in the O. J. Simpson trial were wrong and markers A, B, C, D, and E appear only pairwise independently. Then the probability that a randomlyselected person has all five markers is no more than: Pr (A ∩ B ∩ C ∩ D ∩ E) ≤ Pr (A ∩ E) = Pr (A) · Pr (E) 1 1 = 100 · 170 1 = 17, 000
Independence The first line uses the fact that An BncnDnE is a subset of An E.(We picked out the A and E markers because theyre the rarest. We use pairwise independence on the second line. Now the probability of a random match is 1 in 17,000--a far cry from 1 in 170 million! And this is the strongest conclusion we can reach assuming only pairwise Independence 3 The birthday Paradox Suppose that there are 100 students in a lecture hall. There are 365 possible birthdays, ignoring February 29. What is the probability that two students have the same birthday? 50%90% 99%? Let' s make some modeling assumptions For each student, all possible birthdays are equally likely. The idea underlying this assumption is that each student's birthday is determined by a random process in- olving parents, fate, and, um, some issues that we discussed earlier in the context of graph theory. Our assumption is not completely accurate, however; a dispropor tionate number of babies are born in August and September, for example. ( Counting back nine months explains the reason why Birthdays are mutually independent. This isn't perfectly accurate either. For exam- ple, if there are twins in the lecture hall, then their birthdays are surely not indepen dent We'll stick with these assumptions, despite their limitations. Part of the reason is to sim- plify the analysis. But the bigger reason is that our conclusions will apply to many situa tions in computer science where twins, leap days, and romantic holidays are not consid erations. Also in pursuit of generality, lets switch from specific numbers to variables. Let m be the number of people in the room, and let N be the number of days in a year 3.1 The Four-Step Method We can solve this problem using the standard four-step method. However, a tree diagram will be of little value because the sample space is so enormous. This time we'll have to proceed without the visual aid! Step 1: Find the Sample Space Let's number the people in the room from 1 to m. An outcome of the experiment is a sequence(b1,., bm)where b: is the birthday of the ith person. The sample space is the set of all such sequences S={(b1,…,bn)b∈{1,…,N}
10 Independence The first line uses the fact that A ∩ B ∩ C ∩ D ∩ E is a subset of A ∩ E. (We picked out the A and E markers because they’re the rarest.) We use pairwise independence on the second line. Now the probability of a random match is 1 in 17,000— a far cry from 1 in 170 million! And this is the strongest conclusion we can reach assuming only pairwise independence. 3 The Birthday Paradox Suppose that there are 100 students in a lecture hall. There are 365 possible birthdays, ignoring February 29. What is the probability that two students have the same birthday? 50%? 90%? 99%? Let’s make some modeling assumptions: • For each student, all possible birthdays are equally likely. The idea underlying this assumption is that each student’s birthday is determined by a random process involving parents, fate, and, um, some issues that we discussed earlier in the context of graph theory. Our assumption is not completely accurate, however; a disproportionate number of babies are born in August and September, for example. (Counting back nine months explains the reason why!) • Birthdays are mutually independent. This isn’t perfectly accurate either. For example, if there are twins in the lecture hall, then their birthdays are surely not independent. We’ll stick with these assumptions, despite their limitations. Part of the reason is to simplify the analysis. But the bigger reason is that our conclusions will apply to many situations in computer science where twins, leap days, and romantic holidays are not considerations. Also in pursuit of generality, let’s switch from specific numbers to variables. Let m be the number of people in the room, and let N be the number of days in a year. 3.1 The FourStep Method We can solve this problem using the standard fourstep method. However, a tree diagram will be of little value because the sample space is so enormous. This time we’ll have to proceed without the visual aid! Step 1: Find the Sample Space Let’s number the people in the room from 1 to m. An outcome of the experiment is a sequence (b1, . . . , bm) where bi is the birthday of the ith person. The sample space is the set of all such sequences: S = {(b1, . . . , bm) | bi ∈ {1, . . . , N}}