16. 322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 Last time Given a set of events with are mutually exclusive and equally likely, P(E)= n(E) N Example: card games Number of different 5-card poker hands(522,598,960 Number of different 13-card bridge hands=/ 52 =635,013,559,600 13 Poker probabilities 1. P(one pair =0.423 5 2.P(two pair) 2(2)(1 0.0476 Independence P(AB)=P(A)P(B)when A, B independent P(AB)=P(A)P(B A)when A, B dependent Definition of independence of events: P(AA2A3.Am)=P(A)(A2).P(Am)for all m pairwise, threewise, etc. Definition of conditional probability: P(B1A)=PLAB) P(A P(B1 A)restricts our attention to the situation where a has already occurred If a, B are independent P(B/A=P(A)P(B) =P(B) Two useful results Lecture
16.322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 Lecture 2 Last time: Given a set of events with are mutually exclusive and equally likely, ( ) ( ) n E P E N = . Example: card games Number of different 5-card poker hands 52 2,598,960 5 ⎛ ⎞ = = ⎜ ⎟ ⎝ ⎠ Number of different 13-card bridge hands 52 635,013,559,600 13 ⎛ ⎞ = = ⎜ ⎟ ⎝ ⎠ Poker probabilities: 1. 4444 13 12 11 10 2111 (one pair) 0.423 52 3! 5 P ⎛⎞ ⎛⎞ ⎛⎞ ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ⎝⎠ ⎝⎠ ⎝⎠ = = ⎛ ⎞ ⎜ ⎟ ⎝ ⎠ 2. 444 13 12 11 221 (two pair) 0.0476 52 2! 5 P ⎛⎞ ⎛⎞ ⎛⎞ ⎜⎟ ⎜⎟ ⎜⎟ ⎝⎠ ⎝⎠ ⎝⎠ = = ⎛ ⎞ ⎜ ⎟ ⎝ ⎠ Independence P AB P A P B ( ) ()() = when A,B independent P AB P A P B A ( ) ()( | ) = when A,B dependent Definition of independence of events: 123 1 2 ( ... ) ( ) ( )... ( ) P AAA A P A P A P A m m = for all m (pairwise, threewise, etc.) Definition of conditional probability: ( ) (|) ( ) P AB PB A P A = PB A (|)restricts our attention to the situation where A has already occurred. If A,B are independent ()() ( |) () ( ) PAPB PB A PB P A = = Two useful results:
16.322 Stochastic Estimation and Control Professor Vander Velde 1. P(ABCD.=P(A)P(B A)P(C|AB)P(D 1 ABC) Derive this by letting A=CD. Then P(BCD)= P(CD)P(B ICD)= P(C)P(DIC)P(DICD) 2. If A,, A2r.. is a set of mutually exclusive and collectively exhaustive events, then P(E)=P(EA)+P(EA)+…+P(EA1)=P(A)P(E|A)+P(A2)P(E|A)+…+P(An)P(E|A) A A3 A P(A+B)=P(A)+P(B)-P(AB)must subtract off P(AB) because it is counted twice by the first two terms of the rhs P(A+B+C)=P(A)+P(B)+P(C)-P(AB)-P(AC)-P(BC)+ P(ABC)(for three events) If four events, P(A+B+C+ D) would be the sum of probabilities of the single events, minus the probability of the joint events taken two at a time, plus the probability of the joint events taken three at a time, minus the probability of the oint event taken four at a time Conclusion: If events are mutually exclusive, express result in terms of the sum of events; if independent, in terms of joint events. Note that this reduces to just the sum of the probabilities of the events if the events are mutually exclusive Example: Simplex system Probability of component failure=P(F) P(system failure)=P(any component fails)=P(F +F2+.+F) Assume component failures are independent. Take advantage of this by working with the complementary event Lecture
16.322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 1. P ABCD P A P B A P C AB P D ABC ( ...) ( ) ( | ) ( | ) ( | )... = Derive this by letting A=CD. Then P BCD P CD P B CD P C P D C P D CD ( ) ( )( | ) ()( | )( | ) = = 2. If A1 , A2 ,… is a set of mutually exclusive and collectively exhaustive events, then 1 2 1 12 2 ( ) ( ) ( ) ... ( ) ( ) ( | ) ( ) ( | ) ... ( ) ( | ) P E P EA P EA P EA P A P E A P A P E A P A P E A = + ++ = + ++ n nn P A B P A P B P AB ( ) () () ( ) += + − must subtract off P(AB) because it is counted twice by the first two terms of the RHS. P A B C P A P B P C P AB P AC P BC P ABC ( ) () () () ( ) ( ) ( ) ( ) ++ = + + − − − + (for three events). If four events, PA B C D ( ) +++ would be the sum of probabilities of the single events, minus the probability of the joint events taken two at a time, plus the probability of the joint events taken three at a time, minus the probability of the joint event taken four at a time. Conclusion: If events are mutually exclusive, express result in terms of the sum of events; if independent, in terms of joint events. Note that this reduces to just the sum of the probabilities of the events if the events are mutually exclusive. Example: Simplex system Probability of component failure ( ) = P Fi 1 2 (system failure) (any component fails) ( ... ) P P PF F F = = + ++ n Assume component failures are independent. Take advantage of this by working with the complementary event
16.322 Stochastic Estimation and Control Professor Vander Velde P(system failure)=l-P(system works)=1-P(WW.W)=1-I P(=1-l (1-P(FD) where P()=1-P(E) Example: 11-bit message pe- bit error probability A system may be able to tolerate up to k independent errors and still decode the message. Compute the probability of k errors. 口囗..口 P(k errors) 42(1-P)2k Conditional Independence Sometimes events depend on more than one random quantity, some of which may be common to two or more events, the others different and independent. In such cases the concept of conditional independence is useful Example: Two measurements of the sane quantity, x, each corrupted by additive independent noise m,=x+n, m, =x+n2 (m1),E2(m2) Since mi and m2 both depend on the same quantity, x, any events which depend on m1 and m2, E1(m1)and E2(m2), are clearly not independent P(EE2)+P(EP(E2)in general P(AB E)=P(AEP(BE) P(B)= P(B)if A, B are independent P(B|EA=(BE)=(EP81E)=P(4B)BE=P(B|E) P(EA) P(EP(AE) P(AJE)
16.322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 1 2 (system failure) 1 (system works) 1 ( ... ) 1 ( ) 1 (1 ( )) P P P WW W P W P F = − = − = −Π = −Π − n ic i i where ( )1 () PW P F i i = − . Example: n-bit message e p - bit error probability A system may be able to tolerate up to k independent errors and still decode the message. Compute the probability of k errors. e e ... ( errors) (1 ) k nk e e n Pk p p k − ⎛ ⎞ = − ⎜ ⎟ ⎝ ⎠ Conditional Independence Sometimes events depend on more than one random quantity, some of which may be common to two or more events, the others different and independent. In such cases the concept of conditional independence is useful. Example: Two measurements of the same quantity, x, each corrupted by additive independent noise. 1 1 2 2 11 2 2 ( ), ( ) m xn m xn Em Em = + = + Since m1 and m2 both depend on the same quantity, x, any events which depend on m1 and m2, E1(m1) and E2(m2), are clearly not independent. 12 1 2 P EE P E P E ( ) ( )( ) ≠ in general i n are independent ( | ) ( | )( | ) ( | ) ( ) if A,B are independent ( ) ()( | ) ( | )( | ) (| ) (|) ( ) ()( | ) ( | ) P AB E P A E P B E PB A PB P ABE P E P AB E P A E P B E P B EA P B E P EA P E P A E P A E = ⇒ == = =
16.322 Stochastic Estimation and Control Professor Vander Velde But if the value of x is given, the statistical properties of m, and m2 depend only on n, and n,, which are independent. Thus the events E, and E,, conditioned on a given value of x, would be independent. Intuitively we would then say P(EE,Ix)=P(E, Ix)P(E, Ix) In general, the statement P(ABJE)= P(AJEP(BJE) is the definition of the conditional independence of the events A and B conditioned on the event e If A and b are conditionally independent, given E, and B is conditioned on E further conditioning on A should not change the probability of the occurrence of B since if e itionally independ BavesTheorem An interesting application of conditional probability. Let A, be mutually exclusive and collectively exhaustive P(AE=P(AE)_ P(AE) P(A)P(EIA) Bayes’rule P(E)∑P(4E)∑P4)P(E|A) This relation results directly from the definition of conditional probability and cannot be questioned. But over the years, the application of this relation to statistical inference has been subjected to a great deal of mathematical and philosophical criticism. Of late the notion has been more generally accepted, and serves as the basis for a well-defined theory of estimation and decision-making The point of the controversy is the use of a probability function to express ones state of imperfect knowledge about something which is itself not probabilistic at Example: Acceptance testing Components are manufactured by a process which is uncertain-some come out good and some bad. a test is devised such that a good component has Lecture
16.322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 But if the value of x is given, the statistical properties of m1 and m2 depend only on 1 n and 2 n , which are independent. Thus the events E1 and E2 , conditioned on a given value of x , would be independent. Intuitively we would then say 12 1 2 P EE x P E x P E x ( |) ( |)( |) = In general, the statement P AB E P A E P B E ( | ) ( | )( | ) = is the definition of the conditional independence of the events A and B , conditioned on the event E . If A and B are conditionally independent, given E , and B is conditioned on E , further conditioning on A should not change the probability of the occurrence of B since if E is given, A and B are conditionally independent. Bayes’ Theorem An interesting application of conditional probability. Let Ai be mutually exclusive and collectively exhaustive. ( ) ( ) ( )( | ) ( |) () ( ) ( )( | ) k k kk k i ii i i P AE P AE P A P E A PA E P E P AE P A P E A == = ∑ ∑ Bayes’ rule This relation results directly from the definition of conditional probability and cannot be questioned. But over the years, the application of this relation to statistical inference has been subjected to a great deal of mathematical and philosophical criticism. Of late the notion has been more generally accepted, and serves as the basis for a well-defined theory of estimation and decision-making. The point of the controversy is the use of a probability function to express one’s state of imperfect knowledge about something which is itself not probabilistic at all. Example: Acceptance testing Components are manufactured by a process which is uncertain – some come out good and some bad. A test is devised such that a good component has
16.322 Stochastic Estimation and Control Professor Vander Velde 80%chance of passing and a bad component has 30% chance of passing. On the average, we think that the process produces 60% good components, 40% bad ones. If a component passes the test, what is the probability that it is a Note that whether a certain component is good or bad is really not probabilistic-it is either good or bad. But we dont know what it is. Our knowledge based on all prior experience, intuition, or whatever is expressed in the prior probabilities P(good)=P(G)=A=0.6 P("bad")=P(B)=A2=04 P(P|G)=0.8 P(P|B)=03 P(GP) P(GP(PIG) 06(0.8)048 P(G)P(P|G)+P(B)P(P|B)0608)+04(0.3)06008 Suppose we have n-l observations P(A4|E1…E-1) Take one more observation, e P(AE1…En) P(4E-En)=P(E1、E) P(AE1…,En ∑P(AE1-En) P(ELEn-DP(A EEnP(En) >P(EE)P(A IEEn-DP(E, IA.. P(AKIELE-P(E IA) P(AIEEM-IP(E, IA) This is of the same form as the relation first written down, with P(AkIE-En-) in place of P(Ak). This says that Bayes'rule can be applied repetitively to account for any number of observations if the observations are conditional independent-conditioned on the alternatives Lecture
16.322 Stochastic Estimation and Control Professor Vander Velde Lecture 2 80% chance of passing and a bad component has 30% chance of passing. On the average, we think that the process produces 60% good components, 40% bad ones. If a component passes the test, what is the probability that it is a good one? Note that whether a certain component is good or bad is really not probabilistic – it is either good or bad. But we don’t know what it is. Our knowledge based on all prior experience, intuition, or whatever is expressed in the prior probabilities. 1 2 ("good") ( ) 0.6 ("bad") ( ) 0.4 ( | ) 0.8 ( | ) 0.3 ( ) ( | ) 0.6(0.8) 0.48 ( | ) 0.8 ( ) ( | ) ( ) ( | ) 0.6(0.8) 0.4(0.3) 0.60 P PG A P PB A PPG PP B PGPP G PG P PGPP G PBPP B = == = == = = = = == + + Suppose we have n −1 observations 1 1 ( | ... ) PA E E k n− Take one more observation, En 1 1 1 1 1 1 1 1 11 1 1 11 11 1 1 1 1 ( ... ) ( | ... ) ( ... ) ( ... ) ( ... ) ( ... ) ( | ... ) ( | ... ) ( ... ) ( | ... ) ( | ... ) ( | ... ) ( | ) ( | ... ) ( | ) k n k n n k n i n i n k nn n n i n ni n i k n nk i n ni i P AE E PA E E PE E P AE E P AE E P E E P A E E P E AE E P E E P A E E P E AE E PA E E PE A PA E E PE A − − − − − − = = = = ∑ ∑ ∑ This is of the same form as the relation first written down, with ( ) 1 1 | ... PA E E k n− in place of ( ) P Ak . This says that Bayes’ rule can be applied repetitively to account for any number of observations if the observations are conditionally independent – conditioned on the alternatives