J Appl.Pob.40,513-518(2003) Printed in israel O Applied Probability Trust 2003 THE COUPON-COLLECTOR'S PROBLEM REVISITED SHMUEL OREN * AND SHELDON M. ROSS, "s University of california, Berkeley Abstract Consider the classical coupon-collector's problem in which items of m distinct types arrive in sequence. An arriving item is installed in system i2 I if i is the smallest index uch that system i does not contain an item of the arrival,'s type. We study the expected number of items in system at the moment when system l first contains an item of each Keywords: Coupon-collector's problem; hyperharmonic numbers; Poissonization AMS 2000 Subject Classification: Primary 60C05 1. Introduction Consider the classical coupon-collector's problem with m distinct types of items. The items arrive in sequence, with the types of the successive items being independent random variables that are each equal to k with probability Pk, kai pk= 1. An arriving item is installed in systemis 1 if i is the smallest index such that system i does not contain an item of the arrival,'s type. Let U, j> 2, denote the number of unfilled types in systems when system 1 first contains an item of each type. Foata et al. [2]and Foata and Zeilberger [1], using nonelementary mathematics, obtained recursive formulae and generating functions for E[Um for the equally likely case, where Pk 1/m. In Section 2 we derive, using basic probability, the recursion and a closed-form expression for E[U for the equally likely case. The general case is considered in Section 3 where an exact expression and bounds for E[U are determined. Comments concerning computation, as well as a simulation approach, are also presented in Section 3. 2. The equally likely case Assume, in this section, that all Pk =1/m. Furthermore, assume that the problem ends when system I has one item of each type, and let a denote the event that at least j type-k coupons have arrived. With 1(A)denoting the indicator variable for the event A 1-1(A) eceived 20 January 2003 ial Engineering and Operations Research, University of Califomia, Berkeley Email address: smross@ieor. berkeley. edu Research supported by the National Science Foundation Grant ECS-0224779 with the University of California. 513
J. Appl. Prob. 40, 513-518 (2003) Printed in Israel ? Applied Probability Trust 2003 THE COUPON-COLLECTOR'S PROBLEM REVISITED ILAN ADLER,* SHMUEL OREN * AND SHELDON M. ROSS,* ** University of California, Berkeley Abstract Consider the classical coupon-collector's problem in which items of m distinct types arrive in sequence. An arriving item is installed in system i > 1 if i is the smallest index such that system i does not contain an item of the arrival's type. We study the expected number of items in system j at the moment when system 1 first contains an item of each type. Keywords: Coupon-collector's problem; hyperharmonic numbers; Poissonization AMS 2000 Subject Classification: Primary 60C05 1. Introduction Consider the classical coupon-collector's problem with m distinct types of items. The items arrive in sequence, with the types of the successive items being independent random variables that are each equal to k with probability pk, Ek=l Pk = 1. An arriving item is installed in system i > 1 if i is the smallest index such that system i does not contain an item of the arrival's type. Let UJ, j > 2, denote the number of unfilled types in system j when system 1 first contains an item of each type. Foata et al. [2] and Foata and Zeilberger [ 1], using nonelementary mathematics, obtained recursive formulae and generating functions for E[UT] for the equally likely case, where pk = 1/m. In Section 2 we derive, using basic probability, the recursion and a closed-form expression for E[Uj] for the equally likely case. The general case is considered in Section 3 where an exact expression and bounds for E[Uj] are determined. Comments concerning computation, as well as a simulation approach, are also presented in Section 3. 2. The equally likely case Assume, in this section, that all Pk = 1/m. Furthermore, assume that the problem ends when system 1 has one item of each type, and let Ak denote the event that at least j type-k coupons have arrived. With 1(A) denoting the indicator variable for the event A, m U? = E[1 - 1(Ak)]. k=l Received 10 December 2002; revision received 20 January 2003. * Postal address: Department of Industrial Engineering and Operations Research, University of California, Berkeley, CA 94720, USA. ** Email address: smross@ieor.berkeley.edu Research supported by the National Science Foundation Grant ECS-0224779 with the University of California. 513
. ADLER ETAL E[Um]=>[1-P(A)] [1-P(Am)] (1) Let bj.i denote the event that at least j type-m coupons arrive before the first coupon of type i arrives. Then P(A) Bii and the inclusion-exclusion probability equality give(forj2 2) ∑P(B"n1…Bm Using(1), this gives the following result Next, using basic probability arguments, we obtain a recursive expression for E[U l that was first presented in [l] and [2]. Let C be the event that at least j type-k coupons have already arrived at the moment when each of the item types l,..., k-l has arrived. Also, let X be the number of types 1 ,., k- l that have not yet arrived when the first coupon of type k arrives With PK= P(Ck), we obtain that =∑P P/+ ∑P where pk =(k-1)/k for k=1,2 Substituting A=C for j2 2 into(D)gives E[UM]=m[1-Pil. j>2 Thus, using(2)and (3), we obtain that E[U2]=m
I. ADLER ETAL. Thus, m E[Um] = E[1- P(Ak)] k=l =m[l - P(Am)]. (1) Let Be. denote the event that at least j type-m coupons arrive before the first coupon of J,I type i arrives. Then m-1 P(A) = P( Bji) and the inclusion-exclusion probability equality give (for j > 2) m-1 :(,~)_C-i)k+l P(BTmi P(A7) = (-1)k+ L ... B ) E P(Bj,, * BJ,ik k=1 il 2, m (m) i-j Next, using basic probability arguments, we obtain a recursive expression for E[Uj ] that was first presented in [1] and [2]. Let Ck be the event that at least j type-k coupons have already arrived at the moment when each of the item types 1,..., k - 1 has arrived. Also, let Xk be the number of types 1, ..., k - 1 that have not yet arrived when the first coupon of type k arrives. With Pk = P(Ck), we obtain that k-I k = P (Ck Xk =r) r)(X r r=O k-I -= ? ' pr+l k / -1 r=O k -k E -i (2) r=l where Pk = (k - 1)/k fork = 1, 2,.... Substituting Am = Cm for j > 2 into (1) gives E[Um] = ml - Pm], j 2. (3) Thus, using (2) and (3), we obtain that m r m E[U=] =- m - r - r=l k r=1 k=l 514
515 d,forj≥3, EU/]=m-∑P EIUR-JJ U内+-1 We have thus proven the followin LUNi Remark 1. Equating the two expressions for elU I given by Propositions I and 2 yields explicit expression for the hyperharmonic number, which is defined in [2] by the recursive formula given in Proposition 2 3. The general case: Poissonization In the general case, we suppose that each item is of type k with probability Pk, 2kI Pk=1 To analyze this case, let us start by assuming that, rather than stopping when system l is filled, items continue coming forever. Suppose also that successive items arrive at times distributed according to a Poisson process with rate 1. Under this scenario, the arrival processes of the distinct types are independent Poisson processes, with respective rates Pk, k= 1 Because 1-P(a' denotes the probability that there have been less than j type-k arrivals when system I becomes full, we obtain upon conditioning on the arrival time of the jth item of type k 1-P(A4)= e e Pi)dx (-1)! The expected number of unfilled slots in system is now obtained from ∑Ⅱ-P(A),j≥2 5) k=1 The following lemma will be used to obtain bounds on E[UnI Lemma 1. For positive values xi, Ti=(l -e-xi) is a Schur concave function of y
The coupon-collector's problem revisited and, for j > 3, m E[U7] =m-E Pi- k=l -i = m-E(1 -E[UJ-k I _ k=l E[U_1] m E[O We have thus proven the following. Proposition 2. We have m E[U2 ] = k k=l and, for j > 3, m E[Uk_ ] E[Uj?] =j E k=l Remark 1. Equating the two expressions for E[UJ'] given by Propositions 1 and 2 yields an explicit expression for the hyperharmonic number, which is defined in [2] by the recursive formula given in Proposition 2. 3. The general case: Poissonization In the general case, we suppose that each item is of type k with probability Pk, Lkml Pk = 1. To analyze this case, let us start by assuming that, rather than stopping when system 1 is filled, items continue coming forever. Suppose also that successive items arrive at times distributed according to a Poisson process with rate 1. Under this scenario, the arrival processes of the distinct types are independent Poisson processes, with respective rates Pk, k = 1, ..., m. Because 1 - P(Ak) denotes the probability that there have been less than j type-k arrivals when system 1 becomes full, we obtain upon conditioning on the arrival time of the jth item of type k that 1 P(Ak 0I pkPkX (pkx)'j- 1- P(Aj) pke-PX ( ) n - -e-Pix)dx, j > 2. (4) The expected number of unfilled slots in system j is now obtained from m E[UJ7] = Z[1 - P(A5)], j > 2. (5) k=1 The following lemma will be used to obtain bounds on E[UJ]. Lemma 1. For positive values xi, f1=l (1 - e-xi) is a Schur concave function of y = (Y1, , Yr), where yi = ln(xi). 515
516 L. ADLER ET AL. Proof. With y =In(x) Because In(x)in increasing in x, by the Ostrowski condition for Schur concavity(see [31)it suffices to show that rie ( x2)>x2e-(1-e)ifx1<x2 But this inequality follows because xe /(1-e)is a decreasing function of x Lower and upper bounds on E[U], fairly tight for va (1/m, 1/m,., 1/m), can be obtained from the inequalities (-e-mkrm-I ≤4-en)≤(1-e8) where mk=mini*lpi)and gk=(+k Pi)(m-l. That is, &k is the geometric mean of the alues Pi for i#k. The second inequality of(6) follows from Lemma I We obtain from(4)and (6)that 1-P(A)≤/pe e-gkxm-l x (Pkx) Pk pk Pk where A= rgk Pk. Substituting the preceding inequality into(5)and considering both equalities of() gives EU"1≤∑ (-1) We will now derive a second set of lower and upper bounds for E[U]. Let b, denote the event that at least j coupons of type k arrive before the first of type i arrives. Then, using the 3.2.3 of [SD), we obtain that P(A)=P(∪b P(Bk k((pk+ pi)/(Pk+ pi+ pr))
I. ADLER ETAL. Proof With y = In(x), -(1 -e-X) = xe-X ay Because ln(x) in increasing in x, by the Ostrowski condition for Schur concavity (see [3]) it suffices to show that xle-I (1-e-X2) > x2e-X2(1 -e-X) if XI p(Bj i) (8) i1 + P(M i 1 + Er:i,k J,r lB,) jk _ 1yF ? (Pk/(Pk + Pi))j i~k - r+ ri,k((Pk + Pi)/(Pk + Pi + Pr))' 516
he coupon-collector's problem revisited 517 where(8)follows from the conditional expectation inequality and(9)from P(Bk. B P时,B)=P(B (pk/(pk+pi+pr)j (Pk/(Pk+Pi) pk+ pi Pk+ Pi+ pr Therefore. we obtain our second upper bound for E[1=∑11-P(A分小 E[U]≤ (Pk/(Pk +Pi)) +ik(Pk+ Pi//(Pk+Pi+pr) To obtain a lower bound, let X; denote the time of the first type-i event, and let denote the time of the j th type-k event in the Poissonization scheme(which results in T and Xifor i# k being independent). Then, from(4) P(4=E∏ Using the well-known result that elf(X)g(X)> elf(X)elg(X)] whenever f and g ar increasing functions [4, p. 339], which easily generalizes to the product of any number of positive increasing functions, the preceding equation yields that P(45≥E-cr =∏IP(T>X) ∏[-P(T 150). the explicit expression(but not the recursive one) is computationally unstable
The coupon-collector's problem revisited where (8) follows from the conditional expectation inequality and (9) from P(BkrBki) P(Bj k Bk ) = n' ,ir j,r ,i' P(Bkj P(Bj,) (Pk/(Pk + Pi + Pr))j (Pk/(Pk + Pi))' = ( Pk + Pi ) Pk + Pi + Pr J Therefore, we obtain our second upper bound for E[Uj ] = km1[1 - P(Ak)]: E[Ujm] E[f(X)]E[g(X)] whenever f and g are increasing functions [4, p. 339], which easily generalizes to the product of any number of positive increasing functions, the preceding equation yields that 1-P(Ak) > E[1-e PiT"] ifk = P(Tk > Xi) i#k = H[ - p(Tf 150), the explicit expression (but not the recursive one) is computationally unstable. 517
518 . ADLER ETAL (ii For very large m, simulation can be employed to efficiently estimate E[Um]. The following simulation approach estimates 1-P(A) )by a conditional expectation estimator that conditions on the arrival time of the jth item of type k; the estimator is then further improved by the use Generate random numbers U1,., U let LI=mn(∏=1U)andL2=mn(∏=1(-U1) The preceding should be repeated many times, with the estimator of E[Um being the average [1 FOATA, D AND ZEILBERGER, D(2002). The collector's brotherhood problem using the Newman-Shepp symbolic [3] MARSHALL, A. w. AND OLKIN, I (1979). inequalities: Theory of Majorization and Its Applications(Math. Sci Eng. 143). Academic Press, New York [4] Ross, S M.(1996). Stochastic Processes, 2nd edn. John Wiley, New York [5] ROSS, S M.(2002). Probability Models for Computer Science. Academic Press, New York
I. ADLER ETAL. (iii) For very large m, simulation can be employed to efficiently estimate E[UT]. The following simulation approach estimates 1 - P(Ak) by a conditional expectation estimator that conditions on the arrival time of the jth item of type k; the estimator is then further improved by the use of antithetic variables. * Generate random numbers Ul,..., Uj; * let L1 = ln(Fnl Ui) and L2 = ln(Ji(l - U)); * set 1 m - V = E n (l -ePiLI/Pk) + I (l -ePiL21/p) k=l -i#:k i:k The preceding should be repeated many times, with the estimator of E[U ] being the average of the values of V obtained. References [ 1] FOATA, D. AND ZEILBERGER, D. (2002). The collector's brotherhood problem using the Newman-Shepp symbolic method. To appear in Algebra Univ. [2] FOATA, D., Guo-Niu, H. AND LASS, B. (2001). Les nombres hyperharmonique et la fratrie du collectionneur de vignettes. Sem. Lothar. Combinatoire 47, B47a (electronic). [3] MARSHALL, A. W. AND OLKIN, I. (1979). Inequalities: Theory of Majorization and Its Applications (Math. Sci. Eng. 143). Academic Press, New York. [4] Ross, S. M. (1996). Stochastic Processes, 2nd edn. John Wiley, New York. [5] Ross, S. M. (2002). Probability Models for Computer Science. Academic Press, New York. 518