Coded Caching under Arbitrary Popularity Distributions Jinbei Zhang',Xiaojun Lin,Xinbing Wangt Dept.of Electronic Engineering,Shanghai Jiao Tong University,China School of Electrical and Computer Engineering,Purdue University,USA Email:abelchina @sjtu.edu.cn,linx @purdue.edu,xwang8@sjtu.edu.cn Abstract-Caching plays an important role in reducing the large (compared to N),this local caching gain will not differ backbone traffic when serving high-volume multimedia content. significantly from 1 (i.e.,the baseline with no-caching).Note Recently,a new class of coded caching schemes have received sig- that the broadcast capability of the system is not exploited here nificant interest because they can exploit coded multi-cast oppor- tunities to further reduce backbone traffic.Without considering because each user can request a different file.In contrast,with file popularity,prior works have characterized the fundamental the coded caching scheme in [7],the worst-case transmission performance limits of coded caching through a deterministic rate at the upstream link is reduced toK(1The worst-case analysis.However,when heterogeneous file popularity is taken into account,there remain open questions regarding the additional factorwhich is referred to as the global fundamental limits of coded caching performance.In this work, caching gain in [7,suggests a significant improvement over for an arbitrary popularity distribution,we first derive a new the uncoded case when the global storage capability KM of information-theoretical lower bound on the expected transmission all users is comparable to,or larger than,N.The key idea rate of any coded caching schemes.We then show that a simple of [7]is to transmit coded packets so that multiple users can coded-caching scheme attains an expected transmission rate that benefit from the same broadcast packet.Thus,the broadcast is at most a constant factor away from the lower bound.Unlike other existing studies,the constant factor that we derived is capability in the system can be exploited even if different users independent of the popularity distribution. request different files.[7]further shows in an information- theoretic sense that the worst-case transmission rate of the coded caching scheme in [7]is at most a constant factor I.INTRODUCTION (specifically,12 times)away from the minimum possible.In As the amount of Internet traffic continues to grow,video is this sense,the performance of the coded caching scheme of [7] expected to dominate 69%of the overall traffic [2],which will is close to the fundamental limit for the system studied.The greatly stress the underlying communication infrastructure. works in [8,14-16]extend this idea to decentralized caching, Historically,caching has played a significant role in reducing hierarchical networks,multiple group-cast,and online caching, the bandwidth requirement for serving video traffic.By placing respectively. contents closer to,or even at the end-users,the bandwidth The studies cited above all focus on the deterministic worst- requirement at the upstream links can be greatly reduced.Most case,i.e.,not only does each user request distinct files,the of such studies of caching have focused on the case where performance of the system is studied against the worst-case uncoded video packets were stored and transmitted (see,e.g., request pattern.Arguably,if the popularity of the files are 3-6]and references therein). identical,the probability of each request pattern will vary Recently,a new class of caching schemes,called coded less significantly.Then,the worst-case performance may not caching [7-16],have gained significant interest because it differ significantly from the average-case performance [9].In can significantly reduce the upstream bandwidth requirement reality,however,the file popularity can differ significantly,and in systems with broadcast/multicast capabilities.Consider K thus some request patterns will occur much more frequently users requesting contents from one server through a shared than other request patterns.As a result,the average-case communication link with broadcast capability.Each user may performance can differ significantly from the worst-case bound request any one of the N files(N>K),but each user only has (see also the discussions at the end of Section I). a storage with size M<N.In the worst case,each user may request a distinct file.With conventional (uncoded)caching While the average-case performance of coded caching un- scheme,it is easy to see that the worst-case transmission rate der heterogeneous file popularity was studied in [9-13].the on the upstream link must be K(1-4).because each user optimality bounds obtained are substantially weaker than the can only cache fraction of all the contents.[7]refers to results in [7]because the gap between the achievable bound this factor (1-)as the local caching gain.Unless M is and the lower bound depends on various system parameters Specifically,in [9],contents are divided into groups with An earlier version of this paper has appeared in Information Theory and similar popularity.Each group is assigned a separate portion Applications Workshop,UCSD,Feb.2015 [1]. of the cache and uses the coded caching scheme of [8.The Copyright (c)2017 IEEE.Personal use of this material is permitted. However,permission to use this material for any other purposes must be gap between the corresponding transmission rate and the lower obtained from the IEEE by sending a request to pubs-permissions@ieee.org bound is found to increase with the total number of groups
1 Coded Caching under Arbitrary Popularity Distributions Jinbei Zhang† , Xiaojun Lin‡ , Xinbing Wang† †Dept. of Electronic Engineering, Shanghai Jiao Tong University, China ‡School of Electrical and Computer Engineering, Purdue University, USA Email: abelchina@sjtu.edu.cn, linx@purdue.edu, xwang8@sjtu.edu.cn Abstract—Caching plays an important role in reducing the backbone traffic when serving high-volume multimedia content. Recently, a new class of coded caching schemes have received significant interest because they can exploit coded multi-cast opportunities to further reduce backbone traffic. Without considering file popularity, prior works have characterized the fundamental performance limits of coded caching through a deterministic worst-case analysis. However, when heterogeneous file popularity is taken into account, there remain open questions regarding the fundamental limits of coded caching performance. In this work, for an arbitrary popularity distribution, we first derive a new information-theoretical lower bound on the expected transmission rate of any coded caching schemes. We then show that a simple coded-caching scheme attains an expected transmission rate that is at most a constant factor away from the lower bound. Unlike other existing studies, the constant factor that we derived is independent of the popularity distribution. I. INTRODUCTION As the amount of Internet traffic continues to grow, video is expected to dominate 69% of the overall traffic [2], which will greatly stress the underlying communication infrastructure. Historically, caching has played a significant role in reducing the bandwidth requirement for serving video traffic. By placing contents closer to, or even at the end-users, the bandwidth requirement at the upstream links can be greatly reduced. Most of such studies of caching have focused on the case where uncoded video packets were stored and transmitted (see, e.g., [3–6] and references therein). Recently, a new class of caching schemes, called coded caching [7–16], have gained significant interest because it can significantly reduce the upstream bandwidth requirement in systems with broadcast/multicast capabilities. Consider K users requesting contents from one server through a shared communication link with broadcast capability. Each user may request any one of the N files (N > K), but each user only has a storage with size M < N. In the worst case, each user may request a distinct file. With conventional (uncoded) caching scheme, it is easy to see that the worst-case transmission rate on the upstream link must be K(1 − M N ), because each user can only cache M N fraction of all the contents. [7] refers to this factor (1 − M N ) as the local caching gain. Unless M is An earlier version of this paper has appeared in Information Theory and Applications Workshop, UCSD, Feb. 2015 [1]. Copyright (c) 2017 IEEE. Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions@ieee.org. large (compared to N), this local caching gain will not differ significantly from 1 (i.e., the baseline with no-caching). Note that the broadcast capability of the system is not exploited here because each user can request a different file. In contrast, with the coded caching scheme in [7], the worst-case transmission rate at the upstream link is reduced to K(1− M N ) 1 1+KM/N . The additional factor 1 1+KM/N , which is referred to as the global caching gain in [7], suggests a significant improvement over the uncoded case when the global storage capability KM of all users is comparable to, or larger than, N. The key idea of [7] is to transmit coded packets so that multiple users can benefit from the same broadcast packet. Thus, the broadcast capability in the system can be exploited even if different users request different files. [7] further shows in an informationtheoretic sense that the worst-case transmission rate of the coded caching scheme in [7] is at most a constant factor (specifically, 12 times) away from the minimum possible. In this sense, the performance of the coded caching scheme of [7] is close to the fundamental limit for the system studied. The works in [8, 14–16] extend this idea to decentralized caching, hierarchical networks, multiple group-cast, and online caching, respectively. The studies cited above all focus on the deterministic worstcase, i.e., not only does each user request distinct files, the performance of the system is studied against the worst-case request pattern. Arguably, if the popularity of the files are identical, the probability of each request pattern will vary less significantly. Then, the worst-case performance may not differ significantly from the average-case performance [9]. In reality, however, the file popularity can differ significantly, and thus some request patterns will occur much more frequently than other request patterns. As a result, the average-case performance can differ significantly from the worst-case bound (see also the discussions at the end of Section II). While the average-case performance of coded caching under heterogeneous file popularity was studied in [9–13], the optimality bounds obtained are substantially weaker than the results in [7] because the gap between the achievable bound and the lower bound depends on various system parameters. Specifically, in [9], contents are divided into groups with similar popularity. Each group is assigned a separate portion of the cache and uses the coded caching scheme of [8]. The gap between the corresponding transmission rate and the lower bound is found to increase with the total number of groups
3 [10]studies a popularity model that can be viewed as an popularity distributions(see further discussions in Section IID). intermediate between worst-case analysis and average-case We establish this lower bound by a series of reduction and analysis.It models non-uniform popularity by assuming that merging steps that convert the original system with hetero- the file popularity has L different levels.Both the number of geneous popularity to other systems with uniform popularity. files at each level and the number of users requesting files Using these techniques,we are able to quantify the impact at each level are fixed.The authors then study the deter- of both "popular"files and "non-popular"files,which we ministic worst-case performance under such a setting.When believe is the main reason that we can obtain sharp constant- the number of users is large,this model can be seen as an factor characterizations even in the non-asymptotic settings. approximation of a stochastic-demand model.The theoretical (See further discussions at the end of Section IV.D.)These gap between the achievable transmission rate and the lower techniques may be of independent interest for future studies bound established in [10]increases as L3.The work in [13] of coded caching performance.Third,while our achievable is most related to ours.In [13],the authors propose a large scheme is similar to RLFU proposed [13](in the sense that class of achievable schemes,called RAP-GCC and RAP- for most parameter settings we divide the files into 2 groups, CIC,which place files in caches according to a popularity- correspondingly to popular and unpopular files,respectively), dependent caching distribution.However,because the optimal there are parameter settings where we need to divide the files caching distribution for RAP-GCC and RAP-CIC may not into 3 groups.This new modification turns out to be important have analytically tractable solutions in general,[13]then for attaining the constant-factor performance gap under all focuses on a sub-class of RAP-GCC,called RLFU (Random popularity distributions and non-asymptotic settings.Further, Least-Frequently-Used caching),to study the performance gap our division between popular and unpopular files uses a simple between the lower bound and achievable bound of the required threshold Ni.Specifically,suppose that the number of users is transmission rate.RLFU uses a clever and surprisingly simple K and the size of each user's storage is M.Then,the threshold caching distribution that divides the contents into 2 groups. Ni corresponds to the least-popular file among those whose Although both the lower bound and the achievable bound (by popularity is no smaller thanwhere M-max(M,3). RLFU)are given in [13]for arbitrary popularity distributions, Compared with the choice of the threshold (denoted by m) the gap between the achievable bound and the lower bound in [13 (either chosen as a function of the Zipf distribution is shown to be a constant only when the file popularity in the theoretical analysis,or obtained via a 1-dimensional follows a Zipf distribution.Note that the gaps estimated by optimization over an achievable bound).our choice of N the theoretical results of [13]depend on the parameters of the is in a simple closed-form that works for arbitrary popular Zipf distribution,and may also become large for certain ranges distributions (see further discussions in Section II).Finally, of the parameter values [13].Further,the constant factors are note that the RLFU scheme in [13]with the optimal threshold only shown in the asymptotic limit when the number of files m should in general attain a better performance than the and/or the number of users are large.Therefore,it remains an simpler choice of Ni in this paper.Thus,it implies that RLFU important open question what is the fundamental limit of the combined with appropriately dividing the files into 3 groups performance of coded caching for the more practical scenario under certain settings also attains an average transmission rate of heterogeneous file popularity,and whether one can find a that is away from the optimal by at most a constant factor, coded-caching scheme whose performance gap from the lower independently of the popularity distribution.(This immediately bound is independent of the popularity distribution even in the follows that the best scheme in the larger class of RAP-GCC non-asymptotic settings. will also attain a constant-factor gap.)However,the RLFU In this paper,we make the following contributions to answer scheme itself without the 3-group modification does not have the above open question.First,we show that a simple coded- this constant-factor performance guarantees under arbitrary caching scheme(which is a special case of RAP-GCC in [13] popularity distribution (see Section IID). and is also similar to RLFU)can attain an average transmission The remainder of this paper is as follows.We first present rate that is at most 55 times from the optimal.Although this the network model in Section II.Main results are summarized factor appears to be large,it is the first result in the literature in Section III.Followed is the analysis on the information with a constant-factor gap that is independent of the popularity theoretical lower bound in Section IV.The achievable scheme distributions and the system size.In contrast,as we discussed is analyzed in Section V.The gap between the lower bound earlier,the performance gap in prior studies could be arbitrar- and the achievable rate is derived in Section VI.Simulation ily large depending on either the number of groups [9],the results are presented in Section VII.Then,we conclude. number of levels [10],or the parameter of the Zipf distribution [11-13].Second,a key step towards this result is to use a new II.NETWORK MODEL construction in Section IV to establish a sharper lower bound In the following,we present the network model for a (in the order sense)on the achievable transmission rate of any video delivery system with both local caches and broadcast schemes.Specifically,our lower bound may be higher than the capabilities (see Figure 1). lower bound in [13]by an arbitrarily large factor under certain We assume that there are N distinct files from the setF= (F,F2,...,FN.The popularity of the file Fi is pi,where I Note that the conference version of this paper [1]not only shows a larger multiplicative factor,but also requires a small additive factor.The results in 1.Without loss of generality,we assume that the this paper use a more-refined analysis that eliminates the need for the additive file size is of unit length and the file popularity is decreasing factor. in the index,i.e.,P:≥Pi if i≤j
2 [10] studies a popularity model that can be viewed as an intermediate between worst-case analysis and average-case analysis. It models non-uniform popularity by assuming that the file popularity has L different levels. Both the number of files at each level and the number of users requesting files at each level are fixed. The authors then study the deterministic worst-case performance under such a setting. When the number of users is large, this model can be seen as an approximation of a stochastic-demand model. The theoretical gap between the achievable transmission rate and the lower bound established in [10] increases as L 3 . The work in [13] is most related to ours. In [13], the authors propose a large class of achievable schemes, called RAP-GCC and RAPCIC, which place files in caches according to a popularitydependent caching distribution. However, because the optimal caching distribution for RAP-GCC and RAP-CIC may not have analytically tractable solutions in general, [13] then focuses on a sub-class of RAP-GCC, called RLFU (Random Least-Frequently-Used caching), to study the performance gap between the lower bound and achievable bound of the required transmission rate. RLFU uses a clever and surprisingly simple caching distribution that divides the contents into 2 groups. Although both the lower bound and the achievable bound (by RLFU) are given in [13] for arbitrary popularity distributions, the gap between the achievable bound and the lower bound is shown to be a constant only when the file popularity follows a Zipf distribution. Note that the gaps estimated by the theoretical results of [13] depend on the parameters of the Zipf distribution, and may also become large for certain ranges of the parameter values [13]. Further, the constant factors are only shown in the asymptotic limit when the number of files and/or the number of users are large. Therefore, it remains an important open question what is the fundamental limit of the performance of coded caching for the more practical scenario of heterogeneous file popularity, and whether one can find a coded-caching scheme whose performance gap from the lower bound is independent of the popularity distribution even in the non-asymptotic settings. In this paper, we make the following contributions to answer the above open question. First, we show that a simple codedcaching scheme (which is a special case of RAP-GCC in [13] and is also similar to RLFU) can attain an average transmission rate that is at most 55 times from the optimal1 . Although this factor appears to be large, it is the first result in the literature with a constant-factor gap that is independent of the popularity distributions and the system size. In contrast, as we discussed earlier, the performance gap in prior studies could be arbitrarily large depending on either the number of groups [9], the number of levels [10], or the parameter of the Zipf distribution [11–13]. Second, a key step towards this result is to use a new construction in Section IV to establish a sharper lower bound (in the order sense) on the achievable transmission rate of any schemes. Specifically, our lower bound may be higher than the lower bound in [13] by an arbitrarily large factor under certain 1 Note that the conference version of this paper [1] not only shows a larger multiplicative factor, but also requires a small additive factor. The results in this paper use a more-refined analysis that eliminates the need for the additive factor. popularity distributions (see further discussions in Section III). We establish this lower bound by a series of reduction and merging steps that convert the original system with heterogeneous popularity to other systems with uniform popularity. Using these techniques, we are able to quantify the impact of both “popular” files and “non-popular” files, which we believe is the main reason that we can obtain sharp constantfactor characterizations even in the non-asymptotic settings. (See further discussions at the end of Section IV.D.) These techniques may be of independent interest for future studies of coded caching performance. Third, while our achievable scheme is similar to RLFU proposed [13] (in the sense that for most parameter settings we divide the files into 2 groups, correspondingly to popular and unpopular files, respectively), there are parameter settings where we need to divide the files into 3 groups. This new modification turns out to be important for attaining the constant-factor performance gap under all popularity distributions and non-asymptotic settings. Further, our division between popular and unpopular files uses a simple threshold N1. Specifically, suppose that the number of users is K and the size of each user’s storage is M. Then, the threshold N1 corresponds to the least-popular file among those whose popularity is no smaller than 1 KMr , where Mr = max(M, 3). Compared with the choice of the threshold (denoted by m˜ ) in [13] (either chosen as a function of the Zipf distribution in the theoretical analysis, or obtained via a 1-dimensional optimization over an achievable bound), our choice of N1 is in a simple closed-form that works for arbitrary popular distributions (see further discussions in Section III). Finally, note that the RLFU scheme in [13] with the optimal threshold m˜ should in general attain a better performance than the simpler choice of N1 in this paper. Thus, it implies that RLFU combined with appropriately dividing the files into 3 groups under certain settings also attains an average transmission rate that is away from the optimal by at most a constant factor, independently of the popularity distribution. (This immediately follows that the best scheme in the larger class of RAP-GCC will also attain a constant-factor gap.) However, the RLFU scheme itself without the 3-group modification does not have this constant-factor performance guarantees under arbitrary popularity distribution (see Section III). The remainder of this paper is as follows. We first present the network model in Section II. Main results are summarized in Section III. Followed is the analysis on the information theoretical lower bound in Section IV. The achievable scheme is analyzed in Section V. The gap between the lower bound and the achievable rate is derived in Section VI. Simulation results are presented in Section VII. Then, we conclude. II. NETWORK MODEL In the following, we present the network model for a video delivery system with both local caches and broadcast capabilities (see Figure 1). We assume that there are N distinct files from the set F = {F1, F2, ..., FN }. The popularity of the file Fi is pi P , where N i=1 pi = 1. Without loss of generality, we assume that the file size is of unit length and the file popularity is decreasing in the index, i.e., pi ≥ pj if i ≤ j
and transmission scheme 3,let r(K,Wi)denote the amount File N Popularity of broadcast transmission from the server that is needed to Server satisfy a request Wi.The expected rate under scheme 3 is therefore defined as ①File File Placement ② ③Transmission NK R(K,F,P)=∑r(K,W)Pw) (1) 1234 ②File N File i=1 Request ②File We wish to find the schedule that minimizes Request Rs(K,F,P).Define the optimal rate as User 1 User User K R(K,F,P)=min Ra(K,F,P) (2) Fig.1:An illustration of the network model. Unfortunately,finding the exact optimal schedule that achieves this optimal rate is very difficult [7-16].Like [7-16].our goal is to find a simple scheme whose achievable rate is as close There is one server who has all N files and who serves to the optimal rate R(K,F,P)as possible. these files to K users interested in these files.Each user has Remark:In [7],instead of studying the expected rate (1), a local cache with size M (again measured with respect to the authors focus on the worst-case rate,i.e., the unit-length of the files).The K users are connected to the server through a network with broadcast capability,i.e.,each maxrs(K,Wi) (3) Wi transmission from the server can be received by all users. Let be the optimal scheme that attains the minimum value Before users request any files,some of the contents are placed in the users'caches.This is called cache placement of (3),and let 3 be the scheme proposed in [7].Then [7] shows that and in practice is usually carried out during off-peak hours of maxw,r3(K,Wi) the network.Then,at each time,a user k will request file Fi maxw?(K,W≤12. (4) with probability pi,independently of all other users and files. If the user's local cache already has (some of)the content. However,in this paper since we are interested in the expected the request can be served locally.Otherwise,the server must rate given in (2),we would be interested in the gap transmit (via broadcast)contents not available from the local ΣrK.WPw cache.The goal is that every user should be able to reconstruct (5) the file that it requests with the information received from the ∑1r(K,W)P(WwW server and the cached content in its local cache. where*is the optimal scheme that attains the minimum value of)P(W).Note that the bound in (4 A.Definition of the Expected Transmission Rate does not imply that the expression in (5)is bounded by the same constant,especially when the probability P(Wi)varies In this subsection,we will define the expected rate needed significantly.In general,even if the bound in (4)holds,the from the server in serving the requests.Note that we do not expression in(5)can still be arbitrarily large.Thus,quantifying consider the transmission rate for cache placement. the performance gap in terms of the expected rate represents Let Wi [fi,fi2,...,fik}denote a request pattern. a new research problem where fij F is the requested file for the j-th user, 1NP4 the corresponding popularity distribution P. 2,+ Further,for any integer Obviously,given a set of files F and the files'corresponding popularity distribution P,there exists numerous caching and Nr:between 1 and N-1,let Ny(Nz)= >N Pi 2PN transmission schemes to meet users'request.For a caching Then,we have the following
3 File Placement User 1 File 1 ... File N User 2 ... User K File Request File Request File Transmission 1 2 3 2 2 2 Server Ͳ Ͳ Ͳ Ͳ Ͳ Ͳ File Popularity ... 1 2 3 ... 4 N Fig. 1: An illustration of the network model. There is one server who has all N files and who serves these files to K users interested in these files. Each user has a local cache with size M (again measured with respect to the unit-length of the files). The K users are connected to the server through a network with broadcast capability, i.e., each transmission from the server can be received by all users. Before users request any files, some of the contents are placed in the users’ caches. This is called cache placement and in practice is usually carried out during off-peak hours of the network. Then, at each time, a user k will request file Fi with probability pi , independently of all other users and files. If the user’s local cache already has (some of) the content, the request can be served locally. Otherwise, the server must transmit (via broadcast) contents not available from the local cache. The goal is that every user should be able to reconstruct the file that it requests with the information received from the server and the cached content in its local cache. A. Definition of the Expected Transmission Rate In this subsection, we will define the expected rate needed from the server in serving the requests. Note that we do not consider the transmission rate for cache placement. Let Wi = {fi1, fi2, ..., fiK} denote a request pattern, where fij ∈ F is the requested file for the j-th user, 1 ≤ j ≤ K. Note that there are NK such patterns. Let W be the set of all possible request patterns from K users, i.e., W = {W1, W2, ..., WNK }. Since each user can request one file from N files independently, the probability for event Wi is given by P(Wi) = Y K j=1 P(fij ) where P(fij ) is the probability for a user to request file fij . Note that in our original system, the probability for a user to request file Fj is pj . However, later in the analysis we will compare to another system with a different popularity distribution P. Hence, we use the notation W(K, F,P) to denote the set W of possible request patterns associated with the corresponding popularity distribution P. Obviously, given a set of files F and the files’ corresponding popularity distribution P, there exists numerous caching and transmission schemes to meet users’ request. For a caching and transmission scheme F, let rF(K, Wi) denote the amount of broadcast transmission from the server that is needed to satisfy a request Wi . The expected rate under scheme F is therefore defined as RF(K, F,P) = X NK i=1 rF(K, Wi)P(Wi). (1) We wish to find the schedule F that minimizes RF(K, F,P). Define the optimal rate as R(K, F,P) = min F RF(K, F,P). (2) Unfortunately, finding the exact optimal schedule that achieves this optimal rate is very difficult [7–16]. Like [7–16], our goal is to find a simple scheme F whose achievable rate is as close to the optimal rate R(K, F,P) as possible. Remark: In [7], instead of studying the expected rate (1), the authors focus on the worst-case rate, i.e., max Wi rF(K, Wi). (3) Let F ′ be the optimal scheme that attains the minimum value of (3), and let F be the scheme proposed in [7]. Then [7] shows that maxWi rF(K, Wi) maxW′ i rF′ (K, W′ i ) ≤ 12. (4) However, in this paper since we are interested in the expected rate given in (2), we would be interested in the gap PNK i=1 rF(K, Wi)P(Wi) PNK i=1 rF∗ (K, W′ i )P(W′ i ) , (5) where F ∗ is the optimal scheme that attains the minimum value of PNK i=1 rF(K, W′ i )P(W′ i ). Note that the bound in (4) does not imply that the expression in (5) is bounded by the same constant, especially when the probability P(Wi) varies significantly. In general, even if the bound in (4) holds, the expression in (5) can still be arbitrarily large. Thus, quantifying the performance gap in terms of the expected rate represents a new research problem. III. MAIN RESULTS In this section, we provide an overview of our main results. Given an arbitrary popularity distribution, our first result establishes a fundamental lower bound on the expected transmission rate for any coded caching scheme. Let [x]+ denote max(0, x). Recall that the files are indexed with nonincreasing popularity. Without loss of generality, we assume pN+1 = 0 (which ensures the existence of N1 defined next even when pN ≥ 1 KMr ). Let Mr = max(M, 3), and let N1 be the integer that satisfies pN1 ≥ 1 KMr and pN1+1 N1 pi 2/KMr + 1 2 . Further, for any integer Nx between 1 and N − 1, let Ny(Nx) = P i>Nx pi 2pNx + 1 2 . Then, we have the following
¥ Theorem /With K users requesting files independently in due to these Ni popular files is1-1) F according to the corresponding popularity distribution P. (assuming M >3).Thus,this lower bound is always less the lower bound on the expected transmission rate is given by than Note that using N in the place of N will not R(K,F,P)2i立max 1 (N1+N2-M) get a lower bound higher than either.To see this,if IM we do not consider the term N(N),the lower bound in the (6) max KPN2IN:+Nu(N:)-M] second term of Theorem 1 gives HKpN (N-M).Since Nz>N1+1 KpN N this lower bound can not be larger than The new lower bound in Theorem I is one of the main either.On the other hand.the lower bound given by contributions of the paper.It is sharper than those reported in Theorem 1 can be arbitrarily larger once the contribution for the literature [9,11-13]in the order sense.Specifically,we unpopular files is accounted.For example,when Ni NKPi=K-K∑sN,Kn≈K-Kg certain popularity distributions.Thus,this sharper bound is the When log N≥log KM,we would have≈O(K).In main reason behind the improved performance characterization other words,accounting for the impact of unpopular files can reported in this paper.We acknowledge that there will be increase the lower bound by a factor as large as O(log N).Due settings where our lower bound can be lower than that of [13]. to this reason,it is critical to use the improved lower bound However,even when this happens,the difference is at most a in Theorem 1 in order to prove constant-factor performance constant multiplicative factor since Theorem 2 establishes an gaps under arbitrary popularity distributions. achievable upper bound that is at most a constant factor away We next present an achievable scheme that can attain from our lower bound.Here.the index N in the first term of a corresponding upper bound.Our achievable scheme is a (6)plays an important role in most of the results in this paper. special case of RAP-GCC in [13]and is also similar to RLFU. Recall again that the popularity pi is non-increasing in the file Recall that each file is of unit length.In order to allow a index i.Roughly speaking,N is the index for the file whose portion of each file to be cached,we refer to a minimally popularity is aroundK.We may view all files iNi as the "unpopular" file has F "bits".We are most interested in the case of files.As readers will see in the proofs of Theorem 1 in Section large files,i.e.,when the bits are very small compared to the IV,we will first show that the expression NM+is a file size,and hence F+o0.Our achievable scheme is lower bound on the expected transmission rate for serving the similar to RLFU in most settings (i.e.,when N1>M and more popular files.Existing results in [9,11-13]use different M>3),but different in other settings (i.e.,when N3 and the R(K,F,P)≤min +〉Kp: number of files N is large.Then,using the approximation max1,M0)N, (7) that∑l}≈logV,it is easy to see that p≈ogN,and thus the threshold N satisfies NThe lower bound KpNa(N-M)+∑Kp) >N3
4 Theorem 1: With K users requesting files independently in F according to the corresponding popularity distribution P, the lower bound on the expected transmission rate is given by R(K, F,P) ≥ 1 11 max n 1 Mr (N1 + N2 − M), max Nx≥N1+1 KpNx [Nx + Ny(Nx) − M] o . (6) The new lower bound in Theorem 1 is one of the main contributions of the paper. It is sharper than those reported in the literature [9, 11–13] in the order sense. Specifically, we will show shortly that our lower bound can be higher than the lower bound in [13] by an arbitrarily large factor under certain popularity distributions. Thus, this sharper bound is the main reason behind the improved performance characterization reported in this paper. We acknowledge that there will be settings where our lower bound can be lower than that of [13]. However, even when this happens, the difference is at most a constant multiplicative factor since Theorem 2 establishes an achievable upper bound that is at most a constant factor away from our lower bound. Here, the index N1 in the first term of (6) plays an important role in most of the results in this paper. Recall again that the popularity pi is non-increasing in the file index i. Roughly speaking, N1 is the index for the file whose popularity is around 1 KMr . We may view all files i ≤ N1 as the “more popular” files, and all files i > N1 as the “unpopular” files. As readers will see in the proofs of Theorem 1 in Section IV, we will first show that the expression 1 11Mr [N1−M]+ is a lower bound on the expected transmission rate for serving the more popular files. Existing results in [9, 11–13] use different variations of this expression as the lower bound. However, since this expression neglects the contribution of the unpopular files, it results into poorer performance characterization. In contrast, we show that the contribution of the unpopular files can be accounted for by having N2 in the first term of (6). As readers will see in Section IV.D, here we introduce a novel “merging” procedure that merges multiple unpopular files into one file with the sum of the popularities, so that the new popularity is greater than or equal to 1/(KMr). In this way, N2 can be interpreted (approximately) as a lower bound on the number of such “merged” files. Thus, N1 and N2 combined produce the lower bound in the first term of Theorem 1. However, the first term of (6) would be trivial if N1 + N2 ≤ M. In that case, we will need to increase the threshold index N1 to a larger value Nx, in order to obtain a sharper lower bound. The second term in (6) takes care of this case. Details of the proof will be presented in Section IV. Difference from [13]: As we discussed above, a key difference between our lower bound and the prior results is that our lower bound accounts for the contribution of the unpopular files. As one can see from the example below, accounting for the contribution of the unpopular files to the lower bound is critical, because otherwise the lower bound may be looser by an arbitrarily factor. Take Zipf distribution with α = 1 as an example. Suppose that M ≥ 3 and the number of files N is large. Then, using the approximation that PN i=1 1 i ≈ log N, it is easy to see that pi ≈ 1 i log N , and thus the threshold N1 satisfies N1 ≈ KM log N . The lower bound due to these N1 popular files is 1 11 ( N1 M − 1) ≈ 1 11 ( K log N − 1) (assuming M ≥ 3). Thus, this lower bound is always less than 1 11 K log N . Note that using Nx in the place of N1 will not get a lower bound higher than 1 11 K log N either. To see this, if we do not consider the term Ny(Nx), the lower bound in the second term of Theorem 1 gives 1 11KpNx (Nx − M). Since KpNxNx ≈ K log N , this lower bound can not be larger than 1 11 K log N either. On the other hand, the lower bound given by Theorem 1 can be arbitrarily larger once the contribution for unpopular files is accounted. For example, when N1 ≤ KM, the impact from unpopular files, N2 Mr , can be approximated as P i>N1 Kpi = K − K P i≤N1 Kpi ≈ K − K log N1 log N . When log N ≫ log KM, we would have N2 Mr ≈ O(K). In other words, accounting for the impact of unpopular files can increase the lower bound by a factor as large as O(log N). Due to this reason, it is critical to use the improved lower bound in Theorem 1 in order to prove constant-factor performance gaps under arbitrary popularity distributions. We next present an achievable scheme that can attain a corresponding upper bound. Our achievable scheme is a special case of RAP-GCC in [13] and is also similar to RLFU. Recall that each file is of unit length. In order to allow a portion of each file to be cached, we refer to a minimally divisible portion of a file as a “bit”, and assume that each file has |F| “bits”. We are most interested in the case of large files, i.e., when the bits are very small compared to the file size, and hence |F| → +∞. Our achievable scheme is similar to RLFU in most settings (i.e., when N1 ≥ M and M ≥ 3), but different in other settings (i.e., when N1 N1 Kpi , KpN3 (N3 − M) + X i>N3 Kpi , (7)
TABLE I:Comparisons of our work and [13] Our work 13] Performance gap (between achievable achievable schemes and lower bounds)is derived on- System Performance gap (between schemes and lower bounds)is derived & ly for Zipf distribution and for asymptotic settings Performance for arbitrary distribution and system size. settings when the system size is large.Gap Constant-factor gap independent of the depends on the parameter a of the Zipf ga即 popularity distribution and system size. distribution,and may approach infinity as a approaches 1. Lower bound only accounts for the l most- Lower bound accounts for the contribution popular files.and thus may be arbitrarily Lower bound of both popular files and unpopular files. lower than our lower bound for certain popularity distributions. The achievable scheme divides all files into either 2 groups or 3 groups depend- The achievable bound is analyzed based ing on system settings.Using a simple on RLFU,which divides the files into 2 Achievable bound expression for N,this scheme is shown groups.As a result,RLFU may incur an to be constant-factor away from the lower arbitrarily large gap from the lower bound bound under arbitrary popularity distribu- under certain popularity distributions. tion. where N3 M]+1. serve file N3 in an uncoded manner when any users request .In the first term of (7),when N1>M and M 3,it.On the other hand,if RLFU caches all N3 =M+1 N1-M+ ax, INis an upper bound on the expected files,the average transmission rate can be lower bounded by transmission rate to serve the more popular files (i.e.,with R1,which corresponds to the rate index i≤Ni),and∑i>M,Kpn:is an upper bound on when there is only one file being requested.In contrast,by the expected transmission rate to serve the unpopular files.caching all files 1 to M]and partially caching file N3,our However,the expression Kp:may become rate is R3 KpNa (M+1-M).We note that,depending too loose in certain cases (e.g,when NM and distribution and non-asymptotic case in this paper,capturing M≥3). this subtle difference is critical for the overall results. Difference from(13/:Note that the above modification that From Theorem 1 and Theorem 2,we will show in Section divides the files into 3 groups in some settings turns out to be VI that the gap between the lower bound Ri and the upper crucial for attaining the constant-factor performance gap under bound Rup is bounded by a multiplicative constant. arbitrary popularity distributions,especially when the optimal Corollary I:With K users independently requesting files in transmission rate is small.The difference from RLFU occurs Faccording to the popularity distribution P,as F+, when N<M and when M is not an integer,i.e.,when the the lower bound R and the upper bound Rp can be bounded cache can hold all popular files and the cache size is not a by multiple of the file size.In this case,our optimal scheme is Rup≤55Rb (8) to cache all files1toLM」,cache M-LM」portion of file M+1,and not to cache all other files.To compare the Thus,the bounds differ at most by a multiplicative factor performance difference with RLFU:assuming that there are of 55.As we discuss in the introduction,although this factor M]+1 files.Let N3=M]+1.If RLFU caches at most may appear to be large,it is the first result in the literature M]files,the average transmission rate is R1=e(KPNa) with a constant-factor gap that is independent of the popularity when KpN=O(1).which is simply the rate for RLFU to distributions.In contrast,the gap (between upper-and lower-
5 TABLE I: Comparisons of our work and [13] Our work [13] System settings & Performance gap Performance gap (between achievable schemes and lower bounds) is derived for arbitrary distribution and system size. Constant-factor gap independent of the popularity distribution and system size. Performance gap (between achievable schemes and lower bounds) is derived only for Zipf distribution and for asymptotic settings when the system size is large. Gap depends on the parameter α of the Zipf distribution, and may approach infinity as α approaches 1. Lower bound Lower bound accounts for the contribution of both popular files and unpopular files. Lower bound only accounts for the l mostpopular files, and thus may be arbitrarily lower than our lower bound for certain popularity distributions. Achievable bound The achievable scheme divides all files into either 2 groups or 3 groups depending on system settings. Using a simple expression for N1, this scheme is shown to be constant-factor away from the lower bound under arbitrary popularity distribution. The achievable bound is analyzed based on RLFU, which divides the files into 2 groups. As a result, RLFU may incur an arbitrarily large gap from the lower bound under certain popularity distributions. where N3 = ⌊M⌋ + 1. In the first term of (7), when N1 ≥ M and M ≥ 3, [N1−M]+ max(1,M) = [N1−M]+ M is an upper bound on the expected transmission rate to serve the more popular files (i.e., with index i ≤ N1), and P i>N1 Kpi is an upper bound on the expected transmission rate to serve the unpopular files. However, the expression [N1−M]+ M + P i>N1 Kpi may become too loose in certain cases (e.g, when N1 N1 Kpi by roughly KpN1 . Thus, by setting pN1 ≈ 1 KM , this index N1 is chosen such that the net effect to the first term of upper bound (7) is approximately zero, and thus the first term in (7) is approximately minimized. This property was the main intuition behind our choice of N1. However, the analysis in the paper will account for all scenarios (not just N1 ≥ M and M ≥ 3). Difference from [13]: Note that the above modification that divides the files into 3 groups in some settings turns out to be crucial for attaining the constant-factor performance gap under arbitrary popularity distributions, especially when the optimal transmission rate is small. The difference from RLFU occurs when N1 < M and when M is not an integer, i.e., when the cache can hold all popular files and the cache size is not a multiple of the file size. In this case, our optimal scheme is to cache all files 1 to ⌊M⌋, cache M − ⌊M⌋ portion of file ⌊M⌋ + 1, and not to cache all other files. To compare the performance difference with RLFU: assuming that there are ⌊M⌋ + 1 files. Let N3 = ⌊M⌋ + 1. If RLFU caches at most ⌊M⌋ files, the average transmission rate is R1 = Θ(KpN3 ) when KpN3 = O(1), which is simply the rate for RLFU to serve file N3 in an uncoded manner when any users request it. On the other hand, if RLFU caches all N3 = ⌊M⌋ + 1 files, the average transmission rate can be lower bounded by R2 = 1 − M N3 = ⌊M⌋+1−M ⌊M⌋+1 , which corresponds to the rate when there is only one file being requested. In contrast, by caching all files 1 to ⌊M⌋ and partially caching file N3, our rate is R3 = KpN3 (⌊M⌋ + 1 − M). We note that, depending on the system setting, R3 can be an arbitrarily small fraction of min{R1, R2}. For example, for any integer H, by letting N = H, M = H− 1 H , and pi = 1 H−1− 1 KH2(H−1) (i=1,...,H− 1), pH = 1 KH2 , we have R1 = Θ( 1 H2 ), R2 = Θ( 1 H2 ) and R3 = Θ( 1 H3 ). Thus, as H goes to infinity, the ratio R3 min{R1,R2} goes to zero. Since our achievable scheme is order optimal, it implies that directly using RLFU is not order-optimal for this case. Hence, this part of dividing files into 3 groups (one group is cached in its entirety, one file is cached partially, other files are not cached) turns out to be critical for the constant-gap result. We acknowledge that the difference may be small under Zipf distribution. However, since we are focusing on arbitrary distribution and non-asymptotic case in this paper, capturing this subtle difference is critical for the overall results. From Theorem 1 and Theorem 2, we will show in Section VI that the gap between the lower bound Rlb and the upper bound Rup is bounded by a multiplicative constant. Corollary 1: With K users independently requesting files in F according to the popularity distribution P, as |F| → +∞, the lower bound Rlb and the upper bound Rup can be bounded by Rup ≤ 55Rlb. (8) Thus, the bounds differ at most by a multiplicative factor of 55. As we discuss in the introduction, although this factor may appear to be large, it is the first result in the literature with a constant-factor gap that is independent of the popularity distributions. In contrast, the gap (between upper- and lower-
bounds)estimated by the existing results can be arbitrarily is no larger than large depending on either the number of groups [9],the M 1 number of levels [10],or the parameter of the Zipf distribution K'(1- N1+K证 (9) [13].It is remarkable that such a simple coded caching N scheme,with a very simple choice of Ni,can achieve such a Now,suppose that the individual cache size M is much smaller strong performance guarantee,independently of the popularity than Ni,and the global cache size K'M is much larger than distribution. N(note that this is precisely the regime where coded caching As we mentioned earlier,although both our lower bound and will be most helpful[7).Then,we have 1-÷≈1and achievable bound have some similarity to those in [13],our 1+MM.Thus,the expression in (9)is approximately lower bound is sharper in the order sense because it accounts equal to N/M.The significance of this observation is that this for the contribution of unpopular files,and our achievable approximated expression is independent of K.In other words, bound is also tighter in the order sense because it divides in a suitable regime of interest,the exact popularity of the the files into 3 groups under some parameter settings.We "popular files"does not seem to matter!It is then plausible to believe that such difference is the main reason why we can argue that,even when we reduce the popularity values to attain constant-factor performance guarantees independent of in System 2,there is no substantial change in the lower-bound the popularity distribution,while the performance gap in [13] performance.Of course,this argument needs to be carefully only holds when the file popularity follows a Zipf distribution made.Further,we have to account for not only popular files, and when the system sizes approach infinity,and even then the but also unpopular files.The proofs in the next section will performance gap in [13]still depends on the parameter a of make this intuition precise. Zipf distribution(which may approach infinity as a approaches 1).A brief comparison with [13]is presented in Table I. IV.LOWER BOUND ON THE EXPECTED RATE We also note that,for certain ranges of the exponent a of the Zipf distributions,the performance characterization in [13] In this section,we present the proof of Theorem 1,i.e.,the for RLFU may be tighter than the 55 factor reported in(8). lower bound. Further,both our achievable scheme and RLFU are special The proof consists of two parts.Subsections A-C focus on cases of the larger class of RAP-GCC schemes reported in popular files 1 to Ni,and prove the part that R(K,F,P)> [13].Thus,the results in [13]and in this paper combined (N-M).where Mr =max{M,3).This proof is provide a more complete characterization of the performance composed of 5 steps.From the first step to the fourth one, guarantees for coded caching schemes across both Zipf and we map the original system into a series of reduced systems. non-Zipf distributions. whose information-theoretical rate is strictly smaller than previous ones.Then,we bound the rate needed for popular files in the fourth step.We note that the ideas used in the A.Main Intuition 1st to 4th reduction steps are similar to [9][13].In the fifth step,we introduce a novel "merging"technique in Subsection Before we present the proofs for these main results,we D to account for the unpopular files and prove the part that would like to illustrate the main intuition behind.First,con- R(KF,P)(N+N2-M).Finally,in Subsection sider only the "popular files"1 to N.i.e.,assuming that the E,we deal with the case when N+N2<M,and establish unpopular files N+1 to N are removed.Let us refer to the second term in(6).A brief summary of the constructed this system as "System 1".In our proof,we will consider systems are presented in Table II.As we discussed earlier, an alternate system where the popularity of all popular files while some of the techniques for quantifying the impact of is reduced toWe will refer to this alterate system 1 popular files are similar to [913],our treatment of unpopular as "System 2"(see Section IV-A).Intuitively,the average files is new and is the key reason for the sharper constant- transmission rate in System 2 is no larger than that in System factor characterization in our paper. 1.Further,since all files are with the same popularity in System 2,the average-case and the worst-case performance will not differ too much [9].Thus,one can then use System A.Reduction Steps I 2 2 to derive a lower bound on the average transmission rate, Recall that the set of files is given byF=[F1,F2,...,F} and compare it with an upper bound attained by an achievable and their popularity distribution is given by p scheme pi,p2,..,PN).Next,we will compare to a series of reduced However,the potential problem of this argument is that, systems with different sets of files and popularity distributions. when we reduce the popularity of all popular files to i. Again,let N be the integer defined in Theorem 1. some popularity values could be reduced by several orders of In the first constructed system,the set of files is given by magnitude.It is then unclear why the lower bound derived F={Fo,F1,F2,...,FN},where Fo denotes the empty file, from System 2 is still a reasonable lower bound for System which is introduced for ease of presentation.Its correspond- 1.The intuition behind this insensitivity can be explained as ing popularity distribution is P1={po,P1,p2,...,PN}and follows.Suppose that there are Kusers in System 1 that po-1-P In other words,we replace all unpopular request any of the popular files.Then,according to the result files FN+1,...,FN in the original system by the empty file in [7],the worst-case transmission rate to serve these K'users Fo,and reassign their popularity all to Fo.Intuitively,the
6 bounds) estimated by the existing results can be arbitrarily large depending on either the number of groups [9], the number of levels [10], or the parameter of the Zipf distribution [13]. It is remarkable that such a simple coded caching scheme, with a very simple choice of N1, can achieve such a strong performance guarantee, independently of the popularity distribution. As we mentioned earlier, although both our lower bound and achievable bound have some similarity to those in [13], our lower bound is sharper in the order sense because it accounts for the contribution of unpopular files, and our achievable bound is also tighter in the order sense because it divides the files into 3 groups under some parameter settings. We believe that such difference is the main reason why we can attain constant-factor performance guarantees independent of the popularity distribution, while the performance gap in [13] only holds when the file popularity follows a Zipf distribution and when the system sizes approach infinity, and even then the performance gap in [13] still depends on the parameter α of Zipf distribution (which may approach infinity as α approaches 1). A brief comparison with [13] is presented in Table I. We also note that, for certain ranges of the exponent α of the Zipf distributions, the performance characterization in [13] for RLFU may be tighter than the 55 factor reported in (8). Further, both our achievable scheme and RLFU are special cases of the larger class of RAP-GCC schemes reported in [13]. Thus, the results in [13] and in this paper combined provide a more complete characterization of the performance guarantees for coded caching schemes across both Zipf and non-Zipf distributions. A. Main Intuition Before we present the proofs for these main results, we would like to illustrate the main intuition behind. First, consider only the “popular files” 1 to N1, i.e., assuming that the unpopular files N1 + 1 to N are removed. Let us refer to this system as “System 1”. In our proof, we will consider an alternate system where the popularity of all popular files is reduced to 1 KMr . We will refer to this alternate system as “System 2” (see Section IV-A). Intuitively, the average transmission rate in System 2 is no larger than that in System 1. Further, since all files are with the same popularity in System 2, the average-case and the worst-case performance will not differ too much [9]. Thus, one can then use System 2 to derive a lower bound on the average transmission rate, and compare it with an upper bound attained by an achievable scheme. However, the potential problem of this argument is that, when we reduce the popularity of all popular files to 1 KMr , some popularity values could be reduced by several orders of magnitude. It is then unclear why the lower bound derived from System 2 is still a reasonable lower bound for System 1. The intuition behind this insensitivity can be explained as follows. Suppose that there are K′ users in System 1 that request any of the popular files. Then, according to the result in [7], the worst-case transmission rate to serve these K′ users is no larger than K′ (1 − M N1 ) 1 1 + K′M N1 . (9) Now, suppose that the individual cache size M is much smaller than N1, and the global cache size K′M is much larger than N1 (note that this is precisely the regime where coded caching will be most helpful [7]). Then, we have 1 − M N1 ≈ 1 and 1+ K′M N1 ≈ K′M N1 . Thus, the expression in (9) is approximately equal to N1/M. The significance of this observation is that this approximated expression is independent of K′ . In other words, in a suitable regime of interest, the exact popularity of the “popular files” does not seem to matter! It is then plausible to argue that, even when we reduce the popularity values to 1 KMr in System 2, there is no substantial change in the lower-bound performance. Of course, this argument needs to be carefully made. Further, we have to account for not only popular files, but also unpopular files. The proofs in the next section will make this intuition precise. IV. LOWER BOUND ON THE EXPECTED RATE In this section, we present the proof of Theorem 1, i.e., the lower bound. The proof consists of two parts. Subsections A-C focus on popular files 1 to N1, and prove the part that R(K, F,P) ≥ 1 11Mr (N1 − M), where Mr = max{M, 3}. This proof is composed of 5 steps. From the first step to the fourth one, we map the original system into a series of reduced systems, whose information-theoretical rate is strictly smaller than previous ones. Then, we bound the rate needed for popular files in the fourth step. We note that the ideas used in the 1st to 4th reduction steps are similar to [9][13]. In the fifth step, we introduce a novel “merging” technique in Subsection D to account for the unpopular files and prove the part that R(K, F,P) ≥ 1 11Mr (N1 + N2 − M). Finally, in Subsection E, we deal with the case when N1 + N2 ≤ M, and establish the second term in (6). A brief summary of the constructed systems are presented in Table II. As we discussed earlier, while some of the techniques for quantifying the impact of popular files are similar to [9][13], our treatment of unpopular files is new and is the key reason for the sharper constantfactor characterization in our paper. A. Reduction Steps 1 & 2 Recall that the set of files is given by F = {F1, F2, ..., FN } and their popularity distribution is given by P = {p1, p2, ..., pN }. Next, we will compare to a series of reduced systems with different sets of files and popularity distributions. Again, let N1 be the integer defined in Theorem 1. In the first constructed system, the set of files is given by F1 = {F0, F1, F2, ..., FN1 }, where F0 denotes the empty file, which is introduced for ease of presentation. Its corresponding popularity distribution is P1 = {p0, p1, p2, ..., pN1 } and p0 = 1 − PN1 i=1 pi . In other words, we replace all unpopular files FN1+1, ..., FN in the original system by the empty file F0, and reassign their popularity all to F0. Intuitively, the
TABLE II:Brief introduction of constructed systems satisfying property (a).We then map Wi to another request Systems Definition W:,such that the first N elements in W;remain the same as Map the“unpopular”files from N+1to in W:,while other elements are mapped to the empty file.Let System 1 N to a virtual empty file with their sum be the transmission rate to serve the request pattern W.. probability. By such a construction,we now verify property (b)holds. Reduce the popularity of files F1...FN to We wish to show that the probability with which the mapped System 2 K.Some users may request a same file. request patterns is W!={f1,f2...fk},where fk is the With probability (K1),all users request file requested by user k,is the same as the probability with the empty file.With probability 1-(K1), which the same pattern Wi=[f1,f2,...,fr}is requested in System 3 W(K,F1,P1).We first fix a user k and compare the probabili- exactly K3 users request K3 distinct files, ty that user k requests file fk.If f E [F1,F2,...,FN},since and other users request empty files. we did not change their popularity,the probability pw(f) In each request pattern,exactly K3 users request K3 distinct files,and other users that user k requests fk in the mapped request pattern is the System 4 request empty files. same as the probability pw,(f)that user k requests fr in W(K,F1,P1).If fr=0,according to our construction,the (K,F3,P3) Reduce the the popularity of files F1....,FN probability pw(f)that user k requests fr in the mapped in(K,F,P)to立 request pattern is equal to the probability that user k re- (K,F4,P) Merge the unpopular files into virtual files quests any file in [FN+1,...,FN}in W(K,F,P).which is with popularity 2KM and KM,~pNa≥1,which is a contra- W(K,F1,P1),respectively.Therefore,r and r'cannot be diction to1.Define a new popularity distribution directly compared in a pointwise manner,e.g.,the number P2=(1-ki Kit:Kht:Kir}over the files F1.In of request patterns in W(K,F,P)is larger than that in other words,compared to (K,F1,P1),in this new system, W(K,F1,P1). each non-empty file is requested with a smaller probability In the following,we will show that r'sr,i.e.,r'is KMIntuitively,its expected transmission rate should be even stochastic dominated by r.To do so,we will use Theorem 3.1 lower,which is stated below. in [19].Specifically,we need to find two coupled variablesf Lemma 2:Let R(K,F1,P2)be the minimum expected rate and r with the following properties: required to meet the requests by K users,each of which (a)r and f have the same distribution; randomly requests a file in Fi according to the popularity (b)r'and r'have the same distribution; distribution P2.We have (c)r≥ralmost surely.. Then,applying Theorem 3.1 in [19].we can conclude that R(K,F1,P1)>R(K,F1,P2). (11) r'E(r)=E(r'). This lemma is similar to the uniformization argument in It remains to show the existence of and which are the proof of Claim 2 in [9].The proof can also use a similar constructed as follows.For every WiE W(K,F,P),we let coupling idea as in Lemma 1.For completeness,we provide r=r and thus they must have exactly the same distribution, the proof in the appendix
7 TABLE II: Brief introduction of constructed systems Systems Definition System 1 Map the “unpopular” files from N1 + 1 to N to a virtual empty file with their sum probability. System 2 Reduce the popularity of files F1,..., FN1 to 1 KMr . Some users may request a same file. System 3 With probability π(K1), all users request the empty file. With probability 1 − π(K1), exactly K3 users request K3 distinct files, and other users request empty files. System 4 In each request pattern, exactly K3 users request K3 distinct files, and other users request empty files. (K, F3,P3) Reduce the the popularity of files F1,..., FN1 in (K, F,P) to 1 KMr . (K, F4,P4) Merge the unpopular files into virtual files with popularity ≥ 1 KMr and KMr · pN1 ≥ 1, which is a contradiction to PN i=1 pi = 1. Define a new popularity distribution P2 = {1 − N1 KMr , 1 KMr , 1 KMr , ..., 1 KMr } over the files F1. In other words, compared to (K, F1,P1), in this new system, each non-empty file is requested with a smaller probability 1 KMr . Intuitively, its expected transmission rate should be even lower, which is stated below. Lemma 2: Let R(K, F1,P2) be the minimum expected rate required to meet the requests by K users, each of which randomly requests a file in F1 according to the popularity distribution P2. We have R(K, F1,P1) ≥ R(K, F1,P2). (11) This lemma is similar to the uniformization argument in the proof of Claim 2 in [9]. The proof can also use a similar coupling idea as in Lemma 1. For completeness, we provide the proof in the appendix
With Lemma 1 and Lemma 2,we have proved that have K1≤L」,and R(K,F,P)>R(K,F1,P2).In the following analysis for the first part of Theorem 1,we will focus on R(K,F1,P2).Note P(K2K)≥2 (13) that the system (K,F1,P2)is precisely the "System 2"that we discussed in Section III-A.Next,we will derive a lower This follows from the result in [17],which shows that any bound on the average transmission rate of System 2,which median must lie in the interval [np],[np]],for a binomial also provides a lower bound on R(K,F,P).We will derive distribution B(n,p) this lower bound on the average transmission rate of System 2 by relating it to a lower bound on the worst-case transmission In other words,with probability no less than 0.5,no less rate.Note that since all files have equal popularity in System 2, than K users request non-empty files.Still,some of these the fact that its average transmission rate is at most a constant K users may request a common file.Next,we are interested factor away from its worst-case transmission rate is in fact in the number of distinct files that are requested.Denote this known from the results in [9]and [131.For example,we number as Kd. can obtain a lower bound on the average transmission rate Lemma 4:Given that there are K users requesting non- of System 2 from Theorem 2 in [9]by choosing c=1, empty files,the probability that Ka(i.e.,the number of distinct N=Ni there and by choosing K in [9]as the number files requested)is no smaller than min],K]}is of users requesting popular files.However,the lower bound greater than or equal to 0.91. derived in this way involves an expectation over Ki.Since later we will use System 2 again to deal with unpopular files, Proof:Clearly,we only need to consider K4,consider only those Kr users requesting non- techniques below and in Sections IV-B to IV-C are also similar empty files.Each user requests one file from the No non-empty to [913].although here we exploit the fact that pN to obtain the 1/11 factor (see Proposition 1 below). files uniformly randomly and independently.There are N possible request patterns for the Kr users,each of which is Towards this end,rather than only proving the lower bound equally likely.For some of these request patterns,the number for System 2.we instead prove a more general result that will of distinct files are smaller than K2K.The number of be used again in Subsections C-E.Specifically,consider a sys- such request patterns must be smaller than CKTo see tem(K,F P).Its file set isF={F,F,...,FNo},and the this,note that the first term is the number of ways to choose corresponding popularity distribution is p={po,p.),K2 files from the No non-empty files.The second term is the wherep0.Note that (K,F,P') number of ways that each user can choose one of the k2 files. becomes the"System 2"when No=Ni and p=K We thus have Proposition 1:With K users requesting files independently in F according to the corresponding popularity distribution P(Kd≤K2lKr)0.Therefore, users who request non-empty files.All Ii are i.i.d.distributed with mean Nop.The probability distribution for Kr is given No No-K2 K2 NogKa.Ka by No-K2 =(1+ No-K2 P(K =K1)=CK (Nop)K:(1-Nop)K-K: Recall that at the beginning of the proof,we have restricted our attention to K,≤K1andK,≥4.Since K2≌L2K,J, Lemma3:Define K1≌KNop.Since p≤t,we we have2≤K2≤是Kr.Hence,using Lemma3,we have
8 With Lemma 1 and Lemma 2, we have proved that R(K, F,P) ≥ R(K, F1,P2). In the following analysis for the first part of Theorem 1, we will focus on R(K, F1,P2). Note that the system (K, F1,P2) is precisely the “System 2” that we discussed in Section III-A. Next, we will derive a lower bound on the average transmission rate of System 2, which also provides a lower bound on R(K, F,P). We will derive this lower bound on the average transmission rate of System 2 by relating it to a lower bound on the worst-case transmission rate. Note that since all files have equal popularity in System 2, the fact that its average transmission rate is at most a constant factor away from its worst-case transmission rate is in fact known from the results in [9] and [13]. For example, we can obtain a lower bound on the average transmission rate of System 2 from Theorem 2 in [9] by choosing c = 1, Nl = N1 there and by choosing Kl in [9] as the number of users requesting popular files. However, the lower bound derived in this way involves an expectation over Kl . Since later we will use System 2 again to deal with unpopular files, we wish to obtain a lower bound that is a function of the total number of users K. The following derivation accounts for such technical details, and at the same time yields the factor 1/11 in (6). (In contrast, an explicit expression for such a factor is not provided in [13]). We note that some of the reduction techniques below and in Sections IV-B to IV-C are also similar to [9][13], although here we exploit the fact that pN1 ≈ 1 KMr to obtain the 1/11 factor (see Proposition 1 below). Towards this end, rather than only proving the lower bound for System 2, we instead prove a more general result that will be used again in Subsections C-E. Specifically, consider a system (K, F ′ ,P ′ ). Its file set is F ′ = {F0, F1, ..., FN0 }, and the corresponding popularity distribution is P ′ = {p0, p, ..., p}, where p ≤ 1 KMr and p0 = 1−N0p ≥ 0. Note that (K, F ′ ,P ′ ) becomes the “System 2” when N0 = N1 and p = 1 KMr . Proposition 1: With K users requesting files independently in F ′ according to the corresponding popularity distribution P ′ , the lower bound on the expected transmission is R(K, F ′ ,P ′ ) ≥ 1 11 Kp · (N0 − M), (12) for any M ≥ 0. From now till Subsection C, we will prove Proposition 1. Note that in the system (K, F ′ ,P ′ ), it is possible that some file is requested by multiple users. In Section IV-B, we will reduce it to the third system where every non-empty requested file is requested exactly once. Towards that end, we first characterize the number of distinct files requested in system (K, F ′ ,P ′ ). For a given system setting (K, F ′ ,P ′ ), let Ii = 1 if user i requests a non-empty file, and let Ii = 0 if user i requests the empty file. Denote Kr = PK i=1 Ii . Then, Kr is the number of users who request non-empty files. All Ii are i.i.d. distributed with mean N0p. The probability distribution for Kr is given by P(Kr = K1) = C K1 K (N0p) K1 (1 − N0p) K−K1 . Lemma 3: Define K1 , ⌊KN0p⌋. Since p ≤ 1 KMr , we have K1 ≤ ⌊ N0 Mr ⌋, and P(Kr ≥ K1) ≥ 1 2 . (13) This follows from the result in [17], which shows that any median must lie in the interval [⌊np⌋, ⌈np⌉], for a binomial distribution B(n, p). In other words, with probability no less than 0.5, no less than K1 users request non-empty files. Still, some of these K1 users may request a common file. Next, we are interested in the number of distinct files that are requested. Denote this number as Kd. Lemma 4: Given that there are Kr users requesting nonempty files, the probability that Kd (i.e., the number of distinct files requested) is no smaller than min{⌊ 1 2Kr⌋, ⌊ 1 2K1⌋} is greater than or equal to 0.91. Proof: Clearly, we only need to consider Kr ≤ K1 (because a larger value of Kr only increases the number of distinct files). When Kr = 1, 2, or 3, we have that ⌊ 1 2Kr⌋ equals 0 or 1. In this case, it is easy to see that this lemma holds, since there must be at least one distinct file requested. For Kr ≥ 4, consider only those Kr users requesting nonempty files. Each user requests one file from the N0 non-empty files uniformly randomly and independently. There are N Kr 0 possible request patterns for the Kr users, each of which is equally likely. For some of these request patterns, the number of distinct files are smaller than K2 , ⌊ 1 2Kr⌋. The number of such request patterns must be smaller than C K2 N0 ·K Kr 2 . To see this, note that the first term is the number of ways to choose K2 files from the N0 non-empty files. The second term is the number of ways that each user can choose one of the K2 files. We thus have P(Kd ≤ K2|Kr) 0. Therefore, N0 N0 − K2 N0−K2 = 1 + K2 N0 − K2 N0−K2 K2 ·K2 ≤ e K2 . Recall that at the beginning of the proof, we have restricted our attention to Kr ≤ K1 and Kr ≥ 4. Since K2 , ⌊ 1 2Kr⌋, we have 2 ≤ K2 ≤ 1 2Kr. Hence, using Lemma 3, we have
9 K,≤Ki≤l」≤3No.Therefore,.we have It remains to show that,if W;is chosen according to the PK≤KK0.91. ■ Rs(K,F,P)≥Rs(K,W3) (16) B.Reduction Step 3 and the result then follows. ■ Combing Lemma 3 and Lemma 4,we can show that,with probability no less than 0.455,the number of distinct files C.Reduction Step 4 the Lower Bound for Popular files requested is no smaller than K1].We now perform the We now consider the 4th system W4.In this system,there third reduction.For W(K,F,P'),we divide all the possible are always K3 users requesting K3 distinct non- request patterns into two subsets.The first subset contains empty files and all the other users request the empty file.We those request pattems such that the number of users requesting have distinct files is no smaller than K3.The other request Lemma 6:R(K,W3)=(1-(K1))R(K,Wa). patterns constitute the second subset.The sum probability for Proof:Recall that in the third system W3(K3,K1),with the second subset is denoted as (K1). probability (K1)the users all request the empty file,which We then construct the third system as follows:in the third requires zero rate.With probability 1-(K1),exactly K3 system,with probability (K1)the users will all request the users request exactly K3 distinct files.Each such pattern empty file.With probability 1-(K1),exactly K3 users will request exactly K3 distinct non-empty files from F1,...,FNo. cthprobabity爱器In the fourh ysem网a. all request patterns have exactly K3 users requesting exactly and all other users will request the empty file.Note that there K3 distinct files,and each pattern occurs with probability are exactly CA request patterns where exactly Ka users request K3 distinct non-empty files.We let each such request Therefore,the rate needed for the third systes pattern occur with equal probability 1-元(K1 linear combination of zero rate (with portion (K1))and the rate needed for the fourth system (with portion 1-(K1)). Let this third system be denoted by W3(K3,K1),and let The lemma then follows. ■ R(K,W3)be the corresponding minimum expected transmis- Next we focus on the system W4. sion rate.Then.we have the following lemma. Let Hi.i=1,2.ca be the CK choices of picking Lemma 5: K3 users out of the K users.In system W4,if in a request R(K,F,P)≥R(K,W3) (14)Wj,the users requesting distinct non-empty files are exactly in Hi,we denote it by WiE Hi.In other words,Hi is the set Proof:The proof uses coupling [19].For every Wie of request patterns in which the users requesting non-empty W(K,Fp).map it to a random Wi EW3(K3,K1)as files is exactly the same as in Hi.Note that there are AN= follows.If the number of users requesting non-empty files in No! (KKa such patterns in each Hi.We have the following Wi is less than K1,or the number of distinct non-empty files result. requested is less than K3≌L」,then in W:all users request Lemma 7:Consider systems W4 where there are always the empty file.Otherwise,we perform the mapping described exactly K3 users requesting distinct files in F and the below. other K-K3 users request the empty file.For any Hi. For every remaining W;with K>Ka,we conduct the i=1,2,.C,the following holds, following splitting procedure. For each non-empty file that is requested by some users, ∑(K,W)≥A 0」K3-KM randomly choose one user requesting it.Note that there (17) WEH 兴 are Kd such chosen users. Among the chosen users,randomly choose K3 of them. Note that Lemma 7 immediately implies that These K3 users now request distinct non-empty files,and we let all other users request the empty file. R(K,W4)≥ 」K-KM (18) Given any cache placement and transmission scheme 3, 岩 since W requests a subset of the files in Wi,we have Proof:Without loss of generality,suppose that Hi= rs(K,W)≥Ts(K,W): (15){1,1,...,1,0,0,...,0}.In other words,user 1,2.....K3 are
9 Kr ≤ K1 ≤ ⌊ N0 Mr ⌋ ≤ 1 3N0. Therefore, we have P(Kd ≤ K2|Kr) K2|Kr) = 1 − P(Kd ≤ K2|Kr) > 0.91. B. Reduction Step 3 Combing Lemma 3 and Lemma 4, we can show that, with probability no less than 0.455, the number of distinct files requested is no smaller than ⌊ 1 2K1⌋. We now perform the third reduction. For W(K, F ′ ,P ′ ), we divide all the possible request patterns into two subsets. The first subset contains those request patterns such that the number of users requesting distinct files is no smaller than K3 , ⌊ K1 2 ⌋. The other request patterns constitute the second subset. The sum probability for the second subset is denoted as π(K1). We then construct the third system as follows: in the third system, with probability π(K1) the users will all request the empty file. With probability 1 − π(K1), exactly K3 users will request exactly K3 distinct non-empty files from F1, ..., FN0 , and all other users will request the empty file. Note that there are exactly C K3 K A K3 N0 request patterns where exactly K3 users request K3 distinct non-empty files. We let each such request pattern occur with equal probability 1−π(K1) C K3 K A K3 N0 . Let this third system be denoted by W3(K3, K1), and let R(K, W3) be the corresponding minimum expected transmission rate. Then. we have the following lemma. Lemma 5: R(K, F ′ ,P ′ ) ≥ R(K, W3). (14) Proof: The proof uses coupling [19]. For every Wi ∈ W(K, F ′ ,P ′ ), map it to a random W ′ i ∈ W3(K3, K1) as follows. If the number of users requesting non-empty files in Wi is less than K1, or the number of distinct non-empty files requested is less than K3 , ⌊ K1 2 ⌋, then in W ′ i all users request the empty file. Otherwise, we perform the mapping described below. For every remaining Wi with Kd ≥ K3, we conduct the following splitting procedure. • For each non-empty file that is requested by some users, randomly choose one user requesting it. Note that there are Kd such chosen users. • Among the chosen users, randomly choose K3 of them. These K3 users now request distinct non-empty files, and we let all other users request the empty file. Given any cache placement and transmission scheme F, since W ′ i requests a subset of the files in Wi , we have rF(K, Wi) ≥ rF(K, W ′ i ). (15) It remains to show that, if Wi is chosen according to the distribution of W(K, F1,P2), then the resulting W ′ i has the same distribution as that of request patterns in W3(K3, K1). To see this, note that the probability with which W′ i requests no empty files is exactly 1−π(K1). Further, due to symmetry on the files and the users in W(K, F1,P2), along with the symmetry of our mapping, each pattern W ′ i that requests nonempty files must occur with equal probability, i.e., there are C K3 K A K3 N0 request patterns each of equal probability, that sums to the probability 1 − π(K1). We then conclude that each W′ i occurs with the same probability as in W3(K3, K1). Thus, with the coupling method [19], we have RF(K, F ′ ,P ′ ) ≥ RF(K, W3). (16) and the result then follows. C. Reduction Step 4 & the Lower Bound for Popular files We now consider the 4th system W4. In this system, there are always K3 , ⌊ K1 2 ⌋ users requesting K3 distinct nonempty files and all the other users request the empty file. We have Lemma 6: R(K, W3) = (1 − π(K1))R(K, W4). Proof: Recall that in the third system W3(K3, K1), with probability π(K1) the users all request the empty file, which requires zero rate. With probability 1 − π(K1), exactly K3 users request exactly K3 distinct files. Each such pattern occurs with probability 1−π(K1) C K3 K A K3 N0 . In the fourth system W4, all request patterns have exactly K3 users requesting exactly K3 distinct files, and each pattern occurs with probability 1 C K3 K A K3 N0 . Therefore, the rate needed for the third system is a linear combination of zero rate (with portion π(K1)) and the rate needed for the fourth system (with portion 1 − π(K1)). The lemma then follows. Next we focus on the system W4. Let Hi , i = 1, 2, ..., CK3 K be the C K3 K choices of picking K3 users out of the K users. In system W4, if in a request Wj , the users requesting distinct non-empty files are exactly in Hi , we denote it by Wj ∈ Hi . In other words, Hi is the set of request patterns in which the users requesting non-empty files is exactly the same as in Hi . Note that there are A K3 N0 = N0! (K−K3)! such patterns in each Hi . We have the following result. Lemma 7: Consider systems W4 where there are always exactly K3 users requesting distinct files in F ′ and the other K − K3 users request the empty file. For any Hi , i = 1, 2, ..., CK3 K , the following holds, X Wj∈Hi rF(K, Wj ) ≥ A K3 N0 · N0 K3 K3 − K3M N0 K3 . (17) Note that Lemma 7 immediately implies that R(K, W4) ≥ N0 K3 K3 − K3M N0 K3 . (18) Proof: Without loss of generality, suppose that Hi = {1, 1, ..., 1, 0, 0, ..., 0}. In other words, user 1, 2,..., K3 are
10 requesting distinct non-empty files.Each user has a cache,la- f(K3)≥f(1) beled Mi,M2,...,MKa,each of which has a common storage size M. =1总 There are No!permutations for the No files.For each per- mutation,we split it into subgroups,each with K3 files. =G%-M 1 (23) Denote r(i,j)as the rate needed to meet the users'requests if their request pattern is the same as the j-th subgroup in the ≥5K(%-M0 i-th permutation,i.e.,when the k-th user requests the k-th file Next,we consider the case KNop 5.By definition,we in the subgroup,k =1,2,...,K3. have For each permutation i,consider all the sub-groups (i.e., request patterns)as a whole.Recall that the cache con- K= tent is fixed when these request patterns vary.Consider a feasible cache placement and transmission scheme 3. 2K1- (since Ki is an integer) Based on the cached content Mi,...,MKa.and the transmis- 1 sions from the server for each request pattern (with rates =KNop]-2 (24) r(i,1),.r(i,).respectively),the Ka users together 1 must be able to reconstruct all K files.Hence. ≥2(Kop-1)-2 2KVop-1≥0.3KN0n, Ka ∑r6,)+∑M≥K No (19) where in the last step we have used KNop >5.Since M >3. =1 k=1 we have Summarizing over all No!permutations,we have f(K3)=K3- K3M No! r(位,)≥ K3-K3M )No!. (20) M ≥K3 1一 -1 K Note that there are AN request patterns Wji,while M (25) 2K3 1- (using Kp ≥M,≥3) there areNo!subgroups among all the No!permuta- tions.By symmetry2,each Wi E Hi appears an equal number of times in these subgroups.Hence,the number of times ≥05KNop-)(-3MKp) each Wi EHi appears in the summation in Equation(20) isN Hence. ≥5p(No-M0, where the last inequality is due to ∑w,emr(K,W) NaL等」 (05KNop-1).(1-MKp)-Kp(No-M) A rs(i,j) ·No 1 1 21 20.5kop-1)号-言n(o-0 2 」No Nol( No K3-K3) ≥品KMp-号+写KpM 21 Ka-KaM >0, (26) 兴 where the first inequality is due to KMp5 and KMp >0. We therefore conclude this lemma. ◆ Using both (23)and (25)into (22),we conclude that the Denote the right hand side of Equation(18)by f(K3).From minimum expected rate needed for W(K,F,P)is bounded Lemmas 5 and 6,the minimum expected rate can be bounded by by 1 R(K,F,P)≥R(K,W3) R(K,F,P)≥0.455·5po-M+ (27) =(1-π(K1)·R(K,W4) (22) 1 ≥(1-π(K1)f(K3). ≥立po-M+ Therefore,we have proved Proposition 1. Recall that K1≌LKNop]≤L0」and K3≌L号K小.We Applying Proposition 1 to System 2(K,F1,P2)(i.e.,with now consider two cases. No=Ni and p=),we immediately have the following If KNop≤5,it is easy to verify that lower bound for the expected transmission rate of System 2: 2 We note that this is similar to the symmetrization argument in[9. RKP≥i-AM (28)
10 requesting distinct non-empty files. Each user has a cache, labeled M1, M2, ..., MK3 , each of which has a common storage size M. There are N0! permutations for the N0 files. For each permutation, we split it into ⌊ N0 K3 ⌋ subgroups, each with K3 files. Denote r(i, j) as the rate needed to meet the users’ requests if their request pattern is the same as the j-th subgroup in the i-th permutation, i.e., when the k-th user requests the k-th file in the subgroup, k = 1, 2, ..., K3. For each permutation i, consider all the sub-groups (i.e., request patterns) as a whole. Recall that the cache content is fixed when these request patterns vary. Consider a feasible cache placement and transmission scheme F. Based on the cached content M1, ..., MK3 , and the transmissions from the server for each request pattern (with rates r(i, 1), ..., r(i, ⌊ N0 K3 ⌋), respectively), the K3 users together must be able to reconstruct all K3 · ⌊ N0 K3 ⌋ files. Hence, N0 K3 X j=1 rF(i, j) +X K3 k=1 Mk ≥ K3 · N0 K3 . (19) Summarizing over all N0! permutations, we have X N0! i=1 N0 K3 X j=1 rF(i, j) ≥ N0 K3 · K3 − K3M · N0!. (20) Note that there are A K3 N0 request patterns Wj ∈ Hi , while there are N0 K3 · N0! subgroups among all the N0! permutations. By symmetry2 , each Wj ∈ Hi appears an equal number of times in these subgroups. Hence, the number of times each Wj ∈ Hi appears in the summation in Equation (20) is N0 K3 ·N0! A K3 N0 . Hence, P Wj∈Hi rF(K, Wj ) A K3 N0 = 1 N0 K3 · N0! · X N0! i=1 N0 K3 X j=1 rF(i, j) ≥ 1 N0 K3 · N0! · N0!( N0 K3 K3 − K3M) = N0 K3 K3 − K3M N0 K3 . (21) We therefore conclude this lemma. Denote the right hand side of Equation (18) by f(K3). From Lemmas 5 and 6, the minimum expected rate can be bounded by R(K, F ′ ,P ′ ) ≥ R(K, W3) = (1 − π(K1)) · R(K, W4) ≥ (1 − π(K1))f(K3). (22) Recall that K1 , ⌊KN0p⌋ ≤ ⌊ N0 Mr ⌋ and K3 , ⌊ 1 2K1⌋. We now consider two cases. If KN0p ≤ 5, it is easy to verify that 2 We note that this is similar to the symmetrization argument in [9]. f(K3) ≥ f(1) = 1 − M N0 = 1 N0 (N0 − M) ≥ 1 5 Kp(N0 − M). (23) Next, we consider the case KN0p > 5. By definition, we have K3 = 1 2 K1 ≥ 1 2 K1 − 1 2 (since K1 is an integer) = 1 2 ⌊KN0p⌋ − 1 2 ≥ 1 2 (KN0p − 1) − 1 2 = 1 2 KN0p − 1 ≥ 0.3KN0p, (24) where in the last step we have used KN0p > 5. Since Mr ≥ 3, we have f(K3) = K3 − K3M N0 K3 ≥ K3 1 − M N0 K3 − 1 ≥ K3 1 − M 10 3Kp − 1 ≥ K3 1 − M 3 Kp (using 1 Kp ≥ Mr ≥ 3) ≥ (0.5KN0p − 1) · 1 − 1 3 MKp ≥ 1 5 Kp (N0 − M), (25) where the last inequality is due to (0.5KN0p − 1) · 1 − 1 3 MKp − 1 5 Kp (N0 − M) ≥ (0.5KN0p − 1) 2 3 − 1 5 Kp(N0 − M) ≥ 2 15 KN0p − 2 3 + 1 5 KpM > 0, (26) where the first inequality is due to KM p ≤ 1, and the third inequality is due to KN0p > 5 and KM p ≥ 0. Using both (23) and (25) into (22), we conclude that the minimum expected rate needed for W(K, F,P) is bounded by R(K, F ′ ,P ′ ) ≥ 0.455 · 1 5 Kp [N0 − M]+ ≥ 1 11 Kp [N0 − M]+ . (27) Therefore, we have proved Proposition 1. Applying Proposition 1 to System 2 (K, F1,P2) (i.e., with N0 = N1 and p = 1 KMr ), we immediately have the following lower bound for the expected transmission rate of System 2: R(K, F1,P2) ≥ 1 11Mr (N1 − M). (28)