计算机科学与技术（参考文献）Burstiness-Aware Resource Reservation for Server Consolidation in Computing Clouds.pdf_大学文库

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Burstiness-Aware Resource Reservation for Server Consolidation in Computing Clouds Sheng Zhang,Member,IEEE,Zhuzhong Qian,Member,IEEE,Zhaoyi Luo,Jie Wu,Fellow,IEEE,and Sanglu Lu,Member,IEEE Abstract-In computing clouds,burstiness of a virtual machine(VM)workload widely exists in real applications,where spikes usually occur aperiodically with low frequency and short duration.This could be effectively handled through dynamically scaling up/down in a virtualization-based computing cloud;however,to minimize energy consumption,VMs are often highly consolidated with the minimum number of physical machines(PMs)used.In this case,to meet the dynamic runtime resource demands of VMs in a PM,some VMs have to be migrated to some other PMs,which may cause potential performance degradation.In this paper,we investigate the burstiness- aware server consolidation problem from the perspective of resource reservation,i.e.,reserving a certain amount of extra resources on each PM to avoid live migrations,and propose a novel server consolidation algorithm,QUEUE.We first model the resource requirement pattern of each VM as a two-state Markov chain to capture burstiness,then we design a resource reservation strategy for each PM based on the stationary distribution of a Markov chain.Finally,we present QUEUE,a complete server consolidation algorithm with a reasonable time complexity.We also show how to cope with heterogenous spikes and provide remarks on several extensions. Simulation and testbed results show that,QUEUE improves the consolidation ratio by up to 45%with large spike size and around 30%with normal spike size compared with the strategy that provisions for peak workload,and achieves a better balance between performance and energy consumption in comparison with other commonly-used consolidation algorithms. Index Terms-Bursty workload,Markov chain,resource reservation,server consolidation,stationary distribution 1 INTRODUCTION We observed that the variability and burstiness of VM LOUD computing has been gaining more and more workload widely exists in modern computing clouds, traction in the past few years,and it is changing as evidenced in prior studies [4],[6],[7],[8],[9].Take the way we access and retrieve information [1].The a typical web server for example,burstiness may be recent emergence of virtual desktop [2]has further el- caused by flash crowed with bursty incoming requests. evated the importance of computing clouds.As a cru- We all know that VMs should be provisioned with cial technique in modern computing clouds,virtualiza- resources commensurate with their workload require- tion enables one physical machine (PM)to host many ments [10],which becomes more complex when con- performance-isolated virtual machines(VMs).It greatly sidering workload variation.As shown in Fig.1,two benefits a computing cloud where VMs running various kinds of resource provisioning strategies are commonly applications are aggregated together to improve resource used to deal with workload burstiness-provisioning for utilization.It has been shown in previous work [3] peak workload and provisioning for normal workload. that,the cost of energy consumption,e.g.,power supply, Provisioning for peak workload is favourable to VM and cooling,occupies a significant fraction of the total performance guarantee,but it undermines the advantage operating costs in a cloud.Therefore,making optimal of elasticity from virtualization and may lead to low utilization of underlying resources to reduce the energy resource utilization [1],[8],[9]. consumption is becoming an important issue [4],[5].To In contrast,provisioning for normal workload makes cut back the energy consumption in clouds,server con- use of elasticity in cloud computing.In this case,to solidation is proposed to tightly pack VMs to reduce the meet the dynamic resource requirements of VMs,local number of running PMs;however,VMs'performance resizing and live migration are the two pervasively-used may be seriously affected if VMs are not appropriately methods.Local resizing adaptively adjusts VM configu- placed,especially in a highly consolidated cloud. ration according to the real-time resource requirement with negligible time and computing overheads [11].On the other hand,live migration moves some VM(s)to a .S.Zhang.Z.Z.Qian,and S.L.Lu are with the State Key Laboratory for Novel Software Technology,Nanjing University,Nanjing,210023,China. relatively idle PM,when local resizing is not able to E-mail:{sheng,qzz,sanglu@nju.edu.cn. allocate enough resources.However,in a highly con- .Z.Y.Luo is with the David Cheriton School of Computer Science,Univer- solidated computing cloud where resource contention sity of Waterloo,Waterloo,N2L3G1,Canada. E-mail:zhaoyi.luo@uwaterloo.ca is generally prominent among VMs,live migration may .J.Wu is with the Department of Computer and Information Sciences, cause significant service downtime;furthermore,it also Temple University,Philadelphia,PA 19122,USA. incurs noticeable CPU usage on the host PM [12],which E-mail:jiewu@temple.edu. probably degrades the co-located VMs'performance 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 1 Burstiness-Aware Resource Reservation for Server Consolidation in Computing Clouds Sheng Zhang, Member, IEEE, Zhuzhong Qian, Member, IEEE, Zhaoyi Luo, Jie Wu, Fellow, IEEE, and Sanglu Lu, Member, IEEE Abstract—In computing clouds, burstiness of a virtual machine (VM) workload widely exists in real applications, where spikes usually occur aperiodically with low frequency and short duration. This could be effectively handled through dynamically scaling up/down in a virtualization-based computing cloud; however, to minimize energy consumption, VMs are often highly consolidated with the minimum number of physical machines (PMs) used. In this case, to meet the dynamic runtime resource demands of VMs in a PM, some VMs have to be migrated to some other PMs, which may cause potential performance degradation. In this paper, we investigate the burstinessaware server consolidation problem from the perspective of resource reservation, i.e., reserving a certain amount of extra resources on each PM to avoid live migrations, and propose a novel server consolidation algorithm, QUEUE. We first model the resource requirement pattern of each VM as a two-state Markov chain to capture burstiness, then we design a resource reservation strategy for each PM based on the stationary distribution of a Markov chain. Finally, we present QUEUE, a complete server consolidation algorithm with a reasonable time complexity. We also show how to cope with heterogenous spikes and provide remarks on several extensions. Simulation and testbed results show that, QUEUE improves the consolidation ratio by up to 45% with large spike size and around 30% with normal spike size compared with the strategy that provisions for peak workload, and achieves a better balance between performance and energy consumption in comparison with other commonly-used consolidation algorithms. Index Terms—Bursty workload, Markov chain, resource reservation, server consolidation, stationary distribution ✦ 1 INTRODUCTION C LOUD computing has been gaining more and more traction in the past few years, and it is changing the way we access and retrieve information [1]. The recent emergence of virtual desktop [2] has further elevated the importance of computing clouds. As a crucial technique in modern computing clouds, virtualization enables one physical machine (PM) to host many performance-isolated virtual machines (VMs). It greatly benefits a computing cloud where VMs running various applications are aggregated together to improve resource utilization. It has been shown in previous work [3] that, the cost of energy consumption, e.g., power supply, and cooling, occupies a significant fraction of the total operating costs in a cloud. Therefore, making optimal utilization of underlying resources to reduce the energy consumption is becoming an important issue [4], [5]. To cut back the energy consumption in clouds, server consolidation is proposed to tightly pack VMs to reduce the number of running PMs; however, VMs’ performance may be seriously affected if VMs are not appropriately placed, especially in a highly consolidated cloud. • S. Zhang, Z.Z. Qian, and S.L. Lu are with the State Key Laboratory for Novel Software Technology, Nanjing University, Nanjing, 210023, China. E-mail: {sheng, qzz, sanglu}@nju.edu.cn. • Z.Y. Luo is with the David Cheriton School of Computer Science, University of Waterloo, Waterloo, N2L3G1, Canada. E-mail: zhaoyi.luo@uwaterloo.ca • J. Wu is with the Department of Computer and Information Sciences, Temple University, Philadelphia, PA 19122, USA. E-mail: jiewu@temple.edu. We observed that the variability and burstiness of VM workload widely exists in modern computing clouds, as evidenced in prior studies [4], [6], [7], [8], [9]. Take a typical web server for example, burstiness may be caused by flash crowed with bursty incoming requests. We all know that VMs should be provisioned with resources commensurate with their workload requirements [10], which becomes more complex when considering workload variation. As shown in Fig. 1, two kinds of resource provisioning strategies are commonly used to deal with workload burstiness—provisioning for peak workload and provisioning for normal workload. Provisioning for peak workload is favourable to VM performance guarantee, but it undermines the advantage of elasticity from virtualization and may lead to low resource utilization [1], [8], [9]. In contrast, provisioning for normal workload makes use of elasticity in cloud computing. In this case, to meet the dynamic resource requirements of VMs, local resizing and live migration are the two pervasively-used methods. Local resizing adaptively adjusts VM configuration according to the real-time resource requirement with negligible time and computing overheads [11]. On the other hand, live migration moves some VM(s) to a relatively idle PM, when local resizing is not able to allocate enough resources. However, in a highly consolidated computing cloud where resource contention is generally prominent among VMs, live migration may cause significant service downtime; furthermore, it also incurs noticeable CPU usage on the host PM [12], which probably degrades the co-located VMs’ performance

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR (1)To the best of our knowledge,we are the first to quantify the amount of reserved resources with consideration of workload burstiness.We propose to use the two-state Markov chain model to capture workload burstiness,and we present a formal prob- lem description and its NP-completeness. (2) We develop a novel algorithm,QUEUE,for burstiness-aware resource reservation,based on the Fig.1.An example of workload with bursty spikes. stationary distribution of a Markov chain.We also show how to cope with heterogeneous spikes to further improve the performance of QUEUE. In this paper,we propose to reserve some extra (3) Extensive simulations and testbed experiments are resources on each PM to accommodate bursty work- conducted to validate the effectiveness and advan- load [13].In doing so,when a resource spike occurs, tages of QUEUE. VMs can be quickly reconfigured to the new level of We now continue by presenting related work in Sec- resource requirement through local resizing with mini- tion 2 and our model in Section 3.Problem formulation is mal overheads,instead of being migrated to some other provided in Section 4.We show the details of QUEUE in PMs.Hence,the number of live migrations could be Section 5.Heterogeneous spikes are handled in Section 6. reduced considerably and the overall performance of a Evaluation results are presented in Section 7.Before computing cloud could be improved. concluding the paper in Section 9,we discuss known Specifically,we investigate the problem of minimizing issues and extensions of QUEUE in Section 8. the amount of extra resources reserved on each PM during server consolidation while the overall perfor- mance is probabilistically guaranteed.By "probabilisti- 2 RELATED WORK cally guaranteed",we mean that,the fraction of time Most of prior studies [3],[15],[16]on server consolida- within which the aggregated workloads of a PM exceed tion focused on minimizing the number of active PMs its physical capacity is not larger than a threshold. from the perspective of bin packing.A heterogeneity- Imposing such a threshold rather than conducting live aware resource management system for dynamic capac- migration upon PM's capacity overflow is a way to ity provisioning in clouds was developed in [17].Stable tolerate minor fluctuations of resource usage (like the resource allocation in geographically-distributed clouds case of CPU usage)and to break the tradeoff between was considered in [18].Network-aware virtual machine utilization and performance.Then,our problem can be placement was considered in [19].Scalable virtual net- formulated as an optimization,wherein the goal is to work models were designed in [8],[20]to allow cloud minimize the amount of resource reserved on each PM, tenants to explicitly specify computing and networking and the constraint is that the capacity violation ratio of requirements to achieve predictable performance. every PM is not larger than a predetermined threshold. In a computing cloud,burstiness of workload widely We use a two-state Markov chain to capture the bursti- exists in real applications,which becomes an inevitable ness of workload [7],and also shows how to learn the characteristic in server consolidation [1],[4],[6],[7],[21]. chain parameters.Inspired by the serving windows in Some recent works [22],[23]used stochastic bin-packing queueing theory [14],we abstract the resources reserved (SBP)techniques to deal with variable workloads,where on each PM for workload spikes as blocks.Denoting workload is modeled as random variable.Some other re- by 0(t)the number of busy blocks at time t on a PM, search [10],[24],[25]studied the SBP problem assuming we show that a sequence of (0),0(1),0(2),...has the VM workload follows normal distribution.Several other Markov property,namely that,the next state only de- studies [26],[27]focused on workload prediction while pends on the current state and not on the past sequence the application runs.Different from them,in our model a of states.Then we develop a novel server consolidation lower limit of provisioning is set at the normal workload algorithm,QUEUE,based on the stationary distribution level which effectively prevents VM interference caused of this Markov chain.We also show how to further by unpredictable behaviors from co-located VMs. improve the effectiveness of QUEUE with more careful Markov chain was used to inject burstiness into a treatment of heterogenous workload spikes.Simulation traditional benchmark in [7].Several works [5],[28],[29] and testbed results show that,QUEUE improves the studied modeling and dynamic provisioning of bursty consolidation ratio by up to 45%with large spike size workload in cloud computing.A previous study [30] and around 30%with normal spike size compared with proposed to reserve a constant level of hardware re- the strategy that provisions for peak workload,and source on each PM to tolerate workload fluctuation;but achieves a better balance between performance and en- how much resource should be reserved was not given. ergy consumption in comparison with other commonly-To the best of our knowledge,we are the first to quantify used consolidation algorithms.The contributions of our the amount of reserved resources with consideration on paper are three-fold. various,but distinct,workload burstiness 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 2 Time W o r k l o a d S p i k e (R e ) P e a k W o r k l o a d (R p ) N o rm a lW o r k l o a d (R b ) Fig. 1. An example of workload with bursty spikes. In this paper, we propose to reserve some extra resources on each PM to accommodate bursty workload [13]. In doing so, when a resource spike occurs, VMs can be quickly reconfigured to the new level of resource requirement through local resizing with minimal overheads, instead of being migrated to some other PMs. Hence, the number of live migrations could be reduced considerably and the overall performance of a computing cloud could be improved. Specifically, we investigate the problem of minimizing the amount of extra resources reserved on each PM during server consolidation while the overall performance is probabilistically guaranteed. By “probabilistically guaranteed”, we mean that, the fraction of time within which the aggregated workloads of a PM exceed its physical capacity is not larger than a threshold. Imposing such a threshold rather than conducting live migration upon PM’s capacity overflow is a way to tolerate minor fluctuations of resource usage (like the case of CPU usage) and to break the tradeoff between utilization and performance. Then, our problem can be formulated as an optimization, wherein the goal is to minimize the amount of resource reserved on each PM, and the constraint is that the capacity violation ratio of every PM is not larger than a predetermined threshold. We use a two-state Markov chain to capture the burstiness of workload [7], and also shows how to learn the chain parameters. Inspired by the serving windows in queueing theory [14], we abstract the resources reserved on each PM for workload spikes as blocks. Denoting by θ(t) the number of busy blocks at time t on a PM, we show that a sequence of θ(0), θ(1), θ(2), ... has the Markov property, namely that, the next state only depends on the current state and not on the past sequence of states. Then we develop a novel server consolidation algorithm, QUEUE, based on the stationary distribution of this Markov chain. We also show how to further improve the effectiveness of QUEUE with more careful treatment of heterogenous workload spikes. Simulation and testbed results show that, QUEUE improves the consolidation ratio by up to 45% with large spike size and around 30% with normal spike size compared with the strategy that provisions for peak workload, and achieves a better balance between performance and energy consumption in comparison with other commonlyused consolidation algorithms. The contributions of our paper are three-fold. (1) To the best of our knowledge, we are the first to quantify the amount of reserved resources with consideration of workload burstiness. We propose to use the two-state Markov chain model to capture workload burstiness, and we present a formal problem description and its NP-completeness. (2) We develop a novel algorithm, QUEUE, for burstiness-aware resource reservation, based on the stationary distribution of a Markov chain. We also show how to cope with heterogeneous spikes to further improve the performance of QUEUE. (3) Extensive simulations and testbed experiments are conducted to validate the effectiveness and advantages of QUEUE. We now continue by presenting related work in Section 2 and our model in Section 3. Problem formulation is provided in Section 4. We show the details of QUEUE in Section 5. Heterogeneous spikes are handled in Section 6. Evaluation results are presented in Section 7. Before concluding the paper in Section 9, we discuss known issues and extensions of QUEUE in Section 8. 2 RELATED WORK Most of prior studies [3], [15], [16] on server consolidation focused on minimizing the number of active PMs from the perspective of bin packing. A heterogeneityaware resource management system for dynamic capacity provisioning in clouds was developed in [17]. Stable resource allocation in geographically-distributed clouds was considered in [18]. Network-aware virtual machine placement was considered in [19]. Scalable virtual network models were designed in [8], [20] to allow cloud tenants to explicitly specify computing and networking requirements to achieve predictable performance. In a computing cloud, burstiness of workload widely exists in real applications, which becomes an inevitable characteristic in server consolidation [1], [4], [6], [7], [21]. Some recent works [22], [23] used stochastic bin-packing (SBP) techniques to deal with variable workloads, where workload is modeled as random variable. Some other research [10], [24], [25] studied the SBP problem assuming VM workload follows normal distribution. Several other studies [26], [27] focused on workload prediction while the application runs. Different from them, in our model a lower limit of provisioning is set at the normal workload level which effectively prevents VM interference caused by unpredictable behaviors from co-located VMs. Markov chain was used to inject burstiness into a traditional benchmark in [7]. Several works [5], [28], [29] studied modeling and dynamic provisioning of bursty workload in cloud computing. A previous study [30] proposed to reserve a constant level of hardware resource on each PM to tolerate workload fluctuation; but how much resource should be reserved was not given. To the best of our knowledge, we are the first to quantify the amount of reserved resources with consideration on various, but distinct, workload burstiness

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR -卫 345678901112131415T Fig.2.Two-state Markov chain.The "ON"state repre- Workload Prote ·*-…ON-OFF Model sents peak workload(R)while the "OFF"state repre- sents normal workload(R).Pon and poff are the state Fig.3.Given the predetermined R and Re,we conser- switch probabilities vatively round the solid black curve up to the dashed red curve,based on which we can calculate pon and poff. 3 MODELING VIRTUAL MACHINE WORKLOAD where the solid black curve represents the workload 3.1 Two-state Markov Chain over time.Given the predetermined R and Re,we It has been well recognized in previous studies [4],[6], conservatively round the solid black curve up to the [7]that VM workload is time-varying with bursty spikes, dashed red curve.Denote by W(t)the workload at time t as shown in Fig.1.Several works [9],[10],[22],[23], in the dashed red curve,e.g.,W(4)=Rp,and W(7)=Ro. [24],[25]modeled the workload of a VM as a random Denote by Spr the number of switches from state variable,which follows the Bernoulli distribution in [9] "OFF"to state "OFF"in two consecutive time slots or normal distribution in [10],[24],[25].Different from during the time period of interest;denote by Srn the these works,we model the workload of a VM as a two- number of switches from state "OFF"to state "ON"in state Markov chain,which takes the additional dimen- two consecutive time slots during the time period of sion of time into consideration,and thus describes the interest,e.g.,SFF =5 and SFN =2 in Fig.3.Similarly, characteristics of spikes more precisely. we can define SNN and SNF,which are equal to 5 and Fig.2 shows an example.We denote the resource 2,respectively,in the figure.It is then easy to see that requirements of peak workload,normal workload,and SFN workload spike by Rp,Ro,and Re,respectively,where SNE Re =Rp-Ro as demonstrated in Fig.1.The "ON" Pmn SEN +SFF'and Polf-SNF+SNN state represents peak workload while the "OFF"state represents normal workload.We use Pon and poff to 3.3 Potential Benefits denote the state switch probabilities.More specifically, The 2-state Markov chain model allows cloud tenants to if a VM is in the ON state,then the probability of it flexibly control the tradeoff between VM performance switching to OFF at the next time is poff,and remaining and deployment cost through adjusting Ro and Re. ON is 1-poff.Similarly if a VM is in the OFF state, When a tenant wants to maximize VM performance,the then the probability of it switching to ON at next time tenant should choose a large R and a small Re.As we will is pon and remaining ON is 1-pon.We emphasize that show later in this paper,there may be multiple work- this model is able to describe the characteristics of spikes load spikes that share some common physical resources. precisely-intuitively,Re denotes the size of a spike,and Thus,when the aggregated amount of workload spikes Pon denotes the frequency of spike occurrence.Thus, that simultaneously occur is larger than the amount of each VM can be described by a four-tuple the shared common resources,capacity overflow hap- pens and VM performance is probably affected. V=(pan:paff,R路，FRe),1≤i≤n, (1) When a tenant wants to minimize deployment cost,the where n is the number of VMs. tenant should choose a small Ro and a large Re.By "deploy- ment cost",we mean the fee which is paid by a cloud tenant to a cloud provider.Since physical resources 3.2 Learning Model Parameters are opportunistically shared among multiple workload This subsection provides a simple strategy for cloud spikes,the charge for workload spike should be smaller tenants to generate model parameters for their VM than that for normal workload [9].Therefore,decreasing workload.It consists of two phases. Ro helps tenants to reduce the deployment cost. First,a cloud tenant must have the workload traces Our model is also a tradeoff between modeling com- and guarantees that they will be consistent with the plexity and precision.We could model time-varying realistic deployment in computing clouds.This could be workload by 3-state or even more states of Markov chain, achieved by tentatively deploying VMs in a cloud for which should capture the workload bustiness more a relatively short period;the cloud system collects the precisely;however,the complexity in learning model resource usage traces and feeds them back to tenants. parameters and allocating physical resources increases Second,given a VM workload trace,a cloud tenant as well,which may complicate the interactions between generates a four-tuple.We use Fig.3 as an illustration, cloud providers and tenants. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 3 OFF (Rb) ON (Rp) pon poff 1-pon 1-poff Fig. 2. Two-state Markov chain. The “ON” state represents peak workload (Rp) while the “OFF” state represents normal workload (Rb). pon and pof f are the state switch probabilities. 3 MODELING VIRTUAL MACHINE WORKLOAD 3.1 Two-state Markov Chain It has been well recognized in previous studies [4], [6], [7] that VM workload is time-varying with bursty spikes, as shown in Fig. 1. Several works [9], [10], [22], [23], [24], [25] modeled the workload of a VM as a random variable, which follows the Bernoulli distribution in [9] or normal distribution in [10], [24], [25]. Different from these works, we model the workload of a VM as a twostate Markov chain, which takes the additional dimension of time into consideration, and thus describes the characteristics of spikes more precisely. Fig. 2 shows an example. We denote the resource requirements of peak workload, normal workload, and workload spike by Rp, Rb, and Re, respectively, where Re = Rp − Rb as demonstrated in Fig. 1. The “ON” state represents peak workload while the “OFF” state represents normal workload. We use pon and pof f to denote the state switch probabilities. More specifically, if a VM is in the ON state, then the probability of it switching to OFF at the next time is poff , and remaining ON is 1 − pof f . Similarly if a VM is in the OFF state, then the probability of it switching to ON at next time is pon and remaining ON is 1 − pon. We emphasize that this model is able to describe the characteristics of spikes precisely—intuitively, Re denotes the size of a spike, and pon denotes the frequency of spike occurrence. Thus, each VM can be described by a four-tuple Vi = (p i on, pi of f , Ri b , Ri e ), ∀1 ≤ i ≤ n, (1) where n is the number of VMs. 3.2 Learning Model Parameters This subsection provides a simple strategy for cloud tenants to generate model parameters for their VM workload. It consists of two phases. First, a cloud tenant must have the workload traces and guarantees that they will be consistent with the realistic deployment in computing clouds. This could be achieved by tentatively deploying VMs in a cloud for a relatively short period; the cloud system collects the resource usage traces and feeds them back to tenants. Second, given a VM workload trace, a cloud tenant generates a four-tuple. We use Fig. 3 as an illustration, Time W o r k l o a d Workload Profile ON-OFF Model S p i k e (R e ) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 P e a k W o r k l o a d (R p ) N o rm a lW o r k l o a d (R b ) Fig. 3. Given the predetermined Rb and Re, we conservatively round the solid black curve up to the dashed red curve, based on which we can calculate pon and pof f . where the solid black curve represents the workload over time. Given the predetermined Rb and Re, we conservatively round the solid black curve up to the dashed red curve. Denote by W(t) the workload at time t in the dashed red curve, e.g., W(4) = Rp, and W(7) = Rb. Denote by SF F the number of switches from state “OFF” to state “OFF” in two consecutive time slots during the time period of interest; denote by SF N the number of switches from state “OFF” to state “ON” in two consecutive time slots during the time period of interest, e.g., SF F = 5 and SF N = 2 in Fig. 3. Similarly, we can define SNN and SNF , which are equal to 5 and 2, respectively, in the figure. It is then easy to see that pon = SF N SF N + SF F , and poff = SNF SNF + SNN . 3.3 Potential Benefits The 2-state Markov chain model allows cloud tenants to flexibly control the tradeoff between VM performance and deployment cost through adjusting Rb and Re. When a tenant wants to maximize VM performance, the tenant should choose a large Rb and a small Re. As we will show later in this paper, there may be multiple workload spikes that share some common physical resources. Thus, when the aggregated amount of workload spikes that simultaneously occur is larger than the amount of the shared common resources, capacity overflow happens and VM performance is probably affected. When a tenant wants to minimize deployment cost, the tenant should choose a small Rb and a large Re. By ”deployment cost”, we mean the fee which is paid by a cloud tenant to a cloud provider. Since physical resources are opportunistically shared among multiple workload spikes, the charge for workload spike should be smaller than that for normal workload [9]. Therefore, decreasing Rb helps tenants to reduce the deployment cost. Our model is also a tradeoff between modeling complexity and precision. We could model time-varying workload by 3-state or even more states of Markov chain, which should capture the workload bustiness more precisely; however, the complexity in learning model parameters and allocating physical resources increases as well, which may complicate the interactions between cloud providers and tenants

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Symbol Meaning the number of VMs performance of Hi.The performance of each PM should the i-th VM be probabilistically guaranteed,so we have R阳 the normal workload size of V the spiky workload size of Vi Φj≤p,1≤j≤m. (3) the peak workload size of Vi,and Ri=Ri+R Pon the probability of Vi switching from OFF to ON Here,p is a predetermined value serving as the threshold Poff the probability of Vi switching from ON to OFF of COR.Main notations are summarized in Fig.4 for m the number of PMs quick reference.Our problem can be stated as follows. H the j-th PM Problem 1:(Burtiness-Aware Server Consolidation, C the physical capacity of H; i衫 the variable indicating whether VMi is placed on PM BASC)Given a set of n VMs and a set of m PMs,find a the capacity overflow ratio of PM PM; VM-to-PM mapping X to minimize the number of PMs the threshold of capacity overflow ratio used while making sure that(1)the initial placement sat- Fig.4.Main notations for quick reference. isfies capacity constraint,and(2)the capacity overflow ratio of each PM is not larger than the threshold p.It can be formally formulated as follows: 4 PROBLEM FORMULATION n We consider a computing cloud with 1-dimensional re- min I{jl∑x>0,1≤j≤ml source;for scenarios with multi-dimensional resources, (4) we provide a few remarks in Section 8.There are m s.t.CO9=0,1≤j≤m physical machines in the computing cloud,and each PM Φ)≤p,∀1≤j≤m is described by its physical capacity Here,S denotes the cardinality of set S.In the H=(C),1≤j≤m. (2) following theorem,we can prove that,the BASC problem is NP-complete. We use a binary matrix X [ziilnxm to represent Theorem 1:The BASC problem is NP-complete. the results of placing n VMs on m PMs:i;=1,if Vi Proof:We prove this theorem by reduction from the is placed on Hj,and 0 otherwise.We assume that the Bin Packing (BP)problem [31],which is NP-complete workloads of VMs are mutually independent.Let Wi(t) The decision version of the BP problem is as follows. be the resource requirements of Vi at time t.According Given n items with sizes s1,s2,...,sn E (0,1],can we the Markov chain model,we have pack them in no more than k unit-sized bins? Given an instance of the decision version of the BP Ri if Vi is in the "OFF"state at time t, problem,we can construct an instance of the decision W(t)= if V is in the "ON"state at time t. version of our problem as follows:let Ri =Ri=si, 1≤i≤n;letm=k;letC,=l,1≤j≤m;and let Then,the aggregated resource requirement of VMs on p=0,i.e.,capacity overflow is not allowed. PM H,isW:(t). It is not hard to see that the construction can be Let CO indicate whether the capacity overflow hap- finished in polynomial time;thus,we reduce solving pens on PM Hj at time t,i.e., the NP-complete BP problem to solving a special case of our problem,implying that our problem is NP-hard. It is easy to verify that the BASC problem is also in NP; zijWi(t)>Ci, i=1 the theorem follows immediately. 口 otherwise. 5 BURSTINESS-AWARE RESOURCE RESER- Intuitively,the results of VM placement should guaran- VATION tee that the capacity constraint is satisfied on each PM In this section,we first present the main idea of our at the beginning of the time period of interest,i.e., solution to the BASC problem,then we design a resource CO9=0,1≤j≤m reservation strategy for a single PM,based on which we develop QUEUE,a complete server consolidation We now can define our metric for probabilistic perfor- algorithm.In the end,we provide a concrete example mance guarantee-capacity overflow ratio (COR),which is to help readers better understand our algorithm. the fraction of time that the aggregated workloads of a PM exceed its physical capacity.Denoting the capacity 5.1 Overview of QUEUE overflow ratio of PM Hj as i,we have We propose reserving a certain amount of physical re- EisisTcof sources on each PM to accommodate workload spikes. The main idea is to abstract the reserved spaces as T blocks (i.e.,serving windows in queueing theory).We where T is the length of the time period of interest. give an informal illustration of the evolution process of It is easy to see that,a smaller implies a better our queueing system in Fig.5. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 4 Symbol Meaning n the number of VMs Vi the i-th VM Ri b the normal workload size of Vi Ri e the spiky workload size of Vi Ri p the peak workload size of Vi, and Ri p = Ri b + Ri e p i on the probability of Vi switching from OFF to ON p i off the probability of Vi switching from ON to OFF m the number of PMs Hj the j-th PM Cj the physical capacity of Hj xij the variable indicating whether V Mi is placed on PMj Φj the capacity overflow ratio of PM PMj ρ the threshold of capacity overflow ratio Fig. 4. Main notations for quick reference. 4 PROBLEM FORMULATION We consider a computing cloud with 1-dimensional resource; for scenarios with multi-dimensional resources, we provide a few remarks in Section 8. There are m physical machines in the computing cloud, and each PM is described by its physical capacity Hj = (Cj ), ∀1 ≤ j ≤ m. (2) We use a binary matrix X = [xij ]n×m to represent the results of placing n VMs on m PMs: xij = 1, if Vi is placed on Hj , and 0 otherwise. We assume that the workloads of VMs are mutually independent. Let Wi(t) be the resource requirements of Vi at time t. According the Markov chain model, we have Wi(t) =    Ri b if Vi is in the “OFF” state at time t, Ri p if Vi is in the “ON” state at time t. Then, the aggregated resource requirement of VMs on PM Hj is ∑n i=1 xijWi(t). Let COt j indicate whether the capacity overflow happens on PM Hj at time t, i.e., COt j =    1 if ∑n i=1 xijWi(t) > Cj , 0 otherwise. Intuitively, the results of VM placement should guarantee that the capacity constraint is satisfied on each PM at the beginning of the time period of interest, i.e., CO0 j = 0, ∀1 ≤ j ≤ m. We now can define our metric for probabilistic performance guarantee—capacity overflow ratio (COR), which is the fraction of time that the aggregated workloads of a PM exceed its physical capacity. Denoting the capacity overflow ratio of PM Hj as Φj , we have Φj = ∑ 1≤t≤T COt j T , where T is the length of the time period of interest. It is easy to see that, a smaller Φj implies a better performance of Hj . The performance of each PM should be probabilistically guaranteed, so we have Φj ≤ ρ, ∀1 ≤ j ≤ m. (3) Here, ρ is a predetermined value serving as the threshold of COR. Main notations are summarized in Fig. 4 for quick reference. Our problem can be stated as follows. Problem 1: (Burtiness-Aware Server Consolidation, BASC) Given a set of n VMs and a set of m PMs, find a VM-to-PM mapping X to minimize the number of PMs used while making sure that (1) the initial placement satisfies capacity constraint, and (2) the capacity overflow ratio of each PM is not larger than the threshold ρ. It can be formally formulated as follows: min |{j| ∑n i=1 xij > 0, 1 ≤ j ≤ m}| s.t. CO0 j = 0, ∀1 ≤ j ≤ m Φj ≤ ρ, ∀1 ≤ j ≤ m (4) Here, |S| denotes the cardinality of set S. In the following theorem, we can prove that, the BASC problem is NP-complete. Theorem 1: The BASC problem is NP-complete. Proof: We prove this theorem by reduction from the Bin Packing (BP) problem [31], which is NP-complete. The decision version of the BP problem is as follows. Given n items with sizes s1, s2, ..., sn ∈ (0, 1], can we pack them in no more than k unit-sized bins? Given an instance of the decision version of the BP problem, we can construct an instance of the decision version of our problem as follows: let Ri b = Ri p = si , ∀1 ≤ i ≤ n; let m = k; let Cj = 1, ∀1 ≤ j ≤ m; and let ρ = 0, i.e., capacity overflow is not allowed. It is not hard to see that the construction can be finished in polynomial time; thus, we reduce solving the NP-complete BP problem to solving a special case of our problem, implying that our problem is NP-hard. It is easy to verify that the BASC problem is also in NP; the theorem follows immediately. 5 BURSTINESS-AWARE RESOURCE RESERVATION In this section, we first present the main idea of our solution to the BASC problem, then we design a resource reservation strategy for a single PM, based on which we develop QUEUE, a complete server consolidation algorithm. In the end, we provide a concrete example to help readers better understand our algorithm. 5.1 Overview of QUEUE We propose reserving a certain amount of physical resources on each PM to accommodate workload spikes. The main idea is to abstract the reserved spaces as blocks (i.e., serving windows in queueing theory). We give an informal illustration of the evolution process of our queueing system in Fig. 5

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Queueing: Reducing R RR RR System Blocks OFF to ON (i.e.,VMs that enter the queueing system) at timet,respectively.We use (to denoteie x choose y.Since the workloads of VMs are mutually independent,we have Physical Machine cal Machine Physical Machine Pr{O(t)=x}= ）P1-pof)- @(t) (a) (b) (c) and Fig.5.An illustration of the evolution process.(a)The original provisioning strategy for peak workload.(b)Gath- Pr{I(t)=x}= ering all R's together to form a queueing system.(c)Re- k-e）p6n(1-pom-- ducing the number of blocks while still satisfying Equ.(3). which suggest that both O(t)and I(t)follow the bino- mial distribution: Initially,all VMs are provisioned by R+Re,and each O(t)B(0(t),Poff), VM has its own block (denoted as Re in Fig.5).A VM (5) I(t)~B(k-0(t);Pon). uses only its Ro part during periods of normal workload, however,when a workload spike occurs,the extra Re Without loss of generality,we assume that the switch part is put into use.We note that,the collected Re's between two consecutive states of all VMs happens at altogether form a queueing system-when a workload the end of each time interval.Then we have the recursive spike occurs in a VM,the VM enters the queueing system relation of (t), and occupies one of the idle blocks;when the spike (t+1)=θ(t)-O(t)+I(t) (6) disappears,the corresponding VM leaves the queue- ing system and releases the block.It is worth noting Combining Equs.(5)and(6)together,we see that,the that,there is no waiting space in the queueing system; next state (t+1)only depends on the current state thus,the PM capacity constraint would be violated if a 0(t)and not on the past sequence of states 0(t-1), workload spike occurs while all the blocks are occupied,(t-2),...,0(0).Therefore,the stochastic process (0), which never happens when the number of blocks equals 0(1),...of discrete time (10,1,2,...})and discrete space the number of co-located VMs (as shown in Fig.5(b)). ([0,1,2,...,))is a Markov chain.The stochastic process However,we may find that a certain number of blocks is said to be in state i(1x or y<x<0,we let (be 0.Then,Pij can be is still satisfied. derived as follows. p=Pr{9(t+1)=(t)= 5.2 Resource Reservation Strategy for a Single PM In this subsection,We focus on resource reservation for =∑Pr0=因=j-i+r8= a single PM.For the sake of convenience,we set the size r=0 of each block as the size of the maximum spike of all co-located VMs on a PM.In Section 6,we will present =∑Pr{O(=r(=i r=0 how to cope with heterogenous workload spikes in an (7) effort to further improve the performance of QUEUE. x Pr{I(t)=j-i+0(t)=i} We also assume that all VMs have the same state switch probabilities,i.e.,pon =Pon and poff=Poff,for all 1< i<n.In Section 8,we will show how to cluster VMs when they have different state switch probabilities. Suppose there are k VMs on the PM of interest and initially each VM Vi occupies R resources.We initialize In the above formula,the first and second equations the number of blocks reserved on this PM as k,and our are due to the definition of pij and the recursive relation objective is to reduce the number of blocks to K(K<k), of 0(t),respectively.Observing that O(t)and I(t)are while the capacity overflow ratio does not exceed the independent of each other,we get the third equation threshold p.Let 0(t)be the number of busy blocks at The last equation can be obtained by replacing O(t)and time t,implying that,there are 0(t)VMs in the ON state I(t)with their distributions. and (k-0(t))VMs in the OFF state.Let O(t)and I(t)The stochastic matrix P=pijl(+)x(+)is called denote the number of VMs that switch state from ON to the transition matrix of the Markov chain.We see OFF(i.e.,VMs that leave the queueing system)and from that,it does not depend on the time.Let () 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 5 Queueing System Physical Machine V1 V2 V3 V4 V5 Physical Machine Re Re Re Re Re Reducing Blocks Physical Machine Re Re Re (a) (b) (c) Re Re Re Re Re V1 V2 V3 V4 V5 V1 V2 V3 V4 V5 Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Rb Fig. 5. An illustration of the evolution process. (a) The original provisioning strategy for peak workload. (b) Gathering all Re ′s together to form a queueing system. (c) Reducing the number of blocks while still satisfying Equ. (3). Initially, all VMs are provisioned by Rb +Re, and each VM has its own block (denoted as Re in Fig. 5). A VM uses only its Rb part during periods of normal workload, however, when a workload spike occurs, the extra Re part is put into use. We note that, the collected Re ′ s altogether form a queueing system—when a workload spike occurs in a VM, the VM enters the queueing system and occupies one of the idle blocks; when the spike disappears, the corresponding VM leaves the queueing system and releases the block. It is worth noting that, there is no waiting space in the queueing system; thus, the PM capacity constraint would be violated if a workload spike occurs while all the blocks are occupied, which never happens when the number of blocks equals the number of co-located VMs (as shown in Fig. 5(b)). However, we may find that a certain number of blocks are idle for the majority of the time in Fig. 5(b), so we can reduce the number of blocks while only incurring very few capacity violations (as shown in Fig. 5(c)). Therefore, our goal becomes reserving minimal number of blocks on each PM while the performance constraint in Equ. (3) is still satisfied. 5.2 Resource Reservation Strategy for a Single PM In this subsection, We focus on resource reservation for a single PM. For the sake of convenience, we set the size of each block as the size of the maximum spike of all co-located VMs on a PM. In Section 6, we will present how to cope with heterogenous workload spikes in an effort to further improve the performance of QUEUE. We also assume that all VMs have the same state switch probabilities, i.e., p i on = pon and p i of f = poff , for all 1 ≤ i ≤ n. In Section 8, we will show how to cluster VMs when they have different state switch probabilities. Suppose there are k VMs on the PM of interest and initially each VM Vi occupies Ri b resources. We initialize the number of blocks reserved on this PM as k, and our objective is to reduce the number of blocks to K (K x or y ≤ x < 0, we let ( x y ) be 0. Then, pij can be derived as follows. pij =P r{θ(t + 1) = j|θ(t) = i} = ∑ i r=0 P r{O(t) = r, I(t) = j − i + r|θ(t) = i} = ∑ i r=0 P r{O(t) = r|θ(t) = i} × P r{I(t) = j − i + r|θ(t) = i} = ∑ i r=0 ( i r ) p r off (1 − poff ) i−r × ( k − i j − i + r ) p j−i+r on (1 − pon) k−j−r . (7) In the above formula, the first and second equations are due to the definition of pij and the recursive relation of θ(t), respectively. Observing that O(t) and I(t) are independent of each other, we get the third equation. The last equation can be obtained by replacing O(t) and I(t) with their distributions. The stochastic matrix P = [pij ](k+1)×(k+1) is called the transition matrix of the Markov chain. We see that, it does not depend on the time. Let π (t) =

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR 6 Algorithm 1 Calculating Minimum Blocks(CalMinBlk) Input:k,the number of co-located VMs on a PM; Pon,the switch probability from "OFF"to "ON"; Poff,the switch probability from "ON"to "OFF"; P,capacity overflow ratio threshold Output:K,the minimum number of blocks that should be reserved on a PM 1:Calculate the transition matrix P using Equ.(7) Fig.6.The transition graph of the stochastic process 2:Prepare the coefficient matrix of the homogeneous {(0),(1),...,0(t),....The stochastic process is said to system of linear equations described in Equ.(9) be in state i(10 for any state i,so all states parameters k,Pon,Poff,and p.Calculating the transition are aperiodic.Finally,from any state i,we can reach matrix P requires O(k3)time;solving the linear equa- any other state j,so the transition graph is strongly tions using Gaussian elimination costs roughly O(k3) connected,implying that our chain is irreducible.The time;finding K that satisfies Equ.(8)needs O(k)time. theorem follows immediately. Therefore,the time complexity of Alg.1 is O(k3). When the chain stays in the stationary distribution, we see that Th is equivalent to the proportion of times that the stochastic process is in state h.In our case,it 5.3 QUEUE means that mh denotes the proportion of time wherein In this subsection,we present the complete server con- the number of busy resource blocks is h. solidation algorithm,QUEUE,which is to place a set of We then can derive the minimum number of blocks n VMs onto a set of m PMs. that keeps the capacity overflow ratio not larger than p. As we mentioned before,we conservatively set the Denote by K the minimum number of blocks;we argue size of a single block as the size of the maximum spike that K satisfies the following constraint: of all the VMs on each PM,which may result in low K-1 K utilization if the workload spikes of the co-located VMs Th<1-p≤ (8) differ too much.Therefore,in QUEUE,we tend to place h-0 h=0 VMs with similar Re's on the same PM in an effort to 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 6 p00 0 1 2 k p0k p02 p01 p12 p1k p10 p11 p2k p22 p21 p20 pk0 pk1 pk2 pkk Fig. 6. The transition graph of the stochastic process {θ(0), θ(1), ..., θ(t), ...}. The stochastic process is said to be in state i (1 ≤ i ≤ k) if the number of busy blocks is i. pij is the transition probability from state i to state j. (π (t) 0 , π (t) 1 , ..., π (t) k ) be the distribution of the chain at time t, i.e., π (t) h = P r{θ(t) = h}, ∀0 ≤ h ≤ k. For our chain, which is finite, π (t) is a vector of k+1 nonnegative entries such that ∑k h=0 π (t) h = 1. In linear algebra, vectors of this type are called stochastic vectors. Then, it holds that π (t+1) = π (t)P. Suppose π is a distribution over the state space {0, 1, 2, ..., k} such that, if the chain starts with an initial distribution π (0) that is equal to π, then after a transition, the distribution of the chain is still π (1) = π. Then the chain will stay in the distribution π forever: π P −→ π P −→ π P −→ · · · · · · Such π is called a stationary distribution. For our chain, we have the following theorem. Theorem 2: For the Markov chain defined in Fig. 6 and Equ. (7), given an arbitrary initial distribution π (0) , π (t) will converge to the same distribution π, which satisify π = πP, and limt→∞ (π (0)P t )h = πh, ∀0 ≤ h ≤ k. Proof: According to the Markov chain convergence theorem [14], it is sufficient to prove that, our chain is finite, aperiodic, and irreducible. Since the number of VMs that a single PM can host is finite (in our case, it is k), the state space of our chain is finite. As shown in Equ. (7), pii > 0 for any state i, so all states are aperiodic. Finally, from any state i, we can reach any other state j, so the transition graph is strongly connected, implying that our chain is irreducible. The theorem follows immediately. When the chain stays in the stationary distribution, we see that πh is equivalent to the proportion of times that the stochastic process is in state h. In our case, it means that πh denotes the proportion of time wherein the number of busy resource blocks is h. We then can derive the minimum number of blocks that keeps the capacity overflow ratio not larger than ρ. Denote by K the minimum number of blocks; we argue that K satisfies the following constraint: K ∑−1 h=0 πh < 1 − ρ ≤ ∑ K h=0 πh, (8) Algorithm 1 Calculating Minimum Blocks (CalMinBlk) Input: k, the number of co-located VMs on a PM; pon, the switch probability from “OFF” to “ON”; pof f , the switch probability from “ON” to “OFF”; ρ, capacity overflow ratio threshold Output: K, the minimum number of blocks that should be reserved on a PM 1: Calculate the transition matrix P using Equ. (7) 2: Prepare the coefficient matrix of the homogeneous system of linear equations described in Equ. (9) 3: Solve the the homogeneous system via Gaussian elimination and get the stationary distribution π 4: Calculate K from π using Equ. (8) 5: return K; which suggests that K is the minimum number that guarantees ∑K h=0 πh ≥ 1 − ρ. This is because, when he number of reserved blocks on PM Hj is reduced from k to K, if the queueing system is in state h, which is larger than K, then capacity overflow occurs, i.e., COt j = 1, and vice versa. Thus we have Φj = ∑ 1≤t≤T COt j T = ∑ k h=K+1 πh = 1 − ∑ K h=0 πh ≤ ρ. We now show how to calculate the stationary distribution π. According to its definition, we have π = πP, which is equivalent to the following homogeneous system of linear equations that can be solved by Gaussian elimination.    ∑ k h=0 πhph0 − π0 = 0 ∑ k h=1 πhph1 − π1 = 0 ...... ∑ k h=k πhphk − πk = 0 (9) Alg. 1 summarizes the entire process of how to calculate the minimum number of reserved blocks, given parameters k, pon, pof f , and ρ. Calculating the transition matrix P requires O(k 3 ) time; solving the linear equations using Gaussian elimination costs roughly O(k 3 ) time; finding K that satisfies Equ. (8) needs O(k) time. Therefore, the time complexity of Alg. 1 is O(k 3 ). 5.3 QUEUE In this subsection, we present the complete server consolidation algorithm, QUEUE, which is to place a set of n VMs onto a set of m PMs. As we mentioned before, we conservatively set the size of a single block as the size of the maximum spike of all the VMs on each PM, which may result in low utilization if the workload spikes of the co-located VMs differ too much. Therefore, in QUEUE, we tend to place VMs with similar Re ′ s on the same PM in an effort to

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR Algorithm 2 QUEUE on 5.00.0 Input:Vi,V2,...,Vn,specifications of n VMs; pm-0.2.p.0.5.0-0.05 H1,H2,...,Hm,specifications of m PMs; 15 Pon,the switch probability from "OFF"to "ON"; 10 Poff,the switch probability from "ON"to "OFF" P,capacity overflow ratio threshold; d,the maximum number of VMs allowed on a PM; 0 0 4 8121620.2428323640444852 Output:X,a VM-to-PM placement matrix k,the number of co-located VMs on a PM 1://Preprocessing phase 2:MinN an array of size d+1 Fig.7.The minimum number,MinN[k],of blocks that 3:MinN0l←-0 should be reserved on a PM,given different settings of 4:for each k∈[1，ddo Pon,Poff and p. MinN[k]CalMinBlk(k,Pon,Poff,p) 6:end for 7://Sorting phase minimize the average size of blocks on all PMs. 8:Cluster VMs based on their Re's In the third phase (lines 12-16),we adopt the First Fit 9:Sort clusters in descending order of Re Decrease(FFD)[31]heuristic to place VMs on PMs.For 10:In each cluster,sort VMs in descending order of Ro each Vi in the sorted order,we place Vi on the first PM 11:Sort PMs in descending order of C H,that satisfies the following constraint: 12://FFD-based placement phase max{Pe,max{Rls∈T}}×MinN[lT引+ 13:X←[0]nxm (10) 14:for each Vi in the sorted order do +R+∑R%≤C, 15: place Vi on the first Hj(in the sorted order)that s∈T satisfies Equ.(10),and set ij1; where T;denotes the set of indices of VMs that have 16:end for already been placed on Hi,and Ci is the capacity of Hj 17:return X; We note that,the size of the reserved resources is the block size multiplying the number of blocks,where block size is conservatively set to the maximum Re among all reduce the average size of a single block.Alg.2 shows co-located VMs.Therefore,this constraint indicates that, the details of QUEUE,which consists of three phases. VM Vi can be placed on H;if and only if the sum of the new size of the queueing system and the new total size In the preprocessing phase (lines 1-6),we introduce of R's does not exceed the physical capacity of Hj.If an array named MinN,which stores the information Equ.(10)holds,we set zij be 1.At the end of QUEUE, about the minimum number of blocks that need to be we return the VM-to-PM mapping result,i.e.,X. reserved on a PM,given k,Pon,poff,and p.That is,if there are k VMs placed on a PM,then we need to reserve Finally,we present the complexity analysis of QUEUE MinNk]blocks to bound the capacity overflow ratio In the preprocessing phase,Alg.1 is invoked at most d+1 times.Remember that the time complexity of Alg.1 Without loss of generality,we assume that a single PM is O(3),thus,this phase costs O(d)time.The sim- can host up to d VMs,thus we can calculate the array MinNIk]for all possible k E[1,d]before VM placement. ple clustering phase takes O(n)time.More developed clustering techniques are out of the scope of this paper. We also let MinN[0]=0 for compatibility.Fig.7 shows The sorting phase takes O(n log n)time.The FFD-based the array MinN given different settings of pon,Poff,and placement phase takes O(mn)time.Overall,the time p.We notice that,for the same k,when the capacity overflow ratio threshold p increases(from green circles complexity of the complete consolidation algorithm is to red triangles),MinN[k]decreases;when pon increases O(d+nlogn +mn). (from green circles to blue squares),MinN[k]increases. These observations are consistent with our intuitions. 5.4 A Concrete Example During the sorting phase (lines 7-11),we first cluster In this subsection,we provide a concrete example to all VMs so that VMs with similar R.s are in the same better explain the details of QUEUE.In our example, cluster',and then sort these clusters in the descending there are n=8 VMs and m =3 PMs with the following order of Re.In each cluster,we sort VMs in the descend- parameters (see Equs.(1)and(2)): ing order of Ro.We also sorted PMs in the descending =(0.1,0.5,15,13),2=(0.1,0.5,15,13): order of the physical capacity C.This is a cluster-level heuristic to let co-located VMs have similar Re's,thus to 3=(0.1,0.5,20,15),V4=(0.1,0.5,20,10) 5=(0.1,0.5,25,15),V6=(0.1,0.5,10,9), 1.We first use linear time to find the maximum,marR,and mini- V=(0.1,0.5,15,10),V8=(0.1,0.5,10,9), mum,minR,of n Re's,then partition all Re's into c clusters,where the i-th cluster contains those Re's that satisfy minR+(maR and minR)<Re<minR+(maxR-minR),for i=1,2,....c. H1=H2=H3=(100) 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 7 Algorithm 2 QUEUE Input: V1, V2, ..., Vn, specifications of n VMs; H1, H2, ..., Hm, specifications of m PMs; pon, the switch probability from “OFF” to “ON”; poff , the switch probability from “ON” to “OFF”; ρ, capacity overflow ratio threshold; d, the maximum number of VMs allowed on a PM; Output: X, a VM-to-PM placement matrix 1: // Preprocessing phase 2: M inN ← an array of size d + 1 3: M inN[0] ← 0 4: for each k ∈ [1, d] do 5: M inN[k] ← CalMinBlk(k, pon, pof f , ρ) 6: end for 7: // Sorting phase 8: Cluster VMs based on their Re ′ s 9: Sort clusters in descending order of Re 10: In each cluster, sort VMs in descending order of Rb 11: Sort PMs in descending order of C 12: // FFD-based placement phase 13: X ← [0]n×m 14: for each Vi in the sorted order do 15: place Vi on the first Hj (in the sorted order) that satisfies Equ. (10), and set xij ← 1; 16: end for 17: return X; reduce the average size of a single block. Alg. 2 shows the details of QUEUE, which consists of three phases. In the preprocessing phase (lines 1-6), we introduce an array named M inN, which stores the information about the minimum number of blocks that need to be reserved on a PM, given k, pon, pof f , and ρ. That is, if there are k VMs placed on a PM, then we need to reserve M inN[k] blocks to bound the capacity overflow ratio. Without loss of generality, we assume that a single PM can host up to d VMs, thus we can calculate the array M inN[k] for all possible k ∈ [1, d] before VM placement. We also let M inN[0] = 0 for compatibility. Fig. 7 shows the array M inN given different settings of pon, poff , and ρ. We notice that, for the same k, when the capacity overflow ratio threshold ρ increases (from green circles to red triangles), M inN[k] decreases; when pon increases (from green circles to blue squares), M inN[k] increases. These observations are consistent with our intuitions. During the sorting phase (lines 7-11), we first cluster all VMs so that VMs with similar Re ′ s are in the same cluster1 , and then sort these clusters in the descending order of Re. In each cluster, we sort VMs in the descending order of Rb. We also sorted PMs in the descending order of the physical capacity C. This is a cluster-level heuristic to let co-located VMs have similar Re ′ s, thus to 1. We first use linear time to find the maximum, maxR, and minimum, minR, of n Re ′ s, then partition all Re ′ s into c clusters, where the i-th cluster contains those Re ′ s that satisfy minR + i−1 c (maxR − minR) ≤ Re < minR + i c (maxR − minR), for i = 1, 2, ..., c. 0 5 10 15 20 0 4 8 12 16 20 24 28 32 36 40 44 48 52 MinN[k] k, the number of co-located VMs on a PM pon=0.1,poff=0.5,ρ=0.10 pon=0.1,poff=0.5,ρ=0.05 pon=0.2,poff=0.5,ρ=0.05 Fig. 7. The minimum number, M inN[k], of blocks that should be reserved on a PM, given different settings of pon, pof f and ρ. minimize the average size of blocks on all PMs. In the third phase (lines 12-16), we adopt the First Fit Decrease (FFD) [31] heuristic to place VMs on PMs. For each Vi in the sorted order, we place Vi on the first PM Hj that satisfies the following constraint: max{R i e , max{R s e |s ∈ Tj}} × M inN[|Tj | + 1] + R i b + ∑ s∈Tj R s b ≤ Cj , (10) where Tj denotes the set of indices of VMs that have already been placed on Hj , and Cj is the capacity of Hj . We note that, the size of the reserved resources is the block size multiplying the number of blocks, where block size is conservatively set to the maximum Re among all co-located VMs. Therefore, this constraint indicates that, VM Vi can be placed on Hj if and only if the sum of the new size of the queueing system and the new total size of Rb ′ s does not exceed the physical capacity of Hj . If Equ. (10) holds, we set xij be 1. At the end of QUEUE, we return the VM-to-PM mapping result, i.e., X. Finally, we present the complexity analysis of QUEUE. In the preprocessing phase, Alg. 1 is invoked at most d+1 times. Remember that the time complexity of Alg. 1 is O(k 3 ), thus, this phase costs O(d 4 ) time. The simple clustering phase takes O(n) time. More developed clustering techniques are out of the scope of this paper. The sorting phase takes O(n log n) time. The FFD-based placement phase takes O(mn) time. Overall, the time complexity of the complete consolidation algorithm is O(d 4 + n log n + mn). 5.4 A Concrete Example In this subsection, we provide a concrete example to better explain the details of QUEUE. In our example, there are n = 8 VMs and m = 3 PMs with the following parameters (see Equs. (1) and (2)): V1 = (0.1, 0.5, 15, 13), V2 = (0.1, 0.5, 15, 13), V3 = (0.1, 0.5, 20, 15), V4 = (0.1, 0.5, 20, 10), V5 = (0.1, 0.5, 25, 15), V6 = (0.1, 0.5, 10, 9), V7 = (0.1, 0.5, 15, 10), V8 = (0.1, 0.5, 10, 9), and H1 = H2 = H3 = (100)

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR As we mentioned in Section 5.2,we assume that all VMs have the same state switch probabilities.The PM =25 R1=15 threshold p of capacity overflow ratio is set to 0.05.The R3=20 maximum number d of VMs allowed on a single PM is 4.We now present the details of running QUEUE. In the preprocessing phase,we are to generate the array MinN.Taking k =4 for example,according to Equ.(7),we can obtain the transition matrix P: PM =15 R=20 R=15 0.65610.29160.04860.00360.0001 0.36450.48600.1350 0.01400.0005 Normal Workload ☒Qucucing Block P 0.20250.45000.29500.05000.0025 0.11250.35000.3750 0.15000.0125 Fig.8.The final placement of our example using QUEUE, 0.06250.25000.3750 0.25000.0625 where only 2 PMs are needed.We conservatively round the block size up to the maximum spike size of all co- By solving m=TP,we have the stationary distribution: located VMs on a PM,e.g.,on PM H1,the size of each queueing block is max{R,R,R2,Re}=15. π=(0.4823.0.3858.0.1157.0.0154.0.0008) Since To +11 for every element zi and 1+x2+...+xg=k the queueing blocks,if resources are provisioned for For example,{1,2,4}and {1,3,3}are two possible 3- peak workload,then 3 PMs are needed to host these partitions of integer 7.Denote by pa(k)the number of 8 VMs,i.e.,V5,V3,and V6 are on H1;Vi,V2,and V4 are different g-partitions of an integer k,and by p(k)the on H2;and the remaining two VMs are on H3. number of all possible partitions of an integer k.For example,p3(7)=4,p(7)=15.According to [33],we 6 COPING WITH HETEROGENOUS SPIKES have the following recursive relations of pa(k): In this section,we present how to improve the consol- Pg(k)=Pg-1(k-1)+Pg(k-g). (11) idation performance of QUEUE through more careful Alg.3 shows the main steps to find the optimal treatment of heterogenous workload spikes. ordered partition.Without loss of generality,we assume Remember that,in the last section,we conservatively that,R,R2,..,R are sorted in descending order of round the block size up to the maximum spike size of their values.The array MinN is computed using Alg.1. all co-located VMs on each PM,as shown in Fig.8.It is Since the number of all possible partitions of an integer k easy to see that,this kind of rounding may waste some is very large(logp(k)=O(vk)[33]),enumerating them 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 8 As we mentioned in Section 5.2, we assume that all VMs have the same state switch probabilities. The threshold ρ of capacity overflow ratio is set to 0.05. The maximum number d of VMs allowed on a single PM is 4. We now present the details of running QUEUE. In the preprocessing phase, we are to generate the array M inN. Taking k = 4 for example, according to Equ. (7), we can obtain the transition matrix P: P =         0.6561 0.2916 0.0486 0.0036 0.0001 0.3645 0.4860 0.1350 0.0140 0.0005 0.2025 0.4500 0.2950 0.0500 0.0025 0.1125 0.3500 0.3750 0.1500 0.0125 0.0625 0.2500 0.3750 0.2500 0.0625         By solving π = πP, we have the stationary distribution: π = (0.4823, 0.3858, 0.1157, 0.0154, 0.0008). Since π0 + π1 < 1 − ρ ≤ π0 + π1 + π2 , we have K = 2, which is also the value of M inN[4]. Similarly, we can find that, M inN[0] = 0, M inN[1] = M inN[2] = 1, and M inN[3] = 2 (see the green circles in Fig. 7). In the clustering phase, these 8 VMs are first clustered into two groups, i.e., V1, V2, V3, and V5 are the first group, and the rest is the second group. In each cluster, we sort VMs to be in the descending order of their Rb ′ s. The final order of VMs is V5, V3, V1, V2, V4, V7, V6, and V8. In the third phase of QUEUE, we first try to place V5 on H1. Since R5 b + R5 e < C1, it succeeds and we set x51 = 1. We then try to place V3 on H1. According to Equ. (10), we have max{R 3 e , max{R 5 e}} × M inN[|{3}| + 1] + R 3 b + R 5 b < C1, so we set x31 = 1. We then try to place V1 on C1, which also succeeds and we set x11 = 1. Similarly, we then find that, in the final placement, V5, V3, V1, and V6 are placed on H1, and the rest is placed on H2. Fig. 8 shows the final placement of our example using QUEUE. Note that, we conservatively round the block size up to the maximum spike size of all co-located VMs on a PM, e.g., on PM H1, the size of each queueing block is max{R5 e , R3 e , R1 e , R6 e} = 15. In contrast, without opportunistic resource sharing in the queueing blocks, if resources are provisioned for peak workload, then 3 PMs are needed to host these 8 VMs, i.e., V5, V3, and V6 are on H1; V1, V2, and V4 are on H2; and the remaining two VMs are on H3. 6 COPING WITH HETEROGENOUS SPIKES In this section, we present how to improve the consolidation performance of QUEUE through more careful treatment of heterogenous workload spikes. Remember that, in the last section, we conservatively round the block size up to the maximum spike size of all co-located VMs on each PM, as shown in Fig. 8. It is easy to see that, this kind of rounding may waste some PM H1 R5 b = 25 R1 b = 15 R6 b = 10 R3 b = 20 PM H2 R2 b = 15 R7 b = 15 R8 b = 10 R4 b = 20 Normal Workload Queueing Block 15 15 13 13 Fig. 8. The final placement of our example using QUEUE, where only 2 PMs are needed. We conservatively round the block size up to the maximum spike size of all colocated VMs on a PM, e.g., on PM H1, the size of each queueing block is max{R5 e , R3 e , R1 e , R6 e} = 15. physical resources. Let us take PM H1 in Fig. 8 for example, by the current design of QUEUE, a total of 30 units of physical resources need to be reserved for workload spikes; however, instead of considering the four VMs (i.e., V5, V3, V1, and V6) together, we can partition them into two groups and consider each of them separately. For instance, we choose to have V5 and V3 in the first group, and have V1 and V6 in the second group. For the former group, since M inN[2] = 1, we only have to reserve one block with a size of max{R5 e , R3 e} = 15; for the latter group, we also have to reserve one block with a size of max{R1 e , R6 e} = 13. In doing so, a total of 28 units of resources are reserved, which is less than that in the previous case. We, therefore, have the following intuition: on each PM, we can try to partition the co-located VMs into several groups and consider them separately, so as to improve QUEUE by reducing the amount of resources reserved for workload spikes. A key problem in achieving our goal is how to partition a set of k VMs into non-overlapped groups, i.e., how to partition an integer k, which is an interesting and important problem in number theory [32]. A g-partition of an integer k is a multi-set {x1, x2, . . . , xi , . . . , xg} with xi ≥ 1 for every element xi and x1 + x2 + · · · + xg = k. For example, {1, 2, 4} and {1, 3, 3} are two possible 3- partitions of integer 7. Denote by pg(k) the number of different g-partitions of an integer k, and by p(k) the number of all possible partitions of an integer k. For example, p3(7) = 4, p(7) = 15. According to [33], we have the following recursive relations of pg(k): pg(k) = pg−1(k − 1) + pg(k − g). (11) Alg. 3 shows the main steps to find the optimal ordered partition. Without loss of generality, we assume that, R1 e , R2 e , ..., Rk e are sorted in descending order of their values. The array M inN is computed using Alg. 1. Since the number of all possible partitions of an integer k is very large (log p(k) = O( √ k) [33]), enumerating them

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR 9 Algorithm 3 Finding Optimal Partition(FidOptPat) Ordered partition The amount of resources used Input:MinN,an array that stores the minimum num- 47 MinW4×13=26 bers of blocks that need to be reserved on a PM,with {1,3} MinN[1]×13+MinN[3]×10=33 {3,1 respect to k,Pon,Poff,and p; MimN3×13+MimN[l]×9=35 {2,2} MinN[2×13+MinN[2]×10=23 R,R2,...,Re,a set of workload spikes that are 1.1,2 MinN[1]×13+MinN[1]×10+MinN2]×10=33 sorted in the descending order of their values; 1,2,1 MimN×13+MimN2×10+MinN[1]×9=32 G,the maximum number of groups 2,1,1 MinN[2×13+MimN[1]×10+MinN[2]×9=32 Output:S,an ordered partition Fig.9.All possible ordered partitions on PM H2 in Fig.8, 1:S←{} where the optimal ordered partition result is {2,2). 2:Tmin←MinN[内×Rg 3:Let zo 0 for convenience 4:for g=2 to G do 5: Generate all g-partitions of k(using Equ.(11)) for each g-partition [r1,...,g}do for each permutation z1,.,of 1,...,g do T← h=0 9: if r rmin then 10: Tmin←T,S←{i,,tg} 20的 80 90 11: end if Numb8rof微s 12: end for Fig.10.The computation cost of QUEUE with varying 13: end for d and n.The cost of the actual placement varies with 14:end for 15:return S; different physical configurations and thus is not included. 7 PERFORMANCE EVALUATION would be time-consuming.So we use G to restrict the In this section,we conduct extensive simulations and maximum number of groups. testbed experiments to evaluate the proposed algorithms We use S to record the best partition so far (line 1) under different settings and reveal insights of the pro- and use rmin to record the amount of resources needed posed design performance. by that partition S(line 2).For each integer g(2<g< G),we first generate all possible g-partitions of k using 7.1 Simulation Setup Equ.(11);then,for each permutation x1,...,of a g- Two commonly-used packing strategies are considered partition x1,...,g,we compute the amount of resources here,which both use the First Fit Decrease heuristic for needed by this ordered partition(line 8)and compare it VM placement.The first strategy is to provision VMs with rmin:if this ordered partition uses fewer resources for peak workload (FFD by Rp),while the second is than S,we update S and rmin (lines 9-11).Finally,the to provision VMs for normal workload (FFD by Ro). optimal ordered partition S is returned. Provisioning for peak workload is usually applied for It takes O(p())time to generate all possible partitions the initial VM placement [1],where cloud tenants choose k9-1 of an integer k [34].Since pa(k),generat- the peak workload as the fixed capacity of the VM to ing all possible g-partitions (1<gG)requires guarantee application performance.On the other hand, O)time.Taking G3 for example,since k9-1 provisioning for normal workload is usually applied in p1()=O(1),p2(k)=O(k),andp3(k)=O(k2), the consolidation process,since at runtime the majority of VMs are in the OFF state,i.e.,most of the VMs only generating all possible g-partitions(1<g 3)requires O(n2)time.For each possible permutation of a partition, have normal workloads. We consider both the situations without and with live evaluating the amount of resources needed requires migration,where different metrics are used to evalu- O(k)time,thus,the total time complexity of Alg.3 is ,k91 Ok∑g1合.In practice,wecan choosea properG ate the runtime performance.For experiments without live migration,where only local resizing is allowed to to achieve a balance between complexity and optimality. dynamically provision resources,we use the capacity We use PM H2 in Fig.8 as an example,where the sizes overflow ratio (COR)defined in Section 4 as the perfor- of the four spikes are 13,10,10,and 9.Without loss of mance metric.Next,in our testbed experiments,we add generality,we let RI=13,R2=10,R=10,and R=9.live migration to our system to simulate a more realistic We consider all ordered 1-partition,2-partitions,and 3- computing cluster,in which the number of migrations partitions of k=4,i.e.,G=3.Fig.9 shows the results,reflects the quality of performance,and the number of where the optimal ordered partition is [2,2). active PMs reflects the level of energy consumption. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 9 Algorithm 3 Finding Optimal Partition (FidOptPat) Input: M inN, an array that stores the minimum numbers of blocks that need to be reserved on a PM, with respect to k, pon, poff , and ρ; R1 e , R2 e , ..., Rk e , a set of workload spikes that are sorted in the descending order of their values; G, the maximum number of groups Output: S, an ordered partition 1: S ← {k} 2: rmin ← M inN[k] × R1 e 3: Let x0 ← 0 for convenience 4: for g = 2 to G do 5: Generate all g-partitions of k (using Equ. (11)) 6: for each g-partition {x1, ..., xg} do 7: for each permutation x ′ 1 , ..., x′ g of x1, ..., xg do 8: r ← ∑g i=1 M inN[x ′ i ] × max{Rj e | i ∑−1 h=0 x ′ h < j ≤ ∑ i h=0 x ′ h } 9: if r < rmin then 10: rmin ← r, S ← {x ′ 1 , ..., x′ g} 11: end if 12: end for 13: end for 14: end for 15: return S; would be time-consuming. So we use G to restrict the maximum number of groups. We use S to record the best partition so far (line 1), and use rmin to record the amount of resources needed by that partition S (line 2). For each integer g (2 ≤ g ≤ G), we first generate all possible g-partitions of k using Equ. (11); then, for each permutation x ′ 1 , ..., x′ g of a gpartition x1, ..., xg, we compute the amount of resources needed by this ordered partition (line 8) and compare it with rmin: if this ordered partition uses fewer resources than S, we update S and rmin (lines 9-11). Finally, the optimal ordered partition S is returned. It takes O(p(k)) time to generate all possible partitions of an integer k [34]. Since pg(k) ∼ k g−1 g!(g−1)! , generating all possible g-partitions (1 ≤ g ≤ G) requires O( ∑G g=1 k g−1 g!(g−1)!) time. Taking G = 3 for example, since p1(k) = O(1), p2(k) = O(k), and p3(k) = O(k 2 ), generating all possible g-partitions (1 ≤ g ≤ 3) requires O(n 2 ) time. For each possible permutation of a partition, evaluating the amount of resources needed requires O(k) time, thus, the total time complexity of Alg. 3 is O(k ∑G g=1 k g−1 (g−1)!). In practice, we can choose a proper G to achieve a balance between complexity and optimality. We use PM H2 in Fig. 8 as an example, where the sizes of the four spikes are 13, 10, 10, and 9. Without loss of generality, we let R1 e = 13, R2 e = 10, R3 e = 10, and R4 e = 9. We consider all ordered 1-partition, 2-partitions, and 3- partitions of k = 4, i.e., G = 3. Fig. 9 shows the results, where the optimal ordered partition is {2, 2}. Ordered The amount of resources used partition {4} M inN[4] × 13 = 26 {1, 3} M inN[1] × 13 + M inN[3] × 10 = 33 {3, 1} M inN[3] × 13 + M inN[1] × 9 = 35 {2, 2} M inN[2] × 13 + M inN[2] × 10 = 23 {1, 1, 2} M inN[1] × 13 + M inN[1] × 10 + M inN[2] × 10 = 33 {1, 2, 1} M inN[1] × 13 + M inN[2] × 10 + M inN[1] × 9 = 32 {2, 1, 1} M inN[2] × 13 + M inN[1] × 10 + M inN[2] × 9 = 32 Fig. 9. All possible ordered partitions on PM H2 in Fig. 8, where the optimal ordered partition result is {2, 2}. 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 200 250 300 Number of VMs Computation time (milliseconds) d=16 d=32 d=64 d=128 Fig. 10. The computation cost of QUEUE with varying d and n. The cost of the actual placement varies with different physical configurations and thus is not included. 7 PERFORMANCE EVALUATION In this section, we conduct extensive simulations and testbed experiments to evaluate the proposed algorithms under different settings and reveal insights of the proposed design performance. 7.1 Simulation Setup Two commonly-used packing strategies are considered here, which both use the First Fit Decrease heuristic for VM placement. The first strategy is to provision VMs for peak workload (FFD by Rp), while the second is to provision VMs for normal workload (FFD by Rb). Provisioning for peak workload is usually applied for the initial VM placement [1], where cloud tenants choose the peak workload as the fixed capacity of the VM to guarantee application performance. On the other hand, provisioning for normal workload is usually applied in the consolidation process, since at runtime the majority of VMs are in the OFF state, i.e., most of the VMs only have normal workloads. We consider both the situations without and with live migration, where different metrics are used to evaluate the runtime performance. For experiments without live migration, where only local resizing is allowed to dynamically provision resources, we use the capacity overflow ratio (COR) defined in Section 4 as the performance metric. Next, in our testbed experiments, we add live migration to our system to simulate a more realistic computing cluster, in which the number of migrations reflects the quality of performance, and the number of active PMs reflects the level of energy consumption

This article has been accepted for publication in a future issue of this journal,but has not been fully edited.Content may change prior to final publication.Citation information:DOI 10.1109/TPDS.2015.2425403,IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS,VOL.-.NO.-,MONTH YEAR 10 e (a)R=Re (b)R>Re (c)RRe,R6∈[12,20，Re∈2,10.(c)R6Re (c)RRe,R6∈[12,20]，Re∈[2,10l,and for used for each experiment:Ro Re,R Re and Figs.11(c)and12(c),RRe are 30%and According to the results in Section 5.3,the time com- 18%,respectively.FFD by Ro(denoted as RB)uses even plexity of QUEUE is O(d4+nlogn +mn).In Fig.10,we fewer PMs,but the runtime performance is disastrous present the experimental computation cost of QUEUE according to Fig.12.The COR of RB is unacceptably with reasonable d and n values.We see that,our algo- high.With larger spike sizes (R<Re),the packing rithm incurs very few overheads with moderate n and d result of QUEUE is better,because more PMs are saved values.The cost variation with respect to n is not even compared with RP,and fewer additional PMs (for live distinguishable in the millisecond-level. migrations)are used compared with RB(see Fig.11(c)). To evaluate the consolidation performance of QUEUE Simultaneously,with larger spike sizes,the average COR in different settings,we then choose Ro and Re uniform- of QUEUE is slightly higher,but is still bounded by p ly and randomly from a certain range for each VM.We (see Fig.12(c)).The case of smaller spike sizes shows the repeat the experiments multiple times for convergence. opposite results. The capacity overflow ratio (COR)is used here as the We mention that,there are very few PMs with CORs metric of runtime performance.Since FFD by Rp never slightly higher than p in each experiment.This is be- incurs capacity violations,it is not included in the per- cause a Markov chain needs some time to enter into its formance assessment. stationary distribution.Though we did not theoretically Figs.11 and 12 show the packing results and COR evaluate whether the chain constructed in Section 5 results,respectively.The common settings for three sub-is rapid-mixing,in our experiments,we find that the figures are as follows:p=0.01,d=16,Pon =0.01,Poff time period before the chain enters into its stationary 0.09,and CjE [80,100].As we discussed in Section 3,pon distribution is very short. 1045-9219(c)2015 IEEE.Personal use is permitted,but republication/redistribution requires IEEE permission.See http://www.ieee.org/publications standards/publications/rights/index.html for more information

1045-9219 (c) 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TPDS.2015.2425403, IEEE Transactions on Parallel and Distributed Systems IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS, VOL. –, NO. -, MONTH YEAR 10 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 Number of VMs Number of PMs used RP QUEUE RB (a) Rb = Re 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 Number of VMs Number of PMs used RP QUEUE RB (b) Rb > Re 100 200 300 400 500 600 700 800 900 1000 0 50 100 150 Number of VMs Number of PMs used RP QUEUE RB (c) Rb Re, Rb ∈ [12, 20], Re ∈ [2, 10]. (c) Rb Re 100 200 400 800 1000 1% 10% 30% 50% 70% 100% 0.1% 600 Average C O R QUEUE RB Capacity Overflow Ratio Threshold 1XPEHURIVMs (c) Rb Re and Rb Re, Rb ∈ [12, 20], Re ∈ [2, 10], and for Figs. 11(c) and 12(c), Rb Re are 30% and 18%, respectively. FFD by Rb (denoted as RB) uses even fewer PMs, but the runtime performance is disastrous according to Fig. 12. The COR of RB is unacceptably high. With larger spike sizes (Rb < Re), the packing result of QUEUE is better, because more PMs are saved compared with RP, and fewer additional PMs (for live migrations) are used compared with RB (see Fig. 11(c)). Simultaneously, with larger spike sizes, the average COR of QUEUE is slightly higher, but is still bounded by ρ (see Fig. 12(c)). The case of smaller spike sizes shows the opposite results. We mention that, there are very few PMs with CORs slightly higher than ρ in each experiment. This is because a Markov chain needs some time to enter into its stationary distribution. Though we did not theoretically evaluate whether the chain constructed in Section 5 is rapid-mixing, in our experiments, we find that the time period before the chain enters into its stationary distribution is very short.