Availability "Weak Point Analysis over an SOA Deployment Framework Lei Xiel,Jing Luo2,Jie Qiu2,John A Pershing3,Ying Li2,Ying Chen2 1Department of Computer Science,Nanjing University xielei@dislab.nju.edu.cn 2 IBM China Research Lab {jingluo,qiujie,lying,yingch}@cn.ibm.com 3 IBM T.J.Watson Research Center,Ha wthorne,NY 10532 pershng@us.ibm.com I Abstract-Availability is one of the important factors to to the workflow specification,the availability for the chains be considered for business-driven IT service management.This of resources over the IT infrastructure forms the end to end paper addresses the issue of analyzing what we call availability availability. weak-points in an SOA deployment framework,leveraging work- flow definitions to specify the high availability requirement at the Note that,even if all single points of failure have been business process level.In our weak-point analysis framework,we made redundant,some of these (redundant)resources still present an effective analysis methodology to calculate the optimal may not exhibit the necessary availability level to satisfy high availability solution with minimum cost,while meeting the the requirements of the business processes.We refer to this business level availability requirements.We evaluate the weak- point analysis methodology,and show that our methodology can situation as an availability weak-point,and it may be necessary identify a near-optimal solution for availability enhancement over to introduce even more redundancy in order to meet the the SOA deployment framework. availability requirements. The key to deliver successful,robust solutions is determin- I.INTRODUCTION ing the right level of high availability IT infrastructure [5]:not Service-Oriented Architecture (SOA)has opened up new enough could result in costly outages,and too much could be opportunities for organizations seeking more flexibility and an expensive waste.So it makes sense to perform availability responsiveness to business demands over the large scale de- analysis over the distributed IT infrastructure in conjunction ployed IT infrastructures.Availability of computing resources with business level requirements,and further plan for high is an important consideration for IT service management. availability solutions. Note,though,that the actual availability requirements are Therefore,detecting and analyzing the availability weak dictated by the various business processes and services that are points from the SOA deployment topology is the premise for supported by the IT infrastructure;the availability requirement applying high availability (HA)solutions [1][2][3]over the of an individual resource is simply to support the overall IT infrastructure.In this paper we propose a workflow-based availability of the busines processes and services.Business weak-point analysis methodology over the SOA deployment services today are not only doing more work but also have framework:the novelty of our approach is that we propose more users,often spread out across the globe,and requiring a framework to analyze the weak-points and give indications near 24/7 availability. for optimal HA solutions over the deployment topology.Using The basic principle of high availability management for our framework,it can be determined which components from IT infrastructure is to eliminate single points of failure by the topology need to be HA enhanced,and to what level providing redundancy,which can be implemented to varying they should be enhanced to satisfy the business-level HA degrees with a wide range of associated cost and perfor- requirements,while keeping the overall cost close to the mance considerations.Common high availability techniques minimum. include clustering [1],hot failover mechanisms [2][3],recur- The rest of the paper is organized as follows:In section sive restartability [4],redundant arrays of independent disks II we describe the basic structure of our availability weak- (RAIDs),and other approaches.From the business process point analysis framework.We introduce the workflow based level for enterprise applications,the availability metric is methodology for calculating a near-optimal solution in section actually an end to end availability;thus,the business process III.Section IV shows the experiment evaluation to depict can be depicted as a workflow.For general applications the the efficiency of our analysis framework.In Section V we workflow crosses the typical three-tiered IT infrastructure: introduce the related works.Section VI concludes the paper. web tier,middleware tier,database tier.Therefore,according II.THE WEAK-POINT ANALYSIS FRAMEWORK IThis paper work is done when the first author is working as an intern in The overall weak-point analysis framework is shown in IBM China Research Lab. Fig.1.The framework includes the following three major
Availability “Weak Point” Analysis over an SOA Deployment Framework Lei Xie1 , Jing Luo2 , Jie Qiu2 , John A Pershing3 , Ying Li2 ,Ying Chen2 1 Department of Computer Science, Nanjing University xielei@dislab.nju.edu.cn 2 IBM China Research Lab {jingluo, qiujie,lying, yingch}@cn.ibm.com 3 IBM T. J. Watson Research Center, Ha wthorne, NY 10532 pershng@us.ibm.com 1 Abstract— Availability is one of the important factors to be considered for business-driven IT service management. This paper addresses the issue of analyzing what we call availability weak-points in an SOA deployment framework, leveraging work- flow definitions to specify the high availability requirement at the business process level. In our weak-point analysis framework, we present an effective analysis methodology to calculate the optimal high availability solution with minimum cost, while meeting the business level availability requirements. We evaluate the weakpoint analysis methodology, and show that our methodology can identify a near-optimal solution for availability enhancement over the SOA deployment framework. I. INTRODUCTION Service-Oriented Architecture (SOA) has opened up new opportunities for organizations seeking more flexibility and responsiveness to business demands over the large scale deployed IT infrastructures. Availability of computing resources is an important consideration for IT service management. Note, though, that the actual availability requirements are dictated by the various business processes and services that are supported by the IT infrastructure; the availability requirement of an individual resource is simply to support the overall availability of the busines processes and services. Business services today are not only doing more work but also have more users, often spread out across the globe, and requiring near 24/7 availability. The basic principle of high availability management for IT infrastructure is to eliminate single points of failure by providing redundancy, which can be implemented to varying degrees with a wide range of associated cost and performance considerations. Common high availability techniques include clustering [1], hot failover mechanisms [2] [3], recursive restartability [4], redundant arrays of independent disks (RAIDs), and other approaches. From the business process level for enterprise applications, the availability metric is actually an end to end availability; thus, the business process can be depicted as a workflow. For general applications the workflow crosses the typical three-tiered IT infrastructure: web tier, middleware tier, database tier. Therefore, according 1This paper work is done when the first author is working as an intern in IBM China Research Lab. to the workflow specification, the availability for the chains of resources over the IT infrastructure forms the end to end availability. Note that, even if all single points of failure have been made redundant, some of these (redundant) resources still may not exhibit the necessary availability level to satisfy the requirements of the business processes. We refer to this situation as an availability weak-point, and it may be necessary to introduce even more redundancy in order to meet the availability requirements. The key to deliver successful, robust solutions is determining the right level of high availability IT infrastructure [5]: not enough could result in costly outages, and too much could be an expensive waste. So it makes sense to perform availability analysis over the distributed IT infrastructure in conjunction with business level requirements, and further plan for high availability solutions. Therefore, detecting and analyzing the availability weak points from the SOA deployment topology is the premise for applying high availability (HA) solutions [1] [2] [3] over the IT infrastructure. In this paper we propose a workflow-based weak-point analysis methodology over the SOA deployment framework: the novelty of our approach is that we propose a framework to analyze the weak-points and give indications for optimal HA solutions over the deployment topology. Using our framework, it can be determined which components from the topology need to be HA enhanced, and to what level they should be enhanced to satisfy the business-level HA requirements, while keeping the overall cost close to the minimum. The rest of the paper is organized as follows: In section II we describe the basic structure of our availability weakpoint analysis framework. We introduce the workflow based methodology for calculating a near-optimal solution in section III. Section IV shows the experiment evaluation to depict the efficiency of our analysis framework. In Section V we introduce the related works. Section VI concludes the paper. II. THE WEAK-POINT ANALYSIS FRAMEWORK The overall weak-point analysis framework is shown in Fig.1. The framework includes the following three major
modules: as MTBF.MTTR).Then,it checks whether the availability requirement for each business workflow has been satisfied:for those unsatisfied workflows,the resources where the relevant services are deployed should have their availability enhanced through the use of clustering or a "hot standby"configura- tion.The weak-point analysis module calculates the optimal HA solution over the topology,subject to a utility function, ngle Point Anal producing the HA enhancement parameters for each relevant nCk resource.This module utilizes a Lagrangian constrained op- timization algorithm to achieve a near-optimal solution for HA enhancement,we will describe this algorithm in detail HA Pattern Mapping M in Section III. The HA Pattern Mapping Module applies relevant HA patterns to the identified weak-point IT resources according Fig.1.Framework for Workflow based High Availability Analysis to the optimal solution produced by the weak-point analysis module:these patterns may be generic (e.g.,clustering,hot standby)or product-specific (e.g.,DB2 High Availability and Disaster Recovery).This module finally produces an HA- enhanced deployment topology which satisfies the business ws ws level availability requirement,and requires the minimum over- all cost. WAR EAR EAR In our weak-point analysis framework,we specify the usage WAR EAR of the IT resources from the business process level,leveraging BPEL [6]to specify the business process workflows defining the process and services which we are interested in.Over the 2 SOA deployment topology,we can further map the workflows to the application and IT infrastructure levels by inspecting the hosting and dependency relationships that are defined in the Fig.2.Workflow Mapping over the SOA Deployment Topology IT infrastructure.As Fig.2 shows,the hosting relationships are specified over the SOA deployment topology;through the The Workflow Specification Module extracts relevant infor- hosting relationship,Workflow I and Workflow 2 are mapped mation from the business process workflows:it takes business to IT infrastructure level.Through this workflow mapping,we level workflows as input.specifying the process flow at busi- can extract the relevant resource list at the IT infrastructure ness level leveraging the BPEL [6]structure files,and it also level for each workflow.Based on the resource lists and HA takes as input the availability requirement for each business requirements for each workflow over the SOA deployment workflow.The availability requirement can be expressed as topology,we can calculate the optimal HA enhancement equal to MTBF/(MTBF MTTR),where MTBF represents solution for the current deployment topology. mean time between failures and MTTR represents mean time to repair,so the availability requirement actually lies in the III.WORKFLOW BASED WEAK-POINT ANALYSIS range from 0 and 1 (in reality,it probably lies in the range METHODOLOGY from.9 to 1).Then,this module maps the workflow from the The key contribution of our weak-point analysis method- service(business)level to the IT infrastructure level according ology is the use of business-level workflow specifications to the SOA deployment topology.Finally,it produces the to specify availability requirements and to map the flow of extracted workflow mapping matrix,which specifies necessary transactions through the IT infrastructure.In this section we information for the relevant resources for each workflow at the will describe our analysis methodology,which is responsible IT infrastructure level. for recommending HA solutions such that business-level avail- The Weak-Point Analysis Module does weak-point analysis ability requirements are met while keeping the overall cost based on the mapping matrix created by the Specification close to the minimum. Module,plus MTBF,MITR,and cost metrics for the various IT resources.It calculates a near-optimal solution with mini- A.Workflow Specification mum overall cost while meeting the business level availability After analysis by the workflow specification module de- requirements:this solution indicates which resources need to scribed above,we have extracted the list of IT resources which be HA enhanced,and suggests the size of the clusters for are involved in each workflow.Now we give the following those resources.First,the current availability capability for definitions:We assume there exist n workflows over the SOA each workflow is calculated according to the component failure deployment topology,denoted by Wi,W2,W3....,Wn,and behavior parameters obtained from historical experience(such these workflows are specified with availability requirements
modules: Fig. 1. Framework for Workflow based High Availability Analysis Fig. 2. Workflow Mapping over the SOA Deployment Topology The Workflow Specification Module extracts relevant information from the business process workflows: it takes business level workflows as input, specifying the process flow at business level leveraging the BPEL [6] structure files, and it also takes as input the availability requirement for each business workflow. The availability requirement can be expressed as equal to MTBF/(MTBF + MTTR), where MTBF represents mean time between failures and MTTR represents mean time to repair, so the availability requirement actually lies in the range from 0 and 1 (in reality, it probably lies in the range from .9 to 1). Then, this module maps the workflow from the service (business) level to the IT infrastructure level according to the SOA deployment topology. Finally, it produces the extracted workflow mapping matrix, which specifies necessary information for the relevant resources for each workflow at the IT infrastructure level. The Weak-Point Analysis Module does weak-point analysis based on the mapping matrix created by the Specification Module, plus MTBF, MTTR, and cost metrics for the various IT resources. It calculates a near-optimal solution with minimum overall cost while meeting the business level availability requirements; this solution indicates which resources need to be HA enhanced, and suggests the size of the clusters for those resources. First, the current availability capability for each workflow is calculated according to the component failure behavior parameters obtained from historical experience (such as MTBF, MTTR). Then, it checks whether the availability requirement for each business workflow has been satisfied; for those unsatisfied workflows, the resources where the relevant services are deployed should have their availability enhanced through the use of clustering or a “hot standby” configuration. The weak-point analysis module calculates the optimal HA solution over the topology, subject to a utility function, producing the HA enhancement parameters for each relevant resource. This module utilizes a Lagrangian constrained optimization algorithm to achieve a near-optimal solution for HA enhancement, we will describe this algorithm in detail in Section III. The HA Pattern Mapping Module applies relevant HA patterns to the identified weak-point IT resources according to the optimal solution produced by the weak-point analysis module: these patterns may be generic (e.g., clustering, hot standby) or product-specific (e.g., DB2 High Availability and Disaster Recovery). This module finally produces an HAenhanced deployment topology which satisfies the business level availability requirement, and requires the minimum overall cost. In our weak-point analysis framework, we specify the usage of the IT resources from the business process level, leveraging BPEL [6] to specify the business process workflows defining the process and services which we are interested in. Over the SOA deployment topology, we can further map the workflows to the application and IT infrastructure levels by inspecting the hosting and dependency relationships that are defined in the IT infrastructure. As Fig. 2 shows, the hosting relationships are specified over the SOA deployment topology; through the hosting relationship, Workflow 1 and Workflow 2 are mapped to IT infrastructure level. Through this workflow mapping, we can extract the relevant resource list at the IT infrastructure level for each workflow. Based on the resource lists and HA requirements for each workflow over the SOA deployment topology, we can calculate the optimal HA enhancement solution for the current deployment topology. III. WORKFLOW BASED WEAK-POINT ANALYSIS METHODOLOGY The key contribution of our weak-point analysis methodology is the use of business-level workflow specifications to specify availability requirements and to map the flow of transactions through the IT infrastructure. In this section we will describe our analysis methodology, which is responsible for recommending HA solutions such that business-level availability requirements are met while keeping the overall cost close to the minimum. A. Workflow Specification After analysis by the workflow specification module described above, we have extracted the list of IT resources which are involved in each workflow. Now we give the following definitions: We assume there exist n workflows over the SOA deployment topology, denoted by W1, W2, W3,...,Wn, and these workflows are specified with availability requirements
Pi,P2,P3,..,P,where 0P,the requirement is met:otherwise,the TABLE I availability requirement is unsatisfied,and some resources in THE WORKFLOW-RESOURCE RELATIONSHIP MATRIX the resource list of workflow W;need to have their availability enhanced through the deployment of an HA pattern to meet the availability requirement.This is an optimization problem: which resources should be enhanced for availability to meet Business Process Flow (Worknow 1] the availability requirement,while keeping HA enhancement SerAc 2 cost as low as possible? PIC PICa PICN A conventional method of addressing an optimization prob- onent 1 lem is by enumerating all possible solutions and comparing Resourse CI Resource C2 their cost;however,this approach is computationally expensive for all but the simplest problems,and is sometimes unsolvable when the number of resources is large.Literature [7]proposes Resourse C4 an approach to search for the optimal solution through multi- D4e*n4y山nm tier system design,based on exhaustive iteration.However, our weak-point analysis methodology calculates a near-optimal Fig.3.Example BPEL Workflow solution for HA enhancement using the method of Lagrange multipliers [8],which is a compute-effective approach. We construct a matrix to capture the workflow-resource Assume the number of workflows whose availability re- relationship.Table I shows the matrix;the relationship be- quirements have not yet been met is n;for workflow Wi tween workflow Wi and resource Cj is Ri.j,where Ri.j is an we define the enhancement parameter PWi as the amount integer value depicting the number of references to resource by which that workflow's current availability needs to be Ci from workflow Wi,and is set to 0 when resource C;is enhanced to meet the availability requirement Pi: not included in the resource list of Wi.For example,Fig. 3 shows a workflow with two services,which are mapped to three IT infrastructure resources,C1,C2 and C3,plus PWi=- P one resource Ca which is not included in the workflow. P(Wi) (2) Note that,at the application level,Component 1 depends on Component 2 to implement Service 1,Component 2 depends By definition.PW;1.We also define the enhancement on Component 3 to implement Service 2 as well.We denote parameter for each resource as PC1,PC2,..,PCm;thus,we the availability capability of resource Ci as P(Ci),therefore, form the following constraints: the availability for the two services are P(C1).P(C2).P(C3) and P(C2).P(C3);thus,the availability for the workflow is P(C1).P(C2)2.P(C3)2,and the matrix for Workflow I is set to [1,2.2.01.For standalone services which have no PW1≤PCB,1.PC1,2.…PCRm dependency relationships,we can simply set Ri.i to I for all PW2≤PCR.PCB22.…PCR2m the referenced resources,and 0 for unreferenced resources. PW≤PCR,1.PCB2.PCRm (3) B.Optimal Solution Calculation Given the workflow-resource relationship matrix,we can PWn≤PC1.PC2.…PCRm calculate the current availability capability for each workflow according to its resource list.Assume that the availability of In other words,the overall availability enhancement for the m resources are P(C1),P(C2),P(C3),.,P(Cm):these the IT resources within the workflow should be no less than availabilities can be derived from historical measurements the availability enhancement requirement for the workflow. or,perhaps,from data obtained from the manufacturer.For We take the logarithm of the inequalities 3 to simplify the this scenario,we assume that the relevant resources are all calculation,yielding:
P1, P2, P3,..., Pn, where 0 < Pi < 1. We also assume that there are m IT resources in the infrastructure, denoted by C1, C2, ..., Cm. Each resource consists of a “stack” of hardware and software components; for instance, an X86 server, a Linux OS, and a Websphere Application Server. C1 C2 C3 ... Cm W1(P1) R1,1 R1,2 R1,3 ... R1,m W2(P2) R2,1 R2,2 R2,3 ... R2,m ... ... ... ... ... ... Wn(Pn) Rn,1 Rn,2 Rn,3 ... Rn,m TABLE I THE WORKFLOW-RESOURCE RELATIONSHIP MATRIX Fig. 3. Example BPEL Workflow We construct a matrix to capture the workflow-resource relationship. Table I shows the matrix; the relationship between workflow Wi and resource Cj is Ri,j , where Ri,j is an integer value depicting the number of references to resource Cj from workflow Wi , and is set to 0 when resource Cj is not included in the resource list of Wi . For example, Fig. 3 shows a workflow with two services, which are mapped to three IT infrastructure resources, C1, C2 and C3, plus one resource C4 which is not included in the workflow. Note that, at the application level, Component 1 depends on Component 2 to implement Service 1, Component 2 depends on Component 3 to implement Service 2 as well. We denote the availability capability of resource Ci as P(Ci), therefore, the availability for the two services are P(C1)·P(C2)·P(C3) and P(C2) · P(C3); thus, the availability for the workflow is P(C1) · P(C2) 2 · P(C3) 2 , and the matrix for Workflow 1 is set to [1,2,2,0]. For standalone services which have no dependency relationships, we can simply set Ri,j to 1 for all the referenced resources, and 0 for unreferenced resources. B. Optimal Solution Calculation Given the workflow-resource relationship matrix, we can calculate the current availability capability for each workflow according to its resource list. Assume that the availability of the m resources are P(C1), P(C2), P(C3), ..., P(Cm): these availabilities can be derived from historical measurements or, perhaps, from data obtained from the manufacturer. For this scenario, we assume that the relevant resources are all standalone at initial step for calculating, so we can calculate the current availability for each workflow as formula 1 shows: P(Wi) = Ym j=1 (P(Cj ) Ri,j ) (1) P(Wi) is the current availability capability for workflow Wi . We compare it with the workflow’s availability requirement Pi : if P(Wi) ≥ Pi , the requirement is met; otherwise, the availability requirement is unsatisfied, and some resources in the resource list of workflow Wi need to have their availability enhanced through the deployment of an HA pattern to meet the availability requirement. This is an optimization problem: which resources should be enhanced for availability to meet the availability requirement, while keeping HA enhancement cost as low as possible? A conventional method of addressing an optimization problem is by enumerating all possible solutions and comparing their cost; however, this approach is computationally expensive for all but the simplest problems, and is sometimes unsolvable when the number of resources is large. Literature [7] proposes an approach to search for the optimal solution through multitier system design, based on exhaustive iteration. However, our weak-point analysis methodology calculates a near-optimal solution for HA enhancement using the method of Lagrange multipliers [8], which is a compute-effective approach. Assume the number of workflows whose availability requirements have not yet been met is n; for workflow Wi we define the enhancement parameter PWi as the amount by which that workflow’s current availability needs to be enhanced to meet the availability requirement Pi : PWi = Pi P(Wi) (2) By definition, PWi ≥ 1. We also define the enhancement parameter for each resource as P C1, P C2, ..., P Cm; thus, we form the following constraints: PW1 ≤ P CR1,1 1 · P CR1,2 2 · ... · P CR1,m m PW2 ≤ P CR2,1 1 · P CR2,2 2 · ... · P CR2,m m ... PWi ≤ P CRi,1 1 · P CRi,2 2 · ... · P CRi,m m ... PWn ≤ P CRn,1 1 · P CRn,2 2 · ... · P CRn,m m (3) In other words, the overall availability enhancement for the IT resources within the workflow should be no less than the availability enhancement requirement for the workflow. We take the logarithm of the inequalities 3 to simplify the calculation, yielding:
Therefore solution P(X1..X,..,Xm)has lower cost n(PWi)≤R.1·ln(PC)+.+R1,m·ln(PCm) than P(X1,...,Xi,...,Xm).Thus the former assumption that n(PW2)≤R2.1·n(PC)+.+R2.m·ln(PCm) P(X1,X2,...,Xm)is an optimal solution point is untenable, proving that the optimal solution will definitely exist on some ln(PW)≤R.1·ln(PC)+.+Ri.m·n(PCm) (4)closed lower boundary of the constraint space. ■ Therefore,the closed lower boundaries for the constraint n(PWn)≤Rm,l·ln(PC)+…+Rm,m·ln(PCm) space can be expressed with an equation g(X1,X2,...,Xm)= 0,g(X1,X2,...,Xm)can be a piecewise function to depict the We let In(PC1),In(PC2),...,In(PCm)as X1,X2....,Xm, different closed boundaries. there exists0≤Xi≤ln(pa)because1sPCi≤p; The optimal HA enhancement solution is eventually deter- for the failover HA pattern where only one primary server and mined by the overall utility function.The utility function for one standby server exist in the cluster,we can adjust the upper the specified resource Ci is associated with two parameters: bound to n()and we can adjust the lower bound n the original HA cluster size of resource C(for standalone P(CO from 0 to n()if we want the initial cluster resources,ni is set to 1),and Xi,the enhancement parameter P(C) size to be ni instead of 1,and we let In(PW1),In(PW2),..., for resource i.Therefore,the utility function for resource Ci In(PWn)as B1,B2,..,Bn,therefore the following constraints can be expressed as fi(ni,Xi),and the overall cost will be as should be satisfied: follows: B1≤B1,1·X1+…+B1,m·Xm f(X1,X2,,Xm)=fi(n1,X1)+f2(n2,X2)+… B2≤R2,1·X1+…+B2.m·Xm +fm(nm,Xm)=∑f(n,X) (6) B:≤R,1·X1+…+Ri,m·Xm i=1 The utility function fi(n,Xi)can be defined like this as Bn≤Rn,1·X1++Rn,m·Xm (5) an example: 0≤X≤ln(pCa) 0≤X2≤ln(pa) fi(ni,Xi)=Ei(ni-ni) (7) 4 0≤Xm≤ln(pda) In the above equation,n;denotes the cluster size of resource C;after HA enhancement.and E;denotes the cost for avail- The above constraint forms a continuous region for the ability enhancement per unit;it can include the initial fixed solutions in the multi-dimensional space S(X1,X2,X3.. cost for purchasing hardware and software,and the annual Xm).We utilize a utility function f to depict the overall maintenance cost.The utility function is determined by the cost for HA enhancement,and we will prove that the closed business service providers who want to provide appropriate lower boundaries of the solution space will include the optimal IT resources to support their business services at appropriate solution for the minimum enhancement cost.Therefore we can cost;thus,it may vary according to their demands.Now,we achieve the optimal solution for the utility function subject to can calculate n;according to Xi and we can get the example the constrained solution space of the closed lower boundaries. utility function as equation 8: Theorem 1:The closed lower boundaries of the solution region in the multi-dimensional space S(X1,X2,X3....,Xm) will include the optimal solution Popt. P'(C)=1-(1-P(C)m4 Proof:Assume there exists an optimal solution point P(C)=P(C)·PC P(X1,X2....Xm)in the constraint space beyond the closed Xi=In(PCi) lower boundaries;we will prove that there will exist a solution point which is a better solution compared to point P,therefore →n=r1-Pce91 (8) further proving that the optimal solution Popt locates in the In(1-P(Ci)) closed lower boundaries of the constraint space.Here we In the above formula P'(C:)denotes the enhanced avail- define the overall utility function for HA enhancement as f ability for resource Ci,and P(Ci)denotes the availability of We define x as the mapping from point one single resource.Therefore the optimal solution can be Pi(X1:....Xi;...,Xm)to point Pi(X1,...Xi,....Xm) calculated with the utility function subject to the constraint in closed lower boundary Bi along decreasing direction in depicted by equation g(X1,X2,...,Xm)=0.Following the the X;dimension: Lagrange multiplier method [8],we construct the auxiliary Pi(X1,....Xi;...,Xm)xPi(X1,....Xi,...Xm). function F(X1,X2,...,Xm,A)to calculate the optimal solu- .0<X<Xi and the utility function f always has tion,defining it as equation 9 shows,where f(X1,X2.....Xm) positive correlation with enhancement parameter Xi, denotes the utility function,and g(X1,X2...,Xm)denotes the ..f(P(X1,...,Xi,...Xm))<f(P(X1,...,Xi,...,Xm)). function for the constraint space:
ln (PW1) ≤ R1,1 · ln (P C1) + ... + R1,m · ln (P Cm) ln (PW2) ≤ R2,1 · ln (P C1) + ... + R2,m · ln (P Cm) ... ln (PWi) ≤ Ri,1 · ln (P C1) + ... + Ri,m · ln (P Cm) ... ln (PWn) ≤ Rn,1 · ln (P C1) + ... + Rn,m · ln (P Cm) (4) We let ln (P C1), ln (P C2),..., ln (P Cm) as X1, X2,...,Xm, there exists 0 ≤ Xi ≤ ln ( 1 P (Ci) ) because 1 ≤ P Ci ≤ 1 P (Ci) , for the failover HA pattern where only one primary server and one standby server exist in the cluster, we can adjust the upper bound to ln( 1−(1−P (Ci))2 P (Ci) ) and we can adjust the lower bound from 0 to ln( 1−(1−P (Ci))ni P (Ci) ) if we want the initial cluster size to be ni instead of 1, and we let ln(PW1), ln(PW2),..., ln(PWn) as B1, B2,...,Bn, therefore the following constraints should be satisfied: B1 ≤ R1,1 · X1 + ... + R1,m · Xm B2 ≤ R2,1 · X1 + ... + R2,m · Xm ... Bi ≤ Ri,1 · X1 + ... + Ri,m · Xm ... Bn ≤ Rn,1 · X1 + ... + Rn,m · Xm 0 ≤ X1 ≤ ln( 1 P (C1) ) 0 ≤ X2 ≤ ln( 1 P (C2) ) ... 0 ≤ Xm ≤ ln( 1 P (Cm) ) (5) The above constraint forms a continuous region for the solutions in the multi-dimensional space S(X1, X2, X3,..., Xm). We utilize a utility function f to depict the overall cost for HA enhancement, and we will prove that the closed lower boundaries of the solution space will include the optimal solution for the minimum enhancement cost. Therefore we can achieve the optimal solution for the utility function subject to the constrained solution space of the closed lower boundaries. Theorem 1: The closed lower boundaries of the solution region in the multi-dimensional space S(X1, X2, X3,..., Xm) will include the optimal solution Popt. Proof: Assume there exists an optimal solution point P(X1, X2, ..., Xm) in the constraint space beyond the closed lower boundaries; we will prove that there will exist a solution point which is a better solution compared to point P, therefore further proving that the optimal solution Popt locates in the closed lower boundaries of the constraint space. Here we define the overall utility function for HA enhancement as f. We define ⇒Xi as the mapping from point P1(X1, ..., Xi , ..., Xm) to point Pi(X1, ..., X0 i , ..., Xm) in closed lower boundary Bi along decreasing direction in the Xi dimension: P1(X1, ..., Xi , ..., Xm) ⇒Xi Pi(X1, ..., X0 i , ..., Xm). ∵ 0 < X0 i < Xi and the utility function f always has positive correlation with enhancement parameter Xi , ∴ f(P1(X1, ..., X0 i , ..., Xm)) < f(P1(X1, ..., Xi , ..., Xm)). Therefore solution Pi(X1, ..., X0 i , ..., Xm) has lower cost than P1(X1, ..., Xi , ..., Xm). Thus the former assumption that P(X1, X2, ..., Xm) is an optimal solution point is untenable, proving that the optimal solution will definitely exist on some closed lower boundary of the constraint space. Therefore, the closed lower boundaries for the constraint space can be expressed with an equation g(X1, X2, ..., Xm) = 0, g(X1, X2, ..., Xm) can be a piecewise function to depict the different closed boundaries. The optimal HA enhancement solution is eventually determined by the overall utility function. The utility function for the specified resource Ci is associated with two parameters: ni , the original HA cluster size of resource Ci (for standalone resources, ni is set to 1), and Xi , the enhancement parameter for resource i. Therefore, the utility function for resource Ci can be expressed as fi(ni , Xi), and the overall cost will be as follows: f(X1, X2, ..., Xm) = f1(n1, X1) + f2(n2, X2) + ... +fm(nm, Xm) = Xm i=1 fi(ni , Xi) (6) The utility function fi(ni , Xi) can be defined like this as an example: fi(ni , Xi) = Ei(n 0 i − ni) (7) In the above equation, n 0 i denotes the cluster size of resource Ci after HA enhancement, and Ei denotes the cost for availability enhancement per unit; it can include the initial fixed cost for purchasing hardware and software, and the annual maintenance cost. The utility function is determined by the business service providers who want to provide appropriate IT resources to support their business services at appropriate cost; thus, it may vary according to their demands. Now, we can calculate n 0 i according to Xi and we can get the example utility function as equation 8: P 0 (Ci) = 1 − (1 − P(Ci))n 0 i P 0 (Ci) = P(Ci) · P Ci Xi = ln(P Ci) ⇒ n 0 i = d ln(1 − P(Ci) · e Xi ) ln(1 − P(Ci)) e (8) In the above formula P 0 (Ci) denotes the enhanced availability for resource Ci , and P(Ci) denotes the availability of one single resource. Therefore the optimal solution can be calculated with the utility function subject to the constraint depicted by equation g(X1, X2, ..., Xm) = 0. Following the Lagrange multiplier method [8], we construct the auxiliary function F(X1, X2, ..., Xm, λ) to calculate the optimal solution, defining it as equation 9 shows, where f(X1, X2, ..., Xm) denotes the utility function, and g(X1, X2, ..., Xm) denotes the function for the constraint space:
availability requirement of workflow Wi.In this way,the F(X1,X2,,Xm,)=f(X1,X2,,Xm) priority list of resources can be determined according to the weight.Those resources which support more workflows and +入·g(X1,X2,,Xm)】 (9) more availability-critical workflows will have higher weights. By calculating the following partial derivatives according According to the priority list,the top k resources can be to the Lagrange multiplier method.we can finally get the selected to calculate the HA solution:the calculated solution optimal solution (X1,X2....Xm).(Fdenotes to calculate will be a near optimal solution for only the k candidate re- the partial derivative function for F according to the variable sources which are taken into consideration,but the calculation X.) complexity can be greatly reduced according to the selected 是FX,X2…,Xm,)=0 number k. a成F(X1,X2,,Xm,)=0 D.Computational Complexity Analysis (10) In this section,we analyze the computational complexity 泉F(X,X2,,Xm,)=0 of the conventional exhaustive iteration method and our op- According to the optimal solution for resource HA enhance- timal solution calculation method.Assume that there exist ment (X1,X2,...,Xm),we can get the enhanced availabilities n candidate resources which need to be HA enhanced,and (P(C1),P(C2),...,P(Cm)),and the exact HA solutions we set the upper bound for the cluster size of any resource can be found (e.g,whether a cluster should be constructed to k (which is necessary for the iteration method but not and what is the size of cluster).Assume there should be n for our optimal solution calculation method).Then,for the members to support the HA cluster,the availability capability iteration method,the computational complexity to arrive at the for the cluster should be as follows: optimal solution is·k·..·k,that is,.O(k").For our optimal solution calculation method,since the solution is calculated P(C)=1-(1-P(C)m (11) by solving the equations 10,the computational complexity is According to the above formula,the size n of the cluster only bound by the number of variables in the equations,which can be calculated as follows: have the computational complexity of a polynomial:that is, O(nm),where m is a constant.Apparently,our method is n(1-P(C) scalable to the size of candidate resources,and has much lower n=[in(1-P(C)) (12) computational complexity than the iteration method when the Leveraging the domain information for the component,the number of candidate resources is large. HA cluster pattern can be generated and deployed into the E.Alternative Resource Selection topology. For some HA requirement analysis cases users may not C.Weight-based Optimization Approach to Reduce Calcula-be able to confirm the exact components of the resource,for tion Complexity example,for DB2 HA solution,the user is not sure whether Because the number of candidate resources for availabil-a hotstandby solution with X86 platform will well satisfy ity enhancement over the IT infrastructure can be large,it the HA requirement or a mainframe solution is better,user increases the computational complexity of calculating the may specify several candidate resources types for the exact optimal solutions by solving equations 10.Therefore,we pro- resource.Based on the above analysis,we further propose an pose a method to effectively reduce the number of candidate algorithm with alternative resource selection,as algorithm I resources,in order to simplify the calculation. shows.Here we abstract our availability weak point analysis The principle of our weight-based optimization approach is methodology into a function WeakPointAnalysis(ResourceList, to select a subset of the IT resources,based on weight,for use UtilityFunction,Topology.WorkflowList).As shown in al- in the optimal solution calculation.We note that,for those gorithm 1,we first generate all possible resource lists and resources which are involved in more workflows with more relevant utility functions according to the various candidate critical availability requirements,enhancing the availablity of resource types specified by user,then we leverage function these resources will yield better overall HA enhancement WeakPointAnalysis to calculate various solutions according to for the workflows,in a cost-efficient manner.Therefore,we those various resource lists.Thus we can finally decide the propose a weight-based method to select relevant resources best solution among those candidate solutions. as follows:for resource Ci,we define the weight for Ci F Example calculated as: In this section,we show a detail example to depict our optimal solution calculation work.Fig.4 shows an example W(C)= 〉(R·P) (13) topology for HA enhancement,there exist two candidate 1=1 resources standalone resource Ci and C2 over the original In the above formula,R.;denotes the Integer value defined topology which need to be availability enhanced,and resource in the workflow-resource mapping matrix.P denotes the C3 has been supported by a mainframe which needs no HA
F(X1, X2, ..., Xm, λ) = f(X1, X2, ..., Xm) +λ · g(X1, X2, ..., Xm) (9) By calculating the following partial derivatives according to the Lagrange multiplier method, we can finally get the optimal solution (X1, X2, ..., Xm). ( ∂ ∂X F denotes to calculate the partial derivative function for F according to the variable X.) ∂ ∂X1 F(X1, X2, ..., Xm, λ) = 0 ∂ ∂X2 F(X1, X2, ..., Xm, λ) = 0 ... ∂ ∂λ F(X1, X2, ..., Xm, λ) = 0 (10) According to the optimal solution for resource HA enhancement (X1, X2, ..., Xm), we can get the enhanced availabilities (P 0 (C1), P0 (C2), ..., P0 (Cm)), and the exact HA solutions can be found (e.g., whether a cluster should be constructed and what is the size of cluster). Assume there should be n members to support the HA cluster; the availability capability for the cluster should be as follows: P 0 (Ci) = 1 − (1 − P(Ci))n (11) According to the above formula, the size n of the cluster can be calculated as follows: n = d ln(1 − P 0 (Ci)) ln(1 − P(Ci)) e (12) Leveraging the domain information for the component, the HA cluster pattern can be generated and deployed into the topology. C. Weight-based Optimization Approach to Reduce Calculation Complexity Because the number of candidate resources for availability enhancement over the IT infrastructure can be large, it increases the computational complexity of calculating the optimal solutions by solving equations 10. Therefore, we propose a method to effectively reduce the number of candidate resources, in order to simplify the calculation. The principle of our weight-based optimization approach is to select a subset of the IT resources, based on weight, for use in the optimal solution calculation. We note that, for those resources which are involved in more workflows with more critical availability requirements, enhancing the availablity of these resources will yield better overall HA enhancement for the workflows, in a cost-efficient manner. Therefore, we propose a weight-based method to select relevant resources as follows: for resource Cj , we define the weight for Cj calculated as: W(Cj ) = Xn i=1 (Ri,j · Pi) (13) In the above formula, Ri,j denotes the Integer value defined in the workflow-resource mapping matrix. Pi denotes the availability requirement of workflow Wi . In this way, the priority list of resources can be determined according to the weight. Those resources which support more workflows and more availability-critical workflows will have higher weights. According to the priority list, the top k resources can be selected to calculate the HA solution; the calculated solution will be a near optimal solution for only the k candidate resources which are taken into consideration, but the calculation complexity can be greatly reduced according to the selected number k. D. Computational Complexity Analysis In this section, we analyze the computational complexity of the conventional exhaustive iteration method and our optimal solution calculation method. Assume that there exist n candidate resources which need to be HA enhanced, and we set the upper bound for the cluster size of any resource to k (which is necessary for the iteration method but not for our optimal solution calculation method). Then, for the iteration method, the computational complexity to arrive at the optimal solution is k · k · ... · k | {z } n ; that is, O(k n). For our optimal solution calculation method, since the solution is calculated by solving the equations 10, the computational complexity is only bound by the number of variables in the equations, which have the computational complexity of a polynomial: that is, O(n m), where m is a constant. Apparently, our method is scalable to the size of candidate resources, and has much lower computational complexity than the iteration method when the number of candidate resources is large. E. Alternative Resource Selection For some HA requirement analysis cases users may not be able to confirm the exact components of the resource, for example, for DB2 HA solution, the user is not sure whether a hotstandby solution with X86 platform will well satisfy the HA requirement or a mainframe solution is better, user may specify several candidate resources types for the exact resource. Based on the above analysis, we further propose an algorithm with alternative resource selection, as algorithm 1 shows. Here we abstract our availability weak point analysis methodology into a function WeakPointAnalysis(ResourceList, UtilityFunction, Topology, WorkflowList). As shown in algorithm 1, we first generate all possible resource lists and relevant utility functions according to the various candidate resource types specified by user, then we leverage function WeakPointAnalysis to calculate various solutions according to those various resource lists. Thus we can finally decide the best solution among those candidate solutions. F. Example In this section, we show a detail example to depict our optimal solution calculation work. Fig.4 shows an example topology for HA enhancement, there exist two candidate resources standalone resource C1 and C2 over the original topology which need to be availability enhanced, and resource C3 has been supported by a mainframe which needs no HA
Algorithm 1 HA Weak Point Analysis Algorithm with Alter- Therefore we have g(X1,X2)=0 to denote the closed native Resource Selection lower boundary of solution space,as formula 18 shows: MinCost=MaxNumber OptimalSolution=Null ln(1.04)≤X1+X2 for every Resource Ci in ResourceList do n(1.03)≤X1 0≤X1≤ln(o)) 1 →g(X1,X2)= if NumofCandidateResource(C;)>I then //This resource has alternative component selection 0≤X2≤ln(o.9) for every CandidateResource CRj in Resource Ci X1=ln(1.03)(X2∈(m(1.04)-ln(1.03),ln(o.s)》 do X1+X2-n(1.1)(X1∈(n(1.03),n(1.04)) CR;=GetCandidateResource(Ci) X2=0(X1∈(n(1.04),ln(o)) ResourceList=GenerateResourceList(CRj,Ci) (18) UtilityFunction=GenerateUtilityFunction(CRj,C) AddtoResourceListPool(ResourceList) Fig.5 shows the solution space of (X1,X2)according to the AddtoUtilityFunctionPool(UtilityFunction) end for 取iou6c时 end if end for for every ResourceList and counterpart UtilityFunction in R5rG2 -R5rcC3 ResourceListPool and UtilityFunctionPool do Cost Solution =WeakPointAnalysis(ResourceList,Utility- Function,Topology,WorkflowList) if Cost<MinCost then MinCost=Cost OptimalSolution=Solution Fig.4.HA Enhancement for the Example Topology end if end for Output MinCost,OptimalSolution n10.5 lh(1.04 enhancement,and there're two workflows Wi and W2 which need to be availability enhanced,and PW1 95%/(95%. 95%)=1.04,PW2 =98%/95%=1.03,therefore we have the following inequalities constraint: n1.03 In(1.04)In(10.95)X1 1n(1.04)≤X1+X2 1n(1.03)<X1 0X(p) (14) Fig.5.The solution space of (X1,X2) 0≤X2≤n(o.】 constraint 14.Actually,in this example the dotted line depicts the range of possible optimal solutions (X1,X2);thus,the According to the example utility function as denoted in function g(X1,X2)can be reduced to: equation 7,we define the utility function for Cl and C2 as follows,where we let the HA enhancement cost E as $100 and E2 as $50 g(X1,X2)=X1+X2-ln(1.1)(X1∈(n(1.03),1n(1.04) (19) (1-0.95·e .-1) Therefore we get the auxiliary function according to equa- f1(1,X1)=100.( (15) 1m(0.05) tion 9. F(X1,X2,)=fX1,X2)+入·g(X1,X2) (20) f24,X2)=50-1-0.o5.e)-1) (16) 1n(0.05) And we have: Therefore we have the overall utility function: 化2,-0 F(X,X2,A)=0 (21) f(X1,X2)=f(1,X1)+f2(1,X2) (17) 品F(X1,X2,)=0
Algorithm 1 HA Weak Point Analysis Algorithm with Alternative Resource Selection MinCost=MaxNumber OptimalSolution=Null for every Resource Ci in ResourceList do if NumofCandidateResource(Ci)>1 then //This resource has alternative component selection for every CandidateResource CRj in Resource Ci do CRj=GetCandidateResource(Ci) ResourceList=GenerateResourceList(CRj , Ci) UtilityFunction=GenerateUtilityFunction(CRj , Ci) AddtoResourceListPool(ResourceList) AddtoUtilityFunctionPool(UtilityFunction) end for end if end for for every ResourceList and counterpart UtilityFunction in ResourceListPool and UtilityFunctionPool do Cost,Solution =WeakPointAnalysis(ResourceList, UtilityFunction, Topology, WorkflowList) if Cost<MinCost then MinCost=Cost OptimalSolution=Solution end if end for Output MinCost,OptimalSolution enhancement, and there’re two workflows W1 and W2 which need to be availability enhanced, and PW1 = 95%/(95% · 95%) = 1.04, PW2 = 98%/95% = 1.03, therefore we have the following inequalities constraint: ln(1.04) ≤ X1 + X2 ln(1.03) ≤ X1 0 ≤ X1 ≤ ln( 1 0.95 ) 0 ≤ X2 ≤ ln( 1 0.95 ) (14) According to the example utility function as denoted in equation 7, we define the utility function for C1 and C2 as follows, where we let the HA enhancement cost E1 as $100 and E2 as $50: f1(1, X1) = 100 · ( ln(1 − 0.95 · e X1 ) ln(0.05) − 1) (15) f2(1, X2) = 50 · ( ln(1 − 0.95 · e X2 ) ln(0.05) − 1) (16) Therefore we have the overall utility function: f(X1, X2) = f1(1, X1) + f2(1, X2) (17) Therefore we have g(X1, X2) = 0 to denote the closed lower boundary of solution space, as formula 18 shows: ln(1.04) ≤ X1 + X2 ln(1.03) ≤ X1 0 ≤ X1 ≤ ln( 1 0.95 ) 0 ≤ X2 ≤ ln( 1 0.95 ) ⇒ g(X1, X2) = X1 = ln(1.03)(X2 ∈ (ln(1.04) − ln(1.03), ln( 1 0.95 ))) X1 + X2 − ln(1.1)(X1 ∈ (ln(1.03), ln(1.04))) X2 = 0(X1 ∈ (ln(1.04), ln( 1 0.95 ))) (18) Fig. 5 shows the solution space of (X1, X2) according to the Fig. 4. HA Enhancement for the Example Topology Fig. 5. The solution space of (X1, X2) constraint 14. Actually, in this example the dotted line depicts the range of possible optimal solutions (X1, X2); thus, the function g(X1, X2) can be reduced to: g(X1, X2) = X1 + X2 − ln(1.1)(X1 ∈ (ln(1.03), ln(1.04))) (19) Therefore we get the auxiliary function according to equation 9. F(X1, X2, λ) = f(X1, X2) + λ · g(X1, X2) (20) And we have: ∂ ∂X1 F(X1, X2, λ) = 0 ∂ ∂X2 F(X1, X2, λ) = 0 ∂ ∂λ F(X1, X2, λ) = 0 (21)
Component ColdCost MTTR MTBF No.of Resources Exhaustive Iteration Optimal Solution Calculation x86Server $2400 S2640 5300 3600sec 75days 1000 56 POWERServer S85000 S93500 S1500 3600sec 150days 10 171 LinuxOS 5S0 S0 $0 120sec 45days 1ò 1010 295 WindowsOS S200 120sec 30days 1 105 1516 AIXOS 0 S400 240sec 100days 20 1020 2021 WASServer $0 $100 100sec 30days 30 1030 3011 THSServer $0 S45 $0 50sec 25days DB2Server $0 S60 $0 80sec 30days TABLE V TABLE II COMPUTATIONAL COMPLEXITY CHARACTERISTICS COMPONENT FAILURE BEHAVIOR AND COSTS hardware is powered on,or the software license cost if an Finally we have the optimal solution (X1,X2)=(0.0128, application software component has a usage-based licensing 0.0042),so we have (n1,n2)=([1.3],[1.17)as the near scheme.RepairCost specifies the annual cost to repair the component when the component is down.The annual cost optimal solution,that is (2,2)in this example,and Fig.4 depicts the HA enhanced topology. of a component is the sum of the annual cost to operate it in its operating mode and the initial cost of the component IV.EXPERIMENTAL EVALUATION annualized by dividing by its useful lifetime in years.The A.Experiment Settings annual cost of a resource is the sum of the annual cost of each component. We illustrate the efficiency of our weak-point analysis In this experimental scenario,we create four resources to framework using a simple experimental scenario.For the host relevant web services.Table III depicts the components fundamental parameters for resource availability and cost,we of the resources in our scenario.We specifiy two BPEL specify them as Table II shows.In this table we allocate workflows over the application and resource topologies,as the failure behavior (MTTR.MTBF)and cost (ColdCost, table IV shows. ActiveCost,RepairCost)for each component comprising the three-tier stack of the specified resources. B.Performance Evaluation Based on the above settings for the experimental scenario, Resource Server OS hardware maxSize we can form the utility functions and the workflow-resource resourcel IHSServer LinuxOS x86Server 10 resource2 WASServer LinuxOS x86Server 10 mapping matrix.We get the high availability solution through resource3 WASServer AIXOS POWERServer 10 two approaches for comparison:exhaustive iteration and op- resource4 DB2Server WindowsOS x86Server 10 timal solution calculation methods.Through iteration,we TABLE III generate the optimal solution by simply iterating all possible RESOURCE COMPONENTS solutions for availability requirement and getting the solution for minimum overall cost according to various cost modes. For the optimal solution calculation,we use MATLAB [13]to perform the calculation,leveraging the fimincon algorithm with BPEL Workflow Application Resource workflowl EAR Componentl medium-scale optimization [9]in the MATLAB optimization resourcel EARComponent2 resource2 toolbox.Therefore,we can compare the above two different EARComponent3 resource3 approaches in the following two aspects:solution efficiency DatabaseComponentl resource4 workflow2 EAR Componentl and computational complexity. resourcel DatabaseComponentl resource2 For the solution efficiency,we compare the overall cost for resource4 the solutions found by the two different methods,as Fig.6 TABLE IV shows.From Fig.6 we note that the overall cost raises as WORKFLOW TO RESOURCE MAPPING the availability requirement increases:exhaustive iteration can always achieve the optimal cost when the upper bound for cluster size is set large enough.Our optimal solution achieves For the availability parameters,MTTR specifies the mean the minimum cost of the iteration method most of the time. time to recover after each failure,and MTBF specifies the The reason for any disparity is that we leverage the Lagrange mean time between failures.The single component availability multiplier method,which yields as a solution a vector with can be calculated as MTTR/(MTTR+MTBF) fractional values;however,the solution must be a vector of The cost parameters specify various costs associated with integer values (one cannot deploy 1.2 application servers!). the components:ColdCost specifies the cost when the com- Therefore,the solution we achieve may not be the actual ponent is powered off(e.g.,as a cold spare),ActiveCost optimal solution,but it is near-optimal and can achieve the specifies the cost when the component is powered on (e.g., minimum overall cost for most of the cases. as an active spare).The cost difference may account for For the computational complexity,we compare the number the electrical power costs that are incurred only when the of operations to compute the solutions in Table V.In this
Component ColdCost ActiveCost RepairCost MTTR MTBF x86Server $2400 $2640 $300 3600sec 75days POWERServer $85000 $93500 $1500 3600sec 150days LinuxOS $0 $0 $0 120sec 45days WindowsOS $0 $200 $0 120sec 30days AIXOS $0 $400 $0 240sec 100days WASServer $0 $100 $0 100sec 30days IHSServer $0 $45 $0 50sec 25days DB2Server $0 $60 $0 80sec 30days TABLE II COMPONENT FAILURE BEHAVIOR AND COSTS Finally we have the optimal solution (X1, X2) = (0.0128, 0.0042), so we have (n1, n2) = (d1.3e, d1.1e) as the near optimal solution, that is (2, 2) in this example, and Fig.4 depicts the HA enhanced topology. IV. EXPERIMENTAL EVALUATION A. Experiment Settings We illustrate the efficiency of our weak-point analysis framework using a simple experimental scenario. For the fundamental parameters for resource availability and cost, we specify them as Table II shows. In this table we allocate the failure behavior (MTTR, MTBF) and cost (ColdCost, ActiveCost, RepairCost) for each component comprising the three-tier stack of the specified resources. Resource Server OS hardware maxSize resource1 IHSServer LinuxOS x86Server 10 resource2 WASServer LinuxOS x86Server 10 resource3 WASServer AIXOS POWERServer 10 resource4 DB2Server WindowsOS x86Server 10 TABLE III RESOURCE COMPONENTS BPEL Workflow Application Resource workflow1 EAR Component1 resource1 EARComponent2 resource2 EARComponent3 resource3 DatabaseComponent1 resource4 workflow2 EAR Component1 resource1 DatabaseComponent1 resource2 resource4 TABLE IV WORKFLOW TO RESOURCE MAPPING For the availability parameters, MTTR specifies the mean time to recover after each failure, and MTBF specifies the mean time between failures. The single component availability can be calculated as MTTR/(MTTR+MTBF). The cost parameters specify various costs associated with the components: ColdCost specifies the cost when the component is powered off (e.g., as a cold spare), ActiveCost specifies the cost when the component is powered on (e.g., as an active spare). The cost difference may account for the electrical power costs that are incurred only when the No. of Resources Exhaustive Iteration Optimal Solution Calculation 3 1000 56 5 105 171 10 1010 295 15 1015 1516 20 1020 2021 30 1030 3011 TABLE V COMPUTATIONAL COMPLEXITY CHARACTERISTICS hardware is powered on, or the software license cost if an application software component has a usage-based licensing scheme. RepairCost specifies the annual cost to repair the component when the component is down. The annual cost of a component is the sum of the annual cost to operate it in its operating mode and the initial cost of the component annualized by dividing by its useful lifetime in years. The annual cost of a resource is the sum of the annual cost of each component. In this experimental scenario, we create four resources to host relevant web services. Table III depicts the components of the resources in our scenario. We specifiy two BPEL workflows over the application and resource topologies, as table IV shows. B. Performance Evaluation Based on the above settings for the experimental scenario, we can form the utility functions and the workflow-resource mapping matrix. We get the high availability solution through two approaches for comparison: exhaustive iteration and optimal solution calculation methods. Through iteration, we generate the optimal solution by simply iterating all possible solutions for availability requirement and getting the solution for minimum overall cost according to various cost modes. For the optimal solution calculation, we use MATLAB [13] to perform the calculation, leveraging the fmincon algorithm with medium-scale optimization [9] in the MATLAB optimization toolbox. Therefore, we can compare the above two different approaches in the following two aspects: solution efficiency and computational complexity. For the solution efficiency, we compare the overall cost for the solutions found by the two different methods, as Fig.6 shows. From Fig.6 we note that the overall cost raises as the availability requirement increases; exhaustive iteration can always achieve the optimal cost when the upper bound for cluster size is set large enough. Our optimal solution achieves the minimum cost of the iteration method most of the time. The reason for any disparity is that we leverage the Lagrange multiplier method, which yields as a solution a vector with fractional values; however, the solution must be a vector of integer values (one cannot deploy 1.2 application servers!). Therefore, the solution we achieve may not be the actual optimal solution, but it is near-optimal and can achieve the minimum overall cost for most of the cases. For the computational complexity, we compare the number of operations to compute the solutions in Table V. In this
support system called Mounties that is designed for managing applications and resources using rule-based constraints in scalable mission-critical clustering environment.This paper is our initial efforts towards developing an automated design and deploy framework for the business driven IT management of availability VI.CONCLUSION In this paper we have proposed a workflow based high avail- ability analysis framework to do availability weak-point anal- Fig.6.Solution Efficiency Comparison ysis over an SOA deployment framework,and we have pre- sented a computing-efficient methodology to calculate the op- timal solution;minimizing the overall HA enhancement cost. experiment,we set the default upbound for cluster size to while satisfying the business level availability requirement. 10;the upper bound cannot be too small,since when the Experimental evaluation shows that our analysis methodology optimal solution value is beyond the upper bound,the iteration can achieve a near-optimal solution;our methodology out- method will not be able to find the optimal solution.As Table performs the conventional iteration method in computational V shows,as the number of candidate resources increases, complexity,using a highly compute-efficient approach. the computing complexity for the exhaustive iteration method ACKNOWLEDGMENTS increases exponentially,making the optimal solution extremely expensive for environments with merely tens of resources. The authors would like to thank Guerney Hunt.Jef- In comparison,for our optimal solution calculation method frey Kephart,Tamar Eilam,Alexander V.Konstantinou,and the computing complexity increases in a relatively slow man- Alexander A.Totok for giving comments and feedbacks to ner.When the number of resources reaches 30,the computing shape our vision,contribute ideas and improve this paper. complexity is only 3011 compared to the complexity value REFERENCES 1030 in exhaustive iteration method [1]IBM International Technical Support Organization.WebSphere Appli- V.RELATED WORK cation Server Network Deployment V6:High Availability Solutions, October 2005 Research works on availability analysis mainly focus on de- [2]IBM DB2 Universal Database.Data Recovery and High Availability sign time analysis and runtime analysis.The IBM WebSphere Guide and Reference. [3]M.KamathG.AlonsoG.Alonso.Providing High Availability in Very development group has proposed their work on planning for Large Workow Management Systems.The Fifth International Confer- availability in the enterprise IT infrastructure [5],showing us ence on Extending Database Technology (EDBT 96) [4]G Candea,A Fox.Designing for High Availability and Measurability how to plan and design availability solutions in the end-to-end Proceedings of the Ist Workshop on Evaluating and Architecting System project lifecycle.Researchers from Berkeley have proposed dependability,2001 Pinpoint [10]:a dynamic analysis methodology that automates [5]Rick Robinson,Alexandre Polozoff.IBM WebSphere Developer Tech- nical Journal:Planning for Availability in the Enterprise IBM Software problem determination in large,dynamic internet services, Services for WebSphere leveraging coarse-grained tagging of numerous real client [6]Business Process Execution Language for Web Services version 1.1 requests at runtime combined with data mining techniques to (http://www.ibm.com/developerworks/library/specification/ws-bpel/) [7]G.(John)Janakiraman,Jose Renato Santos,Yoshio Turner.Automated determine the fault components.Our research work addresses Multi-Tier System Design for Service Availability.The First Workshop the availability analysis issue over an IT infrastructure at on Design of Self-Managing Systems(at DSN 2003)22-25 June 2003. design time,leveraging business level workflows to specify San Francisco,Ca lifomia the high level availability requirements. [8]Dimitri P.Bertsekas.Constrained Optimization and Lagrange Multiplier Methods ISBN:1-886529-04-3 Publication:1996,410 pages The idea of business driven IT management to automate [9]Optimization Toolbox For Use with MATLAB Users Guide Version 2 the design and configuration of IT systems to meet user's (http://www.mathworks.com/products/optimization/) (10]Mike Y.Chen,Emre Kiciman,Eugene Fratkin,Armando Fox,Eric A. availability requirements is relatively recent.Researchers at Brewer:Pinpoint:Problem Determination in Large,Dynamic Internet HP Labs have proposed AVED [7],a proof of concept Services.DSN 2002:595-604 [DBLP:conf/dsn/ChenKFFB02] design automation engine to generate cost-effective solution [11]Issam Aib,Mathias Sall,Claudio Bartolini,Abdel Boulmakoul,Raouf Boutaba and Guy Pujolle(2006)"Business-aware Policy-based Manage- from high-level application requirements.And they present a ment"In Proc.Ist IEEE Interational Workshop on Business-Driven IT business-aware policy-based IT management framework [11] Management (BDIM '06),7 April 2006,Vancouver,Canada to leverage SLA and business objectives to effectively manage [12]Chun Zhang,Rong N.Chang,Chang-Shing Perng,Edward So,Chun- IT resource at runtime.Researchers at IBM's T.J.Watson qiang Tang,Tao Tao:QoS-Aware Optimization of Composite-Service Fulfillment Policy.IEEE SCC 2007:11-19 Research Center have proposed a QoS-Aware Optimization [13]MATLAB The Language of Technical Computing Framework [12]to minimize the number of machines sub- (http://www.mathworks.com/products/matlab/) ject to response time and throughput requirements,utilizing [14]Sameh A.Fakhouri,William F.Jerome,Vijay K.Naik,Ajay Raina, Pradeep Varma:Active Middleware Services in a Decision Support Sys- the cross-layer relationship from business process level to tem for Managing Highly Available Distributed Resources.Middleware resource level.And literature [14]has proposed a decision 2000:349.371
0.99 0.999 0.9999 0.99999 0 1 2 3 4 5 6 7 8 x 104 Availability Requirement (%) Additional Cost for Availability (US$) Conventional Iteration Method Optimal Solution Calculation Method Fig. 6. Solution Efficiency Comparison experiment, we set the default upbound for cluster size to 10; the upper bound cannot be too small, since when the optimal solution value is beyond the upper bound, the iteration method will not be able to find the optimal solution. As Table V shows, as the number of candidate resources increases, the computing complexity for the exhaustive iteration method increases exponentially, making the optimal solution extremely expensive for environments with merely tens of resources. In comparison, for our optimal solution calculation method the computing complexity increases in a relatively slow manner. When the number of resources reaches 30, the computing complexity is only 3011 compared to the complexity value 1030 in exhaustive iteration method. V. RELATED WORK Research works on availability analysis mainly focus on design time analysis and runtime analysis. The IBM WebSphere development group has proposed their work on planning for availability in the enterprise IT infrastructure [5], showing us how to plan and design availability solutions in the end-to-end project lifecycle. Researchers from Berkeley have proposed Pinpoint [10]: a dynamic analysis methodology that automates problem determination in large, dynamic internet services, leveraging coarse-grained tagging of numerous real client requests at runtime combined with data mining techniques to determine the fault components.Our research work addresses the availability analysis issue over an IT infrastructure at design time, leveraging business level workflows to specify the high level availability requirements. The idea of business driven IT management to automate the design and configuration of IT systems to meet user’s availability requirements is relatively recent. Researchers at HP Labs have proposed AVED [7], a proof of concept design automation engine to generate cost-effective solution from high-level application requirements. And they present a business-aware policy-based IT management framework [11] to leverage SLA and business objectives to effectively manage IT resource at runtime. Researchers at IBM’s T.J. Watson Research Center have proposed a QoS-Aware Optimization Framework [12] to minimize the number of machines subject to response time and throughput requirements, utilizing the cross-layer relationship from business process level to resource level. And literature [14] has proposed a decision support system called Mounties that is designed for managing applications and resources using rule-based constraints in scalable mission-critical clustering environment. This paper is our initial efforts towards developing an automated design and deploy framework for the business driven IT management of availability. VI. CONCLUSION In this paper we have proposed a workflow based high availability analysis framework to do availability weak-point analysis over an SOA deployment framework, and we have presented a computing-efficient methodology to calculate the optimal solution; minimizing the overall HA enhancement cost, while satisfying the business level availability requirement. Experimental evaluation shows that our analysis methodology can achieve a near-optimal solution; our methodology outperforms the conventional iteration method in computational complexity, using a highly compute-efficient approach. ACKNOWLEDGMENTS The authors would like to thank Guerney Hunt, Jeffrey Kephart, Tamar Eilam, Alexander V. Konstantinou, and Alexander A.Totok for giving comments and feedbacks to shape our vision, contribute ideas and improve this paper. REFERENCES [1] IBM International Technical Support Organization. WebSphere Application Server Network Deployment V6: High Availability Solutions, October 2005 [2] IBM DB2 Universal Database. Data Recovery and High Availability Guide and Reference. [3] M. KamathG. AlonsoG. Alonso. Providing High Availability in Very Large Workow Management Systems. The Fifth International Conference on Extending Database Technology (EDBT 96) [4] G Candea, A Fox. Designing for High Availability and Measurability Proceedings of the 1st Workshop on Evaluating and Architecting System dependability, 2001 [5] Rick Robinson, Alexandre Polozoff. IBM WebSphere Developer Technical Journal: Planning for Availability in the Enterprise IBM Software Services for WebSphere [6] Business Process Execution Language for Web Services version 1.1 (http://www.ibm.com/developerworks/library/specification/ws-bpel/) [7] G. (John) Janakiraman, Jose Renato Santos, Yoshio Turner. Automated Multi-Tier System Design for Service Availability. The First Workshop on Design of Self-Managing Systems (at DSN 2003) 22-25 June 2003, San Francisco, Ca lifornia [8] Dimitri P. Bertsekas. Constrained Optimization and Lagrange Multiplier Methods ISBN: 1-886529-04-3 Publication: 1996, 410 pages [9] Optimization Toolbox For Use with MATLAB Users Guide Version 2 (http://www.mathworks.com/products/optimization/) [10] Mike Y. Chen, Emre Kiciman, Eugene Fratkin, Armando Fox, Eric A. Brewer: Pinpoint: Problem Determination in Large, Dynamic Internet Services. DSN 2002: 595-604 [DBLP:conf/dsn/ChenKFFB02] [11] Issam Aib, Mathias Sall, Claudio Bartolini, Abdel Boulmakoul, Raouf Boutaba and Guy Pujolle (2006) ”Business-aware Policy-based Management” In Proc. 1st IEEE International Workshop on Business-Driven IT Management (BDIM ’06), 7 April 2006, Vancouver, Canada [12] Chun Zhang, Rong N. Chang, Chang-Shing Perng, Edward So, Chunqiang Tang, Tao Tao: QoS-Aware Optimization of Composite-Service Fulfillment Policy. IEEE SCC 2007: 11-19 [13] MATLAB - The Language of Technical Computing (http://www.mathworks.com/products/matlab/) [14] Sameh A. Fakhouri, William F. Jerome, Vijay K. Naik, Ajay Raina, Pradeep Varma: Active Middleware Services in a Decision Support System for Managing Highly Available Distributed Resources. Middleware 2000: 349-371