A Methodology for Analyzing Availability Weak Points in SOA Deployment Frameworks Jing Luol,Ying Lil,John A Pershing2,Lei Xie3,and Ying Chenl 1 IBM China Research Lab,China jingluo,lying,yingch@cn.ibm.com 2 IBM T.J.Watson Research Center,Hawthorne,NY 10532,USA pershing@alum.mit.edu 3 Department of Computer Science,Nanjing University,China xielei@dislab.nju.edu.cn Abstract-The fundamental characteristics of SOA,loose To ensure availability,redundancy-based High Availability coupling and on-demand integration,enable organizations to (HA)solutions are the primary approach,including clustering seek more flexibility and responsiveness from their business IT [2],hot-failover [3],recursive restartability [4],and Redundant systems.However,this brings challenges to assure QoS,especially availability,which should be considered in an integrated way in Array of Independent Disk (RAID).However,these solu- an SOA environment.Traditionally,availability is measured for tions are usually expensive and their cost,capabilities and each IT resource,but within SOA environments,rather than implementation difficulties also vary greatly.Thus,it becomes being considered individually,availability should be analyzed quite difficult to plan HA solutions in a cost-effective manner. from an end-to-end view from both business and IT perspectives. Traditionally,IT architects rely on experience to decide which In this paper,to address the availability problem of SOA,we propose a methodology that analyzes availability weak points HA solutions should be applied to which IT resources with in SOA deployment frameworks,leveraging workflow defini- what degree of redundancy.However,this experience-based tions that specify availability requirements at business level. approach to HA is difficult to apply to SOA environments, This methodology includes an effective way to calculate high- because the large number of involved IT systems and the com- availability enhancement recommendations for a given SOA plex relationships among them are beyond the comprehension deployment topology with near-minimum cost,while meeting the business-level availability requirements.A prototype has of most humans. been implemented as an extension to IBM's SOA deployment Moreover,even if all single points of failure have been framework.Its efficiency and performance are analyzed here. eliminated,some of the (redundant)IT resources still may not exhibit the necessary availability level to satisfy the I.INTRODUCTION requirements of the business,so it may be necessary to intro- The Service-Oriented Architecture (SOA)provides on- duce even more redundancy in order to meet the availability demand integration capabilities by loosely composing one requirements.Hence,the key to delivering a cost-effective HA architecture is to determine the right HA level for each IT or more services.This loose coupling enables SOA to offer system,based on the trade-off between its outage loss and clear benefits [1]and opens up new opportunities for orga- redundancy cost:too little redundancy could result in costly nizations to become more flexible and responsive.However, outages,and too much could be an expensive waste. this architecture brings challenges to Information Technology (IT)management and complicates Quality of Service (QoS) In this paper,we define an availability weak point as an IT resource that is not providing sufficient availability to meet measurement precisely because of its loosely coupled nature:it is frequently difficult to determine which systems and services current business requirements,but for which we can provide a cost-effective HA enhancement in order to meet these avail- are contributing to an SOA service,and how they may be ability requirements.Therefore,to apply HA solutions [2][3] failing to deliver the required quality of service.Traditionally, over SOA environments in a cost-effective manner,identifying availability is measured for each IT resource.But within and analyzing availability weak points is the starting point.In SOA environments,rather than merely considering the avail- ability of individual IT resources,one must take an end-to- a previous article [5].we proposed a workflow-based weak- end viewpoint.Moreover,the relationships between business point analysis methodology to address this challenge.The methodology determines which deployed IT resources need workflows and the supporting IT resources are complex and dynamic.For example,one service can be invoked by several to have their availability enhanced,and to what extent,in business workflows.while each business workflow usually order to satisfy the business-level availability requirements while keeping the overall cost close to the minimum.In this invokes multiple services. paper,we further refine and evaluate our methodology,and IManuscript received 1 April 2008,revised 29 October 2008.accepted 17 analyze the efficiency and performance of the prototype that February 2009.The Associate Editor coordinating the review of this paper implements it as an extension to IBM's SOA deployment and approving it for publication was J.P.Martin-Flatin. framework
A Methodology for Analyzing Availability Weak Points in SOA Deployment Frameworks Jing Luo1 , Ying Li1 , John A Pershing2 , Lei Xie3 , and Ying Chen1 1 IBM China Research Lab, China {jingluo, lying, yingch}@cn.ibm.com 2 IBM T. J. Watson Research Center, Hawthorne, NY 10532, USA pershing@alum.mit.edu 3 Department of Computer Science, Nanjing University, China xielei@dislab.nju.edu.cn 1Abstract— The fundamental characteristics of SOA, loose coupling and on-demand integration, enable organizations to seek more flexibility and responsiveness from their business IT systems. However, this brings challenges to assure QoS, especially availability, which should be considered in an integrated way in an SOA environment. Traditionally, availability is measured for each IT resource, but within SOA environments, rather than being considered individually, availability should be analyzed from an end-to-end view from both business and IT perspectives. In this paper, to address the availability problem of SOA, we propose a methodology that analyzes availability weak points in SOA deployment frameworks, leveraging workflow definitions that specify availability requirements at business level. This methodology includes an effective way to calculate highavailability enhancement recommendations for a given SOA deployment topology with near-minimum cost, while meeting the business-level availability requirements. A prototype has been implemented as an extension to IBM’s SOA deployment framework. Its efficiency and performance are analyzed here. I. INTRODUCTION The Service-Oriented Architecture (SOA) provides ondemand integration capabilities by loosely composing one or more services. This loose coupling enables SOA to offer clear benefits [1] and opens up new opportunities for organizations to become more flexible and responsive. However, this architecture brings challenges to Information Technology (IT) management and complicates Quality of Service (QoS) measurement precisely because of its loosely coupled nature: it is frequently difficult to determine which systems and services are contributing to an SOA service, and how they may be failing to deliver the required quality of service. Traditionally, availability is measured for each IT resource. But within SOA environments, rather than merely considering the availability of individual IT resources, one must take an end-toend viewpoint. Moreover, the relationships between business workflows and the supporting IT resources are complex and dynamic. For example, one service can be invoked by several business workflows, while each business workflow usually invokes multiple services. 1Manuscript received 1 April 2008, revised 29 October 2008, accepted 17 February 2009. The Associate Editor coordinating the review of this paper and approving it for publication was J.P. Martin-Flatin. To ensure availability, redundancy-based High Availability (HA) solutions are the primary approach, including clustering [2], hot-failover [3], recursive restartability [4], and Redundant Array of Independent Disk (RAID). However, these solutions are usually expensive and their cost, capabilities and implementation difficulties also vary greatly. Thus, it becomes quite difficult to plan HA solutions in a cost-effective manner. Traditionally, IT architects rely on experience to decide which HA solutions should be applied to which IT resources with what degree of redundancy. However, this experience-based approach to HA is difficult to apply to SOA environments, because the large number of involved IT systems and the complex relationships among them are beyond the comprehension of most humans. Moreover, even if all single points of failure have been eliminated, some of the (redundant) IT resources still may not exhibit the necessary availability level to satisfy the requirements of the business, so it may be necessary to introduce even more redundancy in order to meet the availability requirements. Hence, the key to delivering a cost-effective HA architecture is to determine the right HA level for each IT system, based on the trade-off between its outage loss and redundancy cost: too little redundancy could result in costly outages, and too much could be an expensive waste. In this paper, we define an availability weak point as an IT resource that is not providing sufficient availability to meet current business requirements, but for which we can provide a cost-effective HA enhancement in order to meet these availability requirements. Therefore, to apply HA solutions [2][3] over SOA environments in a cost-effective manner, identifying and analyzing availability weak points is the starting point. In a previous article [5], we proposed a workflow-based weakpoint analysis methodology to address this challenge. The methodology determines which deployed IT resources need to have their availability enhanced, and to what extent, in order to satisfy the business-level availability requirements while keeping the overall cost close to the minimum. In this paper, we further refine and evaluate our methodology, and analyze the efficiency and performance of the prototype that implements it as an extension to IBM’s SOA deployment framework
The rest of the paper is organized as follows.In Section Workflow Specification Module Component II,we describe the basic structure of our availability weak- failure point analysis methodology.In Section III,we present our Availability SOA deployment topology behavior reauirements algorithm for calculating a near-optimal solution.In Section Workflow-based iVVonKTlov Vorkflowy Vorktlowl HA capacity IV,we describe our implementation and evaluate experimental mapping Business mapping results.In Section V,we study related work.In Section VI, matrix checking workflows we conclude the paper and discuss future work. HA Weak Point Analysis Module No Yes are II.WEAK-POINT ANALYSIS Overall utility function Optimal solution calculation In this work,we define a three-level workflow hierarchy for HA enhancement for HA enhancement to enable work-point analysis:business workflow,application The optimal HA enhancement workflow and IT resource workflow.Firstly,we assume that parameters for each IT resource business workflows are defined in some machine-readable format,such as Business Process Execution Language(BPEL) HA Pattern Mapping Module [6],and that a business workflow includes "pointers"(e.g., HA pattern HA pattern HA enhanced SOA Web service references)to the services that support the various repository mapping deployment topology steps of this business workflow.As services are implemented by given applications,we define an application workflow as the Fig.1.Architecture for workflow-based weak-point analysis application chain that supports the given business workflow. Furthermore,secondly,as applications should be supported Worktow1 by their hosted underlying IT resources,we assume that the ws Relafionship hosting and dependency relationships among the various IT Workfow2 resources are also available in some machine-readable format. WsWs either as standard deployment documents from the design phase,or as the result of a discovery process running against WAR EAR the IT infrastructure.By analyzing the hosting and dependency EAR WAREAR relationships,an IT resource workflow is defined as the IT resource chain that supports a given application workflow that further supports the given business workflow. WAS APP Server Server Based on these assumptions,our weak-point analysis Workfow2 methodology can first construct relationships between business WAS APP workflows and IT resources by workflow mapping:then,based WAS APP Server Server on these relationships,it can calculate the optimized HA en- hancement recommendation for the current SOA deployment Fig.2.Workflow mapping over the SOA deployment topology topology. The main building blocks of our methodology are depicted in Fig.1;they are grouped in three modules. relationships is inadequate:business workfow branching and implicit dependency discovery need to be considered. A.Workflow Specification Module Business workflow branching describes a situation when The Workflow Specification Module maps business work- the business workflow contains conditional branches and the flows to IT resources,where each business workflow is anno- branch to be selected next depends on current conditions.For tated with availability requirements.In this paper,availability a business workflow that implements complex business func- requirements are defined by an uptime ratio,which represents tions,it is common to have several conditional branches.Fig.3 the percentage of time a business workflow is available;for illustrates a workflow with two conditional branches at points example,99.9%means that end users tolerate a downtime A and B.When this workflow runs,only one path through the of at most 86.4 seconds per day for this workflow.Such workflow is executed.Therefore,modeling business workflows availability requirements are typically specified by business from an HA standpoint poses a problem in case of branching. architects.The mapping is performed from the business level On the one hand,the availability requirement is specified on to the application and IT resource levels by inspecting the the overall business workflow;on the other hand,only a subset hosting and dependency relationships that are defined in the of the service components are executed for any given runtime SOA deployment topology. invocation,depending on branch conditions. As Fig.2 shows,through the hosting relationships specified Under these circumstances,mapping business-level HA over the SOA deployment topology,Workflow I and Work-requirements to applications and IT resources is not straight- flow 2 are mapped to the IT resource level.However,in a forward.To deal with this problem,we break up a com- more complex scenario,direct mapping based only on hosting plex workflow into several sub-workflows,where each sub-
The rest of the paper is organized as follows. In Section II, we describe the basic structure of our availability weakpoint analysis methodology. In Section III, we present our algorithm for calculating a near-optimal solution. In Section IV, we describe our implementation and evaluate experimental results. In Section V, we study related work. In Section VI, we conclude the paper and discuss future work. II. WEAK-POINT ANALYSIS In this work, we define a three-level workflow hierarchy to enable work-point analysis: business workflow, application workflow and IT resource workflow. Firstly, we assume that business workflows are defined in some machine-readable format, such as Business Process Execution Language (BPEL) [6], and that a business workflow includes “pointers” (e.g., Web service references) to the services that support the various steps of this business workflow. As services are implemented by given applications, we define an application workflow as the application chain that supports the given business workflow. Furthermore, secondly, as applications should be supported by their hosted underlying IT resources, we assume that the hosting and dependency relationships among the various IT resources are also available in some machine-readable format, either as standard deployment documents from the design phase, or as the result of a discovery process running against the IT infrastructure. By analyzing the hosting and dependency relationships, an IT resource workflow is defined as the IT resource chain that supports a given application workflow that further supports the given business workflow. Based on these assumptions, our weak-point analysis methodology can first construct relationships between business workflows and IT resources by workflow mapping; then, based on these relationships, it can calculate the optimized HA enhancement recommendation for the current SOA deployment topology. The main building blocks of our methodology are depicted in Fig. 1; they are grouped in three modules. A. Workflow Specification Module The Workflow Specification Module maps business work- flows to IT resources, where each business workflow is annotated with availability requirements. In this paper, availability requirements are defined by an uptime ratio, which represents the percentage of time a business workflow is available; for example, 99.9% means that end users tolerate a downtime of at most 86.4 seconds per day for this workflow. Such availability requirements are typically specified by business architects. The mapping is performed from the business level to the application and IT resource levels by inspecting the hosting and dependency relationships that are defined in the SOA deployment topology. As Fig. 2 shows, through the hosting relationships specified over the SOA deployment topology, Workflow 1 and Work- flow 2 are mapped to the IT resource level. However, in a more complex scenario, direct mapping based only on hosting Fig. 1. Architecture for workflow-based weak-point analysis Fig. 2. Workflow mapping over the SOA deployment topology relationships is inadequate: business workflow branching and implicit dependency discovery need to be considered. Business workflow branching describes a situation when the business workflow contains conditional branches and the branch to be selected next depends on current conditions. For a business workflow that implements complex business functions, it is common to have several conditional branches. Fig. 3 illustrates a workflow with two conditional branches at points A and B. When this workflow runs, only one path through the workflow is executed. Therefore, modeling business workflows from an HA standpoint poses a problem in case of branching. On the one hand, the availability requirement is specified on the overall business workflow; on the other hand, only a subset of the service components are executed for any given runtime invocation, depending on branch conditions. Under these circumstances, mapping business-level HA requirements to applications and IT resources is not straightforward. To deal with this problem, we break up a complex workflow into several sub-workflows, where each sub-
workflow represents a path through the complex workflow. Implicit business Business workflow ① dependency Thus,to guarantee the availability of a complex workflow, Service1 Service2 we only need to guarantee the same availability for all its Implicit application dependency sub-workflows.This technique can be applied to all types of Application workflow workflow,including business workflows. Ear1 Ear2- Ol DB sub- sub- sub- IT resource workflow workflow workflow1 workflow2 workflow3 WAS WAS DB2 SFS Service 1 (Service 1 (Service 1 Service 1 Linux Linux nux inux Service 3 (Service 2 (Service 2 X86 X86 X86 Fig.4. Implicit dependency discovery in workflow mapping Service 4 Service 5 Service 6 (Service 4 Service 5 Service 6 Similarly,implicit application dependencies should also be considered when an application workflow is mapped to an (Service 7 IT resource workflow.A typical example used for database HA solutions is that the actual database files of a database Service 8 (Service 8 (Service 8 (Service 8 server are placed on a shared file system or Storage Area Network (SAN);thus,there is a dependency between the Fig.3.Business workflow branching database server and the shared file system,which is not expressed in the application workflow.Fig 4 shows an implicit As depicted in Fig 3,the complex workflow is transformed business dependency between an EAR module and a database into three separate sub-workflows.We treat each sub-workflow being inserted in the application workflow,and an implicit as a complete business workflow,which can be directly taken application dependency between the database server and a as input by the Weak-Point Analysis Module. shared file system being inserted in the IT resource workflow. Based on a "flat"(i.e.,non-branching)business workflow, After the workflow has been mapped from the business level our mapping mechanism constructs lower-level application to the IT resource level,we extract the list of IT resources workflows and IT resource workflows by deriving dependen- that are involved in each workflow.Then,the workflow- cies from the business workflows according to hosting and resource relationship matrix for weak-point analysis is created, dependency ("uses")relationships.In this mechanism,besides which contains the necessary information for the relevant IT noting the explicit dependencies of the higher-level workflows resources for each workflow. (e.g..from Web service references),implicit dependencies We assume there exist n business workflows over the SOA are also used to construct the lower-level workflows.An deployment topology,denoted Wi,W2.W3.....Wn.These implicit dependency is a relationship that is not expressed in workflows are specified with availability requirements P.P2. the higher-level workflows,but should be taken into account P3,...,P,where 0<P<1.We also assume that there are m in the lower-level workflows.When mapping from business IT resources,denoted C1,C2....Cm.Each resource consists workflows to application workflows,implicit business depen- of a "stack"of hardware and software components (e.g.,an dencies(which express the dependencies from applications to X86 server,a Linux Operating System,and a Websphere databases or to other application components in the application Application Server). topology)should be considered for constructing the applica- tion workflows.Dependencies between Enterprise ARchive C1 C02 C3 年 Cm (EAR)modules and databases are a typical example.Thus. WP R1.1 R1.2R1.3 Ri.m an application workflow is constructed as follows:an initial W2P2■ 2.1 R2.2■2.3 R2.m application workflow is constructed by following the explicit Wn(Pn)Rn.1 Rn.2 Rn.3 Rn.m dependencies from the business workflow:then the related TABLE I implicit dependencies are tracked down (e.g.,from a prior The workflow-resource relationship matrix discovery process)and inserted into the application workflow. For example,the business workflow of a J2EE application usually describes the dependency between Web modules and Table I shows the workflow-resource relationship matrix;the EAR modules;to construct an end-to-end application work- relationship between business workflow Wi and IT resource flow,the implicit dependencies expressing the relationships Ci is Ri.j,where Ri;is an integer count of the number of between each EAR module and the referenced databases are references to IT resource C;from business workflow Wi.Ri.j analyzed,and the databases are added as part of the application is set to 0 when resource C;is not included in the resource list workflow. of Wi.For example,Fig.5 shows a business workflow with
workflow represents a path through the complex workflow. Thus, to guarantee the availability of a complex workflow, we only need to guarantee the same availability for all its sub-workflows. This technique can be applied to all types of workflow, including business workflows. Fig. 3. Business workflow branching As depicted in Fig 3, the complex workflow is transformed into three separate sub-workflows. We treat each sub-workflow as a complete business workflow, which can be directly taken as input by the Weak-Point Analysis Module. Based on a “flat” (i.e., non-branching) business workflow, our mapping mechanism constructs lower-level application workflows and IT resource workflows by deriving dependencies from the business workflows according to hosting and dependency (“uses”) relationships. In this mechanism, besides noting the explicit dependencies of the higher-level workflows (e.g., from Web service references), implicit dependencies are also used to construct the lower-level workflows. An implicit dependency is a relationship that is not expressed in the higher-level workflows, but should be taken into account in the lower-level workflows. When mapping from business workflows to application workflows, implicit business dependencies (which express the dependencies from applications to databases or to other application components in the application topology) should be considered for constructing the application workflows. Dependencies between Enterprise ARchive (EAR) modules and databases are a typical example. Thus, an application workflow is constructed as follows: an initial application workflow is constructed by following the explicit dependencies from the business workflow; then the related implicit dependencies are tracked down (e.g., from a prior discovery process) and inserted into the application workflow. For example, the business workflow of a J2EE application usually describes the dependency between Web modules and EAR modules; to construct an end-to-end application work- flow, the implicit dependencies expressing the relationships between each EAR module and the referenced databases are analyzed, and the databases are added as part of the application workflow. Fig. 4. Implicit dependency discovery in workflow mapping Similarly, implicit application dependencies should also be considered when an application workflow is mapped to an IT resource workflow. A typical example used for database HA solutions is that the actual database files of a database server are placed on a shared file system or Storage Area Network (SAN); thus, there is a dependency between the database server and the shared file system, which is not expressed in the application workflow. Fig 4 shows an implicit business dependency between an EAR module and a database being inserted in the application workflow, and an implicit application dependency between the database server and a shared file system being inserted in the IT resource workflow. After the workflow has been mapped from the business level to the IT resource level, we extract the list of IT resources that are involved in each workflow. Then, the workflowresource relationship matrix for weak-point analysis is created, which contains the necessary information for the relevant IT resources for each workflow. We assume there exist n business workflows over the SOA deployment topology, denoted W1, W2, W3,...,Wn. These workflows are specified with availability requirements P1, P2, P3,..., Pn, where 0 < Pi < 1. We also assume that there are m IT resources, denoted C1, C2, ..., Cm. Each resource consists of a “stack” of hardware and software components (e.g., an X86 server, a Linux Operating System, and a Websphere Application Server). C1 C2 C3 ... Cm W1(P1) R1,1 R1,2 R1,3 ... R1,m W2(P2) R2,1 R2,2 R2,3 ... R2,m ... ... ... ... ... ... Wn(Pn) Rn,1 Rn,2 Rn,3 ... Rn,m TABLE I The workflow-resource relationship matrix Table I shows the workflow-resource relationship matrix; the relationship between business workflow Wi and IT resource Cj is Ri,j , where Ri,j is an integer count of the number of references to IT resource Cj from business workflow Wi . Ri,j is set to 0 when resource Cj is not included in the resource list of Wi . For example, Fig. 5 shows a business workflow with
Weak points and Business Worktlow 1 (Wi) HA Expertise HA enhancement parameters Service 1 Service 1 HA pattem P(Ci) P(C2) P(C3) capturer Topology with Component 1 Component 1 Component 1 weak points Map with Resource C1 Resource C1 Resource C1 A Pattern Reposito a HA pattern P(C4) HA pattern transfommation Component 1 →Dependency Link Resource C1 HA pattem appled topolog☑ T ○pplicaton Fig.5.Example of BPEL workflow two services,which are mapped to three IT resources,C1, Fig.6.HA pattern mapping and transformation C2 and C3,plus one implicit resource C4 that is not explicitly included in the business workflow.Note that,at the application given deployment topology;next,the solutions that satisfy level,Component 1 depends on Component 2 to implement the overall availability requirements are selected as candidates; Service 1,and Component 2 depends on Component 3 to finally,among these candidates,the one with minimum cost is implement Service 2;these dependencies are implicit business selected as the best solution.Unfortunately.this method can dependencies.We denote the availability capability of resource only be applied to simple scenarios because when the number Ci as P(Ci);therefore,based on the implicit dependencies of IT resources in the IT infrastructure grows linearly,the discovered above,the availabilities for the two services are computational complexity grows exponentially;consequently, P(C)·P(C2)·P(C3)andP(C2)·P(C3).Thus,the avail-- the exhaustive iteration method can hardly be applied to real- ability for the workflow is P(C1).P(C2)2.P(C3)2,and the world scenarios.Moveover,in SOA environments,the IT matrix for business workflow Wi is set to [1,2,2,0].For a infrastructure must be very flexible so that it can quickly adapt standalone service that has no dependency relationships,we to changing business requirements;therefore,the HA analysis can simply set Ri.i to 1 for all its referenced resources,and may be invoked frequently and should be processed quickly 0 for its unreferenced resources. to provide a cost-effective solution. To address the above challenges,we describe a weak-point B.Weak-Point Analysis Module analysis methodology in Section III.It utilizes a Lagrange The Weak-Point Analysis Module uses the workflow- multiplier method of constrained optimization to calculate resource relationship matrix to calculate a near-optimal HA the optimal HA enhancement recommendation over the SOA enhancement recommendation.Traditionally,HA analysis lo- deployment topology subject to a utility function,and produces cates single points of failure in the IT infrastructure topology; the HA enhancement parameters for each relevant IT resource. for example,if there is no HA solution applied to a Web server,then it is regarded as a single point of failure.This C.HA Pattern Mapping Module method can be applied simply but has limitations in current Based on the optimized HA enhancement recommendation. SOA environments.For example,even if an IT resource has the HA Pattern Mapping Module applies relevant HA patterns been made redundant,it still could be a weak point and to the identified weak points.These patterns may be generic more redundancy could be required to satisfy the availability (e.g.,clustering,hot standby)or product-specific (e.g.,DB2 requirements of the corresponding business workflows.Fur- HADR-High Availability for Disaster Recovery).The goal thermore,the cost and HA capabilities of redundancy vary of this module is to finally produce an HA-enhanced de- for different IT resource types;thus,it is critical to find the ployment topology that satisfies the business-level availability points in the IT infrastructure where it is most cost-effective requirements with a minimum overall cost. to apply an HA solution.Based on the workflow-resource In this module (see Fig.6),each HA pattern is associated relationship matrix,weak-point analysis identifies these weak with an applicable IT resource type (e.g.,a J2EE application points in the IT infrastructure and calculates the cost-effective server or a DB2 database),and provides transformation and HA enhancement parameters. configuration logic for applying the pattern.For each weak At first sight,it looks like an exhaustive iteration method point identified in the IT infrastructure,a list of compatible should be used to identify the weak points and produce HA HA patterns is generated using two matching mechanisms. enhancement parameters.The algorithm would be as follows: The first is the applicable-type match:if the weak point is a first,identify all the possible enhancement solutions for the single IT resource,then HA patterns whose applicable type
Fig. 5. Example of BPEL workflow two services, which are mapped to three IT resources, C1, C2 and C3, plus one implicit resource C4 that is not explicitly included in the business workflow. Note that, at the application level, Component 1 depends on Component 2 to implement Service 1, and Component 2 depends on Component 3 to implement Service 2; these dependencies are implicit business dependencies. We denote the availability capability of resource Ci as P(Ci); therefore, based on the implicit dependencies discovered above, the availabilities for the two services are P(C1) · P(C2) · P(C3) and P(C2) · P(C3). Thus, the availability for the workflow is P(C1) · P(C2) 2 · P(C3) 2 , and the matrix for business workflow W1 is set to [1,2,2,0]. For a standalone service that has no dependency relationships, we can simply set Ri,j to 1 for all its referenced resources, and 0 for its unreferenced resources. B. Weak-Point Analysis Module The Weak-Point Analysis Module uses the workflowresource relationship matrix to calculate a near-optimal HA enhancement recommendation. Traditionally, HA analysis locates single points of failure in the IT infrastructure topology; for example, if there is no HA solution applied to a Web server, then it is regarded as a single point of failure. This method can be applied simply but has limitations in current SOA environments. For example, even if an IT resource has been made redundant, it still could be a weak point and more redundancy could be required to satisfy the availability requirements of the corresponding business workflows. Furthermore, the cost and HA capabilities of redundancy vary for different IT resource types; thus, it is critical to find the points in the IT infrastructure where it is most cost-effective to apply an HA solution. Based on the workflow-resource relationship matrix, weak-point analysis identifies these weak points in the IT infrastructure and calculates the cost-effective HA enhancement parameters. At first sight, it looks like an exhaustive iteration method should be used to identify the weak points and produce HA enhancement parameters. The algorithm would be as follows: first, identify all the possible enhancement solutions for the Fig. 6. HA pattern mapping and transformation given deployment topology; next, the solutions that satisfy the overall availability requirements are selected as candidates; finally, among these candidates, the one with minimum cost is selected as the best solution. Unfortunately, this method can only be applied to simple scenarios because when the number of IT resources in the IT infrastructure grows linearly, the computational complexity grows exponentially; consequently, the exhaustive iteration method can hardly be applied to realworld scenarios. Moveover, in SOA environments, the IT infrastructure must be very flexible so that it can quickly adapt to changing business requirements; therefore, the HA analysis may be invoked frequently and should be processed quickly to provide a cost-effective solution. To address the above challenges, we describe a weak-point analysis methodology in Section III. It utilizes a Lagrange multiplier method of constrained optimization to calculate the optimal HA enhancement recommendation over the SOA deployment topology subject to a utility function, and produces the HA enhancement parameters for each relevant IT resource. C. HA Pattern Mapping Module Based on the optimized HA enhancement recommendation, the HA Pattern Mapping Module applies relevant HA patterns to the identified weak points. These patterns may be generic (e.g., clustering, hot standby) or product-specific (e.g., DB2 HADR — High Availability for Disaster Recovery). The goal of this module is to finally produce an HA-enhanced deployment topology that satisfies the business-level availability requirements with a minimum overall cost. In this module (see Fig. 6), each HA pattern is associated with an applicable IT resource type (e.g., a J2EE application server or a DB2 database), and provides transformation and configuration logic for applying the pattern. For each weak point identified in the IT infrastructure, a list of compatible HA patterns is generated using two matching mechanisms. The first is the applicable-type match: if the weak point is a single IT resource, then HA patterns whose applicable type
is equal to this resource type are considered compatible.The %Service 1 Service 2Service 3Service 4 second is the pattern match:if the weak point is already a W2 99.9% Service 5 Service 6 Semvice 7 Host redundant HA solution(e.g.,a cluster),then HA patterns that can generate this HA solution are considered as compatible. From the list of matched HA patterns (perhaps under the EAR1(APP)EAR2(APP) EAR3 (APP)DB1 (APP) guidance of the software architect),one is selected and config- ured with the HA enhancement parameters.Then,the pattern transformation and configuration logic is used to generate an IHSServer WASServer WASServer D82 Server HA solution that is deployable in IT environments. LinuxOS LinuxOs AIXOS windosOs III.ALGORITHM PowerServer X86Server C1 C2 C3 C4 A key contribution of our weak-point analysis methodology (a)SOA deployment topology example is to attach availability requirements to the business workflow and then to map these workflows to the IT infrastructure, C1 C2 C3 C4 carrying the availability requirements down to the level of the 1 1 individual IT resources,where they can be analyzed.In this 1 1 0 1 section,we describe a methodology for making HA enhance- (b)Workflow-resource relationship matrix ment recommendations to meet the business-level availability objectives,while keeping the overall cost close to the mini- Fig.7. SOA deployment topology example for problem definition mum.In this methodology,the current availability capability for each workflow is first calculated according to the compo- nent failure behavior parameters obtained from historical data and experience:Mean Time Between Failures (MTBF),Mean Time To Repair(MTTR),etc.Secondly,it is checked whether We denote the n business workflows over the SOA de- the availability requirements for each business workflow have been satisfied.Thirdly,for the affected workfows,the relevant ployment topology as W1.W2,W3....,Wn.These workflows are IT resources are identified as availability weak points.and specified with availability requirements P.B2.P3.....P,where appropriate HA patterns are recommended in order to meet 0<P<1.We also assume that there are m IT resources, the availability requirements. denoted by C1,C2.....Cm.We construct a workflow-resource relationship matrix where IT resources depended upon by A.Availability Optimization Problem each workflow are identified;each matrix entry Ri.;expresses the dependency degree from workflow W;to IT resource C A simple example is depicted in Fig.7(a)to illustrate the as previously described.For instance,if IT resource C is availability optimization problem.In this example,we have referenced three times by a given business workflow Wi,its two business workflows(Wi and W2)and four underlying IT dependency degree Rii will be recorded as 3. resources (C1,C2,C3 and C).The workflow specification module constructs the workflow-resource relationship matrix A resource vector can be defined for a given workfow (see Fig.7(b)).In this example,finding an optimized HA ,Was(C1,R,1),(C2,R,2〉,,(Cm,R,m.For simplicity, enhancement recommendation requires the iterative testing of we express this as (Rj.1.Rj.2....Rj.m).Based on the above various HA solutions and different redundancy degrees:for definition,for a given IT resource Ci,its intrinsic availability each IT resource,various HA solutions could be applicable,capability (e.g.,based on historical experience)is denoted by and for the redundancy-based HA solution,different redun- P(Ci),and its cost is denoted by h(Ci,ni)where ni is the dancy degrees should be explored. redundancy degree (e.g.,cluster size)of this resource.For More generally,to make an optimal HA enhancement a given workflow Wi,its current availability capability can recommendation for an SOA deployment topology,three op- be calculated by P(W;).These calculation functions will be timization dimensions should be iteratively explored:i)every discussed in Section III.B. IT resource in the deployment topology could be an HA enhancement candidate;ii)a variety of HA solutions could The optimization problem becomes finding a cost- be applied to a given IT resource;and iii)each of these HA effective HA enhancement recommendation described as solutions could rely on different redundancy degrees.Such an ((Ci,n1),(C2.n2).....(Cm,nm))where Ci denotes the original iterative exploration requires exponential computation time. IT resource,and n denotes the new redundancy degree Due to this computation complexity,it is inapplicable for for Ci.This enhancement recommendation keeps the overall large-scale IT infrastructures,which are frequently used in cost minimal while satisfying all availability requirements for real-world SOA environments. each business workflow.Formally,the availability optimization Let us now rigorously define this availability optimization problem is defined as a constrained optimization problem as problem. follows:
is equal to this resource type are considered compatible. The second is the pattern match: if the weak point is already a redundant HA solution (e.g., a cluster), then HA patterns that can generate this HA solution are considered as compatible. From the list of matched HA patterns (perhaps under the guidance of the software architect), one is selected and configured with the HA enhancement parameters. Then, the pattern transformation and configuration logic is used to generate an HA solution that is deployable in IT environments. III. ALGORITHM A key contribution of our weak-point analysis methodology is to attach availability requirements to the business workflow and then to map these workflows to the IT infrastructure, carrying the availability requirements down to the level of the individual IT resources, where they can be analyzed. In this section, we describe a methodology for making HA enhancement recommendations to meet the business-level availability objectives, while keeping the overall cost close to the minimum. In this methodology, the current availability capability for each workflow is first calculated according to the component failure behavior parameters obtained from historical data and experience: Mean Time Between Failures (MTBF), Mean Time To Repair (MTTR), etc. Secondly, it is checked whether the availability requirements for each business workflow have been satisfied. Thirdly, for the affected workflows, the relevant IT resources are identified as availability weak points, and appropriate HA patterns are recommended in order to meet the availability requirements. A. Availability Optimization Problem A simple example is depicted in Fig. 7(a) to illustrate the availability optimization problem. In this example, we have two business workflows (W1 and W2) and four underlying IT resources (C1, C2, C3 and C4). The workflow specification module constructs the workflow-resource relationship matrix (see Fig. 7(b)). In this example, finding an optimized HA enhancement recommendation requires the iterative testing of various HA solutions and different redundancy degrees: for each IT resource, various HA solutions could be applicable, and for the redundancy-based HA solution, different redundancy degrees should be explored. More generally, to make an optimal HA enhancement recommendation for an SOA deployment topology, three optimization dimensions should be iteratively explored: i) every IT resource in the deployment topology could be an HA enhancement candidate; ii) a variety of HA solutions could be applied to a given IT resource; and iii) each of these HA solutions could rely on different redundancy degrees. Such an iterative exploration requires exponential computation time. Due to this computation complexity, it is inapplicable for large-scale IT infrastructures, which are frequently used in real-world SOA environments. Let us now rigorously define this availability optimization problem. Fig. 7. SOA deployment topology example for problem definition We denote the n business workflows over the SOA deployment topology as W1,W2,W3,...,Wn. These workflows are specified with availability requirements P1,P2,P3,...,Pn, where 0 < Pi < 1. We also assume that there are m IT resources, denoted by C1,C2,...,Cm. We construct a workflow-resource relationship matrix where IT resources depended upon by each workflow are identified; each matrix entry Rj,i expresses the dependency degree from workflow Wj to IT resource Ci as previously described. For instance, if IT resource Ci is referenced three times by a given business workflow Wj , its dependency degree Rj,i will be recorded as 3. A resource vector can be defined for a given workflow Wj as (hC1,Rj,1i, hC2,Rj,2i, ..., hCm,Rj,mi). For simplicity, we express this as (Rj,1,Rj,2,...,Rj,m). Based on the above definition, for a given IT resource Ci , its intrinsic availability capability (e.g., based on historical experience) is denoted by P(Ci), and its cost is denoted by h(Ci ,ni) where ni is the redundancy degree (e.g., cluster size) of this resource. For a given workflow Wj , its current availability capability can be calculated by P(Wj ). These calculation functions will be discussed in Section III.B. The optimization problem becomes finding a costeffective HA enhancement recommendation described as (hC1,n0 1 i,hC2,n0 2 i,...,hCm,n0 mi) where Ci denotes the original IT resource, and n 0 i denotes the new redundancy degree for Ci . This enhancement recommendation keeps the overall cost minimal while satisfying all availability requirements for each business workflow. Formally, the availability optimization problem is defined as a constrained optimization problem as follows:
Find an HA enhancement recommendation The availability calculation for IT resources,HA solutions. (C,n),(C2,n2〉,,(Cm,nm)such that the and workflows are respectively defined as follows. overall enhancement cost defined as Usually,an IT resource is a hosting stack composed of CostHAEnhance ∑1h(C,n) different layers of resource components such as middleware, ∑1h(C,n) an operating system,and a physical server,with the "top"of is minimized and the following conditions the stack being the IT resource that we are really interested in. are satisfied: In a hosting stack,the failure of any component will generally I)i∈1,n,P(WA)=Π1P(C)R>B result in the failure of everything "above"it in the stack. (2)i∈[1,m,Lower Bound(C)≤n)≤ which will further lead to unavailable services hosted by this Upper Bound(Ci) IT resource.Thus,the availability of an IT resource is given Each enhancement parameter is defined as Xi=n/n;to by: capture the enhancement degree for resource C;an enhance- ment cost function is defined as fi(ni,X:)=h(Ci,n)- P(C)= Π(P(RC,) (2) h(Ci,ni)to calculate the cost of HA enhancement for resource j=1 Ci.To find an optimal enhancement solution,we need to where P(Ci)is the availability capability for the IT resource iteratively explore different HA solutions for each IT resource C;which includes m resource components in its hosting and different redundancy degrees for each HA solution. stack,and P(RCj)is the availability capability of resource Returning to the example depicted in Fig.7,finding an component RCj. optimized HA enhancement recommendation boils down to An HA solution is usually composed of several basic IT solving the following constrained optimization problem: resources.When one basic IT resource in an HA solution Find an HA enhancement vector becomes unavailable,the other resources can take over its (X1,X2,X3,X4)such that the HA enhancement workload to guarantee service continuity.For example,if cost an application server cluster is composed of m application Cost HAEnhance F(X1,X2,X3,X4) servers,all providing the same capabilities and hosting the f1(1,X1)+f2(1,X2)+f3(1,X3)+f4(1,X4) same applications,the overall service becomes unavailable is minimized and the following conditions are only if all application servers fail.In this paper,we view an HA satisfied: solution as a resource group;its availability can be calculated (1)P(W1)=P(C)P(C2)P(C3)P(C4)>0.999 as follows: (2)P(W2)=P(C)P(C2)P(C4)>0.999 (3)1≤X1≤32 P(RG) P(C) (3) (4)1≤X2≤16 (5)1≤X3≤16 (6)1≤X4≤2 where P(RG)is the availability capability for resource group RG,and P(Ci)is the availability capability of IT resource In this problem,the bound inequality constraints are set C. according to different resource types:32 as the upper bound An IT resource workflow links resources that support for the HA cluster of HTTP servers (resource C1),16 for the loosely coupled services.In such a workflow.the failure of WAS application server(resources C2 and C3),2 for the DB2 any resource results in the failure of the whole workflow.For database server (resource C4). example,a typical IT resource workflow is a three-tier Web hosting architecture including a Web server,an application B.Calculations server,and a database server.Thus the availability of a Currently,the most frequently used [7]8]definition of avail- workflow is given by: ability is the uptime ratio,which is a close approximation of the steady state availability value and represents the percentage P(W)=Π(P(C) (4) of time a computer system is available throughout its useful j=1 life time.This uptime ratio can be defined as follows: where P(W)is the availability capability for workflow W that MTTR links m IT resources,and P(C;)is the availability capability uptime,atio =1- (1) MTBF of IT resource Ci. Fig.8 depicts an example with three IT resources,one where MTTR is the expected time to recover from a failure, resource group,and an IT resource workflow derived from the and MTBF is the expected time interval from one failure of a business workflow.The availability capability of this business system to the next.With this definition,the uptime ratio lies in workflow is: the range from 0 to 1 (in practice,one hopes the lower bound is limited to 0.9 at worse).We assume that MTBF,MTTR and the uptime ratio can be measured empirically or directly P(W)=P(RG1)P(C3)= obtained from product documentation (1-(1-P(C)(1-P(C2))P(C3) (5)
Find an HA enhancement recommendation (hC1, n0 1 i,hC2, n0 2 i, ...,hCm, n0 mi) such that the overall enhancement cost defined as CostHAEnhance = Pm i=1 h(Ci , n0 i P ) − m i=1 h(Ci , ni) is minimized and the following conditions are satisfied: (1) ∀i ∈ [1, n], P(Wi) = Qm j=1 P(Cj ) Ri,j > Pi (2) ∀i ∈ [1, m], LowerBound(Ci) ≤ n 0 ( i) ≤ U pperBound(Ci) Each enhancement parameter is defined as Xi = n 0 i /ni to capture the enhancement degree for resource Ci ; an enhancement cost function is defined as fi(ni , Xi) = h(Ci , n0 i ) − h(Ci , ni) to calculate the cost of HA enhancement for resource Ci . To find an optimal enhancement solution, we need to iteratively explore different HA solutions for each IT resource and different redundancy degrees for each HA solution. Returning to the example depicted in Fig. 7, finding an optimized HA enhancement recommendation boils down to solving the following constrained optimization problem: Find an HA enhancement vector (X1, X2, X3, X4) such that the HA enhancement cost CostHAEnhance = F(X1, X2, X3, X4) = f1(1, X1) + f2(1, X2) + f3(1, X3) + f4(1, X4) is minimized and the following conditions are satisfied: (1) P(W1) = P(C1)P(C2)P(C3)P(C4) > 0.999 (2) P(W2) = P(C1)P(C2)P(C4) > 0.999 (3) 1 ≤ X1 ≤ 32 (4) 1 ≤ X2 ≤ 16 (5) 1 ≤ X3 ≤ 16 (6) 1 ≤ X4 ≤ 2 In this problem, the bound inequality constraints are set according to different resource types: 32 as the upper bound for the HA cluster of HTTP servers (resource C1), 16 for the WAS application server (resources C2 and C3), 2 for the DB2 database server (resource C4). B. Calculations Currently, the most frequently used [7][8] definition of availability is the uptime ratio, which is a close approximation of the steady state availability value and represents the percentage of time a computer system is available throughout its useful life time. This uptime ratio can be defined as follows: uptimeratio = 1 − MT T R MT BF (1) where MTTR is the expected time to recover from a failure, and MTBF is the expected time interval from one failure of a system to the next. With this definition, the uptime ratio lies in the range from 0 to 1 (in practice, one hopes the lower bound is limited to 0.9 at worse). We assume that MTBF, MTTR and the uptime ratio can be measured empirically or directly obtained from product documentation. The availability calculation for IT resources, HA solutions, and workflows are respectively defined as follows. Usually, an IT resource is a hosting stack composed of different layers of resource components such as middleware, an operating system, and a physical server, with the “top” of the stack being the IT resource that we are really interested in. In a hosting stack, the failure of any component will generally result in the failure of everything “above” it in the stack, which will further lead to unavailable services hosted by this IT resource. Thus, the availability of an IT resource is given by: P(Ci) = Ym j=1 (P(RCj )) (2) where P(Ci) is the availability capability for the IT resource Ci which includes m resource components in its hosting stack, and P(RCj ) is the availability capability of resource component RCj . An HA solution is usually composed of several basic IT resources. When one basic IT resource in an HA solution becomes unavailable, the other resources can take over its workload to guarantee service continuity. For example, if an application server cluster is composed of m application servers, all providing the same capabilities and hosting the same applications, the overall service becomes unavailable only if all application servers fail. In this paper, we view an HA solution as a resource group; its availability can be calculated as follows: P(RG) = 1 − Ym j=1 (1 − P(Cj )) (3) where P(RG) is the availability capability for resource group RG, and P(Cj ) is the availability capability of IT resource Cj . An IT resource workflow links resources that support loosely coupled services. In such a workflow, the failure of any resource results in the failure of the whole workflow. For example, a typical IT resource workflow is a three-tier Web hosting architecture including a Web server, an application server, and a database server. Thus the availability of a workflow is given by: P(W) = Ym j=1 (P(Cj )) (4) where P(W) is the availability capability for workflow W that links m IT resources, and P(Cj ) is the availability capability of IT resource Cj . Fig. 8 depicts an example with three IT resources, one resource group, and an IT resource workflow derived from the business workflow. The availability capability of this business workflow is: P(W) = P(RG1)P(C3) = (1 − (1 − P(C1))(1 − P(C2)))P(C3) (5)
Workflow multivariate optimization problem and calculate a near-optimal App HA enhancement recommendation using the Lagrange multi- plier method [10],which is computationally more effective Dependency chain that Janakiraman et al.'s approach. Assume the number of workflows whose availability re- Group Resource quirements have not yet been met is n.For workflow Wi Reso■ce Resource DB Instance we define the enhancement parameter PWi as the amount WAS WAS by which that workflow's current availability needs to be ↓ enhanced to meet the availability requirement P: 03 03 OS Hardware Haoae Hardware P PW:=P(Wi) (7) Fig.8.Availability calculation example By definition,PWi>1.We also define the enhancement parameter for each resource as PC1,PC2,...,PCm.This C.Weak-Point Analysis Algorithm yields the following constraints: Given the workflow-resource relationship matrix,we can calculate the current availability capability for each business workflow.As seen in Section II.B,we denote the availability PW1≤PCB,1.PC2.PCRm of the m IT resources as P(C1).P(C2),P(C3),...P(Cm). These availabilities can be calculated based on availability PW2≤PCR.PCa2.…PCRm characteristics of the individual resource components,which 4 (8) are derived from historical measurements or manufacturer's PW≤PC1.PC2PCRm evaluation data.As captured in the workflow-resource re- lationship matrix,a workflow can depend on a given IT PWn≤PCR1.PC2.PCRm resource in several ways.Assume that the relevant IT resources appear several times in an IT resource workflow.We can then calculate the current availability for each workflow by using In other words.the overall availability enhancement for Equation 6: the IT resources within the workflow should be no less than the availability enhancement requirement for the workflow. P(w)=ΠI(P(C)R) To simplify the calculations,we take the logarithm of the (6) inequalities in Equation 8. j=1 where P(Wi)is the current availability capability for work- flow Wi,and Rij is the number of times resource Ci is referenced by workflow Wi.We then compare the calculated n(PWi)≤R1,1·ln(PC)+…+R1,m·n(PCm) availability with the workflow availability requirement Pi:if n(PW2)≤R2.1·ln(PC1)+.+R2,m·n(PCm) P(Wi)>P,the requirement is met;otherwise,the avail- ability requirement is unsatisfied,and some resources in the n(PW)≤Ri.1·ln(PC)+.+R.m·ln(PCm) (9) resource list of workflow Wi need to have their availability enhanced using some HA pattern.This is an optimization In (PWn)<Rn.1.In (PC1)+...+Rn,m.In(PCm) problem: Find which resources should be enhanced for the availability to meet the availability requirements, For notational convenience,we replace In(PC1), while keeping HA enhancement cost as low as In (PC2)...In(PCm)by X1,X2....Xm:there exists possible. 0≤X,≤h(pa)because1≤PC,≤p:Fora A simple method of addressing an optimization problem failover HA pattern where only one primary server and one is by enumerating all possible solutions and comparing their standby server exist in the cluster,we can adjust the upper cost;however,this is computationally expensive for all but bound to In().For cluster HA pattems,weca the simplest problems.Janakiraman et al.[9]propose an adjust the lower bound from 0 to In()if we P(C)】 approach to search for the optimal solution through multi-tier want the initial cluster size to be n;instead of 1,and we system design,based on exhaustive iteration.In our weak- substitute B1.B2....,Bn for In(PW1),In(PW2),...,In(PWn). point analysis methodology,we represent the problem as a Therefore the following constraints should be satisfied:
Fig. 8. Availability calculation example C. Weak-Point Analysis Algorithm Given the workflow-resource relationship matrix, we can calculate the current availability capability for each business workflow. As seen in Section II.B, we denote the availability of the m IT resources as P(C1), P(C2), P(C3), ..., P(Cm). These availabilities can be calculated based on availability characteristics of the individual resource components, which are derived from historical measurements or manufacturer’s evaluation data. As captured in the workflow-resource relationship matrix, a workflow can depend on a given IT resource in several ways. Assume that the relevant IT resources appear several times in an IT resource workflow. We can then calculate the current availability for each workflow by using Equation 6: P(Wi) = Ym j=1 (P(Cj ) Ri,j ) (6) where P(Wi) is the current availability capability for work- flow Wi , and Ri,j is the number of times resource Cj is referenced by workflow Wi . We then compare the calculated availability with the workflow availability requirement Pi : if P(Wi) ≥ Pi , the requirement is met; otherwise, the availability requirement is unsatisfied, and some resources in the resource list of workflow Wi need to have their availability enhanced using some HA pattern. This is an optimization problem: Find which resources should be enhanced for the availability to meet the availability requirements, while keeping HA enhancement cost as low as possible. A simple method of addressing an optimization problem is by enumerating all possible solutions and comparing their cost; however, this is computationally expensive for all but the simplest problems. Janakiraman et al. [9] propose an approach to search for the optimal solution through multi-tier system design, based on exhaustive iteration. In our weakpoint analysis methodology, we represent the problem as a multivariate optimization problem and calculate a near-optimal HA enhancement recommendation using the Lagrange multiplier method [10], which is computationally more effective that Janakiraman et al.’s approach. Assume the number of workflows whose availability requirements have not yet been met is n. For workflow Wi we define the enhancement parameter PWi as the amount by which that workflow’s current availability needs to be enhanced to meet the availability requirement Pi : PWi = Pi P(Wi) (7) By definition, PWi ≥ 1. We also define the enhancement parameter for each resource as P C1, P C2, ..., P Cm. This yields the following constraints: PW1 ≤ P CR1,1 1 · P CR1,2 2 · ... · P CR1,m m PW2 ≤ P CR2,1 1 · P CR2,2 2 · ... · P CR2,m m ... PWi ≤ P CRi,1 1 · P CRi,2 2 · ... · P CRi,m m ... PWn ≤ P CRn,1 1 · P CRn,2 2 · ... · P CRn,m m (8) In other words, the overall availability enhancement for the IT resources within the workflow should be no less than the availability enhancement requirement for the workflow. To simplify the calculations, we take the logarithm of the inequalities in Equation 8. ln (PW1) ≤ R1,1 · ln (P C1) + ... + R1,m · ln (P Cm) ln (PW2) ≤ R2,1 · ln (P C1) + ... + R2,m · ln (P Cm) ... ln (PWi) ≤ Ri,1 · ln (P C1) + ... + Ri,m · ln (P Cm) ... ln (PWn) ≤ Rn,1 · ln (P C1) + ... + Rn,m · ln (P Cm) (9) For notational convenience, we replace ln (P C1), ln (P C2),..., ln (P Cm) by X1, X2,...,Xm: there exists 0 ≤ Xi ≤ ln ( 1 P (Ci) ) because 1 ≤ P Ci ≤ 1 P (Ci) . For a failover HA pattern where only one primary server and one standby server exist in the cluster, we can adjust the upper bound to ln( 1−(1−P (Ci))2 P (Ci) ). For cluster HA patterns, we can adjust the lower bound from 0 to ln( 1−(1−P (Ci))ni P (Ci) ) if we want the initial cluster size to be ni instead of 1, and we substitute B1, B2,...,Bn for ln(PW1), ln(PW2),..., ln(PWn). Therefore the following constraints should be satisfied:
B1≤R1,1·X1+…+B1,m·Xm f(X1,X2,,Xm)=fi(n1,X1)+f2(n2,X2)+ B2≤R2,1·X1+…+R2.m·Xm m +fm(nm;Xm)= ∑f(n,X) (11) B:≤R.1·X1+.+R,m·Xm i=1 For example,the utility function fi(ni,Xi)can be defined as: Bn≤Rn,1·X1++Rn,m·Xm (10) 0≤X1≤ln(pca】 1 fi(ni,Xi)=Ei(n;-ni) (12) 0≤X2≤ln(pC) where n;denotes the cluster size of resource Ci after HA 0≤Xm≤ln(pa) enhancement,and E denotes the HA enhancement cost per unit (e.g.,it can include the initial fixed cost for purchasing hardware and software,and the annual maintenance cost).The The above constraints form a continuous region for the utility function is then determined by the business service solutions in the multi-dimensional space S(X1,X2.X3..... providers who want to provide appropriate IT resources to Xm).Let the utility function f be the overall cost of the HA support their business services at an appropriate cost;it may enhancement.Let us prove that the closed lower boundaries vary according to their demands.We can now calculate n; of the solution space include the optimal solution for the according to Xi. minimum enhancement cost (i.e.,we can achieve the optimal solution for the utility function f subject to the constrained solution space of the closed lower boundaries). P'(C)=1-(1-P(C)4 Theorem 1:The closed lower boundaries of the solution P(C)=P(C)·PC region in the multi-dimensional space S(X1,X2.X3.....Xm) Xi=In(PCi) include the optimal solution Ppt .In(1-P(Ci).ex) Proof:Assume there exists an optimal solution point P(X1,X2,...,Xm)in the constraint space beyond the closed →n4=[1m(1-PC) (13) lower boundaries;we need to prove that there exists a solution In the above formula,P(C:)denotes the enhanced avail- point that is a better solution than point P,and therefore the ability for resource Ci and P(Ci)denotes the availability of optimal solution Popt is located in the closed lower boundaries one single resource.Therefore the optimized recommendation of the constraint space. can be calculated with the utility function subject to the We definex,as the mapping from point constraint with the equation g(X1,X2.....Xm)=0.By using Pi(X1:...Xi,....Xm)to point Pi(X1,...,Xi,....Xm) the Lagrange multiplier method [10],we construct the auxil- in the closed lower boundary B:along decreasing direction iary function F(X1,X2,...,Xm,A)to calculate the optimized in the X;dimension: recommendation (see Equation 14),where f(X1,X2,...,Xm) B(X1,,X,,Xm)→x4P(X1,,X,,Xnm) denotes the utility function and g(X1,X2,...,Xm)denotes the .0<X:<X;and the utility function f always has function for the constraint space: positive correlation with enhancement parameter Xi, ..f(P(X1,...Xi,....Xm))<f(P(X1,....Xi,....Xm)). F(X1;X2,...,Xm;A)=f(X1,X2,...,Xm) Therefore solution P(X1,...,Xi,...,Xm)has lower cost +入·g(X1,X2,,Xm) (14) than P(X1,...Xi,...,Xm).Thus the former assumption that P(X1,X2,...,Xm)is an optimal solution point is untenable, By calculating the following partial derivatives according to which proves that the optimal solution exists on some closed the Lagrange multiplier method,we finally get the optimized lower boundary of the constraint space. recommendation (X1,X2,...,Xm). Therefore,the closed lower boundaries for the constraint space can be expressed with the equation: a晟FX1,X2,,Xm,)=0 g(X1,X2,,Xm)=0 FX1X2....Xm)=0 (15) where g(X1,X2,...,Xm)is a piecewise function that depicts the different closed boundaries.The optimized HA enhance- 最F(X1,X2,Xm,A)=0 ment recommendation is eventually determined by the overall With the optimized HA enhancement recommendation utility function.The utility function for a given resource (X1,X2,...Xm),we can get the enhanced availabilities Ci is associated with two parameters:ni,the original HA (P(C1),P(C2),..,P(Cm)),and the exact HA solutions cluster size of resource Ci(for standalone resources,ni is can be found (e.g.,whether a cluster should be constructed set to 1),and Xi,the enhancement parameter for resource Ci. and the size of that cluster).Assume we need n members Therefore,the utility function for resource Ci can be expressed to support the HA cluster;the availability capability for the as fi(ni,Xi),and the overall cost as: cluster is:
B1 ≤ R1,1 · X1 + ... + R1,m · Xm B2 ≤ R2,1 · X1 + ... + R2,m · Xm ... Bi ≤ Ri,1 · X1 + ... + Ri,m · Xm ... Bn ≤ Rn,1 · X1 + ... + Rn,m · Xm 0 ≤ X1 ≤ ln( 1 P (C1) ) 0 ≤ X2 ≤ ln( 1 P (C2) ) ... 0 ≤ Xm ≤ ln( 1 P (Cm) ) (10) The above constraints form a continuous region for the solutions in the multi-dimensional space S(X1, X2, X3,..., Xm). Let the utility function f be the overall cost of the HA enhancement. Let us prove that the closed lower boundaries of the solution space include the optimal solution for the minimum enhancement cost (i.e., we can achieve the optimal solution for the utility function f subject to the constrained solution space of the closed lower boundaries). Theorem 1: The closed lower boundaries of the solution region in the multi-dimensional space S(X1, X2, X3,..., Xm) include the optimal solution Popt. Proof: Assume there exists an optimal solution point P(X1, X2, ..., Xm) in the constraint space beyond the closed lower boundaries; we need to prove that there exists a solution point that is a better solution than point P, and therefore the optimal solution Popt is located in the closed lower boundaries of the constraint space. We define ⇒Xi as the mapping from point P1(X1, ..., Xi , ..., Xm) to point Pi(X1, ..., X0 i , ..., Xm) in the closed lower boundary Bi along decreasing direction in the Xi dimension: P1(X1, ..., Xi , ..., Xm) ⇒Xi Pi(X1, ..., X0 i , ..., Xm). ∵ 0 < X0 i < Xi and the utility function f always has positive correlation with enhancement parameter Xi , ∴ f(P1(X1, ..., X0 i , ..., Xm)) < f(P1(X1, ..., Xi , ..., Xm)). Therefore solution Pi(X1, ..., X0 i , ..., Xm) has lower cost than P1(X1, ..., Xi , ..., Xm). Thus the former assumption that P(X1, X2, ..., Xm) is an optimal solution point is untenable, which proves that the optimal solution exists on some closed lower boundary of the constraint space. Therefore, the closed lower boundaries for the constraint space can be expressed with the equation: g(X1, X2, ..., Xm) = 0, where g(X1, X2, ..., Xm) is a piecewise function that depicts the different closed boundaries. The optimized HA enhancement recommendation is eventually determined by the overall utility function. The utility function for a given resource Ci is associated with two parameters: ni , the original HA cluster size of resource Ci (for standalone resources, ni is set to 1), and Xi , the enhancement parameter for resource Ci . Therefore, the utility function for resource Ci can be expressed as fi(ni , Xi), and the overall cost as: f(X1, X2, ..., Xm) = f1(n1, X1) + f2(n2, X2) + ... +fm(nm, Xm) = Xm i=1 fi(ni , Xi) (11) For example, the utility function fi(ni , Xi) can be defined as: fi(ni , Xi) = Ei(n 0 i − ni) (12) where n 0 i denotes the cluster size of resource Ci after HA enhancement, and Ei denotes the HA enhancement cost per unit (e.g., it can include the initial fixed cost for purchasing hardware and software, and the annual maintenance cost). The utility function is then determined by the business service providers who want to provide appropriate IT resources to support their business services at an appropriate cost; it may vary according to their demands. We can now calculate n 0 i according to Xi . P 0 (Ci) = 1 − (1 − P(Ci))n 0 i P 0 (Ci) = P(Ci) · P Ci Xi = ln(P Ci) ⇒ n 0 i = d ln(1 − P(Ci) · e Xi ) ln(1 − P(Ci)) e (13) In the above formula, P 0 (Ci) denotes the enhanced availability for resource Ci and P(Ci) denotes the availability of one single resource. Therefore the optimized recommendation can be calculated with the utility function subject to the constraint with the equation g(X1, X2, ..., Xm) = 0. By using the Lagrange multiplier method [10], we construct the auxiliary function F(X1, X2, ..., Xm, λ) to calculate the optimized recommendation (see Equation 14), where f(X1, X2, ..., Xm) denotes the utility function and g(X1, X2, ..., Xm) denotes the function for the constraint space: F(X1, X2, ..., Xm, λ) = f(X1, X2, ..., Xm) +λ · g(X1, X2, ..., Xm) (14) By calculating the following partial derivatives according to the Lagrange multiplier method, we finally get the optimized recommendation (X1, X2, ..., Xm). ∂ ∂X1 F(X1, X2, ..., Xm, λ) = 0 ∂ ∂X2 F(X1, X2, ..., Xm, λ) = 0 ... ∂ ∂λ F(X1, X2, ..., Xm, λ) = 0 (15) With the optimized HA enhancement recommendation (X1, X2, ..., Xm), we can get the enhanced availabilities (P 0 (C1), P0 (C2), ..., P0 (Cm)), and the exact HA solutions can be found (e.g., whether a cluster should be constructed and the size of that cluster). Assume we need n members to support the HA cluster; the availability capability for the cluster is:
priority list,the top g resources can be selected to calculate P(C)=1-(1-P(C)" (16) the HA enhancement recommendation;the calculated result is According to Equation 13,the size of the cluster can be a near optimal solution only for the g candidate resources that are taken into consideration,but the computation complexity calculated as follows: can be reduced according to the selected number g (i.e.,the In(1-P(Ci)) calculation is only based on the selected IT resources). n= In(1-P(Ci)) (17) This weight-based optimization mechanism provides a flexi- ble trade-off between quality and performance.For very large- Leveraging the domain information for the resource compo- scale deployment topologies,adapting to the dynamics of nent,the HA cluster pattern can be generated and configured the environment usually requires making decisions to achieve into the deployment topology. business agility.In this scenario,weight-based optimization D.Computational Complexity becomes important and useful to quickly generate a suboptimal enhancement recommendation,rather than finding the optimal In this section,we compare the computational complexity of the exhaustive iteration method with that of our weak-point solution too late (due to computation time).Moreover,in our experimental evaluation,we found that this performance analysis methodology. optimization not only reduced computation complexity,but Assume that there exist n candidate IT resources that may need to be HA enhanced.We set the upper bound for also generated the same result as the original weak-point analysis methodology,when the top 60%of the IT resources the cluster size of any resource to k(this is necessary for were selected according to their weight.The reason could the iteration method,but not for our methodology).For the be that the weight calculation has properly indicated the exhaustive iteration method,the computational complexity to importance of each resource. reach the optimal solution is k.k.....,i.e.O(k).For our Lagrange multiplier-based method,the solution is calculated E.Practical Considerations by solving the set of equations in Equation 15.The compu- In some cases of HA enhancement,substituting an IT tational complexity is only bound by the number of variables resource with inherently better availability characteristics may in the equations,which have the computational complexity of a polynomial:O(n"),where m is a constant.As a result, be more appropriate than using a cluster or a failover solution. our method better scales and has a much lower computational For example,rather than applying a hot-standby solution to an instance of DB2 running on an x86 platform,it may be complexity than the exhaustive iteration method when the number of candidate IT resources is large. preferable to replace it with an instance of DB2 running on zOS on a mainframe. Furthermore,we use a weight-based optimization mecha- nism to reduce calculation complexity for very large-scale Based on this observation,we propose an algorithm for deployment topologies.Since the number of candidate IT selecting alternative IT resources.Here we abstract our avail- resources for availability enhancement can be extremely large ability weak-point analysis methodology into a function Weak- PointAnalysis().As shown in Algorithm 1,we first generate all in such topologies,it is useful to have a way to reduce the number of candidate IT resources,in order to simplify the possible resource lists and relevant utility functions according calculations required by Equation 15. to the various candidate resource types specified by the user. Next,we use the function WeakPointAnalysis()to calculate The principle of our weight-based optimization mechanism is to select a subset of the IT resources.based on weights.for various solutions according to these lists.This enables us to choose the best solution among all candidates. use in our weak-point analysis methodology.We exploit the As we saw in Section III.D.the exhaustive iteration method fact that enhancing the availability of the resources involved in more workflows with critical availability requirements will has an exponential computing complexity,in contrast to the yield a better overall HA enhancement for the workflows. polynomial computing complexity of our weak-point analy- To this end,we propose the following mechanism to select sis methodology.Conversely,the exhaustive iteration method relevant IT resources. finds the optimal solution,whereas our methodology calculates The weight for resource C;is defined as a near-optimal solution.Consequently,we see that there is a tradeoff between these two methods.The exhaustive iteration method performs better when computing complexity is low; W(Cj)=>(Rij.P) (18) consequently,when the topology only contains a limited i=1 number of resources and their maximum cluster size is not where Ri.;denotes the integer value defined in the workflow-too large,we should use the exhaustive iteration method to resource relationship matrix,and P denotes the availability make sure we calculate the optimal solution;but when the requirement of workflow Wi.The priority list of IT resources topology contains many resources and/or the cluster size is can then be determined according to their weights:the re- large,our Lagrange multiplier-based method is more efficient. sources that support more workflows and more availability- Based on this remark,we propose the algorithm COMB (see critical workflows are given higher weights.According to the Algorithm 2)
P 0 (Ci) = 1 − (1 − P(Ci))n (16) According to Equation 13, the size of the cluster can be calculated as follows: n = d ln(1 − P 0 (Ci)) ln(1 − P(Ci)) e (17) Leveraging the domain information for the resource component, the HA cluster pattern can be generated and configured into the deployment topology. D. Computational Complexity In this section, we compare the computational complexity of the exhaustive iteration method with that of our weak-point analysis methodology. Assume that there exist n candidate IT resources that may need to be HA enhanced. We set the upper bound for the cluster size of any resource to k (this is necessary for the iteration method, but not for our methodology). For the exhaustive iteration method, the computational complexity to reach the optimal solution is k · k · ... · k | {z } n , i.e. O(k n). For our Lagrange multiplier-based method, the solution is calculated by solving the set of equations in Equation 15. The computational complexity is only bound by the number of variables in the equations, which have the computational complexity of a polynomial: O(n m), where m is a constant. As a result, our method better scales and has a much lower computational complexity than the exhaustive iteration method when the number of candidate IT resources is large. Furthermore, we use a weight-based optimization mechanism to reduce calculation complexity for very large-scale deployment topologies. Since the number of candidate IT resources for availability enhancement can be extremely large in such topologies, it is useful to have a way to reduce the number of candidate IT resources, in order to simplify the calculations required by Equation 15. The principle of our weight-based optimization mechanism is to select a subset of the IT resources, based on weights, for use in our weak-point analysis methodology. We exploit the fact that enhancing the availability of the resources involved in more workflows with critical availability requirements will yield a better overall HA enhancement for the workflows. To this end, we propose the following mechanism to select relevant IT resources. The weight for resource Cj is defined as W(Cj ) = Xn i=1 (Ri,j · Pi) (18) where Ri,j denotes the integer value defined in the workflowresource relationship matrix, and Pi denotes the availability requirement of workflow Wi . The priority list of IT resources can then be determined according to their weights: the resources that support more workflows and more availabilitycritical workflows are given higher weights. According to the priority list, the top q resources can be selected to calculate the HA enhancement recommendation; the calculated result is a near optimal solution only for the q candidate resources that are taken into consideration, but the computation complexity can be reduced according to the selected number q (i.e., the calculation is only based on the selected IT resources). This weight-based optimization mechanism provides a flexible trade-off between quality and performance. For very largescale deployment topologies, adapting to the dynamics of the environment usually requires making decisions to achieve business agility. In this scenario, weight-based optimization becomes important and useful to quickly generate a suboptimal enhancement recommendation, rather than finding the optimal solution too late (due to computation time). Moreover, in our experimental evaluation, we found that this performance optimization not only reduced computation complexity, but also generated the same result as the original weak-point analysis methodology, when the top 60% of the IT resources were selected according to their weight. The reason could be that the weight calculation has properly indicated the importance of each resource. E. Practical Considerations In some cases of HA enhancement, substituting an IT resource with inherently better availability characteristics may be more appropriate than using a cluster or a failover solution. For example, rather than applying a hot-standby solution to an instance of DB2 running on an x86 platform, it may be preferable to replace it with an instance of DB2 running on zOS on a mainframe. Based on this observation, we propose an algorithm for selecting alternative IT resources. Here we abstract our availability weak-point analysis methodology into a function WeakPointAnalysis(). As shown in Algorithm 1, we first generate all possible resource lists and relevant utility functions according to the various candidate resource types specified by the user. Next, we use the function WeakPointAnalysis() to calculate various solutions according to these lists. This enables us to choose the best solution among all candidates. As we saw in Section III.D, the exhaustive iteration method has an exponential computing complexity, in contrast to the polynomial computing complexity of our weak-point analysis methodology. Conversely, the exhaustive iteration method finds the optimal solution, whereas our methodology calculates a near-optimal solution. Consequently, we see that there is a tradeoff between these two methods. The exhaustive iteration method performs better when computing complexity is low; consequently, when the topology only contains a limited number of resources and their maximum cluster size is not too large, we should use the exhaustive iteration method to make sure we calculate the optimal solution; but when the topology contains many resources and/or the cluster size is large, our Lagrange multiplier-based method is more efficient. Based on this remark, we propose the algorithm COMB (see Algorithm 2)
Algorithm 1 HA Weak-Point Analysis Algorithm with Alter- Algorithm 2 COMB native Resource Selection Input:ResourceList,WorkflowList,UtilityFunction Input:ResourceList,UtilityFunction,Topology,Work- Procedure: fowList Evaluate the computing complexity C for current case Procedure: if C1 then (n1,.,nm =OptimizedSolutionCalculation //This resource has alternative component selection (ResourceList,UtilityFunction,WorkflowList) for every CandidateResource CRj in Resource Ci end if do Output (n1,n2,...,nm) CR;=GetCandidateResource(Ci) ResourceList=GenerateResourceList(CRj,Ci) UtilityFunction=GenerateUtilityFunction(CRj,Ci) Two kinds of relationships are defined in the core meta- AddtoResourceListPool(ResourceList) model:HostingLink and DependencyLink.HostingLink spec- AddtoUtility FunctionPool(Utility Function) ifies that one Unit will be the host for another Unit (e.g.,a end for end if server will be the host for an operating system).These links are restricted according to the Capabilities of the hosting Unit end for and the Requirements of the hosted Unit.DependencyLink for every ResourceList and counterpart UtilityFunction in specifies that one Unit (the source)has some (non-hosting) dependency on another Unit (the target);again,these are ResourceListPool and UtilityFunctionPool do [Cost,Solution ]=WeakPointAnalysis(ResourceList, restricted according to the Requirements of the source and the Capabilities of the target.The core metamodel is further UtilityFunction,Topology,WorkflowList) extended by domain-specific metamodels;for example,server, if Cost<MinCost then operating system,and database domains can be further defined. MinCost=Cost The defined HostingLink and DependencyLink are directly OptimizedSolution=Solution end if used for our workflow mapping as described in Section II. Based on these metamodels,an implementation of this SOA end for deployment framework has been built to help create SOA Output MinCost,OptimizedSolution deployment topologies [11]. B.Weak-Point Analysis Tool First,we evaluate the computing complexity C for the As depicted in Fig.9,our weak-point analysis tool is added current case,according to the number of IT resources and their to the SOA deployment framework.We use BPEL to specify maximum cluster sizes.Next,we compare it with a threshold business workflows,which are further mapped over the SOA value Cthreshold to decide whether to use the iteration method deployment topology.Business workflows can be described in or our weak-point analysis methodology. two ways:executable business workflows that model the actual behavior of a participant in a business interaction;or abstract IV.IMPLEMENTATION AND LESSONS LEARNED business workflows that fulfill a descriptive role and usually To assess the usefulness of our weak-point analysis method-hide some of the concrete operational details.BPEL business ology in real-life scenarios,we implemented it as an extension workflows refer here to abstract workflows,and therefore do to IBM's SOA deployment framework [11]. not capture detailed interaction behaviors. As described in SectionsⅡandIⅡ,the tool includes three A.SOA Deployment Framework modules.The Workflow Specification Module takes a deploy- The underlying SOA deployment framework is a model- ment topology and business workflows expressed in BPEL driven platform based on a core metamodel and various as inputs,and constructs the workflow-resource relationship domain-specific metamodels.The core metamodel is a rep- matrix.The Weak-Point Analysis Module uses this matrix to resentation that captures the common aspects of the IT in- identify availability weak points in the deployment topology, frastructure configuration syntax,structure,and semantics.In and produces HA enhancement parameters.Finally,using this core metamodel,the type"Unit"is defined to capture the these parameters,the HA Pattern Mapping Module generates IT resource components (e.g..database systems or operating proper patterns for each weak point and automatically trans- systems).The function of a resource component is defined forms the SOA deployment topology. via Capabilities attached to its representative Unit;its require- In the Weak-Point Analysis Module,two methods are im- ments are defined by formal Requirement specifications on the plemented:the exhaustive iteration method and our weak-point Unit. analysis methodology
Algorithm 1 HA Weak-Point Analysis Algorithm with Alternative Resource Selection Input: ResourceList, UtilityFunction, Topology, Work- flowList Procedure: M inCost = M axNumber OptimizedSolution=Null for every Resource Ci in ResourceList do if NumofCandidateResource(Ci)>1 then //This resource has alternative component selection for every CandidateResource CRj in Resource Ci do CRj=GetCandidateResource(Ci) ResourceList=GenerateResourceList(CRj , Ci) UtilityFunction=GenerateUtilityFunction(CRj , Ci) AddtoResourceListPool(ResourceList) AddtoUtilityFunctionPool(UtilityFunction) end for end if end for for every ResourceList and counterpart UtilityFunction in ResourceListPool and UtilityFunctionPool do [Cost,Solution]=WeakPointAnalysis(ResourceList, UtilityFunction, Topology, WorkflowList) if Cost<MinCost then M inCost=Cost OptimizedSolution=Solution end if end for Output M inCost,OptimizedSolution First, we evaluate the computing complexity C for the current case, according to the number of IT resources and their maximum cluster sizes. Next, we compare it with a threshold value Cthreshold to decide whether to use the iteration method or our weak-point analysis methodology. IV. IMPLEMENTATION AND LESSONS LEARNED To assess the usefulness of our weak-point analysis methodology in real-life scenarios, we implemented it as an extension to IBM’s SOA deployment framework [11]. A. SOA Deployment Framework The underlying SOA deployment framework is a modeldriven platform based on a core metamodel and various domain-specific metamodels. The core metamodel is a representation that captures the common aspects of the IT infrastructure configuration syntax, structure, and semantics. In this core metamodel, the type “Unit” is defined to capture the IT resource components (e.g., database systems or operating systems). The function of a resource component is defined via Capabilities attached to its representative Unit; its requirements are defined by formal Requirement specifications on the Unit. Algorithm 2 COMB Input: ResourceList, WorkflowList, UtilityFunction Procedure: Evaluate the computing complexity C for current case if C < Cthreshold then (n1, ..., nm)=IterationbasedCalculation(ResourceList, WorkflowList) else (n1, ..., nm) =OptimizedSolutionCalculation (ResourceList, UtilityFunction, WorkflowList) end if Output (n1, n2, ..., nm) Two kinds of relationships are defined in the core metamodel: HostingLink and DependencyLink. HostingLink specifies that one Unit will be the host for another Unit (e.g., a server will be the host for an operating system). These links are restricted according to the Capabilities of the hosting Unit and the Requirements of the hosted Unit. DependencyLink specifies that one Unit (the source) has some (non-hosting) dependency on another Unit (the target); again, these are restricted according to the Requirements of the source and the Capabilities of the target. The core metamodel is further extended by domain-specific metamodels; for example, server, operating system, and database domains can be further defined. The defined HostingLink and DependencyLink are directly used for our workflow mapping as described in Section II. Based on these metamodels, an implementation of this SOA deployment framework has been built to help create SOA deployment topologies [11]. B. Weak-Point Analysis Tool As depicted in Fig. 9, our weak-point analysis tool is added to the SOA deployment framework. We use BPEL to specify business workflows, which are further mapped over the SOA deployment topology. Business workflows can be described in two ways: executable business workflows that model the actual behavior of a participant in a business interaction; or abstract business workflows that fulfill a descriptive role and usually hide some of the concrete operational details. BPEL business workflows refer here to abstract workflows, and therefore do not capture detailed interaction behaviors. As described in Sections II and III, the tool includes three modules. The Workflow Specification Module takes a deployment topology and business workflows expressed in BPEL as inputs, and constructs the workflow-resource relationship matrix. The Weak-Point Analysis Module uses this matrix to identify availability weak points in the deployment topology, and produces HA enhancement parameters. Finally, using these parameters, the HA Pattern Mapping Module generates proper patterns for each weak point and automatically transforms the SOA deployment topology. In the Weak-Point Analysis Module, two methods are implemented: the exhaustive iteration method and our weak-point analysis methodology