1 Are we connected?Optimal Determination of Source-destination Connectivity in Random Networks Luoyi Fu,Xinbing Wang Dept.of Elec.and Engin. Shanghai Jiao Tong University,China Email:[yiluofu,xwang8}@sjtu.edu.cn P.R.Kumar Dept.of Elec.and Comp.Engin. Texas A&M University,USA Email:prk.tamu @gmail.com Abstract-This paper investigates the problem of optimally this graph is connected with probability approaching one as determining source-destination connectivity in random networks. n-oo,if p(n),the probability p as a function of n,satisfies Viewing the network as a random graph,we start by investigating the ER Graph,as well structured graph where,interesting,the p(n)>and contains isolated nodes with probability problem appears to be open.The problem examined is that of approaching one as noo,if p(n)<1)inn.We will determining whether a given pair of nodes,a source S and a refer to the G(n,p)model as the ER graph,in conformity destination D,are connected by a path.Assuming that at each with common usage.In a wireless networking context,Gilbert step one edge can be tested to see if it exists or not,we determine [11],and,more recently,Penrose [12]and Gupta and Kumar an optimal policy that minimizes the total expected number of steps. [13],have studied geometric random graphs where nodes The optimal policy has several interesting features.In order have random uniform i.i.d.locations in a unit disk,and to establish connectivity of S and D,a policy needs to check showed that if the radio transmission range of each node is all edges on some path to see if they all exist,but to establish disconnectivity it has to check all edges on some cut to see if r(n)=,then the whole network is connected none of them exists.The optimal policy has the following form. with probability approaching one as noo,if and only if At each step it examines the condensation multigraph formed by c(n)+oo.Ever since,there has been much research effort contracting each known connected component to a single node, directed towards studying asymptotic connectivity of randomly and then checks an edge that is simultaneously on a shortest S-D distributed wireless networks [14]-[28]. path as well as in a minimum S-D cut.Among such edges,it chooses that which leads to the most opportunities for connection. The above works all focus on the connectivity of the Interestingly,for an ER graph with n nodes,where there is an entire network in the asymptotic regime where the number edge between two nodes with probability p,the optimal strategy of nodes goes to infinity.The focus in this paper is different does not depend on p or n,even though the entire graph itself undergoes a sharp transition from disconnectivity to connectivity in the following three ways:(i)It is non-asymptotic,(ii)it around p In n/n.The policy is efficiently implementable is specific to the specific realization of the random network requiring no more than 30log2n operations to determine which that is being studied,and,(iii)instead of studying the whole edge to test next. network's connectivity,the issue is the connectivity between The result also extends to some more general graphs,and a specified source S and destination D.It is also different in meanwhile provide useful insights into the connectivity determi- that rather than giving conditions for connectivity,we examine nation in random networks. the fundamental problem of designing a policy that determines Index Termns-Optimal,Connectivity,Random Graphs the connectivity of S and D in minimum expected number of steps,where at each step one edge can be chosen and tested I.INTRODUCTION to see if it exists.The policy we present concludes either with the discovery of a path or the discovery of a cut between S Connectivity of random networks has long been a topic and D.It is designed to reach a conclusion regarding one or of intensive study.A widely studied model,known as the the other after examining the fewest expected number of edges G(n,p)model,was proposed by Gilbert [1]in 1959.It refers in ER graphs. to a random network modeled as a graph with n nodes,with an edge existing independently between any pair of nodes The problem has prominent applications in different areas. with probability p.Erdos and Renyi [2].[3]showed that For example,modeling the unreliable network as an indepen- dent random graph,Cox et al.derive a strategy of testing the The early version of this paper is appeared in the Proceedings of ACM connectivity between network nodes with minimum expecting MobiHoc 2014 [40]. cost [31].A random graph can also encodes the structure of
1 Are we connected? Optimal Determination of Source-destination Connectivity in Random Networks Luoyi Fu, Xinbing Wang Dept. of Elec. and Engin. Shanghai Jiao Tong University, China Email: {yiluofu,xwang8}@sjtu.edu.cn P. R. Kumar Dept. of Elec. and Comp. Engin. Texas A&M University, USA Email: prk.tamu@gmail.com Abstract—This paper investigates the problem of optimally determining source-destination connectivity in random networks. Viewing the network as a random graph, we start by investigating the ER Graph, as well structured graph where, interesting, the problem appears to be open. The problem examined is that of determining whether a given pair of nodes, a source S and a destination D, are connected by a path. Assuming that at each step one edge can be tested to see if it exists or not, we determine an optimal policy that minimizes the total expected number of steps. The optimal policy has several interesting features. In order to establish connectivity of S and D, a policy needs to check all edges on some path to see if they all exist, but to establish disconnectivity it has to check all edges on some cut to see if none of them exists. The optimal policy has the following form. At each step it examines the condensation multigraph formed by contracting each known connected component to a single node, and then checks an edge that is simultaneously on a shortest S-D path as well as in a minimum S-D cut. Among such edges, it chooses that which leads to the most opportunities for connection. Interestingly, for an ER graph with n nodes, where there is an edge between two nodes with probability p, the optimal strategy does not depend on p or n, even though the entire graph itself undergoes a sharp transition from disconnectivity to connectivity around p = ln n/n. The policy is efficiently implementable, requiring no more than 30log2 n operations to determine which edge to test next. The result also extends to some more general graphs, and meanwhile provide useful insights into the connectivity determination in random networks. Index Terms—Optimal, Connectivity, Random Graphs I. INTRODUCTION Connectivity of random networks has long been a topic of intensive study. A widely studied model, known as the G(n, p) model, was proposed by Gilbert [1] in 1959. It refers to a random network modeled as a graph with n nodes, with an edge existing independently between any pair of nodes with probability p. Erdos and Renyi [2], [3] showed that The early version of this paper is appeared in the Proceedings of ACM MobiHoc 2014 [40]. this graph is connected with probability approaching one as n → ∞, if p(n), the probability p as a function of n, satisfies p(n) > (1+ϵ) ln n n , and contains isolated nodes with probability approaching one as n → ∞, if p(n) < (1−ϵ) ln n n . We will refer to the G(n, p) model as the ER graph, in conformity with common usage. In a wireless networking context, Gilbert [11], and, more recently, Penrose [12] and Gupta and Kumar [13], have studied geometric random graphs where nodes have random uniform i.i.d. locations in a unit disk, and showed that if the radio transmission range of each node is r(n) = √ ln n+c(n) nπ2 , then the whole network is connected with probability approaching one as n → ∞, if and only if c(n) → +∞. Ever since, there has been much research effort directed towards studying asymptotic connectivity of randomly distributed wireless networks [14]- [28]. The above works all focus on the connectivity of the entire network in the asymptotic regime where the number of nodes goes to infinity. The focus in this paper is different in the following three ways: (i) It is non-asymptotic, (ii) it is specific to the specific realization of the random network that is being studied, and, (iii) instead of studying the whole network’s connectivity, the issue is the connectivity between a specified source S and destination D. It is also different in that rather than giving conditions for connectivity, we examine the fundamental problem of designing a policy that determines the connectivity of S and D in minimum expected number of steps, where at each step one edge can be chosen and tested to see if it exists. The policy we present concludes either with the discovery of a path or the discovery of a cut between S and D. It is designed to reach a conclusion regarding one or the other after examining the fewest expected number of edges in ER graphs. The problem has prominent applications in different areas. For example, modeling the unreliable network as an independent random graph, Cox et al. derive a strategy of testing the connectivity between network nodes with minimum expecting cost [31]. A random graph can also encodes the structure of
3 a priced information,with each edge representing a piece of with high probability for large n,while for p<it information(data)and its testing cost corresponds to the price is disconnected with high probability.Thus,one may possibly of the data,then the optimal strategy enable us to successfully expect that the optimal strategy will depend on the value of query the information paying minimum price [32].Other appli- p.Very interestingly,however,the optimal policy does not cation is a social network graph where,say,an edge denotes depend on p or n at all! a first cousin relationship,and it is of interest to establish The main result is the optimality of the following simple whether two individuals from a large population are distant testing strategy:At each stage,form the condensation multi- cousins.We consider situations where determining whether graph by contracting each known connected component to a an edge exists between two individuals is a very expensive single node.It is a multigraph since the edges between nodes procedure involving costly genetic testing,outweighing any are inherited by the components to which they are contracted. computational cost.An edge could of course represent a In the set of edges that lie both on a shortest S-D path as well variety of other relationships that are expensive to check,for on the minimal S-D cut,test an edge that leads to the most example due to confidentiality or physical restrictions.In those opportunities for connectivity.The policy also has minimal scenarios where users do not have prior knowledge of the complexity:it requires no more than 30log2n operations to whole network structure,the link existence between nodes determine which edge to test next. is usually characterized with a preassigned probability,and The policy is illustrated for a four-node ER graph in Figure the algorithm for the problem gives us an efficient way to 1.The proof of optimality of the policy follows from the discover the relationship between nodes,which has significant proofs of optimality of the following three rules, which, use in link prediction [33.And by applying the algorithm when combined together give rise to the policy equivalently to different node pairs,the structure of the social networks described above in terms of the condensation multigraph: can also be revealed [34],[35].Also,social network routing Rule 1:The testing starts with a tactic that can lead to early can be performed on the basis of the paths discovered by termination by finding an S-D path:it first tests the edges,i.e., the algorithm [36].We note that we are not interested in one-hop paths,connecting the component containing S and the determining the shortest path between the two nodes.which is component containing D.Rule 2:If there are no such edges, a very well studied problem with an O(n log n)computational then it switches to checking for S-D disconnectivity by testing running time solution.We are interested in the optimal policy the edges on the S-D cut that contains the minimum number checking the fewest expected number of edges for determining of untested edges.Rule 3:Among the edges in that cut,test whether or not the two nodes are connected.However,we the edge that leads to most opportunities for connection.These do show that our optimal policy for determining which edge three rules optimally resolve the tension between checking for to check next requires a computational effort no more than paths and cuts.Based on the policy designed under ER graphs. 30log2n operations,where n is the number of nodes.Other we further extend it to certain types of more generalized graph potential applications include random sensor networks [37], structures consisting of additional deterministic structure P2P networks [38],and VANETs [39]. The rest of this paper is organized as follows.We give We start by considering the classic ER random graph,with literature review in Section 2 and introduce models and a designated source S and destination D.The independence of definitions in Section 3.In Section 4,we describe the policy edge occurrences and their equal probabilities turn out to give for determining S-D connectivity in an ER random graph.The ER graphs a great advantage in terms of solvability over other optimality of the policy is proved in Section 5.In Section 6,we more complicated graph structures.Subsequently we consider discuss the extension of the result to other graph models,and some slightly more general graphs in Section 5,where there is give concluding remarks and future directions in Section 7. some additional deterministic structure known a priori about Details of the proofs,when not provided in-line,are gathered the presence or absence of certain edges. in the Appendix. The question of whether S and D are connected can be resolved if one can either display an S-D path,or an S-D cut.However,we do not know a priori whether the graph is II.RELATED WORKS connected or not,thereby making it difficult to know what to do -try to find a path or try to find a cut.The optimal We note that there is no other work,to the best of our policy has to be dynamic,based on the presence or absence of knowledge,which addresses connectivity between a designat- previously tested edges.This inherent tension in the problem ed source and destination in random graphs.Rather,as noted of determining connectivity makes the problem somewhat earlier,it is asymptotic connectivity of the whole graph that challenging.Another difficulty is that we do not know the has been intensively studied.Inspired by Erdos and Renyi optimal solution for general graphs where some edges are [2].[3],there have been many works [4]-[10]dedicated known to exist,some known not to exist,and others existing to investigating connectivity of the entire network when the i.i.d.with probability p,which is what one generally has after number of nodes is sufficiently large. some steps of testing.Thus the proof of optimality is not based There has also been much work motivated by wireless on dynamic programming-like arguments.Yet another aspect networks where connectivity is determined by distance.Gilbert of interest is that the random graph exhibits a phase transition [11]initiated the study of random graphs when nodes are depending on the value of p.For p,the entire randomly distributed on a plane.Suppose that two nodes graph itself,and not just the particular S-D pair,is connected have an edge between them if and only they are less than
2 a priced information, with each edge representing a piece of information(data) and its testing cost corresponds to the price of the data, then the optimal strategy enable us to successfully query the information paying minimum price [32]. Other application is a social network graph where, say, an edge denotes a first cousin relationship, and it is of interest to establish whether two individuals from a large population are distant cousins. We consider situations where determining whether an edge exists between two individuals is a very expensive procedure involving costly genetic testing, outweighing any computational cost. An edge could of course represent a variety of other relationships that are expensive to check, for example due to confidentiality or physical restrictions. In those scenarios where users do not have prior knowledge of the whole network structure, the link existence between nodes is usually characterized with a preassigned probability, and the algorithm for the problem gives us an efficient way to discover the relationship between nodes, which has significant use in link prediction [33]. And by applying the algorithm to different node pairs, the structure of the social networks can also be revealed [34], [35]. Also, social network routing can be performed on the basis of the paths discovered by the algorithm [36]. We note that we are not interested in determining the shortest path between the two nodes, which is a very well studied problem with an O(n log n) computational running time solution. We are interested in the optimal policy checking the fewest expected number of edges for determining whether or not the two nodes are connected. However, we do show that our optimal policy for determining which edge to check next requires a computational effort no more than 30 log2 n operations, where n is the number of nodes. Other potential applications include random sensor networks [37], P2P networks [38], and VANETs [39]. We start by considering the classic ER random graph, with a designated source S and destination D. The independence of edge occurrences and their equal probabilities turn out to give ER graphs a great advantage in terms of solvability over other more complicated graph structures. Subsequently we consider some slightly more general graphs in Section 5, where there is some additional deterministic structure known a priori about the presence or absence of certain edges. The question of whether S and D are connected can be resolved if one can either display an S-D path, or an S-D cut. However, we do not know a priori whether the graph is connected or not, thereby making it difficult to know what to do – try to find a path or try to find a cut. The optimal policy has to be dynamic, based on the presence or absence of previously tested edges. This inherent tension in the problem of determining connectivity makes the problem somewhat challenging. Another difficulty is that we do not know the optimal solution for general graphs where some edges are known to exist, some known not to exist, and others existing i.i.d. with probability p, which is what one generally has after some steps of testing. Thus the proof of optimality is not based on dynamic programming-like arguments. Yet another aspect of interest is that the random graph exhibits a phase transition depending on the value of p. For p > (1+ϵ) ln n n , the entire graph itself, and not just the particular S-D pair, is connected with high probability for large n, while for p < (1−ϵ) ln n n , it is disconnected with high probability. Thus, one may possibly expect that the optimal strategy will depend on the value of p. Very interestingly, however, the optimal policy does not depend on p or n at all! The main result is the optimality of the following simple testing strategy: At each stage, form the condensation multigraph by contracting each known connected component to a single node. It is a multigraph since the edges between nodes are inherited by the components to which they are contracted. In the set of edges that lie both on a shortest S-D path as well on the minimal S-D cut, test an edge that leads to the most opportunities for connectivity. The policy also has minimal complexity: it requires no more than 30log2 n operations to determine which edge to test next. The policy is illustrated for a four-node ER graph in Figure 1. The proof of optimality of the policy follows from the proofs of optimality of the following three rules, which, when combined together give rise to the policy equivalently described above in terms of the condensation multigraph: Rule 1: The testing starts with a tactic that can lead to early termination by finding an S-D path: it first tests the edges, i.e., one-hop paths, connecting the component containing S and the component containing D. Rule 2: If there are no such edges, then it switches to checking for S-D disconnectivity by testing the edges on the S-D cut that contains the minimum number of untested edges. Rule 3: Among the edges in that cut, test the edge that leads to most opportunities for connection. These three rules optimally resolve the tension between checking for paths and cuts. Based on the policy designed under ER graphs, we further extend it to certain types of more generalized graph structures consisting of additional deterministic structure. The rest of this paper is organized as follows. We give literature review in Section 2 and introduce models and definitions in Section 3. In Section 4, we describe the policy for determining S-D connectivity in an ER random graph. The optimality of the policy is proved in Section 5. In Section 6, we discuss the extension of the result to other graph models, and give concluding remarks and future directions in Section 7. Details of the proofs, when not provided in-line, are gathered in the Appendix. II. RELATED WORKS We note that there is no other work, to the best of our knowledge, which addresses connectivity between a designated source and destination in random graphs. Rather, as noted earlier, it is asymptotic connectivity of the whole graph that has been intensively studied. Inspired by Erdos and Renyi [2], [3], there have been many works [4]- [10] dedicated to investigating connectivity of the entire network when the number of nodes is sufficiently large. There has also been much work motivated by wireless networks where connectivity is determined by distance. Gilbert [11] initiated the study of random graphs when nodes are randomly distributed on a plane. Suppose that two nodes have an edge between them if and only they are less than
⊙ neighbors.The condition of full connectivity was relaxed by Four-node E-R Dousse et.al.[18],which analyzed the scenario where the 包 Random Graph sink is connected(in a multihop fashion)to a set of nodes that O ⊙ span the entire network.The impact of node degree,density, Step1. ⊙ ① network dimension as well as boundary effect on connectivity ⊙ ⊙ is also investigated [14],[19]-[22].Due to the emergence of ad hoc networks,mesh networks and sensor networks,there ⊙ ⊙ has also been research interest in analysis of k-connectivity Step2. ① ① [23]-[30],where a network is k-connected if it contains at @ ® least k independent paths between any pair of distinct nodes. @ Step 3. ① ① III.MODEL AND DEFINITIONS ② ② ② @ Consider an ER Random Graph with n nodes.An edge oo o③ ⊙ exists between any pair of nodes with equal probability p, ① 个 independently of others.At each step,one can test a potential Step4. ⊙ @ ⊙ ② 包 (i.e.,untested)edge of the graph to see if it indeed exists.Two specific nodes are identified,labeled S and D.Our objective O ① D is to construct a sequential testing strategy terminating in the Step5. ①y minimal expected number of steps,to determine if S and D 2 ⊙ ⊙ o are connected or not.Termination occurs with Connectivity,when it has verified the existence of all D edges on some S-D path,or Step6. Disconnectivity,when it has verified the absence of all (2 ⊙ 2 ② edges on some S-D cut. We employ three terms,known edge,known non-edge and Fig.1.Illustration of the evolution of the optimal policy on a four node ER random graph.The line in red shows the edge selected potential edge (also sometimes called an "untested edge.").A for testing at each step.A full line indicates that the edge was known edge is one that has already been tested and found to found to exist after testing,while a dotted line indicates that the exist.Similarly,a known non-edge is one that has already been edge was found to not exist after testing.The policy terminates with tested but found to not exist.Finally,a potential edge is an the conclusion of either S-D connectivity (finding an S-D edge that is yet to be tested,which upon testing may turn out path)or S-D disconnectivity (finding an S-D cut).Possible terminations at Steps 1.3.4.5 and 6 are marked by an enclosing to exist or not exist.In Table I.we list other definitions used. square.For example,if edge 1D is found to exist at Step 3,then termination occurs by concluding connectivity as indicated by the square enclosing the graph on the left,while if it is found to not IV.POLICY FOR CHECKING S-D CONNECTIVITY IN ER exist,then the policy continues by testing edge 2D at Step 4. RANDOM GRAPHS We now present a policy*={rt,πt,.,π},which we call the alternating policy,for checking S-D connectivity a distance r from each other.Such an r is called the "range" in an ER random graph.We label the nodes other than the in a wireless network.Penrose [12]and Gupta and Kumar designated S and D as 1,2,...,n-2.Each maps Gt [13]have determined value of r as a function of n,for (defined in Table 1)to one of the remaining potential edges. values below which the network is connected,and for values It specifies which potential edge should be tested at time t,as above which the network is disconnected,with probability a function of what is known,denoted by 9t.about the random approaching one as noo.Xue and Kumar [15],and, graph at that time.Note that N =n(n-1)/2 is the upper subsequently,Balister,Bollobas,Sarkar and Walters [16]and bound on the number of edges in the graph,and thus the step Xue and Kumar [17],further considered the number of nearest by which the policy must necessarily have terminated. neighbors that nodes need to connect to,in order to ensure The policy has been defined in a brief manner in terms the network connectivity.Specifically.[15]proved that if each of the condensation multigraph in Section 1.Now we define node is connected to less than 0.074log n nearest neighbors, it in a detailed manner in terms of three rules more suitable then the network is asymptotically disconnected with prob- for illustration and proof. ability one as n increases,while if each node is connected to more than 5.1774log n nearest neighbors then the network is asymptotically connected with probability approaching one A.The Alternating Policy n* as n increases.The lower and upper bounds were further Rule 1:The policy tests a potential edge between Cs and improved to 0.3043logn and 0.5139logn,respectively [16].Cp as long as there exists one such direct,one-hop,potential The exact threshold function for 0-coverage [17]is found for edge between them.Clearly,if such a direct edge is found to wireless networks modeled as n points uniformly distributed exist,then the policy terminates with the finding that S and in a unit square,with every node connecting to its o nearest D are connected
3 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 S D 1 2 Fig. 1. Illustration of the evolution of the optimal policy on a fournode ER random graph. The line in red shows the edge selected for testing at each step. A full line indicates that the edge was found to exist after testing, while a dotted line indicates that the edge was found to not exist after testing. The policy terminates with the conclusion of either S − D connectivity (finding an S − D path) or S − D disconnectivity (finding an S − D cut). Possible terminations at Steps 1, 3, 4, 5 and 6 are marked by an enclosing square. For example, if edge 1D is found to exist at Step 3, then termination occurs by concluding connectivity as indicated by the square enclosing the graph on the left, while if it is found to not exist, then the policy continues by testing edge 2D at Step 4. a distance r from each other. Such an r is called the “range” in a wireless network. Penrose [12] and Gupta and Kumar [13] have determined value of r as a function of n, for values below which the network is connected, and for values above which the network is disconnected, with probability approaching one as n → ∞. Xue and Kumar [15], and, subsequently, Balister, Bollobas, Sarkar and Walters [16] and Xue and Kumar [17], further considered the number of nearest neighbors that nodes need to connect to, in order to ensure the network connectivity. Specifically, [15] proved that if each node is connected to less than 0.074log n nearest neighbors, then the network is asymptotically disconnected with probability one as n increases, while if each node is connected to more than 5.1774log n nearest neighbors then the network is asymptotically connected with probability approaching one as n increases. The lower and upper bounds were further improved to 0.3043log n and 0.5139log n, respectively [16]. The exact threshold function for θ-coverage [17] is found for wireless networks modeled as n points uniformly distributed in a unit square, with every node connecting to its ϕn nearest neighbors. The condition of full connectivity was relaxed by Dousse et. al. [18], which analyzed the scenario where the sink is connected (in a multihop fashion) to a set of nodes that span the entire network. The impact of node degree, density, network dimension as well as boundary effect on connectivity is also investigated [14], [19]- [22]. Due to the emergence of ad hoc networks, mesh networks and sensor networks, there has also been research interest in analysis of k-connectivity [23]- [30], where a network is k-connected if it contains at least k independent paths between any pair of distinct nodes. III. MODEL AND DEFINITIONS Consider an ER Random Graph with n nodes. An edge exists between any pair of nodes with equal probability p, independently of others. At each step, one can test a potential (i.e., untested) edge of the graph to see if it indeed exists. Two specific nodes are identified, labeled S and D. Our objective is to construct a sequential testing strategy terminating in the minimal expected number of steps, to determine if S and D are connected or not. Termination occurs with • Connectivity, when it has verified the existence of all edges on some S-D path, or • Disconnectivity, when it has verified the absence of all edges on some S-D cut. We employ three terms, known edge, known non-edge and potential edge (also sometimes called an “untested edge.”). A known edge is one that has already been tested and found to exist. Similarly, a known non-edge is one that has already been tested but found to not exist. Finally, a potential edge is an edge that is yet to be tested, which upon testing may turn out to exist or not exist. In Table I, we list other definitions used. IV. POLICY FOR CHECKING S-D CONNECTIVITY IN ER RANDOM GRAPHS We now present a policy π ∗ = {π ∗ 0 , π∗ 1 , . . . , π∗ N }, which we call the alternating policy, for checking S − D connectivity in an ER random graph. We label the nodes other than the designated S and D as 1, 2, . . . , n − 2. Each π ∗ t maps Gt (defined in Table 1) to one of the remaining potential edges. It specifies which potential edge should be tested at time t, as a function of what is known, denoted by Gt, about the random graph at that time. Note that N = n(n − 1)/2 is the upper bound on the number of edges in the graph, and thus the step by which the policy must necessarily have terminated. The policy π ∗ has been defined in a brief manner in terms of the condensation multigraph in Section 1. Now we define it in a detailed manner in terms of three rules more suitable for illustration and proof. A. The Alternating Policy π ∗ Rule 1: The policy tests a potential edge between CS and CD as long as there exists one such direct, one-hop, potential edge between them. Clearly, if such a direct edge is found to exist, then the policy terminates with the finding that S and D are connected
TABLE I NOTATIONS For convenience of definition of Rule 3 below,we also list the set of components {C1,C2:...C}(0<r <n-2)to Notation Definition which the edges belonging to this list C are connecting.Let us et The edge tested at step t. call this set of components as C,i.e.,C:={C1,C2,...,Cr}. The graph state known at step t.It is Figure 2(b)illustrates this set C of components too. a list of (i)those potential edges that have been tested and found to exist. Rule 3:In Rule 3 we further sharpen Rule 2 by specifying i.e,“known edges,”(ii)those potential which particular edge in C should be tested. edges that were tested and found to not Rule 2 when followed will lead to the following structural exist,called“non-edges,.”and(iii)the property,as we prove in Lemma 2.All edges in C are either remaining potential edges that have not between Cs and UCi,or all edges are between CD and yet been tested.G is abbreviated to g if there is no confusion (similarly for UC.(Certainly this is true at the outset for the ER graph, the notation below). and if an edge is chosen from C,as in Rule 2,then it retains The connected component containing that property).Suppose all edges in C connect to Cs,i.e., the source (or destination,or the i-th have one endpoint in Cs.Then,for each Ci,let ni be the component not containing the source or number of direct potential edges between Ci and CD.Sort Cs.t (or CD.t.Or Ci.) destination),in the graph comprised of the known edges at step t.Abbreviated the components in C based on ni in decreasing order. to Cs.Cp and Ci when there is no Suppose that n is the largest.Test any edge in C that confusion. connects Cs to C1.This is illustrated in Figure 2(c). A minimum cut for a graph state g, Algorithm 1 illustrates an algorithmic specification of the defined as a set which contains the proposed Alternating Policy.We note that due to the neat graph minimum number of potential edges at time t.which if they were found to not structure (all components other than Cs and Cp remain as exist,would lead to the conclusion that singletons)maintained under the edge testing process of the Cs and Cp are not connected by a policy,the algorithm can be efficiently implementable,with a path.Note that a minimum cut may be per-step computational complexity of 30log2 n to determine non-unique. which edge to be tested next,as we will disclose and prove in Lemma 5 in Section V. Rule 2:If there are no potential edges connecting Cs and Cp as specified in Rule 1,then choose an edge from the list C defined as follows. First list all the paths,comprised of known or potential edges,connecting Cs to Cp with the minimum number of potential edges.One such path is illustrated in Figure 2(a).Such a path must necessarily have potential edges only between connected components.If the path traverses only one component,say C1,on its path from Cs to Cp,then it must have exactly two potential edges on it,one connecting Cs to 月之几23之几32…≥n C1,and another connecting Ci to CD,as shown in Figure 2(a). (a (b) (c) The remaining portion of the path must only consist of known A known edge (inside Cs,CD,Ci,ete) A known non-edge edges within connected components Cs,C1 or Cp.The same A potential edge A potential edge along rule holds on the remaining potential shortest paths,each one shortest path traversing through only one component,say Ci(2<i<r), A potential edge that belongs to a minimum cut as well as a on its path from Cs to Cp. shortest path. Fix a minimum cut for the know state of the graph Fig.2.The illustration of Rules 2 and 3. g at that time.Figure 2(a)shows two cuts,one separating Cs from C1UC2U...Cr,and another separating CD from C1UC2U...Cr.(Suppose that,as in the case illustrated, the former is a minimum cut,and the latter contains more V.THE PROOF OF OPTIMALITY OF THE ALTERNATING edges.In the sequel we will show that shortest paths are of POLICY T* length at most two,and minimum cuts will be precisely of We now commence the proof of optimality of the Alter- this form,as long as the policy keeps testing the edges in the nating Policy Note that *is composed by three rules prescribed way.Obviously,it is true in the outset of an ER prescribed in Section III.A.Therefore,the problem of proving graph,where every component is a singleton with the shortest the optimality of the policy can be converted into that of path connecting S to D being of length no more than 2.) proving optimality of each of the three rules.In order to solve Among all the potential edges that are on the aforemen- this,we further partition the proof into three parts,the proofs tioned shortest paths,list only those potential edges that are of Rules 1,2,and 3,respectively.We will consider the graph also in the chosen minimum set..Define this as the list shown in Figure 2(c);the general case proceeds similarly.The C.Such a list is shown in Figure 2(b). edges meeting Rules 2 and 3 are the potential edges between
4 TABLE I NOTATIONS Notation Definition et The edge tested at step t. Gt The graph state known at step t. It is a list of (i) those potential edges that have been tested and found to exist, i.e., “known edges,” (ii) those potential edges that were tested and found to not exist, called “non-edges,” and (iii) the remaining potential edges that have not yet been tested. Gt is abbreviated to G if there is no confusion (similarly for the notation below). CS,t (or CD,t, or Ci,t) The connected component containing the source (or destination, or the i-th component not containing the source or destination), in the graph comprised of the known edges at step t. Abbreviated to CS, CD and Ci when there is no confusion. MG A minimum cut for a graph state G, defined as a set which contains the minimum number of potential edges at time t, which if they were found to not exist, would lead to the conclusion that CS and CD are not connected by a path. Note that a minimum cut may be non-unique. Rule 2: If there are no potential edges connecting CS and CD as specified in Rule 1, then choose an edge from the list L defined as follows. First list all the paths, comprised of known or potential edges, connecting CS to CD with the minimum number of potential edges. One such path is illustrated in Figure 2(a). Such a path must necessarily have potential edges only between connected components. If the path traverses only one component, say C1, on its path from CS to CD, then it must have exactly two potential edges on it, one connecting CS to C1, and another connecting C1 to CD, as shown in Figure 2(a). The remaining portion of the path must only consist of known edges within connected components CS, C1 or CD. The same rule holds on the remaining potential shortest paths, each traversing through only one component, say Ci (2 ≤ i ≤ r), on its path from CS to CD. Fix a minimum cut MG for the know state of the graph G at that time. Figure 2(a) shows two cuts, one separating CS from C1 ∪ C2 ∪ . . . Cr, and another separating CD from C1 ∪ C2 ∪ . . . Cr. (Suppose that, as in the case illustrated, the former is a minimum cut, and the latter contains more edges. In the sequel we will show that shortest paths are of length at most two, and minimum cuts will be precisely of this form, as long as the policy keeps testing the edges in the prescribed way. Obviously, it is true in the outset of an ER graph, where every component is a singleton with the shortest path connecting S to D being of length no more than 2.) Among all the potential edges that are on the aforementioned shortest paths, list only those potential edges that are also in the chosen minimum set MG. Define this as the list L. Such a list is shown in Figure 2(b). For convenience of definition of Rule 3 below, we also list the set of components {C1, C2, . . . , Cr} (0 ≤ r ≤ n − 2) to which the edges belonging to this list L are connecting. Let us call this set of components as C, i.e., C := {C1, C2, ..., Cr}. Figure 2(b) illustrates this set C of components too. Rule 3: In Rule 3 we further sharpen Rule 2 by specifying which particular edge in L should be tested. Rule 2 when followed will lead to the following structural property, as we prove in Lemma 2. All edges in L are either between CS and U r i=1Ci , or all edges are between CD and U r i=1Ci . (Certainly this is true at the outset for the ER graph, and if an edge is chosen from L, as in Rule 2, then it retains that property). Suppose all edges in L connect to CS, i.e., have one endpoint in CS. Then, for each Ci , let ni be the number of direct potential edges between Ci and CD. Sort the components in C based on ni in decreasing order. Suppose that n1 is the largest. Test any edge in L that connects CS to C1. This is illustrated in Figure 2(c). Algorithm 1 illustrates an algorithmic specification of the proposed Alternating Policy. We note that due to the neat graph structure (all components other than CS and CD remain as singletons) maintained under the edge testing process of the policy, the algorithm can be efficiently implementable, with a per-step computational complexity of 30 log2 n to determine which edge to be tested next, as we will disclose and prove in Lemma 5 in Section V. !"#$%&$'()!%*+%! !,&#-&!&#&.%*+%! & &U &6 &' & & &U &6 &' & & &U / n n r n ! " r n n n n !" #" $" &6 &' & !"#$%&$'()!%*+%!$,($!-%)#&+.!$#!(!/'&'/0/!10$!(.!2%))!(.!(! .,#3$%.$!"($,4 !"#$%&$'()!%*+%!()#&+! #&%!.,#3$%.$!"($, !5&!%*+%! 6'&.'*% 789!7:ˈ7'9!%$1; Fig. 2. The illustration of Rules 2 and 3. V. THE PROOF OF OPTIMALITY OF THE ALTERNATING POLICY π ∗ We now commence the proof of optimality of the Alternating Policy π ∗ . Note that π ∗ is composed by three rules prescribed in Section III. A. Therefore, the problem of proving the optimality of the policy can be converted into that of proving optimality of each of the three rules. In order to solve this, we further partition the proof into three parts, the proofs of Rules 1, 2, and 3, respectively. We will consider the graph shown in Figure 2(c); the general case proceeds similarly. The edges meeting Rules 2 and 3 are the potential edges between
Algorithm 1 The Optimal Testing Algorithm in particular those that traverse two classes,say Ci and Cj. Input:ER graph G(V,E),source S and destination D,the rest of will need to be tested,in addition to those between Cs and the the nodes labeled as 1,2,... components in Ci,and those between Cp and the components Output:The optimal testing policy 1:Initialize:Connected component Cs :={S).Cp:={D), in Ci.Therefore,extra steps are wasted on testing the edges C={1},C2={2},. between components.However,those edges will never need 2:Repeat until the S-D connectivity is determined. to be tested when a cut is chosen to make all the components 3:if a potential edge e exists between Cs and Cp then lie within one class.Also notice that according to Rule 2,a 4: test e. shorter path length of two can be selected if the cut is chosen 5:else 6: Ns :the number of potential edges between Cs and V\Cs. to make all the components to lie within one class while the 7: Np :the number of potential edges between Cp and path generated via components in two classes are of length V\CD. three. 8: ifNs≤Vp then Therefore,to prove Rule 2 in our scenario we only need to 9 i'=arg maxifthe number of potential edges between prove Lemma 3 stated as follows. CD and Ci. 10: Test an edge e between Cs and Ci-. Lemma 3:When there are no direct edges between Cs 11: if e exists then and Cp,listing all the potential shortest paths and sampling 12: Cs :Cs UCi the edges in the minimum set on them will lead to smaller 13: Relabel the connected components C1,C2,... expected cost than sampling any other edge first. 14 else 15: i"=arg max;fthe number of potential edges between Proof:See Appendix B. Cs and Ci. 16: Test an edge e between Cp and Ci 17: if e exists then B.Proof of Optimality of Rule 3 18: CD:=CD UCi+. The main idea of Rule 3 is that among all the components 19: Relabel the connected components C1,C2,... except Cs and Cp from the set [C1,...,Cr}at step t,the expected cost will be the smallest if we first pick the one with the largest number of direct edges to the other side. Cs and C1.Note that compared to the other Ci's,Ci has the Therefore,proving Rule 3 is equivalent to proving that in greatest chance of subsequently having an edge connected to choosing between any two components from [C1,...,Cr}at CD,if an edge is found between C1 and Cs.The main idea in step t,the expected cost will be smaller if we first pick the each proof is to introduce two policies,one always following component with a larger number of direct edges to the other the specified rule while the other one violates that rule once side.Without loss of generality,let us suppose that there are but obeys to the rule on the very next step.The optimality of only two components,C1 and C2.Based on Lemma 2,all the rule is proved by adopting two major techniques,i.e.,a the edges in the minimum cut.are always on the same carefully chosen stochastic coupling in several places,and the side.Let k and k21 be the number of edges fromg.that proof is completed by an induction. connect to C and C2,respectively,as shown in Figure 3(a). Similarly,let k12 and k22 be the number of direct edges Ci and C2 have to Cp.Assuming that k12 k22 in Figure 3,we A.Proof of Optimality of Rules I and 2 have the following lemma regarding Rule 3. The optimality of Rule 1 follows from the following: Lemma 4:Among all the potential edges that C and C2 Lemma I:Suppose that there are direct potential edges connect to in,it incurs smaller expected cost to first test between Cs and Cp.Then for any policy A that tests an edge the ones that Ci is connected to. other than such a direct potential edge,there is a policy A that Proof:We prove this by induction on the number of does test a direct potential edge and has a lower expected cost potential edges,in the graph of the type that results at each than A. step from following Rules 1 and 2.Based on Lemma 2,such Proof:See Appendix A. ■ type of graph contains singleton components with the edges Next,assuming that Rule 1 is always followed,we prove in the minimum cut all being between Cs and singletons,or the optimality of Rule 2.The following lemma establishes a all being between Co and singletons.Clearly,Rules 1,2 and preliminary property. 3 are all true for the graph with 2 potential edges.Assume Lemma 2:When the policy follows both Rules 1 and 2,all that Rule 3 is true for graphs with k potential edges.We now the edges in the minimum cut at any step will be between consider the case where the graph has k+1 edges.Similar U=Ci and Cs,or they will all be between U=Ci and Cp.to the proof of Rule 1,we introduce two policies A and A, Proof:A minimum set partitions the connected compo-both of which always follow Rules 1 and 2.A first tests an nents into two classes,with one class containing Cs and the edge connecting Cs to C1,as prescribed by Rule 3.If it finds other class containing Cp.An equivalent claim of the lemma an edge,then by Rule 1 it subsequently tests edges from Cl is that at any step t,as long as the policy follows Rules to CD.If not,according to Rule 2,the same cut continues to 1 and 2,all the components other than Cs or Cp will lie be minimum,and by induction,since the number of potential within one class.Suppose that at a certain step not all the edges is reduced by one after the first test,it continues to components lie in the same class.Then,in order to determine test the remaining edges between C and Cs.In contrast,the the S-D disconnectivity,the edges between components, policy A violates Rule 3 on the first test by testing an edge
5 Algorithm 1 The Optimal Testing Algorithm Input: ER graph G(V, E), source S and destination D, the rest of the nodes labeled as 1, 2, . . . Output: The optimal testing policy 1: Initialize: Connected component CS := {S}, CD := {D}, C1 := {1}, C2 := {2}, . . . 2: Repeat until the S-D connectivity is determined. 3: if a potential edge e exists between CS and CD then 4: test e. 5: else 6: NS := the number of potential edges between CS and V \CS. 7: ND := the number of potential edges between CD and V \CD. 8: if NS ≤ ND then 9: i ∗ = arg maxi{the number of potential edges between CD and Ci}. 10: Test an edge e between CS and Ci∗ . 11: if e exists then 12: CS := CS ∪ Ci∗ . 13: Relabel the connected components C1, C2, . . . 14: else 15: i ∗ = arg maxi{the number of potential edges between CS and Ci}. 16: Test an edge e between CD and Ci∗ . 17: if e exists then 18: CD := CD ∪ Ci∗ . 19: Relabel the connected components C1, C2, . . . CS and C1. Note that compared to the other Ci’s, C1 has the greatest chance of subsequently having an edge connected to CD, if an edge is found between C1 and CS. The main idea in each proof is to introduce two policies, one always following the specified rule while the other one violates that rule once but obeys to the rule on the very next step. The optimality of the rule is proved by adopting two major techniques, i.e., a carefully chosen stochastic coupling in several places, and the proof is completed by an induction. A. Proof of Optimality of Rules 1 and 2 The optimality of Rule 1 follows from the following: Lemma 1: Suppose that there are direct potential edges between CS and CD. Then for any policy A that tests an edge other than such a direct potential edge, there is a policy Ae that does test a direct potential edge and has a lower expected cost than A. Proof: See Appendix A. Next, assuming that Rule 1 is always followed, we prove the optimality of Rule 2. The following lemma establishes a preliminary property. Lemma 2: When the policy follows both Rules 1 and 2, all the edges in the minimum cut at any step will be between ∪ r i=1Ci and CS, or they will all be between ∪ r i=1Ci and CD. Proof: A minimum set partitions the connected components into two classes, with one class containing CS and the other class containing CD. An equivalent claim of the lemma is that at any step t, as long as the policy follows Rules 1 and 2, all the components other than CS or CD will lie within one class. Suppose that at a certain step not all the components lie in the same class. Then, in order to determine the S − D disconnectivity, the edges between components, in particular those that traverse two classes, say Ci and Cj , will need to be tested, in addition to those between CS and the components in Ci , and those between CD and the components in Cj . Therefore, extra steps are wasted on testing the edges between components. However, those edges will never need to be tested when a cut is chosen to make all the components lie within one class. Also notice that according to Rule 2, a shorter path length of two can be selected if the cut is chosen to make all the components to lie within one class while the path generated via components in two classes are of length three. Therefore, to prove Rule 2 in our scenario we only need to prove Lemma 3 stated as follows. Lemma 3: When there are no direct edges between CS and CD, listing all the potential shortest paths and sampling the edges in the minimum set on them will lead to smaller expected cost than sampling any other edge first. Proof: See Appendix B. B. Proof of Optimality of Rule 3 The main idea of Rule 3 is that among all the components except CS and CD from the set {C1, . . . , Cr} at step t, the expected cost will be the smallest if we first pick the one with the largest number of direct edges to the other side. Therefore, proving Rule 3 is equivalent to proving that in choosing between any two components from {C1, . . . , Cr} at step t, the expected cost will be smaller if we first pick the component with a larger number of direct edges to the other side. Without loss of generality, let us suppose that there are only two components, C1 and C2. Based on Lemma 2, all the edges in the minimum cut MGt are always on the same side. Let k11 and k21 be the number of edges from MGt that connect to C1 and C2, respectively, as shown in Figure 3(a). Similarly, let k12 and k22 be the number of direct edges C1 and C2 have to CD. Assuming that k12 ≥ k22 in Figure 3, we have the following lemma regarding Rule 3. Lemma 4: Among all the potential edges that C1 and C2 connect to in MGt , it incurs smaller expected cost to first test the ones that C1 is connected to. Proof: We prove this by induction on the number of potential edges, in the graph of the type that results at each step from following Rules 1 and 2. Based on Lemma 2, such type of graph contains singleton components with the edges in the minimum cut all being between CS and singletons, or all being between CD and singletons. Clearly, Rules 1, 2 and 3 are all true for the graph with 2 potential edges. Assume that Rule 3 is true for graphs with k potential edges. We now consider the case where the graph has k + 1 edges. Similar to the proof of Rule 1, we introduce two policies Ae and A, both of which always follow Rules 1 and 2. Ae first tests an edge connecting CS to C1, as prescribed by Rule 3. If it finds an edge, then by Rule 1 it subsequently tests edges from C1 to CD. If not, according to Rule 2, the same cut continues to be minimum, and by induction, since the number of potential edges is reduced by one after the first test, it continues to test the remaining edges between C1 and CS. In contrast, the policy A violates Rule 3 on the first test by testing an edge
D 0 (a) D D D @ D (b) k D D D k D s D s D D D D 尚 ② c (d9 a known non-edge A D D D D -apotential edge in a potential edge in B ------a potential edge in a potential edge in -a potential edge ing a potential edge in入 D D s D D a potential edge in y …a potential edge in n K2I四 K2© la=B=n=10=E=k22 161=k2-k2 l=k-1lyl=k-1 (b)(E,E)(c)EE)(d)(E,E9 (e)(Ec,C) The potential edges in the minimum cut at a certain step in Fig.4.The stochastic coupling of the edges under policy A(shown the evolution of the system in (a)(c))and policy A (shown in (b)(d)). A known edge at a certain step in the evolution of the system A known non-edge at a certain step in the evolution of the system Fig.3.The illustration of the four cases (E).(e,ce).(Ee)and(eE). 6,入,and n under policy A are the same as those in 6,入,y and n under policy A,as shown in Figures 4(c)and(d).The edges with different labels are defined in Table II.Since testing connecting to C2,and then follows the optimal policy for the each potential edge has two possible outcomes depending graph with k edges.We have four possible cases,which are on whether it exists or not,the corresponding sample paths listed as follows: generated under A and A are illustrated in Figures 5(a)and E:A finds an edge between Cs and C1,and subsequently (b),respectively.The nodes in the figures represent the tested no edges between C and Cp,and then the minimum cut turns edges,while the outcome of each tested edge is indicated by a out to be on the same side as Cs;or A finds an edge between label,1 if it exists,or 0 otherwise.Let PD.i (i=1,2,...)be Cs and Ci and subsequently an edge between Ci and Cp;or the ith path where Cs and Cp are found to be disconnected A finds no edges between Cs and Ci. under A (A),and let Pc.i be (i=1,2,...)the ith path where EcA:finds an edge between Cs and C1,and subsequently Cs and Cp are found to be connected under A(A).Note that no edges between Ci and Cp,and the minimum cut therefore the paths with the same label have the same probability.As an subsequently switches to the side of Cp. example,consider paths labeled as Pp.4 in both Figures 5(a) E:A finds an edge between Cs and C2,and subsequently and (b).We can see that the outcome 0 from node a in Figure no edges between C2 and Cp,and then the minimum cut turns 5(a)has the same probability as the outcome 0 from nodes B out to be on the same side as Cs:or A finds an edge between in Figure 5(b).The outcome 1 from node B in Figure 5(a)has Cs and C2 and subsequently an edge between C2 and Cp;or the same probability as the outcome I from node a in Figure A finds no edges between Cs and C2. 5(b).Nodes a and B have the same number of edges,and the Ec:A finds an edge between Cs and C2.and subsequently outcomes of the edges from nodes 6,6,and A are the same in no edges between C2 and Cp,and the minimum cut therefore both figures.Another example is the paths labeled with Pc.2 subsequently switches to the side of Cp. in both figures.It can be seen that the outcome 1 from node The four cases,i.e..(.E).(e,Ee),(.ge)and (E)in Figure 5(a)has the same probability as the outcome 1 from are illustrated in Figures 3(b),(c),(d)and (e),respectively.node s in Figure 5(b).Nodes and s have the same number of Consider Figure 3(b),which is the case (E,E).We will prove edges,and the outcomes of the edges emanating from nodes using a stochastic coupling argument that testing Ci first leads a and A are the same in both figures.Similar relationships to smaller expected cost.We couple the edges labeled with hold for the remaining paths with the same labels in Figures the same symbol (e.g.,B)tested under A and A,as shown in 4(a)and(b),and we therefore omit their explanations.As for Figures 4(a).(b),(c)and(d).Note in particular that,edges in the paths PD.1 and PD.2 in Figure 5(a)and paths Pp.1 and a and B under policy A are switched under policy A,and PD.2 in Figure 5(b),it is trivially true that any terminating edges in 6 and s under policy A are also switched under time belonging to the range [k12+2,1+k11+k12]under policy A.as shown in Figures 4(a)and(b).The edges in the paths PD.1 and PD.2 can also be found under the paths
6 The potential edges in the minimum cut at a certain step in the evolution of the system A known edge at a certain step in the evolution of the system A known non-edge at a certain step in the evolution of the system S D 1 2 S D 1 2 S D 1 2 S D 1 2 11 k 11 k 12 k 12 k 12 k 12 k 21 k 21 k 22 k 22 k 22 k 22 k S D 1 2 S D 1 2 S D 1 2 S D 1 2 11 k 11 k 12 k 12 k 12 k 12 k 21 k 21 k 22 k 22 k 22 k 22 k S D 1 2 11 k 12 k 21 k 22 k S D 1 S D 1 11 k S D 2 S D 2 21 k S D 1 S D 1 11 k S D 2 S D 2 21 k S D 1 S D 1 11 k S D 2 S D 2 S D 2 S D 2 S D 1 S D 1 11 k Fig. 3. The illustration of the four cases (Ee,E), (Eec ,E c ), (Ee,E c ) and (Eec ,E). connecting to C2, and then follows the optimal policy for the graph with k edges. We have four possible cases, which are listed as follows: Ee: Ae finds an edge between CS and C1, and subsequently no edges between C1 and CD, and then the minimum cut turns out to be on the same side as CS; or Ae finds an edge between CS and C1 and subsequently an edge between C1 and CD; or Ae finds no edges between CS and C1. Eec Ae: finds an edge between CS and C1, and subsequently no edges between C1 and CD, and the minimum cut therefore subsequently switches to the side of CD. E: A finds an edge between CS and C2, and subsequently no edges between C2 and CD, and then the minimum cut turns out to be on the same side as CS; or A finds an edge between CS and C2 and subsequently an edge between C2 and CD; or A finds no edges between CS and C2. E c : A finds an edge between CS and C2, and subsequently no edges between C2 and CD, and the minimum cut therefore subsequently switches to the side of CD. The four cases, i.e., (Ee,E), (Eec ,E c ), (Ee,E c ) and (Eec ,E) are illustrated in Figures 3(b), (c), (d) and (e), respectively. Consider Figure 3(b), which is the case (Ee,E). We will prove using a stochastic coupling argument that testing C1 first leads to smaller expected cost. We couple the edges labeled with the same symbol (e.g., β) tested under Ae and A, as shown in Figures 4 (a), (b), (c) and (d). Note in particular that, edges in α and β under policy Ae are switched under policy A, and edges in θ and ε under policy Ae are also switched under policy A, as shown in Figures 4 (a) and (b). The edges in 1 2 ! " # # #1 22 ! " " k 12 22 ! k k " 11 ! k "1 21 ! " k 1 (a) (b) 1 2 2 (c) (d) 2 a known non-edge a potential edge in ! " # " ! # $ ! % " ' & ' # $ " & % ! # ! % " $ ' & a potential edge in a potential edge in a potential edge in a potential edge in a potential edge in a potential edge in a potential edge in Fig. 4. The stochastic coupling of the edges under policy Ae (shown in (a)(c)) and policy A (shown in (b)(d)). δ, λ, γ and η under policy Ae are the same as those in δ, λ, γ and η under policy A, as shown in Figures 4 (c) and (d). The edges with different labels are defined in Table II. Since testing each potential edge has two possible outcomes depending on whether it exists or not, the corresponding sample paths generated under Ae and A are illustrated in Figures 5 (a) and (b), respectively. The nodes in the figures represent the tested edges, while the outcome of each tested edge is indicated by a label, 1 if it exists, or 0 otherwise. Let PD,i (i = 1, 2, . . .) be the ith path where CS and CD are found to be disconnected under Ae (A), and let PC,i be (i = 1, 2, . . .) the ith path where CS and CD are found to be connected under Ae (A). Note that the paths with the same label have the same probability. As an example, consider paths labeled as PD,4 in both Figures 5(a) and (b). We can see that the outcome 0 from node α in Figure 5(a) has the same probability as the outcome 0 from nodes β in Figure 5(b). The outcome 1 from node β in Figure 5(a) has the same probability as the outcome 1 from node α in Figure 5(b). Nodes α and β have the same number of edges, and the outcomes of the edges from nodes θ, δ, ε and λ are the same in both figures. Another example is the paths labeled with PC,2 in both figures. It can be seen that the outcome 1 from node θ in Figure 5(a) has the same probability as the outcome 1 from node ε in Figure 5(b). Nodes θ and ε have the same number of edges, and the outcomes of the edges emanating from nodes α and λ are the same in both figures. Similar relationships hold for the remaining paths with the same labels in Figures 4 (a) and (b), and we therefore omit their explanations. As for the paths PeD,1 and PeD,2 in Figure 5(a) and paths PbD,1 and PbD,2 in Figure 5(b), it is trivially true that any terminating time belonging to the range [k12 + 2, 1 + k11 + k12] under the paths PeD,1 and PeD,2 can also be found under the paths
P.and PD.2.Therefore,the only paths that have different PD.I probabilities are those shown in the form of bold lines under both A and A (specifically,path abe6 under A,and paths 0 0 0-1 aθeB6,a0eBeλ5 and aλBs6 under A.).Obviously,a0e dominates the paths a036,aoBe6 and aABe6,which indicates that testing C1 first will lead to smaller expected cost.This completes the proof of the case in Figure 3(b). 0 a 0 TABLE II DEFINITIONS OF THE EDGES FOR COUPLING A AND A Notation Definition a The first potential edge that policy A(A)tests. a B The first potential edge that policy A(A)tests between Cs and C2(C1). The first k22 potential edges that policy A(A) tests after an edge is found between C(C2)and Cs.Recall that K12 22. k22 potential edges to be tested by policy A(A) after B is found to exist. The k12-K22 remaining potential edges yet to d be tested by policy A(A)between CD and C1. a after no edges are found to exist in 6(e). The ki1-1 remaining potential edges yet to be tested by policy A (A)between Cs and C1,after end the first edge between Cs and Ci is found to not exist. The k21-1 remaining potential edges between Cs and C2. The potential edge between Ci and C2. (b) Next we turn to the case (c,E)shown in Figure 3(c).The Fig.5.Paths generated by the coupling of A(a)and A(b)for the case shown in Figure 3(b). paths generated under A and A using coupling are illustrated in Figures 6(a)and(b),respectively.Again,the paths with the same label have the same probability,and they can be checked leads to smaller cost than A does in case(E,Ee).Now consider as in the aforementioned case(.E).As for the paths PD.1 the last case,(E,E).This case implies that the minimum cut and PD.2 in Figure 6(a)and paths PD.and PD.2 in Figure switches to the side of Cp,after an edge is found between 6(b),it is trivially true that any terminating time belonging to C1 and Cs but no edges are found between C1 and Cp by the range[2+k21+k22+2,1+k12+k21+k22+1] A.According to Rule 2,it incurs smaller expected cost if A under the paths Pp.1 and PD.2 can also be found under the subsequently tests the edges between C2 and Cp rather than paths PD.1 and PD.2.The same conclusion also holds for those between C2 and Cs.Since we have already proved that the paths PC.1 and Pc.2 in Figure 6(a),and paths Pc.and the cost is smaller under policy A in case (E,E).it follows Pc in Figure 6(b).Hence,we highlight the paths that will that by transitivity that A leads to smaller expected cost than terminate in different time with bold lines under both A and A in case (c,E). ◆ A (specifically path aoc6 under A and paths a0c6B,aoBe In addition to the optimality proved above,implementing the and aABe6 under A.).Obviously,a06 terminates earlier policy also incurs a low computational complexity,as stated than a0δB,a0eδBcλas well as aABe6,which indicates below: that testing C first will lead to smaller expected cost.This completes the proof of the case in Figure 3(c). Lemma 5:The optimal policy is implementable with a Now we proceed to prove cases (E,Ee)and(Ec,E),shown computational complexity of no more than 30 log2 n oper- in Figures 3(d)and 3(e),respectively.We first consider case ations at each step to determine which edge is to be tested (E,e).This case implies that the minimum cut turns out to next. be on the same side of Cs,after an edge is found between C1 and Cs but no edges are found between Ci and Cp by Proof:The low complexity of the policy,of no more A.Due to Rule 2,it leads to smaller expected cost forA to than 30log2n operations per step.follows from the fact that subsequently test the edges between C2 and Cs rather than except for Cs and Cp,all other components are singletons, those between C2 and Cp.This means the expected cost of and moreover they too are of only three kinds,either having case(E,Ec)is smaller than that of case(Ec,Ec).Moreover,we a known non-edge to S,or a known non-edge to D,or have already proved that the expected cost is smaller under neither.Hence all computations involving these components policy A in case (c,).By transitivity,we conclude that A are extremely simple.According to the three rules,at each
7 PbD,1 and PbD,2. Therefore, the only paths that have different probabilities are those shown in the form of bold lines under both Ae and A (specifically, path αθc δ under Ae, and paths αθcβδ, αθcβ cλδ and α cλ cβεc δ under A.). Obviously, αθc δ dominates the paths αθcβδ, αθcβ cλδ and α cλ cβεc δ, which indicates that testing C1 first will lead to smaller expected cost. This completes the proof of the case in Figure 3(b). TABLE II DEFINITIONS OF THE EDGES FOR COUPLING Ae AND A Notation Definition α The first potential edge that policy Ae (A) tests. β The first potential edge that policy Ae (A) tests between CS and C2 (C1). θ The first k22 potential edges that policy Ae (A) tests after an edge is found between C1 (C2) and CS. Recall that k12 ≥ k22. ε k22 potential edges to be tested by policy Ae (A) after β is found to exist. δ The k12 − k22 remaining potential edges yet to be tested by policy Ae (A) between CD and C1, after no edges are found to exist in θ (ε). λ The k11 − 1 remaining potential edges yet to be tested by policy Ae (A) between CS and C1, after the first edge between CS and C1 is found to not exist. γ The k21 − 1 remaining potential edges between CS and C2. η The potential edge between C1 and C2. Next we turn to the case (Eec ,E c ) shown in Figure 3(c). The paths generated under Ae and A using coupling are illustrated in Figures 6(a) and (b), respectively. Again, the paths with the same label have the same probability, and they can be checked as in the aforementioned case (Ee,E). As for the paths PeD,1 and PeD,2 in Figure 6(a) and paths PbD,1 and PbD,2 in Figure 6(b), it is trivially true that any terminating time belonging to the range [k12 + k21 + k22 + 2, k11 + k12 + k21 + k22 + 1] under the paths PeD,1 and PeD,2 can also be found under the paths PbD,1 and PbD,2. The same conclusion also holds for the paths PeC,1 and PeC,2 in Figure 6(a), and paths PbC,1 and PbC,2 in Figure 6(b). Hence, we highlight the paths that will terminate in different time with bold lines under both Ae and A (specifically path αθc δ under Ae and paths αθc δβ, αθc δβcλ and α cλ cβεc δ under A.). Obviously, αθc δ terminates earlier than αθc δβ, αθc δβcλ as well as α cλ cβεc δ, which indicates that testing C1 first will lead to smaller expected cost. This completes the proof of the case in Figure 3(c). Now we proceed to prove cases (Ee,E c ) and (Eec ,E), shown in Figures 3(d) and 3(e), respectively. We first consider case (Ee,E c ). This case implies that the minimum cut turns out to be on the same side of CS, after an edge is found between C1 and CS but no edges are found between C1 and CD by Ae. Due to Rule 2, it leads to smaller expected cost for Ae to subsequently test the edges between C2 and CS rather than those between C2 and CD. This means the expected cost of case (Ee,E c ) is smaller than that of case (Eec ,E c ). Moreover, we have already proved that the expected cost is smaller under policy Ae in case (Eec ,E c ). By transitivity, we conclude that Ae end end end end end end end end end end end end end end end end end end end end end end end PD,1 PD,2 PD,3 PD,4 PD,1 PC,1 PC,2 PC,3 PC,4 PC,5 end PC,6 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 0 1 0 0 1 end end PC,1 PC,2 PC,3 PD,3 PC,4 PC,5 PC,6 PD,2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 PD4 PD 1 ˈ 1 ˆPDˈ 2 ˆPDˈ (a) (b) PD 2 ˈ ! # " " $ ! $ ! % % % & # ! & % $ $ $ $ " % % % ! # Fig. 5. Paths generated by the coupling of Ae (a) and A (b) for the case shown in Figure 3(b). leads to smaller cost than A does in case (Ee,E c ). Now consider the last case, (Eec ,E). This case implies that the minimum cut switches to the side of CD, after an edge is found between C1 and CS but no edges are found between C1 and CD by Ae. According to Rule 2, it incurs smaller expected cost if Ae subsequently tests the edges between C2 and CD rather than those between C2 and CS. Since we have already proved that the cost is smaller under policy Ae in case (Ee,E), it follows that by transitivity that Ae leads to smaller expected cost than A in case (Eec ,E). In addition to the optimality proved above, implementing the policy also incurs a low computational complexity, as stated below: Lemma 5: The optimal policy is implementable with a computational complexity of no more than 30 log2 n operations at each step to determine which edge is to be tested next. Proof: The low complexity of the policy, of no more than 30 log2 n operations per step, follows from the fact that except for CS and CD, all other components are singletons, and moreover they too are of only three kinds, either having a known non-edge to S, or a known non-edge to D, or neither. Hence all computations involving these components are extremely simple. According to the three rules, at each
a comparison is made to compare which side has the larger value.This takes 2log2n operations for either ®⑦ side that turns out to have the larger number (Therefore, 0 4log2 n operations are needed.).Hence,operations with 2②③s@4⑦0》7 2⊙@ nd a total number of (1+1+2)log2 n+4log2 n 8l0g2 n @ are involved in this part. For testing an edge located in a minimum cut at the @· current step,if this edge is found to not exist.all the necessary operations include one for reducing the number of potential edges in the current singleton by one,one @ for checking if this updated number is equal to zero. Moreover,if the number is zero,the operations needed to determine whether there is more singletons left include (a) one for the algorithm to terminate with the conclusion of finding disconnectivity if no more singletons are left and 2log2 n for the algorithm to take an maximum over the number of potential edges each singleton has to the other side);If the number is not zero,the algorithm proceeds ⑦。⑦大⊙a in the next step.Hence,operations with a total number of(2+1+2)log2 n 5 log2 n are involved in this part. Note that the second and third cases can occur on either side of the singletons at any step.Therefore,the total number of operations involved in these two parts should be doubled,i.e., 2.(8log2 n+5log2 n)=26l0g2 n.Hence,the total number ®Y of operations is 26l0g2 n+4l0g2 n 30l0g2 n. 圆 B0(( nd (b) C.Simulation Results and Discussion Figure 7 shows the number of steps,averaged over 100 Fig.6.Paths generated by the coupling of A(a)and A(b)for the simulations,that the optimal policy takes to establish S-D case shown in Figure 3(c). disconnectivity/connectivity in an ER graph with 50,100,500, 1000,3000 and 5000 nodes,as a function of p.As can been seen from all those figures,determining S-D connectivity at step an algorithm needs to (1)either check the existence of around its phase transition (p 0.03,0.02,0.003,0.002, a potential direct edge between Cs and Cp (if there is still 0.0007and0.0002 under the cases of50,100,500,1000, more than one direct potential edge left).or(2)determine the 3000 and 5000 nodes,respectively)can involve many steps. minimum cut set (if there are no direct edges)and (3)then Specifically,consider the 1000-node case for example.In pick from among the edges in that cut the one that leads to between disconnectivity at very low p(say 10-5)that takes the most connections on the other sides.For the number of about 999 steps to establish,and connectivity at very high p operations needed by an algorithm is the summation of the (around p=1)that can be established in I step,the value of operations involved in each of these three outcomes: p passes through a phase transition around p=0.002(slightly For the direct edge testing,one operation is needed by smaller than the value of p=Inn/n0.006 under the phase the algorithm to terminate if the edge is found to exist. transition to connectivity of the entire network),where it takes Three operations (one for reducing the number of direct a very large number of steps(about 15000)to determine if S potential edges by one,another one for checking if the and D are connected or not.Similar rules hold in all the other obtained number is equal to zero,and the last one for cases. determining that the next step will be used for finding a Note that the execution time of our algorithm can also be minimum cut)are needed in the case that a direct edge reflected by our simulation results on the number of tested is found not to exist.Therefore,at most four operations edges in the sense that it is the summation of the number of in total are needed. operations involved at each step for determining which edge For the testing of an edge located in a minimum cut at the to be tested next.In other words,the execution time is the current step,if the edge is found to exist,all the necessary summation of the computational complexity obtained under operations include the one for reducing the number of each step.Therefore,the entire execution time relies heavily on singleton components by one,one for increasing the the total number of edges tested,and,more specifically,turns number of potential edges in each of these singletons by out to be proportional to that number.According to Lemma one,2l0g2 n for taking a summation over the total number 5,it takes 30log2n operations at each step to determine of potential edges on both sides.Based on the summation,which edge to be tested next.Hence,the entire execution time
8 end end end end end end end end end PD,1 PD,3 PC,1 PC,2 PC,3 PC,4 PC,5 end PC,6 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 end end end end end end end end end end 0 0 1 1 1 1 1 1 1 1 0 0 0 0 0 0 PC,7 PD,2 PD,5 PD,6 PD,4 PC,8 PC,10 PC,9 PD 1 ˈ (a) PD 2 ˈ PC 1 ˈ PC 2 ˈ ! ! ! " " " # # # $ $ $ % & & & & ' ' ' end end end end end end end end end PD,1 PD,3 PC,1 PC,2 PC,3 PC,4 PC,5 end PC,6 1 1 1 0 0 0 1 1 0 0 0 0 1 1 0 0 1 1 1 1 0 0 0 1 end end end end end end end end end 0 0 1 1 1 1 1 1 1 0 0 0 0 0 PC,7 PD,2 PD,6 PD,4 PC,8 PC,9 PC,10 end end end 0 1 1 0 0 PD,5 1 ˆPDˈ 1 ˆPCˈ (b) 2 ˆPDˈ 2 ˆPCˈ ! ! ! " " " # # # $ $ $ % % % & & & & ' ' ' Fig. 6. Paths generated by the coupling of Ae (a) and A (b) for the case shown in Figure 3(c). step an algorithm needs to (1) either check the existence of a potential direct edge between CS and CD (if there is still more than one direct potential edge left), or (2) determine the minimum cut set (if there are no direct edges) and (3) then pick from among the edges in that cut the one that leads to the most connections on the other sides. For the number of operations needed by an algorithm is the summation of the operations involved in each of these three outcomes: • For the direct edge testing, one operation is needed by the algorithm to terminate if the edge is found to exist. Three operations (one for reducing the number of direct potential edges by one, another one for checking if the obtained number is equal to zero, and the last one for determining that the next step will be used for finding a minimum cut) are needed in the case that a direct edge is found not to exist. Therefore, at most four operations in total are needed. • For the testing of an edge located in a minimum cut at the current step, if the edge is found to exist, all the necessary operations include the one for reducing the number of singleton components by one, one for increasing the number of potential edges in each of these singletons by one, 2log2 n for taking a summation over the total number of potential edges on both sides. Based on the summation, a comparison is made to compare which side has the larger value. This takes 2log2 n operations for either side that turns out to have the larger number (Therefore, 4log2 n operations are needed.). Hence, operations with a total number of (1 + 1 + 2) log2 n+ 4 log2 n ≤ 8 log2 n are involved in this part. • For testing an edge located in a minimum cut at the current step, if this edge is found to not exist, all the necessary operations include one for reducing the number of potential edges in the current singleton by one, one for checking if this updated number is equal to zero. Moreover, if the number is zero, the operations needed to determine whether there is more singletons left include one for the algorithm to terminate with the conclusion of finding disconnectivity if no more singletons are left and 2log2 n for the algorithm to take an maximum over the number of potential edges each singleton has to the other side); If the number is not zero, the algorithm proceeds in the next step. Hence, operations with a total number of (2 + 1 + 2) log2 n ≤ 5 log2 n are involved in this part. Note that the second and third cases can occur on either side of the singletons at any step. Therefore, the total number of operations involved in these two parts should be doubled, i.e., 2 · (8 log2 n + 5 log2 n) = 26 log2 n. Hence, the total number of operations is 26 log2 n + 4 log2 n ≤ 30 log2 n. C. Simulation Results and Discussion Figure 7 shows the number of steps, averaged over 100 simulations, that the optimal policy takes to establish S-D disconnectivity/connectivity in an ER graph with 50, 100, 500, 1000, 3000 and 5000 nodes, as a function of p. As can been seen from all those figures, determining S-D connectivity at around its phase transition (p = 0.03, 0.02, 0.003, 0.002, 0.0007 and 0.0002 under the cases of 50, 100, 500, 1000, 3000 and 5000 nodes, respectively) can involve many steps. Specifically, consider the 1000-node case for example. In between disconnectivity at very low p (say 10−5 ) that takes about 999 steps to establish, and connectivity at very high p (around p = 1) that can be established in 1 step, the value of p passes through a phase transition around p = 0.002 (slightly smaller than the value of p = ln n/n ≈ 0.006 under the phase transition to connectivity of the entire network), where it takes a very large number of steps (about 15000) to determine if S and D are connected or not. Similar rules hold in all the other cases. Note that the execution time of our algorithm can also be reflected by our simulation results on the number of tested edges in the sense that it is the summation of the number of operations involved at each step for determining which edge to be tested next. In other words, the execution time is the summation of the computational complexity obtained under each step. Therefore, the entire execution time relies heavily on the total number of edges tested, and, more specifically, turns out to be proportional to that number. According to Lemma 5, it takes 30log2 n operations at each step to determine which edge to be tested next. Hence, the entire execution time
0.02 (b) (c) 1. 02 102 (d) (e) (0 Fig.7.Number of steps needed to determine S-D disconnectivity/connectivity in ER Graphs as a function of p with different number of nodes.(a)50 nodes:(b)100 nodes:(c)500 nodes:(d)1000 nodes:(e)3000 nodes;(f)5000 nodes. exhibits a similar results to the number of tested edges shown Therefore,the proof can be completed in a fashion similar to in our simulations,with only a difference of logarithmic factor.the proofs used in Lemmas 3 and 4. ■ VI.EXTENSION TO GENERAL GRAPHS In this section,we indicate extensions of the optimality B.(1,0,p)Random Graphs results to certain more general classes of random graphs. By a (1,0.p)random graph,we mean a graph that also initially contains"0"edges.A"0"edge is simply an edge that A.(1,p)Random Graphs we already know a priori to not exist,a"1"edge is an edge that By a (1.p)random graph,we mean a graph that initially we already know a priori to exist,and a "p"edge exists with contains two types of edges,“I”edges and“p”edges.The probability p.Unfortunately,determining the optimal policy former are edges that are known to exist,i.e.,exist with for all types of(1,0.p)random graphs appears to be intractable probability one,while the latter are potential edges that The Alternating Policy remains optimal for certain types of independently exist with probability p.The Alternating Policy (1.0.p)graph patterns-series graphs,parallel graphs,SP remains optimal under the condition that the sizes of all graphs,PS graphs,series of parallel of series(SPS)graphs and components,excluding Cs and CD,are the same. parallel of series of parallel(PSP)graphs.As shown in Figure Proof:It suffices to prove that following Rules 1,2 and 8,a series graph consists of n edges in series,with source 3 is optimal under(1.p)random graphs when the sizes of all and destination at the two ends.A parallel graph consists of components,excluding Cs and Cp,are the same.The proof n parallel edges between source and destination,as shown in of Rule 1 follows by directly applying the proof of Lemma 1 Figure 9.We proceed to define the other graphs mentioned since it holds for all types of(1.p)random graphs.Lemma 2 above,and specify the corresponding optimal policy in each can be applied here to prove that the graph maintains the neat case. structure where all the edges in the minimum cut at step t are either all between Cs and components,or they are all between )(○(○…○ Cp and components.When starting with an ER random graph, all the components,excluding Cs and Cp,remain singletons. Fig.8.A series graph For (1.p)random graphs,the components,excluding Cs and CD,at any step t preserve their size that is larger than 1. 1.Series of Parallel(SP)Graphs However,we can treat each component as a super singleton An SP graph consists of n parallel graphs labeled with the number of edges between Cs (Cp)and each super- P1,P2,...,Pn,with mi potential edges (i.e.,existing inde- singleton being Cs.super singleton(Cplsuper singleton). pendently with equal probability p)in the i-th parallel graph
9 10−3 10−2 10−1 100 0 20 40 60 80 100 120 140 160 180 p Number of steps taken by policy p=0.03 (a) 10−4 10−3 10−2 10−1 100 0 50 100 150 200 250 300 350 400 p Number of steps taken by policy p=0.02 (b) 10−4 10−3 10−2 10−1 100 0 500 1000 1500 2000 2500 3000 3500 4000 4500 5000 p Number of steps taken by policy p = 3 × 10−3 (c) 10−4 10−3 10−2 10−1 0 5000 10000 15000 p Number of steps taken by policy 999 p = 2 × 10−3 (d) 10−4 10−3 10−2 10−1 100 0 1 2 3 4 5 6 7 x 104 p Nunmber of steps taken by policy p = 7 × 10−4 (e) 10−4 10−3 10−2 10−1 100 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 x 105 p Number of steps taken by policy p = 2 × 10−4 (f) Fig. 7. Number of steps needed to determine S-D disconnectivity/connectivity in ER Graphs as a function of p with different number of nodes. (a) 50 nodes; (b) 100 nodes; (c) 500 nodes; (d) 1000 nodes; (e) 3000 nodes; (f) 5000 nodes. exhibits a similar results to the number of tested edges shown in our simulations, with only a difference of logarithmic factor. VI. EXTENSION TO GENERAL GRAPHS In this section, we indicate extensions of the optimality results to certain more general classes of random graphs. A. (1,p) Random Graphs By a (1,p) random graph, we mean a graph that initially contains two types of edges, “1” edges and “p” edges. The former are edges that are known to exist, i.e., exist with probability one, while the latter are potential edges that independently exist with probability p. The Alternating Policy remains optimal under the condition that the sizes of all components, excluding CS and CD, are the same. Proof: It suffices to prove that following Rules 1, 2 and 3 is optimal under (1,p) random graphs when the sizes of all components, excluding CS and CD, are the same. The proof of Rule 1 follows by directly applying the proof of Lemma 1 since it holds for all types of (1,p) random graphs. Lemma 2 can be applied here to prove that the graph maintains the neat structure where all the edges in the minimum cut at step t are either all between CS and components, or they are all between CD and components. When starting with an ER random graph, all the components, excluding CS and CD, remain singletons. For (1,p) random graphs, the components, excluding CS and CD, at any step t preserve their size that is larger than 1. However, we can treat each component as a super singleton with the number of edges between CS (CD) and each supersingleton being |CS| · |super singleton|(|CD|super singleton|). Therefore, the proof can be completed in a fashion similar to the proofs used in Lemmas 3 and 4. B. (1,0,p) Random Graphs By a (1,0,p) random graph, we mean a graph that also initially contains “0” edges. A “0” edge is simply an edge that we already know a priori to not exist, a “1” edge is an edge that we already know a priori to exist, and a “p” edge exists with probability p. Unfortunately, determining the optimal policy for all types of (1,0,p) random graphs appears to be intractable. The Alternating Policy remains optimal for certain types of (1,0,p) graph patterns – series graphs, parallel graphs, SP graphs, PS graphs, series of parallel of series (SPS) graphs and parallel of series of parallel (PSP) graphs. As shown in Figure 8, a series graph consists of n edges in series, with source and destination at the two ends. A parallel graph consists of n parallel edges between source and destination, as shown in Figure 9. We proceed to define the other graphs mentioned above, and specify the corresponding optimal policy in each case. S D Fig. 8. A series graph. 1. Series of Parallel (SP) Graphs An SP graph consists of n parallel graphs labeled P1,P2, . . . ,Pn, with mi potential edges (i.e., existing independently with equal probability p) in the i-th parallel graph
10 the fewest number of edges in the PS subgraph that contains s the fewest number of series graphs.We prove this in two steps. First,we have already proved in Theorem 2 that within the same PS subgraph,testing the series graph with the fewest number of edges leads to a smaller expected cost.Therefore, Fig.9.A parallel graph we only need to prove that between two PS subgraphs,we will obtain a smaller expected cost if we first test the series graph with the fewest number of edges in the one containing as shown in Figure 10.The n parallel graphs are arranged in a fewer number of series graphs.The proof follows by again series between a source and a destination. employing induction and stochastic coupling arguments in a way similar to the previous proofs. ■ P 2)The Optimal Policy for PSP Graphs:A PSP graph consists of n SP graphs arranged in parallel,as shown in Figure 13.The i-th SP graph labeled SPi has mi potential Fig.10.A series of parallel (SP)graph. edges Theorem I:The optimal policy is to test the parallel sub- graph with the fewest number of potential edges. Proof:See Appendix C. ◆ 2.Parallel of Series (PS)Graphs A PS graph consists of n series graphs labeled S1,S2,...,Sn.with mi potential edges in the ith series graph, as shown in Figure 11.The n series graphs are arranged in Fig.13.A parallel of series of parallel (PSP)graph. parallel between a source and a destination. Theorem 4:The optimal policy is to test any edge in that parallel graph which contains the fewest number of edges in the SP graph that contains the minimum number of parallel graphs. Proof:It suffices to prove that for any two SP subgrpahs, it is better to test any edge in that parallel graph which has the fewest number of edges in the SP subgraph that contains fewer Fig.11.A parallel of series (PS)graph. number of parallel graphs.The proof can be divided into two steps.First,we have already proved in Theorem 1 that within Theorem 2:The optimal policy is to test any edge on the the same SP subgraph,testing the series graph with the fewest series subgraph that contains the fewest number of potential number of edges leads to a smaller expected cost.Therefore, edges. we only need to prove that between two PS subgraphs,we Proof:It suffices to show that for any two series sub- obtain a smaller expected cost if we first test the series graph graphs,it is better to test the one with fewer number of with the fewest number of edges in the one containing a potential edges first.The proof can be completed in a manner fewer number of series graphs.The proof again follows by similar to the proof of Theorem 1. employing induction and stochastic coupling arguments in a 1)Series of Parallel of Series(SPS)Graphs:An SPS graph manner similar to the previous proofs. ■ consists of n PS graphs arranged in series,as shown in Figure 12.The i-th PS graphs labeled PSi contains mi potential VII.CONCLUSION AND FUTURE WORK edges. We have studied the problem of optimally determining S-D connectivity of random networks.We start by investigating the PS PS. PS Erdos-Renyi (ER)Graphs,where,interestingly,the problem ○…○ appears to be open even for such a well-structured graph.We have considered the class of classic ER Random Graphs with ○…○y a finite number n of nodes,for any value of n.Assuming that each testing of an edge has the same unit cost,we have Fig.12.A series of parallel of series(SPS)graph determined a policy for establishing whether a designated source and destination are connected with minimum expected Theorem 3:The optimal policy is to test any edge in that cost.Interestingly,though ER graphs exhibit a sharp transition series graph which has the fewest number of edges in the PS between disconnectivity of the entire graph and connectivity graph that contains the minimum number of series graphs. of the entire graph around p=Inn/n,the optimal policy turns Proof:It suffices to show that for any two PS subgrpahs,out to not depend on the specific value of p,or even of which it will be better to test any edge in that series graph which has side of the phase transition p lies.The policy simply contracts
10 S D Fig. 9. A parallel graph. as shown in Figure 10. The n parallel graphs are arranged in series between a source and a destination. S D Fig. 10. A series of parallel (SP) graph. Theorem 1: The optimal policy is to test the parallel subgraph with the fewest number of potential edges. Proof: See Appendix C. 2. Parallel of Series (PS) Graphs A PS graph consists of n series graphs labeled S1, S2, . . . , Sn, with mi potential edges in the ith series graph, as shown in Figure 11. The n series graphs are arranged in parallel between a source and a destination. S D Fig. 11. A parallel of series (PS) graph. Theorem 2: The optimal policy is to test any edge on the series subgraph that contains the fewest number of potential edges. Proof: It suffices to show that for any two series subgraphs, it is better to test the one with fewer number of potential edges first. The proof can be completed in a manner similar to the proof of Theorem 1. 1) Series of Parallel of Series (SPS) Graphs: An SPS graph consists of n PS graphs arranged in series, as shown in Figure 12. The i-th PS graphs labeled PSi contains mi potential edges. S D Fig. 12. A series of parallel of series (SPS) graph. Theorem 3: The optimal policy is to test any edge in that series graph which has the fewest number of edges in the PS graph that contains the minimum number of series graphs. Proof: It suffices to show that for any two PS subgrpahs, it will be better to test any edge in that series graph which has the fewest number of edges in the PS subgraph that contains the fewest number of series graphs. We prove this in two steps. First, we have already proved in Theorem 2 that within the same PS subgraph, testing the series graph with the fewest number of edges leads to a smaller expected cost. Therefore, we only need to prove that between two PS subgraphs, we will obtain a smaller expected cost if we first test the series graph with the fewest number of edges in the one containing a fewer number of series graphs. The proof follows by again employing induction and stochastic coupling arguments in a way similar to the previous proofs. 2) The Optimal Policy for PSP Graphs: A PSP graph consists of n SP graphs arranged in parallel, as shown in Figure 13. The i-th SP graph labeled SPi has mi potential edges. Fig. 13. A parallel of series of parallel (PSP) graph. Theorem 4: The optimal policy is to test any edge in that parallel graph which contains the fewest number of edges in the SP graph that contains the minimum number of parallel graphs. Proof: It suffices to prove that for any two SP subgrpahs, it is better to test any edge in that parallel graph which has the fewest number of edges in the SP subgraph that contains fewer number of parallel graphs. The proof can be divided into two steps. First, we have already proved in Theorem 1 that within the same SP subgraph, testing the series graph with the fewest number of edges leads to a smaller expected cost. Therefore, we only need to prove that between two PS subgraphs, we obtain a smaller expected cost if we first test the series graph with the fewest number of edges in the one containing a fewer number of series graphs. The proof again follows by employing induction and stochastic coupling arguments in a manner similar to the previous proofs. VII. CONCLUSION AND FUTURE WORK We have studied the problem of optimally determining S-D connectivity of random networks. We start by investigating the Erdos-Renyi (ER) Graphs, where, interestingly, the problem appears to be open even for such a well-structured graph. We have considered the class of classic ER Random Graphs with a finite number n of nodes, for any value of n. Assuming that each testing of an edge has the same unit cost, we have determined a policy for establishing whether a designated source and destination are connected with minimum expected cost. Interestingly, though ER graphs exhibit a sharp transition between disconnectivity of the entire graph and connectivity of the entire graph around p = ln n/n, the optimal policy turns out to not depend on the specific value of p, or even of which side of the phase transition p lies. The policy simply contracts