正在加载图片...
9 25.000 t-Geed山y 25.000 200.00 ★0 pSort 20.000 -PeSort Greedy -Greedy IntSort 20.000 ★-Sort --AdaSub y-PeSort 2150 00 15,000 B-IntSort 1s.000 --AdaSub 100.00 10.00 5.000 5,000 3 5678 (a) (b) (c) Fig.4.The expected cost of the adaptive testing strategies yielded by different algorithms.The r-coordinates of the figure follow the increasing order of the size of the uncertain graphs. VII.SIMULATIONS to approximate the expected cost of the strategy.To further In this section,we present our simulations on the perfor- eliminate the random noise in data,we designate 10 pairs of mance of the proposed algorithms on various datasets.We source and destination in each uncertain graph,and the final first introduce our simulation environment in the following results shown in the figures are the average costs incurred by and show the detailed results in subsequent sections. strategies among all pairs of source and destination. 3)Algorithms Involved in Performance Comparisons:To A.Simulation Settings evaluate the performance of our proposed algorithms,we 1)Simulation Datasets:We adopt three real life datasets in include three additional heuristics adapted from [19].We our simulations.The basic descriptions and statistics are listed briefly introduce the algorithms as follows: as follows: Greedy Algorithm(Greedy):The greedy algorithm that Citation Networks (from Microsoft Academic Graph tests the edges from low cost to high cost proposed in [40]):We extract six citation networks of different sub- Section VI. fields in Microsoft Academic Graph ranging from 749 Adaptive Submodular Algorithm (AdaSub):The nodes (1429 edges)to 273751 nodes (993025 edges)to Adaptive Submodular algorithm based on the Q-value generate six uncertain graphs. approach proposed in Section VI. Internet Peer to Peer Networks [38]:This dataset con- MDP-based Algorithm (MDP):The exact dynamic tains a snapshot of the Gnutella peer-to-peer file sharing programming algorithm applying the Markov Decision network in August 2002.We extract nine subnetworks Process framework proposed in Section 1. ranging from 1000 nodes (1700 edges)to 5000 nodes Optimistic Sort Algorithm [19](OpSort):The algo- (16469 edges)to form nine uncertain graphs. rithm yields a strategy that tests the edges following Twitter Ego Networks [39]:This dataset consists of ego the increasing order of c/p.OpSort is optimal when the networks in Twitter.We select 10 ego networks ranging uncertain graph is a parallel graph [19]. from 95 nodes (1376 edges)to 213 nodes (17930 edges) Pessimistic Sort Algorithm [19](PeSort):The algo- to create 10 uncertain graphs.Note that the ego networks rithm yields a strategy that tests the edges following the we use have high density. increasing order of c/(1-p).PeSort is optimal when the For each uncertain graph generated above,we use Jac- uncertain graph is a serial graph [19]. card's coefficient [12],which is an established metric for Intersection Sort Algorithm [19](IntSort):The algo- link prediction in social networks,to assign the existence rithm that tests the edge with the minimum cost that lies probabilities of the edges.Specifically,for an edge e=(,y) on the intersection of a shortest s-t path and a minimum in uncertain graph g,the existence probability of e is given as s-t cut in the uncertain graph under the current state. p(e)=r()nr()/r()Ur(y),where r()denotes the Due to the prohibitive time complexity of the MDP-based set of neighbors of a node in the graph.We construct the cost algorithm,we only apply it to a sequence of subnetworks with function of the uncertain graphs by assigning the cost of each 20 edges that are extracted from the citation networks. edge from a Gaussian distribution with mean 50 and standard deviation 10.The negative part of the distribution is truncated. 2)Calculation of the Performance Metric:The perfor- B.Evaluation of Proposed Algorithms mance metric in the simulations is the expected cost of the We plot the expected costs of the strategies derived by strategies derived by the algorithms.However,to calculate different algorithms on various uncertain graphs in Figures the exact expected cost of a strategy requires testing it on 4 and 5. all the possible underlying graphs of an uncertain graph,of From Figure 5,we can see that the Adaptive Submodular al- which the number is extremely large.Therefore,instead,we gorithm indeed yields near optimal testing strategies,of which first generate 1000 underlying graphs by sampling from the the expected cost is at most 8%higher than the minimum distribution given by the uncertain graph and then use the expected cost achieved by the MDP-based algorithm.On the average cost of a strategy incurs on the 1000 underlying graphs other hand,although the Greedy algorithm is relatively simple,9 1 2 3 4 5 6 Expected Cost 0 5,000 10,000 15,000 20,000 25,000 Dataset: Citation Networks Greedy OpSort PeSort IntSort AdaSub (a) 1 2 3 4 5 6 7 8 9 Expected Cost 0 5,000 10,000 15,000 20,000 25,000 Dataset: Internet P2P Networks Greedy OpSort PeSort IntSort AdaSub (b) 1 2 3 4 5 6 7 8 9 10 Expected Cost 0 50,000 100,000 150,000 200,000 Dataset: Twitter Ego Networks Greedy OpSort PeSort IntSort AdaSub (c) Fig. 4. The expected cost of the adaptive testing strategies yielded by different algorithms. The x-coordinates of the figure follow the increasing order of the size of the uncertain graphs. VII. SIMULATIONS In this section, we present our simulations on the perfor￾mance of the proposed algorithms on various datasets. We first introduce our simulation environment in the following and show the detailed results in subsequent sections. A. Simulation Settings 1) Simulation Datasets: We adopt three real life datasets in our simulations. The basic descriptions and statistics are listed as follows: • Citation Networks (from Microsoft Academic Graph [40]): We extract six citation networks of different sub- fields in Microsoft Academic Graph ranging from 749 nodes (1429 edges) to 273751 nodes (993025 edges) to generate six uncertain graphs. • Internet Peer to Peer Networks [38]: This dataset con￾tains a snapshot of the Gnutella peer-to-peer file sharing network in August 2002. We extract nine subnetworks ranging from 1000 nodes (1700 edges) to 5000 nodes (16469 edges) to form nine uncertain graphs. • Twitter Ego Networks [39]: This dataset consists of ego networks in Twitter. We select 10 ego networks ranging from 95 nodes (1376 edges) to 213 nodes (17930 edges) to create 10 uncertain graphs. Note that the ego networks we use have high density. For each uncertain graph generated above, we use Jac￾card’s coefficient [12], which is an established metric for link prediction in social networks, to assign the existence probabilities of the edges. Specifically, for an edge e = (x, y) in uncertain graph G, the existence probability of e is given as p(e) = |Γ(x) ∩ Γ(y)|/|Γ(x) ∪ Γ(y)|, where Γ(·) denotes the set of neighbors of a node in the graph. We construct the cost function of the uncertain graphs by assigning the cost of each edge from a Gaussian distribution with mean 50 and standard deviation 10. The negative part of the distribution is truncated. 2) Calculation of the Performance Metric: The perfor￾mance metric in the simulations is the expected cost of the strategies derived by the algorithms. However, to calculate the exact expected cost of a strategy requires testing it on all the possible underlying graphs of an uncertain graph, of which the number is extremely large. Therefore, instead, we first generate 1000 underlying graphs by sampling from the distribution given by the uncertain graph and then use the average cost of a strategy incurs on the 1000 underlying graphs to approximate the expected cost of the strategy. To further eliminate the random noise in data, we designate 10 pairs of source and destination in each uncertain graph, and the final results shown in the figures are the average costs incurred by strategies among all pairs of source and destination. 3) Algorithms Involved in Performance Comparisons: To evaluate the performance of our proposed algorithms, we include three additional heuristics adapted from [19]. We briefly introduce the algorithms as follows: • Greedy Algorithm (Greedy): The greedy algorithm that tests the edges from low cost to high cost proposed in Section VI. • Adaptive Submodular Algorithm (AdaSub): The Adaptive Submodular algorithm based on the Q-value approach proposed in Section VI. • MDP-based Algorithm (MDP): The exact dynamic programming algorithm applying the Markov Decision Process framework proposed in Section 1. • Optimistic Sort Algorithm [19] (OpSort): The algo￾rithm yields a strategy that tests the edges following the increasing order of c/p. OpSort is optimal when the uncertain graph is a parallel graph [19]. • Pessimistic Sort Algorithm [19] (PeSort): The algo￾rithm yields a strategy that tests the edges following the increasing order of c/(1−p). PeSort is optimal when the uncertain graph is a serial graph [19]. • Intersection Sort Algorithm [19] (IntSort): The algo￾rithm that tests the edge with the minimum cost that lies on the intersection of a shortest s-t path and a minimum s-t cut in the uncertain graph under the current state. Due to the prohibitive time complexity of the MDP-based algorithm, we only apply it to a sequence of subnetworks with 20 edges that are extracted from the citation networks. B. Evaluation of Proposed Algorithms We plot the expected costs of the strategies derived by different algorithms on various uncertain graphs in Figures 4 and 5. From Figure 5, we can see that the Adaptive Submodular al￾gorithm indeed yields near optimal testing strategies, of which the expected cost is at most 8% higher than the minimum expected cost achieved by the MDP-based algorithm. On the other hand, although the Greedy algorithm is relatively simple
<<向上翻页向下翻页>>
©2008-现在 cucdc.com 高等教育资讯网 版权所有