Fast Computation of Dense Temporal Subgraphs Shuai Ma, Renjun Hu, Luoshu Wang, Xuelian Lin, Jinpeng Huai SKLSDE Lab, beihang university china Beijing Advanced Innovation Center for Big Data and Brain Computing 北京航空航天大学 BEIHANGUNIVERSITY
Fast Computation of Dense Temporal Subgraphs Shuai Ma, Renjun Hu, Luoshu Wang, Xuelian Lin, Jinpeng Huai SKLSDE Lab, Beihang University, China Beijing Advanced Innovation Center for Big Data and Brain Computing 1
Graphs are dynamic The Soil Food Web OCIAL NETWORK A B C Aug 10 Oct 29 All images are downloaded from Google
Graphs are dynamic 2 *All images are downloaded from Google
Motivation Temporal graph: a continuous sequence of snapshots(each snapshot records the status of a graph at a specific timestamp) Dense temporal subgraph: a subgraph having heavy edge weights over a continuous time period Space M Space Mi Museu COLUMB Columbia K San-D Westfield Horton plaza Westfield Horton Plaza FTS Manchester Grand Live traffic, Faster Manchester Grand Live traffic Fastmmnn Hyatt San Diego a dense temporal subgraphs corresponds to a crowded area spanning over a continuous time period
Motivation 3 A dense temporal subgraphs corresponds to a crowded area spanning over a continuous time period. Temporal graph: a continuous sequence of snapshots (each snapshot records the status of a graph at a specific timestamp) Dense temporal subgraph: a subgraph having heavy edge weights over a continuous time period … … …
Outline The FDs problem: analyses and challenges a data-driven approach Experimental stud Summary
Outline ➢ The FDS problem: analyses and challenges ➢ A data-driven approach ➢ Experimental study ➢ Summary 4
Temporal graphs and subgraphs Temporal graph G(V, E, F)with T timestamps nodes and edges keep unchanged 235 edge weights constantly and regularly vary 叫252 with timestamps -7.-2 T snapshots: G,(V, E, F1, G2(V, E, F2) GT(V,E,F density(G)=36 Temporal subgraph H(Vs, Es, Fs, i, j time interval [,jc[1,T 7,5-2 subgraph (Vs, Es of (V,E denote H(V,, Fs,i, j as G[, j] density (H)=21 ohesive density density(G) sum of eage weights among all snapshots 5
Temporal graphs and subgraphs 5 Temporal graph G(V, E, F) with T timestamps • nodes and edges keep unchanged • edge weights constantly and regularly vary with timestamps • T snapshots: G1 (V, E, F1 ), G2 (V, E, F2 ), …, GT (V, E, FT) Temporal subgraph H(Vs , Es , Fs , i, j) • time interval [i, j] ⊆ [1, T] • subgraph (Vs , Es ) of (V, E) • denote H(V, E, Fs , i, j) as G[i, j] Cohesive density cdensity(G) • sum of edge weights among all snapshots cdensity(G)=36 cdensity(H)=21
Problem statement and complexity analysis > FDS: finding dense subgraphs find a connected temporal subgraph with the greatest cohesive density 235 N2).3.5.12351 7.77.72 density(H=44 density(G)=36 P Bogdanov, M. Mongiov, A K Singh. Mining heavy subgraphs in time-evolving networks. In /CDM, 2011. 6
Problem statement and complexity analysis P. Bogdanov, M. Mongiov, A. K. Singh. Mining heavy subgraphs in time-evolving networks. In ICDM, 2011. 6 ➢ FDS: finding dense subgraphs • find a connected temporal subgraph with the greatest cohesive density cdensity(H)=44 cdensity(G)=36
Problem statement and complexity analysis > FDS: finding dense subgraphs find a connected temporal subgraph with the greatest cohesive density Problem hardness[bogdanov et al. 111 the FDs problem is NP-complete, even for a temporal network with a single snapshot and with +1 or-1 edge weights only Our approximation hardness result the cohesive density of the optimal dense temporal subgraph is NP-hard to approximate within any constant factor. proof sketch: building an approximation factor preserving reduction from the net worth maximization problem, which is NP- hard to approximate within any constant factor P Bogdanov, M. Mongiov, A K Singh. Mining heavy subgraphs in time-evolving networks. In /CDM, 2011. 7
Problem statement and complexity analysis ➢ Problem hardness[Bogdanov et al. 11] P. Bogdanov, M. Mongiov, A. K. Singh. Mining heavy subgraphs in time-evolving networks. In ICDM, 2011. 7 ➢ Our approximation hardness result • the FDS problem is NP-complete, even for a temporal network with a single snapshot and with +1 or -1 edge weights only. • the cohesive density of the optimal dense temporal subgraph is NP-hard to approximate within any constant factor. ✓ proof sketch: building an approximation factor preserving reduction from the net worth maximization problem, which is NPhard to approximate within any constant factor. ➢ FDS: finding dense subgraphs • find a connected temporal subgraph with the greatest cohesive density
Challenges find a dense temporal determine a time interval [,j] subgraph find a dense subgraph given [,j Filter-and-Verification[ Bogdanov et al. 11 consider all time intervals [, j] and find dense subgraphs by fixing[i, j] each time filter [ j] if its upper bound of cohesive density is worse than the best cohesive density achieved prune 99%of a total of T*(T+1)2 time intervals T 447 1,414 14,142 T*T+1/2 10 10 10 108 unpruned 10 10 104 106 Filter-and-Verification is insuficient for large temporal graphs! A new and better algorithm design philosophy is needed P BogdanoV, M. Mongiov, A K Singh. Mining heavy subgraphs in time-evolving networks. In /CDM, 2011. 8
Challenges ➢ Filter-and-Verification[Bogdanov et al. 11] • consider all time intervals [i, j] and find dense subgraphs by fixing [i, j] each time • filter [i, j] if its upper bound of cohesive density is worse than the best cohesive density achieved • prune 99% of a total of T*(T+1)/2 time intervals P. Bogdanov, M. Mongiov, A. K. Singh. Mining heavy subgraphs in time-evolving networks. In ICDM, 2011. 8 T 141 447 1,414 ··· 14,142 T*(T+1)/2 104 105 106 ··· 108 # unpruned 102 103 104 ··· 106 Filter-and-Verification is insufficient for large temporal graphs! A new and better algorithm design philosophy is needed. determine a time interval [i, j] find a dense subgraph given [i, j] find a dense temporal subgraph
Outline The fds problem: analyses and challenges a data-driven approach Experimental stud Summary
Outline ➢ The FDS problem: analyses and challenges ➢ A data-driven approach ➢ Experimental study ➢ Summary 9
Main ideas Big graph friendly Employ hidden data statistics to explore k time intervals k is typically a small constant independent of T, e.g., 10 T 141 447 1,414 14,142 T*(T+12 10 106 108 unpruned 10 105 our approach k Our data-driven approach fIDEs step 1: identify k time intervals involved with dense subgraphs employing hidden data statistics and drawing characteristics of targeted time intervals step 2: compute dense subgraphs given time intervals v building the connections with the NWM problem, and exploiting effective and efficient optimization techniques 1000x faster while remain comparable quality of dense subgraphs 10
Main ideas ➢ Employ hidden data statistics to explore k time intervals • k is typically a small constant independent of T, e.g., 10 10 T 141 447 1,414 ··· 14,142 T*(T+1)/2 104 105 106 ··· 108 # unpruned 102 103 104 ··· 106 our approach k k k … k ➢ Our data-driven approach FIDES • step 1: identify k time intervals involved with dense subgraphs ✓ employing hidden data statistics and drawing characteristics of targeted time intervals • step 2: compute dense subgraphs given time intervals ✓ building the connections with the NWM problem, and exploiting effective and efficient optimization techniques Big graph friendly 1000x faster while remain comparable quality of dense subgraphs