Graph Search: a New Paradigm for Social Computing Shuai ma 京航宫航天大学 BEIHANG UNIVERSITY
Shuai Ma Graph Search: a New Paradigm for Social Computing
The soil Food Web 6 Graphs are everywhere, and quite a few are huge graphs 山东省高遮公略图一 员普2 ▲圆 原照R 2
2 Graphs are everywhere, and quite a few are huge graphs!
Graph Search- Why Bother? Graph Searching A+(SOL+A arCI Google/ File systems Databases World wide web Social Networks File systems -1960s: very simple search functionalities Databases-mid 1960S: SQL language s·鱼 World wide Web-1990'S: keyword search engines e f8 in& Social networks-late 1990s q⊙a Blv Graphs have more expressive power, compared with RDB& XML 2. Relationships become important for search - Google Knowledge Graph Graph search is a new paradigm for social computing! 3
Graph Search - Why Bother? 3 • File systems - 1960’s: very simple search functionalities • Databases - mid 1960’s:SQL language • World Wide Web - 1990’s:keyword search engines • Social networks - late 1990’s: File systems Databases World Wide Web Graph search is a new paradigm for social computing! Social Networks 1. Graphs have more expressive power, compared with RDB & XML. 2. Relationships become important for search – Google Knowledge Graph
Interesting Coincidence! ■S|GMoD+VLDB+|CDE 2000200120022003200420052006200720082009201020112012 Social computing Web 2.0 DB people started working on graphs at around the same time!
Interesting Coincidence! 4 DB people started working on graphs at around the same time! 0 5 10 15 20 25 30 35 40 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 SIGMOD + VLDB + ICDE Social computing & Web 2.0
Outline Application scenarios What is graph search? Three types of graph search Problems and challenges · Related techniques Summary
Outline 5 • Application scenarios • What is graph search? • Three types of graph search • Problems and challenges • Related techniques • Summary
Application Scenarios
6 Application Scenarios
Application Scenarios Complex object identification Data quality Real-life data is often dirty 1%0-5% of business data contains errors Dirty costs us businesses 600 billion dollars each year gartner Wrong price data in retail databases alone costs US consumers $2.5 billion annually Data cleaning tools deliver an overall business value of more than 600 million GBP each year at BT Data cleaning FORRESTER Data repairing Record matching(aka object identification, entity resolution, data deduplication) Complex object identification Modeling complex objects as graphs
• Data quality – Real-life data is often dirty: 1%–5% of business data contains errors – Dirty costs us businesses 600 billion dollars each year – Wrong price data in retail databases alone costs US consumers $2.5 billion annually – Data cleaning tools deliver an overall business value of more than ‘‘600 million GBP’’ each year at BT. • Data cleaning – Data repairing – Record matching (aka. object identification, entity resolution, data deduplication) • Complex object identification – Modeling complex objects as graphs Application Scenarios 7 Complex object identification
Application Scenarios Software plagiarism detection [131 Traditional plagiarism detection tools may not be applicable for serious software plagiarism problems A new tool based on graph pattern matching Represent the source codes as program dependence graphs [141 se graph pattern matching to detect plagiarism int sum(int array [], int court) ant 1, sun; 2: declaration. int sum o for (i= O; i< count: 1++)[ sum= add(sum, array[i])i 6: assignment, i=0(0: declaration, int count retm int add(int ant 5: control i< count 1: declaration. retum a+ b: 8. increment i++ 7: assignment, sum=0 9: assignment, sum=addo 3: declaration. int array 4: retum retum sum 10: call-site. add( sum. armaylil 8
Application Scenarios 8 • Traditional plagiarism detection tools may not be applicable for serious software plagiarism problems. • A new tool based on graph pattern matching – Represent the source codes as program dependence graphs [14] . – Use graph pattern matching to detect plagiarism. Software plagiarism detection [13]
Application Scenarios ransport routing【16 T Graph search is a common practice in transportation networks, due to the wide application of Location-Based Services Exam ple: Mark, a driver in the U.S. who wants to go from Irvine to Riverside in california If Mark wants to reach Riverside by his car in the shortest time, the problem can be expressed as the shortest path problem. Then by using existing methods, we can get the shortest path from irvine, ca to Riverside, CA traveling along State Route 261 If Mark drives a truck delivering hazardous materials may not be allowed to cross over some bridges or railroad crossings this time we can use a pattern graph containing specific route constraints(such as regular expressions)to find the optimal transport routes
Application Scenarios 9 • Graph search is a common practice in transportation networks, due to the wide application of Location-Based Services. • Example: Mark, a driver in the U.S. who wants to go from Irvine to Riverside in California. – If Mark wants to reach Riverside by his car in the shortest time, the problem can be expressed as the shortest path problem. Then by using existing methods, we can get the shortest path from Irvine, CA to Riverside, CA traveling along State Route 261. Transport routing [16] – If Mark drives a truck delivering hazardous materials may not be allowed to cross over some bridges or railroad crossings. This time we can use a pattern graph containing specific route constraints (such as regular expressions) to find the optimal transport routes
Application Scenarios Recommender systems [131 Recommendations have found its usage in many emerging specific applications, such as social matching systems Graph search is a useful tool for recommendations a headhunter wants to find a biologist (Bio)to help a group of software G engineers(sEs) analyze genetic data DM1 To do this, (s)he uses an expertise HRI Bio1 recommendation network g. as Bi depicted in g, where All Al2 v a node denotes a person labeled SE1 B102 Bio3 with expertise, and SE2 DM2 an edge indicates recommendation AlI DM Alk DMK e.g., HR, recommends Bio,, and Al, recommends DM1 10
Application Scenarios 10 • Recommendations have found its usage in many emerging specific applications, such as social matching systems. • Graph search is a useful tool for recommendations. Recommender systems [13] – A headhunter wants to find a biologist (Bio) to help a group of software engineers (SEs) analyze genetic data. – To do this, (s)he uses an expertise recommendation network G, as depicted in G, where ✓ a node denotes a person labeled with expertise, and ✓ an edge indicates recommendation, e.g., HR1 recommends Bio1 , and AI1 recommends DM1