P2P tutorial (ESEC 2001) c Peer-to-peer information systems: concepts and models,state-of-the-art, and future systems Karl Aberer Manfred Hauswirth ■P■ TU EPFL-DSC TU Wien-DSG Distributed Information Distributed Systems Group Systems Laboratory karl.aberer@epfl.ch M.Hauswirth@infosys.tuwien.ac.at Goals of the Tutorial Position the P2P paradigm in the design space of distributed systems Get a detailed overview of state-of-the-art P2P systems Understand the problem of decentralized data management in P2P systems Understand the research issues for future systems Detailed information on the new P-Grid approach for P2P systems 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 (c)2001 Karl Aberer,Manfred Hauswirth 1
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 1 Peer-to-peer information systems: concepts and models, state-of-the-art, and future systems Karl Aberer EPFL - DSC Distributed Information Systems Laboratory karl.aberer@epfl.ch Manfred Hauswirth TU Wien - DSG Distributed Systems Group M.Hauswirth@infosys.tuwien.ac.at © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 2 Goals of the Tutorial • Position the P2P paradigm in the design space of distributed systems • Get a detailed overview of state-of-the-art P2P systems • Understand the problem of decentralized data management in P2P systems • Understand the research issues for future systems • Detailed information on the new P-Grid approach for P2P systems
P2P tutorial (ESEC 2001) 和 Outline of the Tutorial ·What is P2P? P2P properties Types of P2P Systems -A historical view -Related approaches 。 State-of the-art systems Napster,Gnutella,Freenet ·P2 P Data Management -Gnutella,Freenet,Chord,CAN,P-Grid -Applications of P-Grid ·Gridella ·Managing trust Conclusions and References 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 Listen-P2P is around P2P systems get a lot of attention currently File-sharing systems Napster,Gnutella,Freenet,etc. -Conferences O'Reilly P2P conference 2001 (conferences.oreilly.com/p2p/) 2001 International Conference on Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/) 。etc. P2P is nothing new -see Arpanet 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 (c)2001 Karl Aberer,Manfred Hauswirth
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 2 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 3 Outline of the Tutorial • What is P2P? – P2P properties – Types of P2P Systems – A historical view – Related approaches • State-of the-art systems – Napster, Gnutella, Freenet • P2P Data Management – Gnutella, Freenet, Chord, CAN, P-Grid – Applications of P-Grid • Gridella • Managing trust • Conclusions and References © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 4 Listen — P2P is around • P2P systems get a lot of attention currently – File-sharing systems • Napster, Gnutella, Freenet, etc. – Conferences • O’Reilly P2P conference 2001 (conferences.oreilly.com/p2p/) • 2001 International Conference on Peer-to-Peer Computing (P2P2001) (www.ida.liu.se/conferences/p2p/p2p2001/) • etc. • P2P is nothing new – see Arpanet
P2P tutorial (ESEC 2001) 和 What is P2P? Every participating node acts as both a client and a server (servent") Every node "pays"its participation by providing access to (some of)its resources Properties: no central coordination no central database no peer has a global view of the system global behavior emerges from local interactions all existing data and services are accessible from any peer peers are autonomous peers and connections are unreliable 82001 Karl Abe ESEC/FSE 2001 What is P2P?...and what isn't? Clay Shirkey (The Accelerator Group): -"Peer-to-peer is a class of applications that take advantage of resources-storage,cycles,content,human presence- available at the edges of the Internet.Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses,peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers. P2P "litmus test: Does it allow for variable connectivity and temporary network addresses? Does it give the nodes at the edges of the network significant autonomy? P2P an application-level internet on top of the Internet 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 (c)2001 Karl Aberer,Manfred Hauswirth 3
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 3 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 5 What is P2P? • Every participating node acts as both a client and a server (“servent”) • Every node “pays” its participation by providing access to (some of) its resources • Properties: – no central coordination – no central database – no peer has a global view of the system – global behavior emerges from local interactions – all existing data and services are accessible from any peer – peers are autonomous – peers and connections are unreliable © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 6 What is P2P? ... and what isn’t? • Clay Shirkey (The Accelerator Group): – “Peer-to-peer is a class of applications that take advantage of resources—storage, cycles, content, human presence— available at the edges of the Internet. Because accessing these decentralized resources means operating in an environment of unstable connectivity and unpredictable IP addresses, peer-to-peer nodes must operate outside the DNS and have significant or total autonomy of central servers.” – P2P “litmus test:” • Does it allow for variable connectivity and temporary network addresses? • Does it give the nodes at the edges of the network significant autonomy? • P2P ~ an application-level internet on top of the Internet
P2P tutorial (ESEC 2001) Types of P2P Systems 。E-commerce systems -eBay,B2B market places,B2B integration servers,... ·File sharing systems -Napster,Gnutella,Freenet,... Distributed Databases Mariposa [Stonebraker96],... ·Networks -Arpanet Mobile ad-hoc networks,Terminodes [Hubaux01],... 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 System Layers -where is P2P? ·Users User -Commerce and society is P2P 。Application layer uses QoS E-commerce systems can be P2P or centralized Application Information management -Directories are central but could exploits QoS be P2P Information Networks often are P2P Management Internet exploits Qos Network 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 (c)2001 Karl Aberer,Manfred Hauswirth 4
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 4 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 7 Types of P2P Systems • E-commerce systems – eBay, B2B market places, B2B integration servers, … • File sharing systems – Napster, Gnutella, Freenet, … • Distributed Databases – Mariposa [Stonebraker96], … • Networks – Arpanet – Mobile ad-hoc networks, Terminodes [Hubaux01], … © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 8 User Application Information Management Network QoS Qos exploits QoS exploits uses System Layers — where is P2P? • Users – Commerce and society is P2P • Application layer – E-commerce systems can be P2P or centralized • Information management – Directories are central but could be P2P • Networks often are P2P – Internet
P2P tutorial (ESEC 2001) How much P2P is involved? P2P user P2P P2P interaction application information management eBay yes no no Napster yes yes no Gnutella, yes yes yes Freenet 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 9 P2P Cooperation Models 。Centralized model - global index held by a central authority (single point of failure) -direct contact between requestors and providers Example:Napster ·Decentralized model Examples:Freenet,Gnutella no global index,no central coordination,global behavior emerges from local interactions,etc. direct contact between requestors and providers(Gnutella)or mediated by a chain of intermediaries (Freenet) ·Hierarchical model introduction of "super-peers" -mix of centralized and decentralized model Example:DNS 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 10 (c)2001 Karl Aberer,Manfred Hauswirth 5
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 5 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 9 How much P2P is involved? Gnutella, yes yes yes Freenet Napster yes yes no eBay yes no no P2P information management P2P application P2P user interaction © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 10 P2P Cooperation Models • Centralized model – global index held by a central authority (single point of failure) – direct contact between requestors and providers – Example: Napster • Decentralized model – Examples: Freenet, Gnutella – no global index, no central coordination, global behavior emerges from local interactions, etc. – direct contact between requestors and providers (Gnutella) or mediated by a chain of intermediaries (Freenet) • Hierarchical model – introduction of “super-peers” – mix of centralized and decentralized model – Example: DNS
P2P tutorial (ESEC 2001) 和 So what's new?P2P in a historical Context The original Internet was designed as a P2P system any 2 computers could send packets to each other no firewalls no network address translation no asymmetric connections(V.90,ADSL,cable,etc.) the back-then "killer apps"FTP and telnet are C/S but anyone could telnet/FTP anyone else servers acted as clients and vice versa cooperation was a central goal and "value":no spam or exhaustive bandwidth consumption .Typical examples of "old-fashioned P2P": -Usenet News -DNS The emergence of P2P can be seen as a renaissance of the original Internet model 82001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 11 Related Approaches Related distributed information system approaches: Event-based systems -Push systems Mobile agents Distributed databases 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 12 (c)2001 Karl Aberer,Manfred Hauswirth 6
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 6 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 11 So what’s new? P2P in a historical Context • The original Internet was designed as a P2P system – any 2 computers could send packets to each other • no firewalls / no network address translation • no asymmetric connections (V.90, ADSL, cable, etc.) – the back-then “killer apps” FTP and telnet are C/S but anyone could telnet/FTP anyone else – servers acted as clients and vice versa – cooperation was a central goal and “value”: no spam or exhaustive bandwidth consumption • Typical examples of “old-fashioned P2P”: – Usenet News – DNS • The emergence of P2P can be seen as a renaissance of the original Internet model © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 12 Related Approaches Related distributed information system approaches: – Event-based systems – Push systems – Mobile agents – Distributed databases
P2P tutorial (ESEC 2001) r Event-based (publish/subscribe)Systems System model -Components (peers)interact by generating and receiving events - Components declare interest in receiving specific (patterns of) events and are notified upon their occurrence Supports a highly flexible interaction between loosely-coupled components XY Subscribe to X followed by Y 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 13 Event-based vs.Peer-to-Peer ·Common properties: -symmetric communication style dynamic binding between producers and consumers Subscription to events ~"passive"queries -EB:notification P2P:active discovery Subscription language supports more sophisticated queries and pattern matching (event patterns with time dependencies) Event-based systems typically have a specialized event distribution infrastructure EB:2 node types,P2P:1 node type -EB infrastructure must be deployed 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 14 (c)2001 Karl Aberer,Manfred Hauswirth 7
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 7 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 13 Event-based (publish/subscribe) Systems • System model – Components (peers) interact by generating and receiving events – Components declare interest in receiving specific (patterns of) events and are notified upon their occurrence – Supports a highly flexible interaction between loosely-coupled components Subscribe to X followed by Y X Y XY © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 14 Event-based vs. Peer-to-Peer • Common properties: – symmetric communication style – dynamic binding between producers and consumers • Subscription to events ~ “passive” queries – EB: notification – P2P: active discovery • Subscription language supports more sophisticated queries and pattern matching (event patterns with time dependencies) • Event-based systems typically have a specialized event distribution infrastructure – EB: 2 node types, P2P: 1 node type – EB infrastructure must be deployed
P2P tutorial (ESEC 2001) 和 Push Systems ·,A set of designated broadcasters offer information that is pre-grouped in channels (weather,news,etc. Receivers subscribe to channels of their interest and receive channel information as it is being "broadcast"(timely distribution) Receivers may have to pay prior to receiving the information(pay-per-view,flat fee,etc.) 。 Pull→push 82001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 15 Push Systems vs.Peer-to-Peer Asymmetric communication style (P2P:symmetric) Focus is on timely data distribution not on discovery Filtering may be deployed to reduce data transmission requirements Subscription to channels is prerequisite Producer/consumer binding is static Push systems require a specialized distribution infrastructure -Push:3 node types,P2P:1 node type -Push infrastructure must be deployed 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 16 (c)2001 Karl Aberer,Manfred Hauswirth 8
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 8 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 15 Push Systems • A set of designated broadcasters offer information that is pre-grouped in channels (weather, news, etc.) • Receivers subscribe to channels of their interest and receive channel information as it is being “broadcast” (timely distribution) • Receivers may have to pay prior to receiving the information (pay-per-view, flat fee, etc.) • Pull → push © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 16 Push Systems vs. Peer-to-Peer • Asymmetric communication style (P2P: symmetric) • Focus is on timely data distribution not on discovery • Filtering may be deployed to reduce data transmission requirements • Subscription to channels is prerequisite • Producer/consumer binding is static • Push systems require a specialized distribution infrastructure – Push: 3 node types, P2P: 1 node type – Push infrastructure must be deployed
P2P tutorial (ESEC 2001) 和 Mobile Agents A mobile agent is a computational entity that moves around in a network at its own volition to accomplish a task on behalf of its owner can cooperate with other agents -"learns"("Whom to visit next?") Mobility (heterogeneous network!) Weak:code,data Strong:code,data,execution Stack 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 Mobile Agents vs.Peer-to-Peer Very similar in terms of search and navigation P2P:the peers propagate requests(search,update) MA:the nodes propagate the agents Mobile agent ~"active"query Mobile agent systems require a considerably more sophisticated environment mobile code support (heavy) security (protect the receiving node from malicious mobile agents and vice versa) In many domains P2P systems can take over more apt for distributed data management less requirements(sending code requires much bandwidth,security,etc.) 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 18 (c)2001 Karl Aberer,Manfred Hauswirth 9
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 9 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 17 Mobile Agents • A mobile agent is a computational entity that moves around in a network at its own volition to accomplish a task on behalf of its owner – can cooperate with other agents – “learns” (“Whom to visit next?”) • Mobility (heterogeneous network!) – Weak: code, data – Strong: code, data, execution Stack © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 18 Mobile Agents vs. Peer-to-Peer • Very similar in terms of search and navigation – P2P: the peers propagate requests (search, update) – MA: the nodes propagate the agents – Mobile agent ~ “active” query • Mobile agent systems require a considerably more sophisticated environment – mobile code support (heavy) – security (protect the receiving node from malicious mobile agents and vice versa) • In many domains P2P systems can take over – more apt for distributed data management – less requirements (sending code requires much bandwidth, security, etc.)
P2P tutorial (ESEC 2001) Distributed Databases Fragmenting large databases (e.g.,relational) over physically distributed nodes ● Efficient processing of complex queries (e.g., SQL)by decomposing them Efficient update strategies (e.g.,lazy vs.eager) Consistent transactions (e.g.,2 phase commit) Normally approaches rely on central coordination 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 9 Distributed Databases vs.Peer-to-Peer Data distribution is a key issue for P2P systems Approaches in distributed DB that address scalability LH*family of scalable hash index structures [Litwin97] Snowball:scalable storage system for workstation clusters [Vingralek98] Fat-Btree:a scalable B-Tree for parallel DB [Yokota 9] Approaches in distributed DB that address autonomy (and scalability) Mariposa:distributed relational DBMS based on an underlying economic model [Stonebraker96] 2001 Karl Aberer,Manfred Hauswirth ESEC/FSE 2001 20 (c)2001 Karl Aberer,Manfred Hauswirth 10
P2P tutorial (ESEC 2001) (c) 2001 Karl Aberer, Manfred Hauswirth 10 © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 19 Distributed Databases • Fragmenting large databases (e.g., relational) over physically distributed nodes • Efficient processing of complex queries (e.g., SQL) by decomposing them • Efficient update strategies (e.g., lazy vs. eager) • Consistent transactions (e.g., 2 phase commit) • Normally approaches rely on central coordination © 2001 Karl Aberer, Manfred Hauswirth ESEC/FSE 2001 20 Distributed Databases vs. Peer-to-Peer • Data distribution is a key issue for P2P systems • Approaches in distributed DB that address scalability – LH* family of scalable hash index structures [Litwin97] – Snowball: scalable storage system for workstation clusters [Vingralek98] – Fat-Btree: a scalable B-Tree for parallel DB [Yokota 9] • Approaches in distributed DB that address autonomy (and scalability) – Mariposa: distributed relational DBMS based on an underlying economic model [Stonebraker96]