Carnegie Mellon Peer-to-Peer 15-441
Peer-to-Peer 15-441
Carnegie Mellon Scaling problem ·Mi| ions of clients→ server and network meltdown 鸟血堂血
2 Scaling Problem • Millions of clients ⇒ server and network meltdown
Carnegie Mellon P2P System Leverage the resources of client machines(peers) Computation storage bandwidth
3 P2P System • Leverage the resources of client machines (peers) – Computation, storage, bandwidth
Carnegie Mellon Why p2p? Scaling: Create system whose capacity grows with of clients-automatically Self-managing This aspect attractive for corporate/datacenter needs e.g., Amazons 100,000-ish machines, google's 300k+ Harness lots of"spare" capacity at end-hosts Eliminate centralization Robust to failures etc Robust to censorship, politics legislation?? Create apps/services without having huge resources
4 Why p2p? • Scaling: Create system whose capacity grows with # of clients - automatically! • Self-managing – This aspect attractive for corporate/datacenter needs – e.g., Amazon’s 100,000-ish machines, google’s 300k+ • Harness lots of “spare” capacity at end-hosts • Eliminate centralization – Robust to failures, etc. – Robust to censorship, politics & legislation?? – Create apps/services without having huge resources
Carnegie Mellon Today's Goal p2p is hot There are tons and tons of instances But that's not the point Identify fundamental techniques useful in p2p settings Understand the challenges Look at the(current )boundaries of where 2p is particularly useful
5 Today’s Goal • p2p is hot. • There are tons and tons of instances • But that’s not the point • Identify fundamental techniques useful in p2p settings • Understand the challenges • Look at the (current!) boundaries of where 2p is particularly useful 5
Carnegie Mellon Outline p2p file sharing techniques Downloading: Whole-file vs chunks Searching Centralized index(Napster, etc. Flooding(Gnutella, etc. Smarter flooding( KazaA,. Routing(Freenet, etc. Uses of p2p-what works well, what doesnt? servers vs arbitrary nodes Hard state(backups! )vs soft-state(caches) Challenges Fairness, freeloading, security
6 Outline • p2p file sharing techniques – Downloading: Whole-file vs. chunks – Searching • Centralized index (Napster, etc.) • Flooding (Gnutella, etc.) • Smarter flooding (KaZaA, …) • Routing (Freenet, etc.) • Uses of p2p - what works well, what doesn’t? – servers vs. arbitrary nodes – Hard state (backups!) vs soft-state (caches) • Challenges – Fairness, freeloading, security, …
Carnegie Mellon Searching& Fetching Human I want to watch that great 80s cult classic Better off dead 1. Search “ better o斤 dead”-> better off dead.mov or ->0x539fba83ajdeadbeef 2. Locate sources of better off dead moy 3. Download the file from them
7 Searching & Fetching Human: “I want to watch that great 80s cult classic ‘Better Off Dead’” 1.Search: “better off dead” -> better_off_dead.mov or -> 0x539fba83ajdeadbeef 2.Locate sources of better_off_dead.mov 3.Download the file from them 7
Carnegie Mellon Searching 2 Key="title Internet Value=MP3 data Client Publisher Lookup( title") 6 5
8 Searching Internet N1 N2 N3 N6 N5 N4 Publisher Key= “title” Value=MP3 data… Client Lookup(“title”) ?
Carnegie Mellon Search Approaches · Centralized Flooding "Supernodes" ng between ° A hybrid:F|ood Structured
9 Search Approaches • Centralized • Flooding • A hybrid: Flooding between “Supernodes” • Structured 9
Carnegie Mellon Different types of searches Needles VS HaystackS Searching for top 40, or an obscure punk track from 1981 that nobody,'s heard of? Search expressiveness Whole word? Reqular expressions? File names? Attributes? Whole-text search? (e.g., p2p gnutella or p2p google?)
1 0 Different types of searches • Needles vs. Haystacks – Searching for top 40, or an obscure punk track from 1981 that nobody’s heard of? • Search expressiveness – Whole word? Regular expressions? File names? Attributes? Whole-text search? • (e.g., p2p gnutella or p2p google?)