Transcript
- Slide 1
- Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington
- Slide 2
- Peer-to-Peer Frenzy Both research and industrial excitement CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus, AudioGalaxy Basic Premise wide-area, distributed system voluntary, ad-hoc, dynamic home-user peers exchange information (mostly large files) Many proposals, yet nobody knows the participating peers characteristics and behavior
- Slide 3
- SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R P S peer server Q R D response query file download NapsterGnutella R Napster & Gnutella
- Slide 4
- Methodology 2 stages: 1.periodically crawl Gnutella/Napster discover peers and their metadata 2.feed output from crawl into measurement tools: bottleneck bandwidth SProbe latency SProbe peer availability LF degree of content sharing Napster crawler
- Slide 5
- Network Bandwidth Scenarios Network measurements Dynamic server/peer selection P2P overlay formation or application-level multicast Placement of content replicas
- Slide 6
- Network Bandwidth 1.Throughput: number of transferred bytes during a fix interval of time 2.Available bandwidth: the maximum attainable throughput of a newly started flow 3.Bottleneck bandwidth: maximum throughput ideally obtained across the slowest link Hard to measure: throughput, available bandwidth Easier to measure: bottleneck bandwidth
- Slide 7
- One-Packet Model slope = bandwidthbottleneck 1 probing packet Traversal Time Packet Size
- Slide 8
- Packet-Pair Model bottleneck bandwidth time dispersion proportional to bottleneck bandwidth t sizepacket bandwidthbottleneck
- Slide 9
- Vital Properties of an Ideal Tool Accurate Fast: 1 min/measurement too slow Scalable: flooding the network will not work Works in Uncooperative Environments cant deploy software at both endpoints
- Slide 10
- Properties of an Ideal Tool Active: existent traffic might not be suitable TCP/UDP based: ICMP heavily filtered Cross-traffic resilient: should detect and give up in the face of cross traffic Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled Evaluations
- Slide 11
- Current Tools Desired Properties Path- char pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric Bandwidth changes Controlled Evaluations
- Slide 12
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 13
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 14
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 15
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 16
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 17
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 18
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 19
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 20
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 21
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 22
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 23
- SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
- Slide 24
- SProbe Uses TCP Tricks From remote To local Involuntary cooperation of application layer LocalRemote (Web) HTTP Get request Data packet ACK (last data packet)
- Slide 25
- SProbes Accuracy
- Slide 26
- Slide 27
- More SProbe Bottleneck Bandwidth Latency Availability (LF): send a SYN packet receive: SYN/ACK host active RST host inactive, but online nothing host offline
- Slide 28
- P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
- Slide 29
- P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
- Slide 30
- Higher Downstream Bandwidths
- Slide 31
- Most Peers have Cable Modem-like Bandwidths
- Slide 32
- Yes, Lots of Cable Modems
- Slide 33
- Closest 20% are 4X closer than furthest 20%
- Slide 34
- Two horizontal bands East Coast and Transoceanic Links
- Slide 35
- Availability Period probes yield data like: start end
- Slide 36
- Availability Period probes yield data like: Divide into two periods Keep segments that: start in 1 st period end in 1 st or 2 nd periods draw conclusion only on segments no larger than 2 nd period start end 12 hours
- Slide 37
- Median Session is about one hour (same for both systems)
- Slide 38
- Gnutella/Napster Uptime
- Slide 39
- P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
- Slide 40
- Who Has the Files?
- Slide 41
- Slide 42
- Correlation of Free-Riding with B/W
- Slide 43
- P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
- Slide 44
- Its all about incentive!
- Slide 45
- Lack of Knowledge is Universal
- Slide 46
- P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
- Slide 47
- Power-Law Networks are here to Stay Barabasi and Albert showed that networks which grow by continuous addition of new nodes exhibit preferential attachment (likelihood of connecting to a node depends on the nodes degree) power-law distribution of vertex degree Internet, WWW, Gnutella
- Slide 48
- Resilience to Failures Power-law networks (Cohen et al.): very resilient in face of random node failures a giant spanning cluster still exists fairly resilient in face of cascading failures very vulnerable in face of orchestrated attacks (towards high-degree nodes)
- Slide 49
- Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0
- Slide 50
- 30% random failures 1771 471 294 hostsFri Feb 16 05:21:52-05:23:22 PST
- Slide 51
- 4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts
- Slide 52
- Discussion Heterogeneity: 3 orders of magnitude of bandwidth 50Kbps-100Mbps 6 orders of magnitude of latency 10us-10s >4 orders of magnitude in availability 1%-99.99% Peers should not be treated as equals
- Slide 53
- Cooperating, Well-Behaved Peers Incentive: game-theoretic approaches of enforcing local behavior for global benefit System enforcement: peers can: measure each others characteristics (SProbe) enforce the reported ones a reported 56Kbps peer should not download content at higher speed
- Slide 54
- Feedback to Current Proposals CAN, Chord, Past: great memory and lookup algorithms: log(N) time and space at the price of maintaining rigid network structure: hypercubes, butterflies, Plaxton trees unclear how network structure is maintained given heterogeneity and dynamics of peers Conjecture these networks will have a hard time stabilizing: will need lots of routine, maintenance traffic
- Slide 55
- Instead Gnutella Easy join procedure: this simplicity gave Gnutella its power-law shape Easy to implement protocol (broadcast) Lots of maintenance traffic already although the protocol has become smarter with its subsequent versions Searching is a nightmare
- Slide 56
- Document Popularity Follows Zipf distribution long-tailed Popular documents become more popular with Napster/Gnutella Currently, need to resubmit queries in the hope that someone will answer Wish-list based system
- Slide 57
- Wide-area Network Measurements Sending a few packets can be identified with hostile behavior Even a few SYN packets are sufficient to trigger software firewalls dialogue box pops up possible scan from washington.edu, click OK or Cancel Many confused, angry, threatening e-mails sent to many people (security, root, Ed): active Internet measurements are not simple to perform
- Slide 58
- Excerpt from e-mail Thank you for your reply. Unfortunately, I did not authorise anybody from washington.edu to attempt to crack into my computer. Attempting to break into computers is a crime in Australia. Please advise the names and contact details of the people involved in this "research" so that I can contact the Australian Federal Police, who will no doubt contact your Federal Bureau of Investigation to investigate this incident and institute criminal proceedings against those concerned.
- Slide 59
- Current Work Quantify and show that current proposals are too rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed exchange system big distributed scheduling problem SGet a downloading tool with automatic server selection no bandwidth is wasted
- Slide 60
- Questions? Beautiful Sieg Hall Pride of UW