characteristics of current p2p file-sharing systems (with a brief excursion into network measurement...

60
Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington

Post on 20-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Characteristics of Current P2P File-Sharing Systems (with a brief excursion into network measurement tools) Stefan Saroiu P. Krishna Gummadi Steven Gribble University of Washington
  • Slide 2
  • Peer-to-Peer Frenzy Both research and industrial excitement CAN, Chord, Past, Tapestry, JXTA, Farsite, Publius, Morpheus, AudioGalaxy Basic Premise wide-area, distributed system voluntary, ad-hoc, dynamic home-user peers exchange information (mostly large files) Many proposals, yet nobody knows the participating peers characteristics and behavior
  • Slide 3
  • SS SS napster.com P P P P P P Q R D P P P P P P P Q Q Q Q Q D R P S peer server Q R D response query file download NapsterGnutella R Napster & Gnutella
  • Slide 4
  • Methodology 2 stages: 1.periodically crawl Gnutella/Napster discover peers and their metadata 2.feed output from crawl into measurement tools: bottleneck bandwidth SProbe latency SProbe peer availability LF degree of content sharing Napster crawler
  • Slide 5
  • Network Bandwidth Scenarios Network measurements Dynamic server/peer selection P2P overlay formation or application-level multicast Placement of content replicas
  • Slide 6
  • Network Bandwidth 1.Throughput: number of transferred bytes during a fix interval of time 2.Available bandwidth: the maximum attainable throughput of a newly started flow 3.Bottleneck bandwidth: maximum throughput ideally obtained across the slowest link Hard to measure: throughput, available bandwidth Easier to measure: bottleneck bandwidth
  • Slide 7
  • One-Packet Model slope = bandwidthbottleneck 1 probing packet Traversal Time Packet Size
  • Slide 8
  • Packet-Pair Model bottleneck bandwidth time dispersion proportional to bottleneck bandwidth t sizepacket bandwidthbottleneck
  • Slide 9
  • Vital Properties of an Ideal Tool Accurate Fast: 1 min/measurement too slow Scalable: flooding the network will not work Works in Uncooperative Environments cant deploy software at both endpoints
  • Slide 10
  • Properties of an Ideal Tool Active: existent traffic might not be suitable TCP/UDP based: ICMP heavily filtered Cross-traffic resilient: should detect and give up in the face of cross traffic Works on Asymmetric Paths Flexible to Bandwidth Changes Controlled Evaluations
  • Slide 11
  • Current Tools Desired Properties Path- char pcharclinkbprobepathrateNettimerSProbe Accurate Fast Uncooperative Environments * Scalable TCP/UDP Active Cross-traffic * Asymmetric Bandwidth changes Controlled Evaluations
  • Slide 12
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 13
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 14
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 15
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 16
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 17
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 18
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 19
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 20
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 21
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 22
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 23
  • SProbe Uses TCP Tricks From local host To remote host No cooperation needed LocalRemote SYN packet RST packet
  • Slide 24
  • SProbe Uses TCP Tricks From remote To local Involuntary cooperation of application layer LocalRemote (Web) HTTP Get request Data packet ACK (last data packet)
  • Slide 25
  • SProbes Accuracy
  • Slide 26
  • Slide 27
  • More SProbe Bottleneck Bandwidth Latency Availability (LF): send a SYN packet receive: SYN/ACK host active RST host inactive, but online nothing host offline
  • Slide 28
  • P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
  • Slide 29
  • P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
  • Slide 30
  • Higher Downstream Bandwidths
  • Slide 31
  • Most Peers have Cable Modem-like Bandwidths
  • Slide 32
  • Yes, Lots of Cable Modems
  • Slide 33
  • Closest 20% are 4X closer than furthest 20%
  • Slide 34
  • Two horizontal bands East Coast and Transoceanic Links
  • Slide 35
  • Availability Period probes yield data like: start end
  • Slide 36
  • Availability Period probes yield data like: Divide into two periods Keep segments that: start in 1 st period end in 1 st or 2 nd periods draw conclusion only on segments no larger than 2 nd period start end 12 hours
  • Slide 37
  • Median Session is about one hour (same for both systems)
  • Slide 38
  • Gnutella/Napster Uptime
  • Slide 39
  • P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
  • Slide 40
  • Who Has the Files?
  • Slide 41
  • Slide 42
  • Correlation of Free-Riding with B/W
  • Slide 43
  • P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
  • Slide 44
  • Its all about incentive!
  • Slide 45
  • Lack of Knowledge is Universal
  • Slide 46
  • P2P Characteristics How many peers are server-like? Who are the free-riders? Do peers tend to lie? How robust is the Gnutella overlay?
  • Slide 47
  • Power-Law Networks are here to Stay Barabasi and Albert showed that networks which grow by continuous addition of new nodes exhibit preferential attachment (likelihood of connecting to a node depends on the nodes degree) power-law distribution of vertex degree Internet, WWW, Gnutella
  • Slide 48
  • Resilience to Failures Power-law networks (Cohen et al.): very resilient in face of random node failures a giant spanning cluster still exists fairly resilient in face of cascading failures very vulnerable in face of orchestrated attacks (towards high-degree nodes)
  • Slide 49
  • Gnutella Fri Feb 16 05:21:52-05:23:22 PST1771 hosts Popular sites: 212.239.171.174 adams-00-305a.Stanford.EDU 0.0.0.0
  • Slide 50
  • 30% random failures 1771 471 294 hostsFri Feb 16 05:21:52-05:23:22 PST
  • Slide 51
  • 4% orchestrated failures Fri Feb 16 05:21:52-05:23:22 PST1771 - 63 hosts
  • Slide 52
  • Discussion Heterogeneity: 3 orders of magnitude of bandwidth 50Kbps-100Mbps 6 orders of magnitude of latency 10us-10s >4 orders of magnitude in availability 1%-99.99% Peers should not be treated as equals
  • Slide 53
  • Cooperating, Well-Behaved Peers Incentive: game-theoretic approaches of enforcing local behavior for global benefit System enforcement: peers can: measure each others characteristics (SProbe) enforce the reported ones a reported 56Kbps peer should not download content at higher speed
  • Slide 54
  • Feedback to Current Proposals CAN, Chord, Past: great memory and lookup algorithms: log(N) time and space at the price of maintaining rigid network structure: hypercubes, butterflies, Plaxton trees unclear how network structure is maintained given heterogeneity and dynamics of peers Conjecture these networks will have a hard time stabilizing: will need lots of routine, maintenance traffic
  • Slide 55
  • Instead Gnutella Easy join procedure: this simplicity gave Gnutella its power-law shape Easy to implement protocol (broadcast) Lots of maintenance traffic already although the protocol has become smarter with its subsequent versions Searching is a nightmare
  • Slide 56
  • Document Popularity Follows Zipf distribution long-tailed Popular documents become more popular with Napster/Gnutella Currently, need to resubmit queries in the hope that someone will answer Wish-list based system
  • Slide 57
  • Wide-area Network Measurements Sending a few packets can be identified with hostile behavior Even a few SYN packets are sufficient to trigger software firewalls dialogue box pops up possible scan from washington.edu, click OK or Cancel Many confused, angry, threatening e-mails sent to many people (security, root, Ed): active Internet measurements are not simple to perform
  • Slide 58
  • Excerpt from e-mail Thank you for your reply. Unfortunately, I did not authorise anybody from washington.edu to attempt to crack into my computer. Attempting to break into computers is a crime in Australia. Please advise the names and contact details of the people involved in this "research" so that I can contact the Australian Federal Police, who will no doubt contact your Federal Bureau of Investigation to investigate this incident and institute criminal proceedings against those concerned.
  • Slide 59
  • Current Work Quantify and show that current proposals are too rigid for Napter/Gnutella-like peers dynamics Wish-list, delayed exchange system big distributed scheduling problem SGet a downloading tool with automatic server selection no bandwidth is wasted
  • Slide 60
  • Questions? Beautiful Sieg Hall Pride of UW