peer-to-peer

47
Peer-to-Peer Credits: Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D. Qui, R. Shrikant, Xiaoyan Li and R. Martin.

Upload: hedva

Post on 09-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

Peer-to-Peer. Credits: Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D. Qui, R. Shrikant, Xiaoyan Li and R. Martin. Reminders. HW1 due next class (9/14) PA1 to be assigned next class. Intro. Quickly grown in popularity - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Peer-to-Peer

Peer-to-Peer

Credits:

Slides adapted from J. Pang, B. Richardson, I. Stoica, M. Cuenca, B. Cohen, S. Ramesh, D.

Qui, R. Shrikant, Xiaoyan Li and R. Martin.

Page 2: Peer-to-Peer

Reminders

• HW1 due next class (9/14)

• PA1 to be assigned next class.

Page 3: Peer-to-Peer

Intro

• Quickly grown in popularity– Hundreds of file sharing applications– 35 million American adults use P2P networks -- 29% of all

Internet users in US!– Audio/Video transfer now dominates traffic on the Internet

• But what is P2P?– Searching or location? -- DNS, Google!– Computers “Peering”? -- Server Clusters, IRC Networks,

Internet Routing!– Clients with no servers? -- Doom, Quake!

Page 4: Peer-to-Peer

Intro (2)

• Fundamental difference: Take advantage of resources at the edges of the network

• What’s changed:– End-host resources have increased dramatically– Broadband connectivity now common

• What hasn’t:– Deploying infrastructure still expensive

Page 5: Peer-to-Peer

Overview• Centralized Database

– Napster

• Query Flooding– Gnutella

• Intelligent Query Flooding– KaZaA

• Swarming– BitTorrent

Page 6: Peer-to-Peer

The Lookup Problem

Internet

N1

N2 N3

N6N5

N4

Publisher

Key=“title”Value=MP3 data… Client

Lookup(“title”)

?

Page 7: Peer-to-Peer

The Lookup Problem (2)

• Common Primitives:– Join: how to I begin participating?– Publish: how do I advertise my file?– Search: how to I find a file?– Fetch: how to I retrieve a file?

Page 8: Peer-to-Peer

Next Topic...• Centralized Database

– Napster

• Query Flooding– Gnutella

• Intelligent Query Flooding– KaZaA

• Swarming– BitTorrent

Page 9: Peer-to-Peer

Napster: History

• In 1999, S. Fanning launches Napster

• Peaked at 1.5 million simultaneous users

• Jul 2001, Napster shuts down

Page 10: Peer-to-Peer

Napster: Overiew

• Centralized Database:– Join: on startup, client contacts central

server– Publish: reports list of files to central

server– Search: query the server => return

someone that stores the requested file– Fetch: get the file directly from peer

Page 11: Peer-to-Peer

Napster: Publish

I have X, Y, and Z!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

Page 12: Peer-to-Peer

Napster: Search

Where is file A?

Query Reply

search(A)-->123.2.0.18Fetch

123.2.0.18

Page 13: Peer-to-Peer

Napster: Discussion

• Pros:– Simple– Search scope is O(1)– Controllable (pro or con?)

• Cons:– Server maintains O(N) State– Server does all processing– Single point of failure

Page 14: Peer-to-Peer

Next Topic...• Centralized Database

– Napster

• Query Flooding– Gnutella

• Intelligent Query Flooding– KaZaA

• Swarming– BitTorrent

Page 15: Peer-to-Peer

Gnutella: History

• In 2000, J. Frankel and T. Pepper from Nullsoft released Gnutella

• Soon many other clients: Bearshare, Morpheus, LimeWire, etc.

• In 2001, many protocol enhancements including “ultrapeers”

Page 16: Peer-to-Peer

Gnutella: Overview

• Query Flooding:– Join: on startup, client contacts a few

other nodes; these become its “neighbors”

– Publish: no need– Search: ask neighbors, who as their

neighbors, and so on... when/if found, reply to sender.

– Fetch: get the file directly from peer

Page 17: Peer-to-Peer

I have file A.

I have file A.

Gnutella: Search

Where is file A?

Query

Reply

Page 18: Peer-to-Peer

Gnutella: Discussion

• Pros:– Fully de-centralized– Search cost distributed

• Cons:– Search scope is O(N)– Search time is O(???)– Nodes leave often, network unstable

Page 19: Peer-to-Peer

Aside: Search Time?

Page 20: Peer-to-Peer

Aside: All Peers Equal?

56kbps Modem

10Mbps LAN

1.5Mbps DSL

56kbps Modem56kbps Modem

1.5Mbps DSL

1.5Mbps DSL

1.5Mbps DSL

QuickTime™ and aTIFF (Uncompressed) decompressor

are needed to see this picture.

Page 21: Peer-to-Peer

Next Topic...• Centralized Database

– Napster

• Query Flooding– Gnutella

• Intelligent Query Flooding– KaZaA

• Swarming– BitTorrent

• Unstructured Overlay Routing– Freenet

• Structured Overlay Routing– Distributed Hash Tables

Page 22: Peer-to-Peer

KaZaA: History

• In 2001, KaZaA created by Dutch company Kazaa BV

• Single network called FastTrack used by other clients as well: Morpheus, giFT, etc.

• Eventually protocol changed so other clients could no longer talk to it

Page 23: Peer-to-Peer

KaZaA: Overview

• “Smart” Query Flooding:– Join: on startup, client contacts a

“supernode” ... may at some point become one itself

– Publish: send list of files to supernode– Search: send query to supernode,

supernodes flood query amongst themselves.

– Fetch: get the file directly from peer(s); can fetch simultaneously from multiple peers

Page 24: Peer-to-Peer

KaZaA: Network Design“Super Nodes”

Page 25: Peer-to-Peer

KaZaA: File Insert

I have X!

Publish

insert(X, 123.2.21.23)...

123.2.21.23

Page 26: Peer-to-Peer

KaZaA: File Search

Where is file A?

Query

search(A)-->123.2.0.18

search(A)-->123.2.22.50

Replies

123.2.0.18

123.2.22.50

Page 27: Peer-to-Peer

KaZaA: Fetching

• More than one node may have requested file...• How to tell?

– Must be able to distinguish identical files– Not necessarily same filename– Same filename not necessarily same file...

• Use Hash of file– KaZaA uses UUHash: (MD5+CRC32 checksum of

staggered blocks)---fast, but not secure– Alternatives: full MD5, SHA-1

• How to fetch?– Get bytes [0..1000] from A, [1001...2000] from B– Alternative: Erasure Codes for greater redundancy.

Page 28: Peer-to-Peer

KaZaA: Discussion

• Pros:– Tries to take into account node heterogeneity:

• Bandwidth

• Host Computational Resources

• Host Availability (?)

– Rumored to take into account network locality (latency)

• Cons:– Mechanisms easy to circumvent– Still no real guarantees on search scope or search time– Easy to intentionally corrupt content.

Page 29: Peer-to-Peer

Next Topic...• Centralized Database

– Napster

• Query Flooding– Gnutella

• Intelligent Query Flooding– KaZaA

• Swarming– BitTorrent

Page 30: Peer-to-Peer

BitTorrent: History

• In 2002, B. Cohen debuted BitTorrent• Key Motivation:

– Popularity exhibits temporal locality (Flash Crowds)– E.g., Slashdot effect, CNN on 9/11, new movie/game

release

• Focused on Efficient Fetching, not Searching:– Distribute the same file to all peers– Single publisher, multiple downloaders

• Has some “real” publishers:– Blizzard Entertainment using it to distribute the beta of their

new game- May 2006 deal with Warner Brothers for content distribution.

Page 31: Peer-to-Peer

BitTorrent Working

Page 32: Peer-to-Peer

BitTorrent: Publish/JoinTracker/Seed

Page 33: Peer-to-Peer

BitTorrent: Fetch

Page 34: Peer-to-Peer

BitTorrent: Overview

• Swarming:– Join: contact centralized “tracker” server,

get a list of peers.– Publish: Run a tracker server.– Search: Out-of-band. E.g., use Google to

find a tracker for the file you want.– Fetch: Download chunks of the file from

your peers. Upload chunks you have to them.

Page 35: Peer-to-Peer

Algorithms/Features in BT• Pipelining of requests – to avoid delay in

pieces being sent

• Piece Selection– Strict priority [within sub-pieces]– Rarest first– Random First piece– Endgame mode [for request of final subpieces]

• Choking, Optimistic unchoking, Anti-snubbing, Upload only

Page 36: Peer-to-Peer

Free-Riding with Optimistic Unchoking

• If a peer wants to free-ride in BT, the maximum download rate it will get is 1/(nu + 1) of the maximum rate

• This is also a very powerful result showing why BT performs so much better when compared to other P2P networks like Gnutella and Kazaa.

Page 37: Peer-to-Peer

Notes on BitTorrent• The BT network graph is random. Other ‘clever’

peer assignment algorithms have resulted only in network partitions

• Single point of failure (tracker)• Lots of magic numbers

– how many peers should a peer upload to?– how long should a peer wait in optimistic unchoke

before moving on to other peers?– how many requests should be pipelined?

• “Starting multiple torrents at once is a bad idea” - Bram

Page 38: Peer-to-Peer

BitTorrent: Sharing Strategy

• Employ “Tit-for-tat” sharing strategy– “I’ll share with you if you share with me”– Be optimistic: occasionally let freeloaders

download• Otherwise no one would ever start!• Also allows you to discover better peers to download

from when they reciprocate

– Meant to counter Prisoner’s Dilemma problem.

• Approximates Pareto Efficiency– Game Theory: “No change can make anyone

better off without making others worse off”

Page 39: Peer-to-Peer

BitTorrent: Summary

• Pros:– Works reasonably well in practice– Gives peers incentive to share resources;

avoids freeloaders

• Cons:– Pareto Efficiency relative weak condition– Central tracker server needed to bootstrap

swarm (is this really necessary?)

Page 40: Peer-to-Peer

P2P: Summary

• Many different styles; remember pros and cons of each– centralized, flooding, swarming

Lessons learned:– Single points of failure are very bad– Flooding messages to everyone is bad– Underlying network topology is important– Not all nodes are equal– Need incentives to discourage freeloading– Privacy and security are important

Page 41: Peer-to-Peer

Demo of using Bittorent client against a .torrent file on http://

www.legaltorrents.com/

and

Video of Interview with Bittorrent’s Creator Bram Cohen

http://24x7.com/blog/vlogs/bram_vlog.mov

Page 42: Peer-to-Peer

Extra Slides

Page 43: Peer-to-Peer

Only for Pirates and sharing home video?

• Legal downloads tripled in the first half of 2005, reaching 186 million.

• In May 2006, Warner Brother announced it would use bittorrent to distribute pay-for content. $1 per tv show $4 to $5 per full length movie.

• Claims “[Users] will be prevented from copying and distributing files they purchase through two mechanisms: one that requires them to enter a password before watching a file, and another that allows the file to be viewed only on the computer to which it was downloaded."

Page 44: Peer-to-Peer

Allowing p2p a liability for ISPs?

From TMCnet News: Unauthorized p2p file sharing banned in Spain. “It's a criminal offense for ISPs to facilitate unauthorized downloading.”

“Spain's telco giant Telefonica reports 90% of usage on its broadband lines is Internet traffic, up from 15% five years ago. Of that 90%, a massive 71% is P2P traffic.”

Page 45: Peer-to-Peer

What can ISPs do?

• tcp/udp port filtering?• Deep Packet Inspection (DPI):

“From lightreading.com: Network operators worldwide spent US$96.8m on DPI in 2005, but the sector is poised to grow by more than 75 per cent this year, to about US$170m, and top US$586m in 2010.”

Page 46: Peer-to-Peer

KaZaA: Usage Patterns• KaZaA is more than

one workload!– Many files < 10MB

(e.g., Audio Files)– Many files > 100MB

(e.g., Movies)

from Gummadi et al., SOSP 2003

Page 47: Peer-to-Peer

KaZaA: Usage Patterns (2)

• What we saw:– A few big files consume most of the bandwidth– Many files are fetched once per client but still very popular

• Solution?– Caching!

from Gummadi et al., SOSP 2003