cs 4720 peer to peer networking cs 4720 – web & mobile systems

32
CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

Upload: nathaniel-betterley

Post on 14-Dec-2015

227 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Peer to Peer Networking

CS 4720 – Web & Mobile Systems

Page 2: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

History of File Sharing• P2P is the solution to a problem: how to get

very large files to a lot of people in a timely fashion

• How did the question arise?• "Freedom!" "Internet Media!" "Take it to the

man!"• … or sharing copies of copyrighted files…

2

Page 3: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Bulletin Board Systems• Ah, the good old days…• Let's take a look at one now!

– bbsmates.com / http://renegadebbs.info/telnet• We can consider it the earliest form of a web

service • Usenet is a form of a bbs… kinda…

– No central server, fully distributed, evolving mesh– "servers" copy info between themselves

3

Page 4: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Napster

4

• The tech story of my undergrad years• Debuted in Summer of 1999

– That fall I was starting my Sophomore year– I was taking:

• CSC 112 (and lab) Fundamentals of Comp Science – B• MTH 112 Calculus II – B• MTH 117 Discrete Mathematics – A• THE 112 Introduction to the Theatre – A-• HMN 396 Individual Study (Medieval Themes in Modern

Video Games) – A

Page 5: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Napster Protocol• Napster ran central servers that maintained:

– User authentication– Logging– Chat functionality– Making connections between clients

• A user would login to Napster and the program would the populate their profile with all the songs/files they had available

5

Page 6: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Napster Protocol

6

• <nick> "<filename>" <md5> <size> <bitrate> <frequency> <time> – <nick> is the user contributing the file – <filename> is the mp3 file contributed – <md5> is the has of the mp3 file – <size> is the file size in bytes – <bitrate> is the mp3 bitrate in kbps – <frequence> is the sampling frequency in Hz – <time> is the play time in seconds – Example: foouser "generic band - generic song.mp3"

b92870e0d41bc8e698cf2f0a1ddfeac7 443332 128 44100 60

Page 7: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Napster Protocol

7

• When a user did a search, it was just a DB lookup on Napster's servers

• Then Napster would establish a client-client connection to make the transfer happen

• Usually, this would be a simple TCP connection to that users Napster data port (effectively like an FTP server)– If a firewall was involved, the sender would initiate

the connection with the requester

Page 8: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Why did this work?• Universities and Colleges were pushing HARD

about how "connected" their campus was• Many students got their first email address

when they went to school starting in 1996 or so• The speed was an INCREDIBLE jump over 14.4,

28.8, 56k• The direct connection made it really easy to

send the files

8

Page 9: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Why did this work?

9

• Napster (theoretically at the time) was in the clear– They didn't host any of the files… they just made

them available– Feb 2001: 26.4 million users!– Eventually, the connection part of the whole deal

was "enabling technology" and that was the end of that in the Summer of 2001

• One of the big problems was there was an identifiable target: Napster and Shawn Fanning

Page 10: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Solution

10

• Decentralize the network• Truly create "the cloud" where the data would

live• Even though Napster was going strong until

Summer of 2001, the foundation was already being laid for the next generation of sharing technolgy

Page 11: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Kazaa, Morpheus, eDonkey, et al.• Continuing the trend of odd names for

programs comes the first set of decentralized sharing services

11

Page 12: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Link Between Them All• The network was actually called "FastTrack"• Was the most popular file sharing network in

2003 – estimates say that it even eclipsed Napster at its height

• FastTrack was an intentionally designed, corporate funded de-centralized distribution network

12

Page 13: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Nodes and Supernodes• When you connected to the FastTrack network

(through whatever program you used), you started as a node

• Nodes provide file information and download requests to supernodes

• Supernodes are responsible for indexing users' shares, performing queries, and keeping statistics

• When a connection is made, HTTP is used

13

Page 14: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

But you just said it was decentralized…• And it is!• Supernodes are regular nodes that are

"promoted" to supernode status by other supernodes on the network

• As supernodes see that their ranks are diminishing, or if the bandwidth is hurting, they find an unsuspecting node and assimilate… I mean, promote it to supernode

• Supernodes could also self-announce

14

Page 15: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Guess which nodes got promoted!• Supernodes liked nodes with:

– Lots of files– Lots of bandwidth– Lots of uptime– Low latency– Lots of computing power

• So… where do you find one of these machines?• College students who leave their machines on

overnight!

15

Page 16: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

University Response• After all of that with Napster traffic, now the

university machines themselves are the supernodes!

• IT staff did their best to throttle traffic, block ports, etc.

• In the end, the RIAA came knocking…

16

Page 17: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Hashing and RIAA Response• There were some problems in the FastTrack

protocol that were susceptible to some attacks from the RIAA

• The hashing algorithm used to verify if a file was indeed a particular file was written to be fast and efficient… but not terribly accurate

• The RIAA seeded a ton of dummy files to drop the value of the network

17

Page 18: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Other Problems• Remember how I said this was corporately

funded?• How would they make their money back?• Malware and spyware!

18

Page 19: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Kazaa Malware• Cydoor (spyware): Collects information on the PC's surfing habits and passes it on to

the company which created Cydoor.• B3D (adware): An add-on which causes advertising popups if the PC accesses a

website which triggers the B3D code.• Altnet (adware): A distribution network for paid "gold" files.• The Best Offers (adware): Tracks your browsing habits and internet usage to display

advertisements similar to your interests.• InstaFinder (hijacker): Redirects your URL typing errors to InstaFinder's web page

instead of the standard search page.• TopSearch (adware): Displays paid songs and media related to your search in Kazaa.• RX Toolbar (spyware): The toolbar monitors all the sites you visit with Microsoft

Internet Explorer and provides links to competitors' websites.• New.net (hijacker): A browser plugin that lets you access several of its own unofficial

Top Level Domain names, e.g., .chat and .shop. The main purpose of which is to sell domain names such as www.record.shop which is actually www.record.shop.new.net.

19

Page 20: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Today• The FastTrack network is still out there, but

many people (comparatively) don't use it anymore

• The inventors of FastTrack? • They're doing just fine.• They created Skype.

20

Page 21: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

So what replaced FastTrack?• BitTorrent• Introduced by Bram Cohen in Summer of 2001• Just a few months after FastTrack goes online,

actually• At the time, though, there weren't any groups

to host the trackers that were needed for the protocol to work

• That wouldn't change until early 2003

21

Page 22: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

What exactly is BitTorrent?• From bitttorrent.org:

– BitTorrent is a free speech tool.– BitTorrent gives you the same freedom to publish previously enjoyed by

only a select few with special equipment and lots of money. – You have something terrific to publish -- a large music or video file,

software, a game or anything else that many people would like to have. – But the more popular your file becomes, the more you are punished by

soaring bandwidth costs. – If your file becomes phenomenally successful and a flash crowd of

hundreds or thousands try to get it at once, your server simply crashes and no one gets it.

– There is a solution to this vicious cycle: BitTorrent– With BitTorrent free speech no longer has a high price.

22

Page 23: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

But does it cure the common cold?• So… that was nice and all… but what is

BitTorrent?• Simply put, BT is a P2P file sharing protocol for

sharing large amounts of data in a method where all nodes share not only demand but also supply

• There is no one node a file is downloaded from, and thus the load is theoretically evenly balanced

23

Page 24: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Basics• It starts with one node who publishes a file• A .torrent file is created, which simply has the

connection information, file size, etc.• Files are split into (usually) 256KB pieces• They are hashed (of course) to verify the contents

after transmission• The .torrent file is hosted on a tracker site• The tracker HOSTS NO FILES other than the .torrent

files• But it does monitor traffic to connect nodes

24

Page 25: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Basics• The file originator is called the seed• The seed pushes the .torrent file to the tracker• The tracker provides the .torrent file for others

to download• When a peer downloads the .torrent to start

downloading the file, it announces itself to everyone else that is downloading that file

25

Page 26: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Basics• In general, BT works on a rarest piece first

algorithm• A peer will ask for the rarest piece for a given

file from the seeder and will receive it• The peer will then start hosting that "rarest

piece," which theoretically is now NOT the rarest, and again asks for the rarest

26

Page 27: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Basics• This "rarest first" approach makes downloading

different from any other downloading you've done before

• A standard HTTP request is a straight flow of data… sort of

• HTTP packets are out of order and put back together… so why not whole files?

• It works out fairly well to get data distributed

27

Page 28: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Basics• Also, BT doesn't (necessarily) use a single port• Multiple TCP connections can be opened

randomly to keep the network strong• The problem with this:

– Speed of download is a bell curve, not constant– Partial seeding is possible– Streaming is pretty hard (although Bram says "it's

coming")

28

Page 29: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Protocol• All non "keep alive" messages start with one of

the following:– 0 - choke– 1 - unchoke– 2 - interested– 3 - not interested– 4 - have– 5 - bitfield– 6 - request– 7 - piece– 8 - cancel

29

Page 30: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Protocol• bitfield – set of indices indicating what the peer

has• have – successful download and check of a

piece• request – send index and offset of data wanted• piece – index of and actual data of a piece• cancel – stop transmission

30

Page 31: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

The Protocol• Interested / not interested – indicates whether

a peer wants to start communicating with another peer

• Choke / unchoke – response from peer to interested party as to whether the connection will continue

• Used to manage the number of connections at any one time

31

Page 32: CS 4720 Peer to Peer Networking CS 4720 – Web & Mobile Systems

CS 4720

Snark• Build your own BT client• Or build it into your own app• How might you use BT in an app?• How might you use it in:

– An enterprise architecture?– A service-oriented architecture?

32