peerto peer networks
DESCRIPTION
TRANSCRIPT
P2P Networks
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
P2P networks
A set of technologies that enable the direct exchange of services of data or services between computers
S
C
C
C
C
CC
C
C
Client Server
P
P
P P
PP
P2P Network
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Network Effects: Promises & Challenges
Can Have the following advantage… …however
Scalability as there is no central resource to exhaust
Has to overcome to challenge of self organization from a collection of unreliable peers with unreliable connections
Aggregating resources can lead to excellent performance
Has to overcome the choking of the network of overhead or organizing messages
Fault resilience as there is no single point of failure
Has to overcome reliability challenges on account of network congestion, isolated networks, unreachable nodes
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Types of P2P Networks
P2P Systems
File Sharing CollaborationDistributed Computing
Napster Limewire (www.limewire.com)Aimster/Madster Gnutella (gnutella.com)Morpheus (morpheus.com)Chord
Instant MessagingGroove Multiplayer Games: Magi
SETI@home (http://setiathome.berkeley.edu/)Grid.org
File Sharing is the one we will delve into in this sessionSanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Locating Content in P2P networksCentralized Directory Approach
Flooded Request Approach Document routing Approach
Peers connect to a central directory where they publish information about the content that they have to share
When the directory receives a request it replies with a peer in the directory that matches the request
Criteria such as proximity, bandwidth, capacity, congestion, health, frequency can guide the decision
Peers broadcast a request to its directly connected peers, each of whom broadcast to their directly connected peers and so on thru the network.
This continues until the request is answered or some broadcast limit is reached.
Each peer has helpful but only partially complete referral information. Each referral moves the requester closer to a peer that can satisfy the query.
The network can scale with a number of central servers
Generates a lot of ineffective network traffic which prevents scaling
Can scale effectively as systems can complete a search within a bounded number of steps
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Napster – A quick history
• Jan 1999: Set up in Jan 1999 by Shawn Fanning (then 18)
• December 1999: sued for copyright infringement – file screening system wpreventing downloads of specified files put in place
• July 2001 : shut down file sharing service post court orders
• May 2002: purchased by German media conglomerate– Invested USD 85 million
• October 2003: Napster 2.0 a client server system goes live– Division of Roxio
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Case Studies
• Napster
• Gnutella & KaZaA
• BitTorrent
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Napster Protocol - Introduction
• Was not documented or published – reverse engineered by OpenNap (opennap.sourceforge.net)
• Uses the centralized directory model to locate content
• Communicates using TCP
• Does not use DNS to name peers: – uses nicknames <nick> (another client) and <mynick> (this client)
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Napster Protocol Session
Napster Server Client A Client B
Search Request
Search Response
Download Request
Download Ack
See next two slides for message structures
Establish TCP/IP connection
“1”
“GET”
Peer response
Song data
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Napster message structures: server - client
Client announcing to server the files it is willing to share
Code 100 – for this type of message <filename> <md5><size><bitrate><frequency><time>
The MD 5 algorithm identifies the song and ensures that two files have identical content
Client search request
Code 200 – for this type of message <filename> <artist name> <song> <max results><line speed><bitrate><frequency>
Server Search Response
Code 201 – for this message<filename><md5><size><length><nick> <ip><link type>
Download request
Code 203<nick> <file name>
Download ack
Code 204<nick> <ip><port><filename><md5><linespeed>
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Napster message structures: peer-peer
“1” Single ASCII characters
“GET” Not HTTP GET – this is the Napster application protocol
Peer response
<mynick> <file name><offset> - allows transfer to be resumed at any place in file
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella
• Jan 1998: Justin Frankel developed Winamp, an audio player– Then he founded Nullsoft
• May 1999: Winamp brand & services acquired by AOL
• Early 2000: Gnutella was developed in 14 days
• March 2000: a protoype was published under a GNU General Public License
• In hours (before AOL could react) the software had been downloaded several times
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella Protocol - Introduction
• Unlike Napster has no centralized service
• Uses the flooded request approach
• Software running in each Gnutella peer is called a servant
• Peers use TCP/IP to communicate with each other
• Servant software was developed by several companies: BearShare, LimeWire, ToadNode
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella Protocol – Finding a Servant
• Specialized hosts that cache IP addresses of servants are run by companies who develop Gnutella software
• Servant wishing to join the network contacts host cache servers and receive a list of prospective addresses
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella message structures: descriptors
The MD 5 algorithm identifies the song and ensures that two files have identical content
Descriptor ID Payload Descriptor
TTL – Time to Live
Hops Payload Length
Uniquely identifies this descriptor message in the network
Code identifying the type of message
Limits the maximum number of hops for this message
0xOO = Connection accept request 0x01 = pong Connect accept OK0x01 = push Push file thru firewall0x80=query File search request 0x81=queryhit Search response OK
Each servant receiving a message decrements TTL count and increments the
Hop count before the message is forwarded. The maximum number of hops
is 7.
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella Protocol Session
Servant 1 - Joining Servant 2 – On Network
Gnutella Connect
Gnutella OK
Ping
Pong<Ip address>,<port>,<shared data>
Query<filename>
Host Cache Server
Queryhit<filename>
File DownloadHTTP GET
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Gnutella Network Traffic
A
BD
EC
Each peer broadcasts requests to its connected peers and so on.
The Pong descriptors may only be sent along the same path that carried the incoming Ping descriptor
.mp3
.mp3
Get .mp3
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
KazaA
• Kazaa and FastTrack were created by Niklas Zennström, Janus Friis, and Priit Kasesalu (all of whom were to later invent Skype and later on still Joost).
• KazaA is owned by Sharman Networks, headquartered in Australia
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
KaZaA
• Based on Guntella
• Uses SuperNodes powerful processors with high bandwidth connections
• Peers connect to their local SuperNodes to upload information about files that they are sharing and to search
• Hybrid system between Napster and Gnutella with similarities to the DNS system
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
BitTorrent
• April 2001: Developed by Bram Cohen
• Become very popular
• CBC is first public broadcaster in North America to make a full show available for download by BitTorrent
• However, not free from controversy
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
BitTorrent - introduction
• Peers run the BitTorrent client which implements the BitTorrent protocol
• To share, the peer creates a metadata file called the torrent
• The torrent file is shared with the BitTorrent tracker, a server which assists
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
How BitTorrent works
• For distributing a data file– The peer treats the file as a number of identically-sized pieces. – Creates a checksum for each piece (using the SHA1 hashing algorithm) and records
it in the torrent file.– Peers that provide a complete file are called seeders
• For sharing files:– Users download and open a torrent of interest with a BitTorrent client. – The client connects to the tracker(s) specified in the torrent file and receives a list
of peers currently transferring pieces of the file(s) – The client connects to those peers to obtain the various pieces. Such a group of
peers connected to each other to share a torrent is called a swarm. • For efficiency:
– Download speed is controlled by Torrent tracking servers, who monitor all swarm users. I
– Swarm users who share are rewarded by increasing the alotted swarm bandwidth – Those who leech and limit sharing, tracking servers are choked– To help newcomers, where the client reserves a portion of its available bandwidth
for sending pieces to random peers Check sums ensure non corruption
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
BitTorrent: How it differs from HTTP
BitTorrent HTTP
Makes many small data requests over different TCP sockets
Typically a single HTTP GET request over a single TCP socket.
Downloads in a random or in a "rarest-first" approach
Downloads in a sequential manner.
Downloads can take time to rise to full speed because it may take time for enough peer connections to be established, and it takes time for a node to receive sufficient data to become an effective uploader
Rises to full speed very quickly and maintains this speed throughout.
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com
Summary
• Fascinating History
• Untapped potential
• The story’s not over yet.
Sanjoy Sanyal:www.itforintelligentfolks.blogspot.com