peer to peer (1). references chapter 2.9 of kurose and ross papers oopennap: open source napster...
TRANSCRIPT
Peer to Peer (1)
References
Chapter 2.9 of Kurose and Ross Papers
o OpenNap: Open Source Napster Server o J. Liang, R. Kumar and K. Ross, Understanding KaZaA
Acknowledgements: Many of the figures are from other presentations especially from the original authors.
Client-Server Model
Let’s look at the Client-Server modelServers are centrally maintained and administeredClient has fewer computing resources than a serverThis is the way the web worksNo interaction between clients
Client Server Model
Disadvantages of the client-server modelo Reliability
—The network depends on a possibly highly loaded server to function properly.
—Server needs to be replicated to some extent to provide better reliability.
o Scalability—More users imply more demand for computing
power, storage space and bandwidth
Peer-to-Peer Model
All nodes have same functional capabilities and responsibilitiesNo reliance on central services or resources.A node acts as both as a “server” and client.Considered more scalable
Peer-to-Peer Model
Peer-to-peer systems provide access to information resources located on computers throughout network.
Algorithms for the placement and subsequent retrieval of information is a key aspect of the design of P2P systems.
Why P2P?
The Internet has three valuable fundamental assetso Informationo Computing resources o Bandwidth
All of which are vastly under utilized,partly due to the traditional client-server model
Why P2P?
No single search engine can locate and catalog the ever-increasing amount of information on the Web in a timely way
Moreover, a huge amount of information is transient and not subject to capture by techniques such as Web crawlingo Google claims that it searches about 1.3x108
web pageso Finding useful information in real time is
increasingly difficult!
Why P2P?
Although miles of new fiber have been installed, the new bandwidth gets little use if everyone goes to Yahoo for content and to eBay
Instead, hot spots just get hotter while cold pipes remain cold
This is partly why most people still feel the congestion over the Internet while a single fiber’s bandwidth has increased by a factor of 10 6 since 1975, doubling every 16 months
Why P2P?
P2P potentially can eliminating the single-source bottleneck
P2P can be used to distribute data and control and load-balance requests across the Net
P2P potentially eliminates the risk of a single point of failure
P2P infrastructure allows direct access and shared space, and this can enable remote maintenance capability
Brief History
Generation 1 of P2P Systemso Napster music exchange
Generation 2o Freenet, Gnuetella, Kazaa, BitTorrent
Generation 3o Characterized by the emergence of
middleware layers for the application-independent management of distributed resources on a global scale
—Pastry, Tapestry, Chord, Kademlia
Environment Characteristics for Peer-to-Peer Systems
Unreliable environments Peers connecting/disconnecting – network
failures to participation Random Failures e.g. power outages,
cable and DSL failures, hackers Personal machines are much more
vulnerable than servers
Evaluating Peer-to-Peer Systems
A node’s database:o What does a node need to save in order to operate
properly/efficiently Success rate (if the file is in the network, what
are the changes that a search will find it) Lookup cost:
o Timeo Communication (bandwidth usage)
Join/departure cost Fault Tolerance – Resilience to faults Resilience to denial of service attacks, security.
Issues in File Sharing Services
Publish – How to insert a new file into the network
Lookup – Find a specific file Retrieval – Getting a copy of a file
P2P File Sharing Software
Allows a user to open up a directory in their file systemo Anyone can retrieve a file from directoryo Like a Web server
Allows the user to copy files from other users’ open directories:o Like a Web client
Allows users to search nodes for content based on keyword matches:o Like Google
Napster: How Did It Work
Application-level, client-server protocol over point-to-point TCP
Centralized directory server Steps:
o Connect to Napster servero Give server keywords to search the full list with.o Select “best” of correct answers.
—One approach is select based on the response time of a pings.
– Shortest response time is chosen.
Napster: How Did It Work
File list and IP address is uploaded
1.napster.com centralized directory
Napster: How Did It Work
napster.com centralized directory
Queryand
results
User requests search at server.
2.
Napster: How Did It Work
pingspings
User pings hosts that apparently have data.
Looks for best transfer rate.
3.napster.com centralized directory
Napster: How Did It Work
napster.com centralized directory
Retrievesfile
User choosesserver
4.
Napster’s centralized server farm had difficult time keeping up with traffic
Napster
There are centralized indexes but users supplied the files which were stored and accessed on their personal computer
Napster became very popular for music exchange
Napster
History: 5/99: Shawn Fanning (freshman,
Northeasten U.) founds Napster Online music service
12/99: first lawsuit 3/00: 25% UWisc traffic Napster 2/01: US Circuit Court of
Appeals: Napster knew users violating copyright laws
7/01: # simultaneous online users:Napster 160K, Gnutella: 40K,
Morpheus (KaZaA): 300K
Napster
Judge orders Napster to pull plug in July ‘01
Other file sharing apps take over!
gnutellanapsterfastrack (KaZaA)
8M
6M
4M
2M
0.0bit
s per
sec
Napster’s Downfall
Napster’s developers argued they were not liable for infringement of the copyrightso Why? They were not participating in the copying
process which was performed entirely between users’ machines.
This argument was not accepted by the courtso Why? The index servers were deemed an essential
part of the process Since the index servers were located at well-
known addresses, their operators were unable to remain anonymous.o Makes for an easy lawsuit target
Napster’s Downfall
A more fully distributed file sharing service spreads the responsibility across all of the users o Makes the pursuit of legal remedies difficult
Napster: Discussion
Locates files quickly Vulnerable to censorship and technical
failure Popular data become less accessible
because of the load of the requests on a central server
People started to look for more distributed solutions to file-sharing as a result of Napster’s failure.
Gnutella
Napster’s legal problems motivated Gnutella where there is not a use of centralized indexes
The focus is on a decentralized method of searching for fileso Central directory server no longer the bottlenecko More difficult to “pull plug”
Each application instance serves to:o Store selected fileso Route queries from and to its neighboring peerso Respond to queries if file stored locallyo Serve files
Gnutella
Gnutella history:o 3/14/00: release by AOL, almost immediately withdrawno Became open sourceo Many iterations to fix poor initial design (poor design
turned many people off) Issues:
o How much traffic does one query generate?o How many hosts can it support at once?o What is the latency associated with querying?o Is there a bottleneck?
Gnutella: Searching
Searching by flooding:• A Query packet might ask, "Do you have any content
that matches the string ‘Homer"? o If a node does not have the requested file, then 7
(default set by Gnutella) of its neighbors are queried. o If the neighbors do not have it, they contact 7 of
their neighbors.o Maximum hop count: 10 (this is called time-to-live
TTL)o Reverse path forwarding for responses (not files)
Gnutella: Searching
Downloading• Peers respond with a “QueryHit” (contains contact info)• File transfers use direct connection using HTTP
protocol’s GET method • When there is a firewall a "Push" packet is used –
reroutes via Push path
Gnutella: Searching
Gnutella: Searching
Gnutella: Discovering Peers
A peer has to know at least one other peer to send requests to.
Addresses of some peers have been published on a website.
When a peer enters the network, it contacts a designated peer and receives a list of other peers that have recently entered the network.
Gnutella: Discussion
Robust: The failure of peer is not a failure of Gnutella.
Performance: Flooding leads to poor performance
Free riders: Those who get data but do not share data.
The model of Gnutella just presented was found not be workable.
This led to models which had some peer nodes having indexes.
KaZaA: The Service
More than 3 million up peers sharing over 3,000 terabytes of content
More popular than Napster ever was More than 50% of Internet traffic ? MP3s & entire albums, videos, games Optional parallel downloading of files Automatically switches to new download server
when current server becomes unavailable Provides estimated download times
KaZaA: The Service
A user can configure the maximum number of simultaneous uploads and maximum number of simultaneous downloads
Queue management at server and cliento Frequent uploaders can get priority in server queue
Keyword searcho User can configure “up to x” responses to keywords
Responses to keyword queries come in waves; stops when x responses are found
KaZaA: The Technology
Proprietary Control data encrypted Everything in HTTP request and response
messages
KaZaA: Architecture
Each peer is either a supernode or is assigned to a supernode
o 56 min avg connecto Each SN has about 100-
150 childreno Roughly 30,000 SNs
Each supernode has TCP connections with 30-50 supernodes
o 23 min avg connect
supernodes
KaZaA: Architecture
Nodes that have more connection bandwidth and are more available are designated as supernodes
Each supernode acts as a mini-Napster hub, tracking the content and IP addresses of its descendants
A supernode tracks only the content of its children.
Considered a cross between Napster and Gnutella
KaZaA: Finding Supernodes
List of potential supernodes included within software download
New peer goes through list until it finds operational supernodeo Connects, obtains more up-to-date list, with 200 entrieso Nodes in list are “close” to ON.o Node then pings 5 nodes on list and connects with the
one If supernode goes down, node obtains updated
list and chooses new supernode
KaZaA Queries
Node first sends query to supernodeo Supernode responds with matcheso If x matches found, done.
Otherwise, supernode forwards query to subset of supernodeso If total of x matches found, done.
Otherwise, query further forwardedo Probably by original supernode rather than
recursively
Bootstrapping
How do I find out about a peer to begin with?
Use a bootstrapping (or multiple bootstrapping nodes).
Summary
The use of centralized indexes in Napster lands you in legal woes
The use of Gnutella avoids legal woes but is painfully slow.
Kazaa is somewhere in between. Can we do better?