2: application layer 1 cmpt 371 data communications and networking chapter 2 application layer - 2

Post on 18-Jan-2018

231 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

2: Application Layer 3 Content Distribution r Problem of a single server m Bottleneck, single point of failure, … r Content Distribution m Distribute (Replicate) contents at different place m Direct requests to appropriate places

TRANSCRIPT

2: Application Layer 1

CMPT 371Data Communications

and Networking

Chapter 2Application Layer - 2

2: Application Layer 2

Chapter 2 outline 2.1 Principles of app layer protocols 2.2 Web and HTTP 2.3 FTP 2.4 Electronic Mail

SMTP, POP3, IMAP 2.5 DNS 2.6 Content distribution

Network Web caching Content distribution networks P2P file sharing

2: Application Layer 3

Content Distribution Problem of a single server

Bottleneck, single point of failure, …

Content Distribution Distribute (Replicate) contents at different

place Direct requests to appropriate places

2: Application Layer 4

Client-side Caching

2: Application Layer 5

Limit of Client-side Caching

Not shared !

2: Application Layer 6

Web caches (proxy server)

user sets browser: Web accesses via proxy

browser sends all HTTP requests to proxy object in cache: cache

returns object else cache requests

object from origin server, then returns object to client

Proxy: both client and server; typically installed by ISP (university, company, residential ISP)

Goal: satisfy client request without involving origin server

client

Proxyserver

client

HTTP request

HTTP request

HTTP response

HTTP response

HTTP request

HTTP response

origin server

origin server

2: Application Layer 7

More about Web cachingWhy Web caching? Reduce response time for client request. Reduce traffic on an institution’s access

link. Internet dense with caches enables

“poor” content providers to effectively deliver content

2: Application Layer 8

00.10.20.30.40.50.60.70.80.9

1

1 2 3 4 5 6 7 8 9 10

Obejct ID

Acc

ess

Prob

abili

ty

More about Web cachingWhy is caching effective for Web, even

if cache space is quite limited ? Zipf distribution

|,|,...,2,1,)/1(

)/1(||

1

S

jj

jp S

j

j

2: Application Layer 9

Caching example (1)Assumptions average object size = 100,000

bits avg. request rate from

institution’s browser to origin serves = 15/sec

delay from institutional router to any origin server and back to router = 2 sec

Consequences utilization on LAN = 15% utilization on access link = 100% ! total delay = Internet delay +

access delay + LAN delay = 2 sec + some minutes + some

milliseconds

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache

2: Application Layer 10

Caching example (2)Possible solution increase bandwidth of

access link to, say, 10 Mbps

Consequences (if 10 Mbps) utilization on LAN = 15% utilization on access link = 15% Total delay = Internet delay +

access delay + LAN delay = 2 sec + some msecs + some

msecs often a costly upgrade

originservers

public Internet

institutionalnetwork 10 Mbps LAN

upgraded from 1.5 to 10 Mbps

institutionalcache

2: Application Layer 11

Caching example (3)Install cache suppose hit rate is .4Consequence 40% requests will be satisfied

almost immediately 60% requests satisfied by

origin server utilization of access link

reduced to 60%, resulting in negligible delays (say 10 msec)

total delay = Internet delay + access delay + LAN delay

= 60%*2 sec + 40%*0.01 secs + some milliseconds < 1.3 secs

originservers

public Internet

institutionalnetwork 10 Mbps LAN

1.5 Mbps access link

institutionalcache

2: Application Layer 12

More about Web caching Problem of Web caching

Extra space/machine (proxy) Inconsistency (out-of-date

objects…)

2: Application Layer 13

Consistency of Cached Objects Solution 1: no caching

2: Application Layer 14

Consistency of Cached Objects Solution 2: Manually update

2: Application Layer 15

Conditional GET Goal: don’t send object if

client has up-to-date cached version

client: specify date of cached copy in HTTP requestIf-modified-since:

<date> server: response contains

no object if cached copy is up-to-date: HTTP/1.0 304 Not

Modified

client serverHTTP request msgIf-modified-since:

<date>

HTTP responseHTTP/1.0

304 Not Modified

object not

modified

HTTP request msgIf-modified-since:

<date>

HTTP responseHTTP/1.0 200 OK

<data>

object modified

2: Application Layer 16

Hierarchical cache

Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?

99.5%

How to measure cache ? Hit Ratio (HR)

client

ProxyServer 1

client

origin server

ProxyServer 2

2: Application Layer 17

Hierarchical cache

Calculation HR of Proxy 1 = 90% HR of Proxy 2 = 95% Independent Joint HR = ?

How about average delay?

How to measure cache ? Hit Ratio (HR)

client

ProxyServer 1

client

origin server

ProxyServer 2

2: Application Layer 18

Content distribution networks (CDNs)

Ping www.Microsoft.com www.Netflix.com www.ibm.com www.apple.com

What did you see ?

origin server in North America

CDN distribution node

CDN serverin S. America CDN server

in Europe

CDN serverin Asia

2: Application Layer 19

Content distribution networks (CDNs)

Content replication CDN company (e.g., Akamai)

installs hundreds of CDN servers throughout Internet in lower-tier ISPs, close to

users Content providers (e.g.,

Netflix) are the CDN company’s customers.

CDN replicates its customers’ content in CDN servers. When provider updates content, CDN updates servers

origin server in North America

CDN distribution node

CDN serverin S. America CDN server

in Europe

CDN serverin Asia

2: Application Layer 20

CDN example

origin server www.foo.com distributes HTML Replaces: http://www.foo.com/sports.ruth.gif

with http://www.cdn.com/www.foo.com/sports/ruth.gif

HTTP request for www.foo.com/sports/sports.html

DNS query for www.cdn.com

HTTP request for www.cdn.com/www.foo.com/sports/ruth.gif

1

2

3

Origin server

CDNs authoritative DNS server

NearbyCDN server

CDN company cdn.com distributes gif files uses its authoritative

DNS server to route redirect requests

2: Application Layer 21

More about CDNsrouting requests CDN creates a “map”,

indicating distances from leaf ISPs and CDN nodes

when query arrives at authoritative DNS server: server determines ISP

from which query originates

uses “map” to determine best CDN server

Caching vs. CDN Pull: passive Push: active

2: Application Layer 22

Client-server architectureserver:

always-on host permanent IP address server farms for

scalingclients:

communicate with server may be intermittently

connected may have dynamic IP

addresses do not communicate

directly with each other

client/server

2: Application Layer 23

Pure P2P architecture no always-on server arbitrary end systems

directly communicate peers are intermittently

connected and change IP addresses self scalability – new peers

bring new resources Three topics: File distribution Searching for information Case Study: BitTorrent, Skype

peer-peer

2: Application Layer 24

P2P file sharingExample Alice runs P2P client

application on her notebook computer Intermittently

connects to Internet Asks for “X.mp3” Application displays

other peers that have copy of X.mp3.

Alice chooses one of the peers, Bob.

File is copied from Bob’s PC to Alice’s notebook: HTTP

While Alice downloads, other users uploading from Alice. Alice’s peer is both a

Web client and a transient Web server

All peers are servers = highly scalable!

2: Application Layer 25

File Distribution: Server-Client vs P2PQuestion : How much time to distribute file

from one server to N peers?

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

File, size F

us: server upload bandwidthui: peer i upload bandwidthdi: peer i download bandwidth

2: Application Layer 26

File distribution time: server-client

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F Server transmission:

must sequentially sends (upload) N copies: NF/us time

Client: each must download the file client i takes F/di time to

download

increases linearly in N(for large N)

= Dcs = max { NF/us, F/min(di) }i

Time to distribute F to N clients using

client/server approach

2: Application Layer 27

File distribution time: P2P

us

u2d1 d2u1

uN

dN

Server

Network (with abundant bandwidth)

F Server transmission:

must upload at least one copy: F/us time

Client: each must down a copy client i takes F/di time to download but also share (upload)

Clients (peers): as a whole must download NF bits fastest possible overall download

rate: us + ui

DP2P = max { F/us, F/min(di) , NF/(us + ui) }i

increases linearly in N

(for large N) ?

2: Application Layer 28

0

0.5

1

1.5

2

2.5

3

3.5

0 5 10 15 20 25 30 35

N

Min

imum

Dis

tribu

tion

Tim

e P2PClient-Server

Server-client vs. P2P: exampleClient upload rate = u, F/u = 1 hour, us = 10u, dmin ≥ us

2: Application Layer 29

File distribution: BitTorrent

tracker: tracks peers participating in torrent

torrent: group of peers exchanging chunks of a file

obtain listof peers

trading chunks

peer

P2P file distribution

Alice arrives …… obtains listof peers from tracker… and begins exchanging file chunks with peers in torrent

2: Application Layer 30

BitTorrent (1) file divided into 256KB chunks. peer joining torrent:

has no chunks, but will accumulate them over time

registers with tracker to get list of peers, connects to subset of peers (“neighbors”)

while downloading, peer uploads chunks to other peers.

peers may come and go: churn once peer has entire file, it may (selfishly) leave or

(altruistically) remain

2: Application Layer 31

BitTorrent (2)Requesting Chunks at any given time,

different peers have different subsets of file chunks

periodically, a peer (Alice) asks each neighbor for list of chunks that they have.

Alice sends requests for her missing chunks rarest first

Sending Chunks: tit-for-tat Alice sends chunks to four

neighbors currently sending her chunks at the highest rate re-evaluate top 4 every

10 secs every 30 secs: randomly

select another peer, starts sending chunks newly chosen peer may

join top 4 “optimistically unchoke”

2: Application Layer 32

BitTorrent: Tit-for-tat(1) Alice “optimistically unchokes” Bob

(2) Alice becomes one of Bob’s top-four providers; Bob reciprocates(3) Bob becomes one of Alice’s top-four providers

With higher upload rate, can find better trading partners & get file faster!

2: Application Layer 33

P2P Case study: Skype inherently P2P: pairs

of users communicate. proprietary

application-layer protocol (inferred via reverse engineering)

hierarchical overlay with SNs

Index maps usernames to IP addresses; distributed over SNs

Skype clients (SC)

Supernode (SN)

Skype login server

2: Application Layer 34

Peers as relays Problem when both

Alice and Bob are behind “NATs”. NAT prevents an outside

peer from initiating a call to insider peer (see later)

Solution: Using Alice’s and Bob’s

SNs, Relay is chosen Each peer initiates

session with relay. Peers can now

communicate through NATs via relay

2: Application Layer 35

P2P: searching for information

Index in P2P system: maps information to peer location(location = IP address & port number)

So many files But, where are they ?

2: Application Layer 36

P2P: centralized directoryoriginal “Napster”

design1) when peer connects,

it informs central server: IP address content

2) Alice queries for “X.mp3”

3) Alice requests file from Bob

centralizeddirectory server

peers

Alice

Bob

1

1

1

12

3

2: Application Layer 37

P2P: problems with centralized directory Single point of failure Performance

bottleneck Copyright

infringement

file transfer is decentralized, but locating content is highly centralized

2: Application Layer 38

P2P: decentralized directory Each peer is either a

group leader or assigned to a group leader.

Group leader tracks the content in all its children.

Peer queries group leader; group leader may query other group leaders.

ord inary peer

group-leader peer

neighoring re la tionshipsin overlay network

2: Application Layer 39

More about decentralized directory

advantages of approach no centralized directory server

location service distributed over peers more difficult to shut down

disadvantages of approach bootstrap node needed group leaders can get overloaded

2: Application Layer 40

P2P: Query flooding Gnutella no hierarchy use bootstrap node to

learn about others join message

Send query to neighbors Neighbors forward query If queried peer has

object, it sends message back to querying peer

join

2: Application Layer 41

P2P: more on query floodingPros peers have similar

responsibilities: no group leaders

highly decentralized no peer maintains

directory info

Cons excessive query

traffic query radius: may

not have content when present

bootstrap node maintenance of

overlay network

2: Application Layer 42

DHT: A New Story… Motivation:

Frustrated by popularity of all these “half-baked” P2P apps

We can do better! Guaranteed lookup success for files in system Provable bounds on search time Provable scalability to millions of node

2: Application Layer 43

P2P: Content Addressing (Hash Routing)

Hash routing Given an object identifier I, calculate its hash value

H=hash(I), and (hopefully) find it (or its location info) in peer H

Not a new idea Load balancing – hash IP address, re-direct to different

servers

hash table

applicationget (key) data

node node node….

put(key, data)

2: Application Layer 44

Hash Routing Two alternatives

Node can cache each (existing) object that hashes within its range

Pointer-based: level of indirection - node caches pointer to location(s) of object

What’s new in P2P? Dynamic overlay

• peer join/leave• number of peers is not fixed

Traditional hash function doesn’t work• SHA-1

0-9999500-9999

1000-19991500-4999

9000-9500

4500-6999

8000-8999 7000-8500

2: Application Layer 45

Distributed Hash Table (DHT)Challenges For each object, node(s) whose range(s) cover that

object must be reachable via a “short” path # neighbors for each node should scale well (e.g.,

should not be O(N)) Fully distributed (no centralized bottleneck/single

point of failure) DHT mechanism should gracefully handle nodes

joining/leaving need to repartition the range space over existing

nodes need to reorganize neighbor set need bootstrap mechanism to connect new nodes into

the existing DHT infrastructure

2: Application Layer 46

Case Studies Structure overlay (p2p) systems – Consistent Hashing

Chord CAN (Content Addressable Network)

Key Questions Q1: How is hash space divided “evenly” among existing

nodes? Q2: How is routing implemented that connects an arbitrary

node to the node responsible for a given object? Q3: How is the hash space repartitioned when nodes

join/leave? Let N be the number of nodes in the overlay Let H be the size of the range of the hash function

(when applicable)

2: Application Layer 47

Chord Associate to each node and file a unique id in

an uni-dimensional space (a Ring) E.g., pick from the range [0...2m-1] Usually the hash of the file or IP address

Properties: Routing table size is O(log N) , where N is the total

number of nodes Guarantees that a file is found in O(log N) hops

from MIT in 2001

2: Application Layer 48

Consistent Hashing

N32

N90

N105

K80

K20

K5

Circular ID space

Key 5Node 105

A key is stored at its successor: node with next higher ID (Key – Hashed value of a file identifier)

2: Application Layer 49

Chord Basic Lookup

N32

N90

N105

N60

N10N120

K80

“ Where is key 80?”

“ N90 has K80”

2: Application Layer 50

Chord “Finger Table”

N80

1/21/4

1/8

1/161/321/641/128

Entry i in the finger table of node n is the first node that succeeds or equals n + 2i

In other words, the ith finger points 1/2n-i way around the ring

2: Application Layer 51

Chord Join Assume a hash space [0..7]

Node n1 joins0

1

2

34

5

6

7i id+2i succ0 2 11 3 12 5 1

Succ. Table

2: Application Layer 52

Chord Join

Node n2 joins0

1

2

34

5

6

7i id+2i succ0 2 21 3 12 5 1

Succ. Table

i id+2i succ0 3 11 4 12 6 1

Succ. Table

2: Application Layer 53

Chord Join

Nodes n0, n6 join 0

1

2

34

5

6

7i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

i id+2i succ0 7 01 0 02 2 2

Succ. Table

2: Application Layer 54

Chord Join

Nodes: n1, n2, n0, n6

Keys: f7, f1

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

7

Key1

Key

i id+2i succ0 7 01 0 02 2 2

Succ. Table

2: Application Layer 55

Chord Routing Upon receiving a query for

file id, a node first calculates the key (Hash id)

Checks whether stores the key locally

If not, forwards the query to the largest node in its successor table that does not exceed the key

01

2

34

5

6

7 i id+2i succ0 2 21 3 62 5 6

Succ. Table

i id+2i succ0 3 61 4 62 6 6

Succ. Table

i id+2i succ0 1 11 2 22 4 6

Succ. Table

7

Key1

Key

i id+2i succ0 7 01 0 02 2 2

Succ. Table

query(7)

2: Application Layer 56

Chord Summary

Routing table size?Log N fingers

Routing time?Each hop expects to 1/2 the distance to the

desired key => expect O(log N) hops.

Note: so far only the basic Chord; many practical issues remain (not covered in this course though …)

2: Application Layer 57

A few words about BitCoin (and other digital/virtual currency) Two key issues for a currency

Generation (where does it come from ?) Distribution (how to use it, i.e., buy/sell

transactions?) BitCoin – open source p2p currency

Mining (hashing) Verification

2: Application Layer 58

Chapter 2: Summary

application service requirements: reliability, bandwidth,

delay client-server paradigm Internet transport

service model connection-oriented,

reliable: TCP unreliable, datagrams:

UDP

Our study of network apps now complete!

specific protocols: HTTP FTP SMTP, POP, IMAP DNS

content distribution caches, CDNs P2P

2: Application Layer 59

Chapter 2: Summary

typical request/reply message exchange: client requests info or

service server responds with

data, status code message formats:

headers: fields giving info about data

data: info being communicated

More importantly: learned about protocols

control vs. data msgs in-band, out-of-band

centralized vs. decentralized

stateless vs. stateful reliable vs. unreliable msg

transfer “complexity at network

edge” – many protocols security: authentication

top related