1 midterm review ee122 fall 2011 scott shenker ee122/ materials with thanks to jennifer rexford, ion...
Post on 21-Dec-2015
218 views
TRANSCRIPT
1
Midterm Review
EE122 Fall 2011
Scott Shenker
http://inst.eecs.berkeley.edu/~ee122/
Materials with thanks to Jennifer Rexford, Ion Stoica, Vern Paxsonand other colleagues at Princeton and UC Berkeley
Announcements
• Available after class
• I hate these review lectures….
2
Agenda
• Finish Web caching
• Midterm review
3
Finishing Up Web Caching
4
5
Improving HTTP Performance:
Caching• Many clients transfer same information
– Generates redundant server and network load– Clients experience unnecessary latency
Server
Clients
Backbone ISP
ISP-1 ISP-2
6
Improving HTTP Performance:
Caching: How• Response header:
– Expires – how long it’s safe to cache the resource– No-cache – ignore all caches; always get resource
directly from server
• If entry has not expired, cache returns it– Otherwise, it issues an if-modified-since
• Modifier to GET requests:– If-modified-since – returns “not modified” if
resource not modified since specified time
7
Improving HTTP Performance:
Caching: Why• Motive for placing content closer to client:
– User gets better response time– Content providers get happier users
o Time is money, really!– Network gets reduced load
• How well does caching work?– Very well, up to a limit– Large overlap in content– But many unique requests– sound familiar?
8
Improving HTTP Performance:
Caching with Reverse Proxies
Cache documents close to server decrease server load
• Typically done by content providers• Only works for static content
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
9
Improving HTTP Performance:
Caching with Forward Proxies
Cache documents close to clients reduce network traffic and decrease latency
• Typically done by ISPs or corporate LANs
Clients
Backbone ISP
ISP-1 ISP-2
Server
Reverse proxies
Forward proxies
10
Improving HTTP Performance:
Caching w/ Content Distribution Networks
• Integrate forward and reverse caching functionality– One overlay network (usually) administered by one entity– e.g., Akamai
• Provide document caching– Pull: Direct result of clients’ requests – Push: Expectation of high access rate
• Also do some processing– Handle dynamic web pages– Transcoding
11
Improving HTTP Performance:
Caching with CDNs (cont.)
Clients
ISP-1
Server
Forward proxies
Backbone ISP
ISP-2
CDN
12
Improving HTTP Performance:
CDN Example – Akamai
• Akamai creates new domain names for each client content provider.– e.g., a128.g.akamai.net
• The CDN’s DNS servers are authoritative for the new domains
• The client content provider modifies its content so that embedded URLs reference the new domains.– “Akamaize” content– e.g.: http://www.cnn.com/image-of-the-day.gif becomes
http://a128.g.akamai.net/image-of-the-day.gif
• Requests now sent to CDN’s infrastructure…
13
Hosting: Multiple Sites Per Machine
• Multiple Web sites on a single machine– Hosting company runs the Web server on behalf of
multiple sites (e.g., www.foo.com and www.bar.com)
• Problem: GET /index.html– www.foo.com/index.html or www.bar.com/index.html?
• Solutions:– Multiple server processes on the same machine
o Have a separate IP address (or port) for each server
– Include site name in HTTP requesto Single Web server process with a single IP addresso Client includes “Host” header (e.g., Host: www.foo.com)o Required header with HTTP/1.1
14
Hosting: Multiple Machines Per Site
• Replicate popular Web site across many machines– Helps to handle the load– Places content closer to clients
• Helps when content isn’t cacheable
• Problem: Want to direct client to particular replica– Balance load across server replicas– Pair clients with nearby servers
15
Multi-Hosting at Single Location
• Single IP address, multiple machines– Run multiple machines behind a single IP address
– Ensure all packets from a single TCP connection go to the same replica
Load Balancer
64.236.16.20
16
Multi-Hosting at Several Locations
• Multiple addresses, multiple machines– Same name but different addresses for all of the replicas– Configure DNS server to return different addresses
Internet 64.236.16.20
173.72.54.131
12.1.1.1
Midterm Review
17
My General Philosophy on Tests
• I am not a sadist
• I am not a masochist
• For those of you who only read the slides at home:– If you don’t attend lectures, then it is your own damn
fault if you missed something….
• I believe in testing your understanding of the basics, not tripping you up on tiny details or making you calculate pi to 15 decimal places
18
General Guidelines
• Know the basics well, rather than focus on details– Study lecture notes and problem sets– Remember: you can use a crib sheet…..10pt font
• Read text only for general context and to learn certain details
• Just because I didn’t cover it in review doesn’t mean you don’t need to know it!
• Get plenty of sleep19
Things You Don’t Need to Know
• The details of how to fragment packets
• The details of any protocol header– Know semantics, but not syntax
• Any details of DNS, HTTP (thank Ganesh)– Just know that when you access a web page, you do a
DNS request and then an HTTP request
– DNS request, DNS reply, SYN, SYNACK, ACK, HTTP Request, HTTP Reply, FIN, FINACK, ACK
20
First half of course: Basics
• General background (3 lectures)– Basic design principles
• Idealized view of network (4 lectures)– Routing– Reliability
• Making this vision real (5 lectures)– IP, TCP, DNS, Web– Emphasize concepts, but deal with unpleasant realities
21
22
General Background
23
Overview of the Internet
• The Internet is a large complicated system that must meet an unprecedented variety of challenges– Scale, dynamic range, diversity, ad hoc, failures,
asynchrony, malice, and greed
• An amazing feat of engineering– Went against the conventional wisdom– Created a new networking paradigm
• In hindsight, some aspects of design are terrible– Will revisit when we do the clean slate design– But enormity of genius far outweighs the oversights
Internet’s Five Basic Design Decisions
1. Packet-switching
2. Best-effort service model
3. A single internetworking layer
4. Layering
5. The end-to-end principle (and fate-sharing)
24
25
Packet-Switching vs. Circuit-Switching
• Reliability advantage: since routers don’t know about individual conversations, when a router or link fails, it iseasy to fail over to a different path
• Efficiency advantage of packet-switching over circuit switching: Exploitation of statistical multiplexing
• Deployability advantage: easier for different parties to link their networks together because they’re not promising to reserve resources for one another
• Disadvantage: packet-switching must handle congestion– More complex routers (more buffering, sophisticated dropping)– Harder to provide good network services (e.g., delay and
bandwidth guarantees)
What service should Internet support?
• Strict delay bounds?– Some applications require them
• Guaranteed delivery?– Some applications are sensitive to packet drops
• No applications mind getting good service– Why not require Internet support these guarantees?
26
Important life lessons
• People (applications) don’t always need what they think they need
• People (applications) don’t always need what we think they need
• Flexibility often more important than performance– But typically only in hindsight!– Example: cell phones vs landlines
• Architect for flexibility, engineer for performance27
Applying lessons to Internet
• Requiring performance guarantees would limit variety of networks that could attach to Internet
• Many applications don’t need these guarantees
• And those that do? – Well, they don’t either (usually)– Tremendous ability to mask drops, delays
• And ISPs can work hard to deliver good service without changing the architecture
28
29
Kahn’s Rules for Interconnection
• Each network is independent and must not be required to change (why?)
• Best-effort communication (why?)
• Boxes (routers) connect networks
• No global control at operations level (why?)
Tasks in Networking (bottom up)
• Electrons on wire
• Bits on wire
• Packets on wire
• Deliver packets across local network– Local addresses
• Deliver packets across country– Global addresses
• Ensure that packets get there
• Do something with the data
30
Resulting Layers
• Electrons on wire (contained in next layer)
• Bits on wire (Physical)
• Packets on wire (contained in next layer)
• Deliver packets across local network (Link)– Local addresses
• Deliver packets across country (Internetwork)– Global addresses
• Ensure that packets get there (Transport)
• Do something with the data (Application)
31
Decisions and Their Principles
• How to break system into modules– Dictated by Layering
• Where modules are implemented– Dictated by End-to-End Principle
• Where state is stored– Dictated by Fate-Sharing
32
33
Who Does What?
• Five layers– Lower three layers implemented everywhere– Top two layers implemented only at hosts– What is top layer of router doing?
Transport
Network
Datalink
Physical
Transport
Network
Datalink
Physical
Network
Datalink
Physical
Application Application
Host A Host BRouter
What about switches?
34
Layer Encapsulation
Trans: Connection ID
Net: Source/Dest
Link: Src/Dest
Appl: Get index.html
User A User B
Common case: 20 bytes TCP header + 20 bytes IP header + 14 bytes Ethernet header = 54 bytes overhead
35
Pontifications….
General Rules of System Design
• System not scalable?– Add hierarchy– DNS, IP addressing
• System not flexible?– Add layer of indirection– DNS names (rather than using IP addresses as names)
• System not performing well?– Add caches– Web and DNS caching
36
The Paradox of Internet Traffic
• The majority of flows are short– A few packets
• The majority of bytes are in long flows– MB or more
• And this trend is accelerating…
37
A Common Pattern…..
• Distributions of various metrics (file lengths, access patterns, etc.) often have two properties:– Large fraction of total metric in the top 10%– Sizable fraction (~10%) of total fraction in low values
• Not an exponential distribution– Large fraction is in top 10%– But low values have very little of overall total
• Lesson: have to pay attention to both ends of dist.
38
39
Fundamental Tasks:Routing and Reliability
40
Routing
“Valid” Routing State
• Global routing state is “valid” if it produces forwarding decisions that always deliver packets to their destinations– Valid is my terminology, not standard
• Goal of routing protocols: compute valid state– But how can you tell if routing state if valid?
41
Necessary and Sufficient Condition
• Global routing state is valid if and only if:– There are no dead ends (other than destination)– There are no loops
42
How Can You Avoid Loops?
• Restrict topology to spanning tree– If the topology has no loops, packets can’t loop!
• Computation over entire graph– Can make sure no loops– Link-State
• Minimizing metric in distributed computation– Loops are never the solution to a minimization problem– Distance vector
• Won’t review LS/DV, but will review learning switch
43
Easiest Way to Avoid Loops
• Use a topology where loops are impossible!
• Take arbitrary topology
• Build spanning tree (algorithm covered later)– Ignore all other links (as before)
• Only one path to destinations on spanning trees
• Use “learning switches” to discover these paths– No need to compute routes, just observe them
44
A Spanning Tree
45
Flooding on a Spanning Tree
• If you want to send a packet that will reach all nodes, then switches can use the following rule:– Ignoring all ports not on spanning tree!
• Originating switch sends “flood” packet out all ports
• When a “flood” packet arrives on one incoming port, send it out all other ports
• This works because the lack of loops prevents the flooding from cycling back on itself
• Eventually all nodes will be covered, exactly once46
Flooding on Spanning Tree
47
This Enables Learning!
• There is only one path from source to destination
• Each switch can learn how to reach a another node by remembering where its flooding packets came from!
• If flood packet from Node A entered switch from port 4, then to reach Node A, switch sends packets out port 4
48
Learning from Flood Packets
49
Node A
Node A can be reached through this port
Node A can be reached through this port
Once a node has sent a flood message, all other switches know how to reach it….
50
Self-Learning Switch
When a packet arrives
• Inspect source ID, associate with incoming port
• Store mapping in the switch table
• Use time-to-live field to eventually forget mapping
A
B
C
D
Packet tells switch how to reach A.
51
Self Learning: Handling Misses
When packet arrives with unfamiliar destination
• Forward packet out all other ports
• Response will teach switch about that destination
A
B
C
D
When in doubt, shout!
52
General Rule
When switch receives a packet:
index the switch table using destination ID
if entry found for destination {
if dest on port from which packet arrived then drop packet
else forward packet on port indicated
} else flood
forward on all but the interface on which the frame arrived
Why do this?
Reliability Correctness Condition
• Packet is always resent if the previous transmission was lost or corrupted.
• Packet may be resent at other times.– Need not specify this portion…
• All the rest is just implementing this invariant
53
54
Core of Real Architecture
Addressing, Forwarding, TCP, DNS, Web
What Tasks Do We Need to Do?
• Read packet correctly
• Get packet to the destination
• Get responses to the packet back to source
• Carry data
• Tell host what to do with packet once arrived
• Specify any special network handling of the packet
• Deal with problems that arise along the path
55
Dealing with Problems
• Is packet caught in loop? – TTL
• Header Corrupted: – Detect with Checksum– What about payload checksum?
• Packet too large? – Deal with fragmentation– Split packet apart– Keep track of how to put together
56
IP Packet Structure
4-bitVersion
4-bitHeaderLength
8-bitType of Service
(TOS)16-bit Total Length (Bytes)
16-bit Identification3-bitFlags 13-bit Fragment Offset
8-bit Time to Live (TTL) 8-bit Protocol 16-bit Header Checksum
32-bit Source IP Address
32-bit Destination IP Address
Options (if any)
Payload
IPv4 and IPv6 Header Comparison
Version IHL Type of Service Total Length
Identification Flags Fragment Offset
Time to Live Protocol Header Checksum
Source Address
Destination Address
Options Padding
Version Traffic Class Flow Label
Payload Length Next Header Hop Limit
Source Address
Destination Address
IPv4 IPv6
Field name kept from IPv4 to IPv6
Fields not kept in IPv6
Name & position changed in IPv6
New field in IPv6
Summary of Changes
• Eliminated fragmentation (why?)
• Eliminated header length (why?)
• Eliminated checksum (why?)
• New options mechanism (next header) (why?)
• Expanded addresses (why?)
• Added Flow Label (why?)
59
Philosophy of Changes
• Don’t deal with problems: leave to ends– Eliminated fragmentation– Eliminated checksum– Why retain TTL?
• Simplify handling:– New options mechanism (uses next header approach)– Eliminated header length
o Why couldn’t IPv4 do this?
• Provide general flow label for packet– Not tied to semantics– Provides great flexibility
60
Comparison of Design Philosophy
Version IHL Type of Service Total Length
Identification Flags Fragment Offset
Time to Live Protocol Header Checksum
Source Address
Destination Address
Options Padding
Version Traffic Class Flow Label
Payload Length Next Header Hop Limit
Source Address
Destination Address
IPv4 IPv6
To Destination and Back (expanded)
Deal with Problems (greatly reduced)
Read Correctly (reduced)
Special Handling (similar)
Original Internet Addresses
• First eight bits: network address (/8)
• Last 24 bits: host address
Assumed 256 networks were more than enough!
62
63
Next Design: Classful Addressing
– Class A: if first byte in [0..127] assume /8 (top bit = 0)
o Very large blocks (e.g., MIT has 18.0.0.0/8)
– Class B: first byte in [128..191] assume /16 (top bits = 10)
o Large blocks (e.g,. UCB has 128.32.0.0/16)
– Class C: [192..223] assume /24 (top bits = 110)
o Small blocks (e.g., ICIR has 192.150.187.0/24)o (My house used to have a /25)
0******* ******** ******** ********
10****** ******** ******** ********
110***** ******** ******** ********
64
Classful Addressing (cont’d)
– Class D: [224..239] (top bits 1110)
o Multicast groups
– Class E: [240..255] (top bits 11110)
o Reserved for future use
• What problems can classful addressing lead to?– Only comes in 3 sizes– Routers can end up knowing about many class C’s (/24s)– Wasted address space
1110**** ******** ******** ********
11110*** ******** ******** ********
Today’s Addressing: CIDR
• CIDR = Classless Interdomain Routing
• Flexible division between network and host addresses
• Must specify both address and mask– Clarifies where boundary between addresses lies– Classful addressing communicate this with first few bits– CIDR requires explicit mask
65
66
CIDR Addressing
IP Address : 12.4.0.0 IP Mask: 255.254.0.0
00001100 00000100 00000000 00000000
11111111 11111110 00000000 00000000
Address
Mask
for hosts Network Prefix
Use two 32-bit numbers to represent a network. Network number = IP address + Mask
Written as 12.4.0.0/15 or 12.4/15
67
Obtaining a Block of Addresses• Allocation is also hierarchical
– Prefix: assigned to an institution– Addresses: assigned by the institution to their nodes
• Who assigns prefixes?– Internet Corporation for Assigned Names and Numbers
o Allocates large address blocks to Regional Internet Registrieso ICANN is politically charged
– Regional Internet Registries (RIRs)o E.g., ARIN (American Registry for Internet Numbers)o Allocates address blocks within their regionso Allocated to Internet Service Providers and large institutions ($$)
– Internet Service Providers (ISPs)o Allocate address blocks to their customers (could be recursive)
• Often w/o charge
68
Dynamic Host Configuration Protocol
arrivingclient
DHCP server203.1.2.5
DHCP discover(broadcast)
DHCP offer
DHCP request
DHCP ACK
(broadcast)
Why all the broadcasts?
(broadcast)
(broadcast)
69
Network Address Translation (NAT)
Before NAT…every machine connected to Internet had unique IP address
1.2.3.4
1.2.3.5
5.6.7.8
LAN
Clients
Server
Internet1.2.3.45.6.7.880 1001
dest addr src addrdst port
src port
5.6.7.8 1.2.3.4 80 1001
70
NAT (cont’d)
• Assign addresses to machines behind same NAT– Usually in address block 192.168.0.0/16
• Use port numbers to multiplex single address
192.2.3.4
192.2.3.5
5.6.7.8
Clients
Server
Internet
NAT
1.2.3.4
5.6.7.8 192.2.3.4 80 1001
192.2.3.4:1001 1.2.3.4:2000
5.6.7.8 1.2.3.4 80 2000
1.2.3.45.6.7.880 2000
5.6.7.8 192.2.3.480 1001
71
NAT (cont’d)
192.2.3.4
192.2.3.5
5.6.7.8
Clients
Server
Internet
NAT
1.2.3.4
192.2.3.4:1001 1.2.3.4:2000
5.6.7.8 1.2.3.4 80 2001
1.2.3.45.6.7.880 2001
5.6.7.8 192.2.3.580 1001
192.2.3.5:1001 1.2.3.4:2001
5.6.7.8 192.2.3.5 80 1001
• Assign addresses to machines behind same NAT– Usually in address block 192.168.0.0/16
• Use port numbers to multiplex single address
72
Forwarding
73
Scalability via Address Aggregation
Provider is given 201.10.0.0/21 (201.10.0.x .. 201.10.7.x)
201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23
Provider
Routers in the rest of the Internet just need to know how to reach 201.10.0.0/21. The provider can direct the
IP packets to the appropriate customer.
Each customergiven smaller prefix
Global Picture
74
201.10.0/21Port 1 201.11.0/21Port 2 202/8 Port 4……………..
201.10.0/22Port 1 201.10.4/24Port 2 201.10.5/24Port 3201.10.6/23Port 4
Router in Internet Core
Router in ISPOnly /21 listed in core
/22, /23, /24 only listed in ISP’s router
75
Aggregation Not Always Possible
201.10.0.0/21
201.10.0.0/22 201.10.4.0/24 201.10.5.0/24 201.10.6.0/23
Provider 1 Provider 2
Multi-homed customer with 201.10.6.0/23 has two providers. Other parts of the Internet need to know how
to reach these destinations through both providers. /23 route must be globally visible
Multihoming Global Picture
76
201.10.0/21Port 1 201.10.6/23Port 2201.11.0/21Port 3
……………..
201.10.0/22Port 1 201.10.4/24Port 2 201.10.5/24Port 3201.10.6/23Port 4
Router in Internet Core
Router in ISP1
201.10.6/23Port 1201.11.0/21Port 2201.12.0/21Port 3201.13.0/21Port 4
Router in ISP2
Simple Example
• 0** Port 1
• 100 Port 2
• 101 Port 1
• 11* Port 1
77
78
Prefix Tree
00*
000 001
0 101*
010 011
0 111*
110 111
0 110*
100 101
0 1
0**0 1
1**0 1
***0 1
P1
P2 P1
P1
79
More Compact Representation
10*
100
0
1**0
***
1
P2
P1
Record port associated with first match, and only over-ride when it matches
another prefix during walk down tree
This is longest prefix match (LPM)
If you ever leave path, you are done, last matched
prefix is answer
Forwarding Optimization
• LPM requires fewest entries and fewest bits walked
80
Longest Prefix Match Representation
• *** Port 1
• 100 Port 2
• If address matches both, then take longest match
81
82
Example
1 0 1 0 10 1
Prefix destined for Provider 1
Prefix destined for Provider 2
No packet will match more than one prefixAll paths reach a unique prefix
83
More Compact Representation
0
Prefix destined for Provider 1
Prefix destined for Provider 2
84
Transport
Role of Transport Layer
• Provide common end-to-end services for app layer– Deal with network on behalf of applications– Deal with applications on behalf of networks
• Could have been built into apps, but want common implementations to make app development easier– Since TCP runs on end host, this is about software
modularity, not overall network architecture
85
86
TCP Header
Source port Destination port
Sequence number
Acknowledgment
Advertised windowHdrLen Flags0
Checksum Urgent pointer
Options (variable)
Data
87
Example
• Packet arrives:– Seq: 2323– Ack: 4001– W=3000– [no payload]
• Appropriate response?– Seq: 4001, payload: 4001-8000– Seq: 2001, payload: 2001-5000– Seq: 4001, payload: 4001-5000– Seq: 5001, payload: 5001-6000– Seq: 8001, payload: 8001-9000
88
Advertised Window Limits Rate
• Sender can send no faster than W/RTT bytes/sec
• In ideal case, throughput = MIN [W/RTT, B]– Where B is bottleneck on path
89
Establishing a TCP Connection
• Three-way handshake to establish connection– Host A sends a SYN (open; “synchronize sequence
numbers”) to host B– Host B returns a SYN acknowledgment (SYN ACK)– Host A sends an ACK to acknowledge the SYN ACK
SYN
SYN ACK
ACK
A B
DataData
Each host tells its ISN to the other host.