1/14/2002 - iut.ac.irivut.iut.ac.ir/content/110/slides/communication_networks2002.pdf · 1/14/2002...
TRANSCRIPT
1/14/2002 2
GoalsGoals
� Introduce the basics of networking both from a theoretical as well a practical standpoint
� Foster the ability to understand research issues
1/14/2002 4
TopicsTopics
� Transport Layer: Service Models, Protocols, Congestion Control
� Network Layer: Service Models, Routing algorithms, IPv6, Multicast
� Link Layer: Issues, performance, implementations
� Multimedia Services: Application requirements, traffic models, Quality of Service issues, transport protocols for adaptive and hard real time traffic
1/14/2002 5
PrerequisitesPrerequisites
• Knowledge of computer architectures. Topics include virtual memory, timers, scheduling, multiprogramming
• Strong programming ability in C • User-level understanding of the UNIX operating
system • Ability to undertake substantial independent
design projects
1/14/2002 6
ResourcesResources
• Required Text:• Andrew Tannenbaum, Computer Networks,
Third Edition, Prentice Hall, 1996• Recommended Books:
• Douglas E. Comer, Internetworking with TCP/IP Volume I: Principles, Protocols, and Architecture, 3rd edition, Prentice Hall, 1995.
• Wright and Stevens, TCP/IP Illustrated, Vol 1. Addison Wesley
Lecture Topics• History and motivation• Network architecture
– Layered models– Definitions and abstractions– OSI Reference Model
• Network design issues– Definitions– Components– Message, packet, and cell switching– Resource sharing– Functionality– Performance
C/SClient-Server Applications
C/SClient-Server Applications
FTPFile Transfer Protocol
FTPFile Transfer Protocol MultimediaMultimedia
Networks are Important!!!
TelnetTerminal Emulation
TelnetTerminal Emulation
WWWWorld Wide Web
WWWWorld Wide Web Email
Electronic MailEmail
Electronic Mail
… and many others …
NET.WORK.VIRGINIA
� ATM network with Internet access� Over 400 sites with OC3,
DS3, or DS1 service� Service through Sprint and
Vision Alliance (consortiumled by Bell Atlantic)
OC3 OC3
OC3
DS3
Internet
OC3
Sprint ROA
Sprint RIC
SprintLinkRouter
Backbone/Internet Gateway
Sprint WTN
ESnetvBNS
Internet2
Network Architecture• Network architecture
– Guides the design and implementation of the network– Assists in coping with complexity
• Networks are typically modeled as a set of layered, cooperating processes
• The International Organization for Standards (ISO) has developed the seven-layer Open Systems Interconnect (OSI) model– The OSI model is not strictly adhered to in actual
implementations. It is used more as guidelines.
A Simple Layered ModelApplication Programs
Process-to-Process Channels
Host-to-Host Connectivity
Networking Hardware
• Decomposes system into simpler, manageable components
• Provides a modular design
Laye
rs
Multiple Abstractions for One Layer
• Process-to-process channel– Request/reply interaction– Stream of messages
Application Programs
Request/ReplyChannel
Host-to-Host Connectivity
Networking Hardware
Message StreamChannel
Functions Are Not Always “Layer-able”
• Some functions may need to interact with multiple layers
Application Programs
Process-to-Process Channels
Host-to-Host Connectivity
Networking Hardware
Net
wor
kM
anag
emen
t
Layered Models … Generalized (1)• Layer N
– Provides services to layers N+1 and above– Uses services offered by layers N-1 and below– May ONLY interact with peer layer N entities via protocols
• Distinction between service, interface, and implementation
Layer N+1
Layer N
Layer N-1
Layer N+1
Layer N
Layer N-1
Layered Models … Generalized (2)
Node A Node B
services provided by lower layers
protocol
services provided to upper layers
Layer N
peer-to-peerinterface
serviceinterface
serviceinterface
Layered Models … Generalized (3)• Protocols are rules for cooperation between peers
– Peer-to-peer interfaces, e.g. Protocol X defines the interfaces
– “Protocol” sometimes used to refer to the layer itself, e.g. the entity that realizes Protocol X
• Service access points (SAPs) adhering to an interface definition are needed between layers– Service or layer-to-layer interface– The services implemented by a protocol at layer X are
accessed through its SAP. Think of SAP as a functional interface.
Interfaces and Protocols• Three components of an interface
– Set of visible abstract objects, and for each, a set of allowed operations with parameters
– Set of rules governing sequences of operations– Encoding and formatting conventions required for
operations and parameters
• Protocols are operationally equivalent, but are usually restricted to peer layers (interfaces are between adjacent layers)
OSI Terminology for Layering
SAP Service Access Point (where N+1 accesses N)IDU Interface Data Unit (passed from N+1 to N)SDU Service Data Unit (data from N+1)ICI Interface Control Information (service type, etc.)PDU Protocol Data Unit (exchanged by peer N entities)
ICI SDUIDU
ICI SDU
SAP
Layer N
Layer N+1
header SDUPDU
OSI Reference Model
Data Link
Physical
Data Link
Physical
Data Link
Physical
Transport
Network Network
Transport
Network
Presentation
Session
Application
Presentation
Session
Application
Deviation from Strict Layering
LLC Logical Link ControlMAC Media Access ControlPHY PhysicalPMD Physical Media DependentSMT Station Management
LLCMACPHYPMD
SMT
Data Link
Physical
• Example: Fiber Distributed Data Interface (FDDI)
Layered Model Example• Typical protocol “stack” in a UNIX-based TCP/IP
environment
Data LinkPhysical
NetworkTransportSession
Presentation
Application
Ethernet FDDI
IPTCP UDP
RPC
HTTP
XDR
SMTP NFS
TelnetFTPX
Internet Protocol Graph
IP
TCP UDP
Net nNet 2Net 1
FTPHTTP FTPHTTP
...
• Internet protocols (“TCP/IP”) really uses a four-layer architecture
Advantages of Layering (1)• Data hiding and encapsulation -- data structures,
algorithms, etc. in a layer are not visible to other layers
• Decomposition -- complex systems can be decomposed into more easily understood pieces
• System can evolve since layers can be changed (as long as service and interface does not change)
• Alternate services can be offered at layer N+1 that share the services of layer N
Advantages of Layering (2)
• Alternate implementations of a layer canco-exist
• A layer or sublayer can be simplified or omitted if some or all of its services are not needed
• Confidence in correct operation enhanced by testing each layer independently
Disadvantages of Layering• Some functions (like FDDI station management)
really need to access and operate at multiple layers• Poorly conceived layers can lead to awkward and
complex interfaces• There may be performance penalties due to extra
overhead of layers, for examplememory-to-memory copies
• Design of (an older) layer N+1 may besub-optimal given the properties of (a new) layerN
Physical Layer• The physical layer provides a virtual link for
transmitting a sequence of bits between any pair of nodes joined by a physical communication channel -- “virtual bit pipe”
• Synchronous or asynchronous• Defines physical interface, signaling, cabling,
connectors, etc.• May be variations at the physical level for a basic
data link protocol (PMD specs)– IEEE 802.3 (Ethernet): 10Base5 (thick wire), 10Base2
(thin wire), 10BaseT (twisted pair)
Data Link Layer• The data link layer is responsible for the
error-free transmission of packets between “adjacent” or directly-connected nodes (OSI defn)
• The media access control (MAC) function is a sub-layer of the data link layer– Allows multiple nodes to share a common transmission
media– Supports addressing of nodes
• The logical link control (LLC) function is another sub-layer– Functions such as error recovery
Network Layer• The network layer is responsible for getting a
packet through the network from the source node to the destination node– Routing to select network path– Flow control or congestion control– Internetworking to allow transmission between
different types of networks• In a WAN or internetwork, the network layer
requires cooperation among peers at intermediate nodes
• Network layer function is minimal in a LAN• Key: Network layer provides host-to-host
communication
Transport Layer (1)• The transport layer provides
network-independent, end-to-end message transfer between pairs of ports or sockets
• Ports are destination points for communication that are defined by software– Ports are identified by a transport address that identifies
the host computer and the port identifier– Used to distinguish between multiple applications on
one host– Established services, like FTP and HTTP, have “well-
known” default port identifiers that can be obtained through a name service (RFC 1700)
• Key: Transport layer provides process-to-process communication.
Transport Layer (3)
• Transport layers typically provide one of two basic types of service:– Virtual circuit or connection-oriented service
• Transmission Control Protocol (TCP)
– Datagram or connection-less service• User Datagram Protocol (UDP)
Transport Layer: Virtual Circuits• Virtual circuits are logical channels between a
source and destination• Connections are maintained for multiple packet or
message transmissions until they are explicitly released – Network layer may still use dynamic routing
• Functions– Translate transport address to network address– Segment messages into packets for transmission– Pass packets to network layer for delivery– Reassemble packets at receiving end
Transport Layer: Datagrams• Datagram communication is connectionless• New connection is established and released for each
packet or message transmitted– Packet itself establishes and releases the “connection”
• Functions– Translate transport address to network address– Pass messages to network layer for delivery– Each message sent as a single packet– Upper layer responsible for re-ordering and error detection
Session Layer
• The session layer is responsible for establishing and maintaining virtual connections between pairs of processes in different hosts, possibly including service location and access rights
• Multiple sessions may be multiplexed over a single connection (provided by a lower layer)
Presentation Layer (1)• The presentation layer represents information to
applications so as to preserve semantics (meanings or values) while resolving syntactic (representation) differences
• In open systems, heterogeneous computers result in heterogeneous representations – Characters: ASCII, EBCDIC, Unicode– Integers: lengths, 1’s versus 2’s complement– Reals: fixed or floats, different float points– Byte order: 01234567... or 67452301– Structured data
Presentation Layer (2)
• Presentation layer may provide encryption and/or compression may be used
• Comments on security– Information security (INFOSEC): security at
this layer – Communications security (COMSEC): security
at the physical or data link layer
Application Layer• Network applications make up the application
layer• Protocol specific to each particular application• Certain applications, like HTTP, NFS, FTP, and
Telnet have been standardized• Standards do not provide a fixed model for
applications, but models do exist– Client-server versus peer-to-peer– Remote procedure call (RPC) versus message passing
Network Requirements• Multiple view points:
– Network users• Performance that a user’s applications need, e.g., latency
(delay) and loss rate
– Network designers• Cost-effective design e.g., network resources are efficiently
utilized and fairly allocated
– Network service providers• System that is easy to administer and manage e.g., faults can
be easily isolated and it is easy to account for use
Point-to-Point
Multiple Access
...
Connectivity (1)• Network building blocks
– Nodes -- Workstations, PCs– Direct links -- twisted pair, coaxial cable, optical fiber, radio
frequency link, … • Point-to-point• Multiple access (multiaccess)
Connectivity (2)• Indirect connectivity
– Switched or routed networks allow indirectly connected nodes to communication
– Switches, routers, hubs, etc. are specialized nodes in the network
– Switching network is the “cloud”
Switched Network
Connectivity (3)• An internetwork or internet is a network of
networks– Need internetworking devices: Routers– The Internet is a specific example of an internet.
Internetwork
Message versus Packet Switching (1)
• Networks may be classified by how they segment data for transmission and switching– Message-switched versus packet-switched– Most networks use packet switching (or cell switching)
• Messages– Have some higher level meaning, e.g. as a request for
service or a reply– Encoded as a string of bits
Message versus Packet Switching (2)• Packets
– Messages may be decomposed into one or more packets for transmission, reconstructed at receiver
– Lower layer entities may further decompose packets, for example: Ethernet frames, ATM cells
Message
Packets with headers
Sessions• Messages usually occur as part of a longer
transaction called a session• Session properties
– Message or packet arrival process (rate, variability)– Session holding time– Message or packet length distribution– Acceptable delay– Required reliability and security– Acceptable ordering of messages or packets
Circuit vs. Store-and-Forward Switching
• Two forms of switching for the messages or packets in a session are widely used– Circuit switching– Store-and-forward or, simply, “packet
switching”
Circuit Switching• Session s initiated with a request for a fixed
transmission rate (bandwidth requirement) of rsbits/sec
• Path created through the network– Each link in path allocates capacity of rs bits/sec to s,
e.g. using time-division multiplexing (TDM) or frequency division multiplexing (FDM)
– Request is blocked if no path can be established
• Bandwidth dedicated to s for the life of the session
Efficiency of Circuit Switching• Most data traffic is “bursty,” so links are not well
utilized
• Circuit switching not widely used in data networks (except, inefficiently, for access)– Links are expensive– Sessions require significant portion of link capacity
(only a few sessions can be supported)– Traffic is bursty, so utilization is low
time
Store-and-Forward Switching (1)
• No transmission rate allocation is dedicated at set-up– Differs from circuit switching
• Data transmitted at full link capacity, but links can be shared by multiple sessions on a demand basis
Store-and-Forward Switching (2)• Advantages:
– Link fully utilized if there is any data to transmit– Delay can be significantly reduced– Utilization can be significantly increased
• Disadvantages:– Greater variance in delay due to queuing delays– Flow control needed to prevent buffer overflows
Store-and-Forward Switching (3)
• How is information switched?– Message Switching: messages are sent intact
without being broken into packets– Packet Switching: messages are broken into
packets for transmission– Cell Switching: messages (or packets) are
broken into fixed-size packets called cells
Store-and-Forward Switching (4)
• How are messages or packets routed through the network?– Virtual Circuit Routing: a path is established
and used for the duration of the session• Connection-oriented or virtual circuit service
– Dynamic Routing: each packet or message may traverse a different path through the network
• Connection-less or datagram service
Geographic Extent (1)• Networks may be classified by their geographic
extent– DANs, LANs, MANs, and WANs– Useful classification for lower level protocols– Should be transparent to upper layer protocols
• DAN: Desk Area Network– Connects PC and peripherals– USB, Firewire– Medium to high data rates– Low-cost, high-volume, built-in interfaces
Geographic Extent (2)• Local Area Networks (LANs)
– Limited extent (10’s of meters to a few kilometers)– High data rates (megabits to gigabits per second)– Built-in interfaces in workstations, PCs– Low cost– Low delay– Examples: Ethernet, Token Ring, FDDI, ATM
Geographic Extent (3)• Metropolitan Area Networks (MANs)
– Medium extent (10’s of kilometers)– Medium data rates (kilobits to 100’s of megabits per
second)– Special access equipment, often expensive– Example: FDDI, ATM, DQDB
• Wide Area Networks (WANs)– Large extent (global)– Low speed (kilobits to 100’s of megabits per second)– Special access equipment, usually expensive– High latency– Examples: T1, T3, SMDS, ATM, OC-XXX links
Resource Sharing (1)
• Economics dictates that network resources must be shared or multiplexed among multiple users– Shared links– Shared network nodes (switches, hubs, etc.)
Host
Host
Host
Switch Switch
Host
Host
Host
Resource Sharing (2)
• Multiplexing schemes– Fixed
• Time-division multiplexing (TDM) or synchronous time-division multiplexing (STDM)
• Frequency division multiplexing (FDM)
– On-demand• Statistical multiplexing, including asynchronous
time-division multiplexing
Statistical Multiplexing• Packets from all traffic streams are merged into a
single queue and transmittedon-demand– Scheduling is typically first-come first-served (FCFS),
but priority schemes are also used– TSM=L/C seconds needed to transmit L-bit packet– May also maintain a separate queue for each traffic
stream and service in a “round-robin” manner (skipping over an empty queue with no loss of transmission capacity)
Synchronous Time-Division Multiplexing
• Time on the channel is divided into m slots and each of m traffic streams is given one slot --unused slots are wasted– Create m channels, each with capacity C/m– L-bit packet takes TSTDM=Lm/C seconds to transmit if
packets are long compared to the length of a slot– L-bit packet takes TSTDM=L/C seconds to transmit if
slots are of packet length, but must wait (m-1) slots between transmissions
Frequency Division Multiplexing
• Channel bandwidth W is subdivided into mchannels and each of m traffic streams is given one channel– Create m channels, each with bandwidth W/m,
or capacity C/m (ignoring guard bands between channels)
– L-bit packet takes TFDM=Lm/C seconds to transmit
FDM, STDM vs. Statistical Multiplexing
• Statistical multiplexing has smaller average delay than either STDM or FDM– Channel capacity is wasted with STDM (wasted time
slot) and FDM (wasted bandwidth) when a traffic stream is idle
– Transmission time greater for STDM and FDM• Advantages of STDM or FDM
– Statistical multiplexing has lower average delay, but higher variance of delay
– STDM and FDM eliminate the need to identify traffic stream associated with each packet
Functionality (1)
• Network must support common services or process-to-process channels, for example– Request/reply channel for file access, digital
libraries, etc.– Message stream channel for video and audio
applications
Functionality (2)• What can corrupt this functionality? What
can go wrong?– Link or node failures– Errors at the bit or packet level– Arbitrary delays– Buffer overflows -- lost packets– Out of order delivery– Security -- eavesdropping, spoofing, etc.
Functionality (3)
• The key problem is to bridge– What the application expects and– What the underlying technology can provide
• Carries over to a layered model -- Layer Nneeds to provide– What Layer N+1 expects using– What Layer N-1 can provide
Distributed Algorithms (1)• Peers must cooperate to perform network
functions• A distributed algorithm is decomposed into one or
more local algorithms• Each local algorithm proceeds based on the data
received from other layers or peers, and the order in which the data is received
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Network
Data Link
Physical
Distributed Algorithms (2)• These algorithms are complex because underlying
services may be unreliable• Data may …
– Never arrive (due to transmission error, overflow, etc.)– Arrive late (due to arbitrary network delay)– Arrive out of order (due to differing network paths)
• It may be impossible to ensure correct operation 100% of the time– Maximize probability of success– Detect errors
Maroon and Orange Armies (1)
OrangeArmy
����
MaroonArmy #1
MaroonArmy #2
• Maroon Armies #1 and #2 must attack simultaneously to defeat the Orange Army
• Maroon Army #1 wants to send a messenger (�) to Maroon Army #2 to set a time for the attack
Maroon and Orange Armies (2)
• The messenger must go through enemy territory (an unreliable communication channel)
• Problems … – May be delayed -- until after the attack time– May be captured -- so that message is never
delivered
Maroon and Orange Armies (3)• Possible solution: require Maroon Army #2 to
send another messenger to acknowledge that the first messenger arrived with the message– Acknowledgment messenger may be delayed or
captured– Maroon Army #2 would think that the attack is on, but
Maroon Army #1 cannot know if it is on or not
• There is no possible solution to the problem with probability 1 of success
Maroon and Orange Armies (4)
• The attack can be synchronized with high probability– For example, send many messengers to increase
likelihood of one reaching Maroon Army #2?
Performance• Protocols and services define functionality, but not
performance– Bandwidth, throughput, data rate, capacity, … – Latency, delay, … – Variability in latency and data rate important for some
applications– Loss is sometimes a performance measure
• Performance is determined by– Underlying technologies– Protocol design– Protocol implementation– Use by the application or upper layer
Bandwidth• Bandwidth is commonly used to indicate the
amount of data that can be transferred in some unit of time
• Example: 10 megabits per second– 107 bits per second– 10-7 seconds per bit (100 ns) -- the “bit width”
• Link versus end-to-end bandwidth may vary
1 0 1
10-7 s = 100 ns
Latency (1)• Latency is delay, i.e. the time it takes for a
message to get from one point to another• Round-trip time (RTT) is the time it takes to get to
one point and receive a return back• End-to-end versus link delay• Components
– Processing overhead -- e.g., software overhead– Transmission time -- depends on bandwidth and length
of message– Propagation delay -- time for a bit to travel from one
end of a link to another– Queueing delay -- time waiting for a shared link
Latency (2)• Example
– Processing overhead -- assume 1 µs– Transmission time
• Assume L = 1,000 bit message• Assume C = 10 Mbps link• Transmission time: T = L/C = 100 µs
– Propagation delay• Speed of light is c = 2×108 m/s in optical fiber• Assume D = 1 km (1000 m)• Propagation delay = D/c = 5 µs
– Queueing delay -- assume 0– Latency is 1 + 100 + 5 = 106 µs (transmission time
dominates in this example)
Latency (3)• Dominating factors
– Processing overhead can dominate for high data rate links over short distances with short messages
– Transmission time can dominate for slower links or longer messages
– Propagation delay is important with long links– Queuing delay can dominate in a congested
network
• The delay×bandwidth product is an important factor in protocol design– Determines the “size of the pipe”
– Made large by• High delay, e.g. long propagation time• High bandwidth, e.g. a fast link
– Large product means that a large amount of data must be sent to “fill the pipe” before the receiver can respond
Delay × Bandwidth Product
B
D
You should now be able to … (1)• Define protocol, service access point, protocol
data unit, service data unit• Describe the structure and role of a layers in a
network architecture• Cite advantages and disadvantages of a layered
model for a network architecture• List the seven layers in OSI reference model and
describe the basic functions of each layer• Describe the three different perspectives on
network design
You should now be able to … (2)• Define the basic components of a network
including links, nodes, and switches• Describe the construction of an internet (with a
lower case i)• Distinguish between message, packet, and cell
switching• Distinguish between store-and-forward and circuit
switching and cite advantages and disadvantages of each
• Define DAN, LAN, MAN, and WAN and describe their general characteristics
You should now be able to … (3)• Describe how STDM, FDM, and statistical
multiplexing enable resource sharing and cite advantages and disadvantages of STDM and FDM versus statistical multiplexing
• Define bandwidth and latency• Calculate bandwidth given the time needed to
transmit one bit• Define the components of latency and describe
factors that can increase latency• Calculate latency given information about the
components
Links at the Physical Layer
• Links can be implemented using a variety of physical media– Magnetic Media (sneaker net)– Twisted pair– Coaxial cable– Optical fiber– Radio waves– Infrared
• Different media, together with end electronics and optics, determine the relevant properties of the media
Physical Layer Properties
• Bit encoding -- how is information -- the 1’s and 0’s -- encoded?
• Full-duplex versus half-duplex operation– Full-duplex: data in both directions simultaneously– Half-duplex: data in one direction at a time
• Data rate -- how much information can be sent in a unit of time?
• Extent -- how long can the link be and still operate reliably?
Examples of Local Links
Category 5 twisted pair50-ohm coax (Thinwire)75-ohm coax (Thickwire)Multimode fiberSingle-mode fiber
Service10-1000 Mbps10-100 Mbps10-100 Mbps
Bandwidth
100 Mbps100-2400 Mbps
100 m200 m500 m
Distances
2 km40 km
Examples of Leased Links
ISDN (B-channel)T1 (DS1)T3 (DS3)STS-3 (OC-3)STS-12 (OC-12)STS-24 (OC-24)STS-48 (OC-48)
Service64 Kbps1.544 Mbps44.736 Mbps
1.244160 Gbps2.488320 Gbps
Bandwidth
155.251 Mbps622.080 Mbps
Encoding
• Encoding determines how information is represented by electrical, optical, or electromagnetic signal
• Examples– Non-return to zero (NRZ)– Non-return to zero inverted (NRZI)– Manchester– Block codes, e.g. 4B/5B
NodeAdaptor
NodeAdaptor
signal
bits
10111010001??11???00?1
Physical Layer “Bit Pipe”• The Physical layer defines signal levels and
timing so that it can deliver a stream of bits to the Data Link layer
• Signal bandwidth determines data rate limit• Timing errors
– Noise or distortion can lead to errors in timing– Sender and receiver clocks may differ -- “drift”
Asynchronous Transmission (1)• Each transmission is synchronized
– A start bit begins a transmission– A stop bit ends a transmission– Line stays in an idle state until the next start bit
• Samples timed from beginning of start bit• Timing errors can accumulate up to
± one-half bit time over the entire character
1 1 0 0 1 0 1 0
T/2 T
start stop idle start
Asynchronous Transmission (2)• The Physical layer can provide characters (n-bit
units) to the Data Link layer
n bits n bits n bitsidle idle
Asynchronous Transmission (3)
• Advantages– Simple timing mechanism– Inherent character framing– Adapts to different data rates (idle serves as fill)
• Disadvantages– Timing errors can occur if the line is noisy (e.g., missed start
bit)– Overhead for stop and start bits
• Used in applications where performance can be reduced to reduce costs– Modems– PC serial ports
Synchronous Transmission (1)
• Used for high data rates, including T1 and other interoffice digital transmission lines
• Information is sent continuously– Receiver and repeater maintain synchronization between the
incoming signaling rate and the local sample clock– Idle or fill characters inserted if line is idle
• Signal transitions (high-to-low orpositive-to-negative) enable clock recovery, or synchronization– Some minimum occurrence of signal transitions are needed to
maintain synchronization
Synchronous Transmission (2)
• Methods to ensure signal transitions– Source code restrictions– Dedicated timing bits– Bit insertion– Data scrambling– Forced bit errors– Line coding -- include transitions in the signals
Synchronous Transmission (3)• The Physical layer can provide bits to the Data
Link layer• Data Link layer must extract frame boundaries
– SYN = 0001 0110 (16H)– STX = 0000 0010 (02H)
SYN SYN STX header packet ETX CRC SYN
frame00010110 00010110 00000010 ...
Ensuring Signal Transitions (1)
• Source code restrictions– Disallow any codes that do not include a transition– Cannot provide a “clear channel”
• Dedicated timing bits– Use some bit transmissions just for timing– Example: Dataphone Digital Service
• Use every one bit out of eight to guarantee a signal transition– Example: Synchronous modems
• Periodically insert a SYNC character– Ethernet uses a timing sequence “preamble”. Needs accurate
local clock that say stay “drift-free” for a single packet transmission
Ensuring Signal Transitions (2)
• Bit insertion– Use bit transmissions for timing only when needed
by inserting timing bits into the data bit stream– HDLC “bit stuffing”
• 01111110 indicates the end of a data block (for framing)• Six consecutive 1’s must not be sent as data (may be
mistaken as the end of a data block)• Sender inserts a 0 after every string of five consecutive
1’s; receiver must strip a 0 after five consecutive 1’s– Problems with extra bits and extra delay
Ensuring Signal Transitions (3)
• Data scrambling– Similar to encryption/decryption– Prevents the transmission of repetitive patterns– With high probability, prevents long strings of 0’s (or 1’s)
that would not have signal transitions• Forced bit errors
– Errors are introduced in code word to prevent long periods without a signal transition
• Line coding– Some line coding methods ensure regular transitions that can
be used for clock recovery– Example: 4B/5B code used in FDDI
Fourier Series
• Even and Odd functions:– Even functions mirror around the Y-axis (cos θ)– Odd functions invert around the Y axis (sin θ)
• A Fourier series expresses a time-domain function as a series of harmonics of sines and cosines. – The basic Fourier series expresses are periodic
function.– Aperiodic functions are involve a continuum of
frequencies – leads to the Fourier transform• Why is it useful?
– Hint: Tells you something about the frequency
Fourier Series
Take a look at :– http://sunlightd.virtualave.net/Fourier/
∫
∫
∫
∑∑
=
=
=
++=∞
=
∞
=
T
n
T
n
Tn
nn
n
dtnfttgT
B
dtnfttgT
A
dttgT
A
nftBnftAAtg
0
0
00
11
0
)2cos()(2
)2sin()(2
)(2
)2cos()2sin(2
)(
π
π
ππ
Nyquist’s Sampling Theorem
• The bit and the baud– Using multiple signaling levels. Baud * log2(signaling
levels) = bits/sec• If a signal is passed through an arbitrary low pass
filter of bandwidth B, it can be reconstructed by a sampler collecting 2B samples/sec– Reasoning: Frequency components higher than the
sampling rate are already filtered out.• Max data rate = 2B log2 V bits/sec,
– Where V is the number of discrete levels in the signal.
Thermal Noise: Shannon’s Theorem
• Nyquist theorem is used for noiseless channels.– Real world communication channels are noisy. At the
minimum, you have thermal noise• Signal to Noise ration S/N is expressed in dB as 10log10(S/N).– Recall the -3db cutoff point.
• Shannon’s Theorem: – Max data rate = B log2(1+S/N)– This is an upper bound independent of the number of
signaling levels and sampling rate.
Module atNode A
Module atNode B
“Link”
Point-to-Point Protocols and Links (1)
• Point-to-point protocols involve exactly two peer entities or modules that are connected by some “link”
• The modules must interact to ensure the proper transfer of information using the link
Point-to-Point Protocols and Links (2)
• For example, the link may be:– Physical link (e.g. RS-232 is a point-to-point
protocol)– Virtual bit pipe (e.g. at the data link layer)– A connection or virtual connection (e.g. at the
transport or session layer)
Data Link Control -- DLC (1)
• For each point-to-point link in a network there are two data link control (DLC) peer modules, one at each end
• DLC modules use a distributed algorithm to transfer packets– Received from and delivered the network layer
• Usual objective is to deliver packets in order of arrival (from the network layer) without errors or repeated packets
Network
DLC
Physical
Network
DLC
Physical
Data Link Control -- DLC (2)
• DLC modules must use the unreliable “virtual bit pipe” provided by the physical layer
Data Link Control -- DLC (3)
• DLC must:– Detect errors (using redundancy bits)– Request retransmission if data is lost (using
automatic repeat request -- ARQ)– Perform framing (detect packet start and end)– Support initialization and disconnection
operations
service data unitheader trailer
frame
Data Link Control -- DLC (4)
• These functions require that extra bits be added to the packet to be transmitted– Header bits are added to the front of each each packet– Trailer bits are added to the rear of each packet– The header, packet from upper layer (service data unit),
and trailer form a frame
service data unitheader trailer
frame
Frame Format• The packet from the upper layer is the service data
unit (SDU)• The frame (header, network layer packet, and
trailer) is the protocol data unit (PDU)• Note that the DLC does not care what is in the
network layer packet and the physical layer does not care what is in the frame generated by the data link layer
Data Link Layer: Functionality
• The data link layer must:– Detect errors (using redundancy bits)– Request retransmission if data is lost
(using automatic repeat request -- ARQ)– Perform framing (detect packet start and
end)– Support initialization and disconnection
operations
service data unitheader trailer
frame
Data Link Layer
• These functions require that extra bits be added to the packet to be transmitted– Header bits are added to the front of each each
packet– Trailer bits are added to the rear of each packet– The header, packet from upper layer (service data
unit), and trailer form a frame
service data unitheader trailer
frame
Frame Format• The packet from the upper layer is the service
data unit (SDU)• The frame (header, network layer packet, and
trailer) is the protocol data unit (PDU)• Note that the data link layer does not care
what is in the network layer packet and the physical layer does not care what is in the frame generated by the data link layer
Error Detection
• Two types– Error Detection Codes (e.g. CRC, Parity,
Checksums)– Error Correction Codes (e.g. Hamming, Reed
Solomon)• Basic Idea
– All bit combinations in a packet are valid– Add redundant information to determine if errors
have been introduced• Why redundant?
Error Detection Codes
• Naïve scheme– Send a duplicate copy of the message
• Problems– Takes up too much space– Poor performance.
• Can’t even detect 2 bit errors
Single Parity Checks• Technically used for 1 bit error detection. Can also
detect any odd number of bit errors.• Involves adding an extra “parity” bit to the bit string• Two varieties:
– Even Parity– Odd Parity
• Basic Idea:– For even parity, make the total number of 1’s in the bit string
an even number. This mechanism decides the value of the parity bit. Odd parity makes the number of 1”s odd instead of even.
• Single Parity cannot detect burst errors– Burst errors cause errors in a sub-string of arbitrary length– A burst error is as likely to cause an even number of errors
as an odd number of errors
Two Dimensional Parity
• Each byte is protected by a parity bit• The entire frame is protected by a parity
byte
1011110 1
1101001 0
0101001 1
1011111 0
0110100 1
0001110 1
1111011 0
Paritybits
Paritybyte
Data
Two-Dimensional Parity Checks
• Arrange a string of bits as atwo-dimensional array and compute parity over each row and each column of the array
• Can detect– Any number of errors in a single row (detect even
number of errors with column parity)– Any number of errors in a single column (detect even
number of errors with row parity)– Does it protect against everything?
• Answer is no. Why? Hint: Read between the lines. Single row or column
• What about burst errors– Need something stronger. CRC codes
CRC Codes
• Burst errors are hard to model -- three parameters are typically used to measure the effectiveness of a code for error detection1. Minimum distance of the code2. Burst-detecting capability3. Probability that a random string is
accepted as being error-free
Cyclic Redundancy Check• Treat the (n+1) bit message as a polynomial of
degree n. The bits are the coefficients of the polynomial. – 1101 = 1*x3 + 1*x2 + 0*x1 + 1*x0
• Calculating CRC– Sender and transmitter choose a divisor polynomial of
degree k. e.g x3 + x2 + 1. Call this C(x)– Add k bits to the (n+1) bit message such that the
n+1+k bit message is exactly divisible by the divisor
• Choice of divisor is very important. – It determines the kind of errors that the CRC can guard
against.
CRC Computation• Given:
– Message: M(x)– Divisor: C(x)
• Multiply M(x) by xk, i.e. add k zeroes to the end of the message. Call this T(x)
• Divide T(x) by C(x).• Subtract the remainder from T(x)
• The result is the message including the CRC
CRC Computation
• C(x) = x3 + x2 + 1• M(x) = x7 + x4 + x3 + x• Subtraction: logical XOR operation
Generator 11011111100110011010000 Message1101
1001110110001101
1011110111001101
10001101
101 Remainder
CRC Codes• Note: The CRC is computed over the entire
message, not a byte or a row/column.• When a message+CRC arrives at the
destination, divide it by the generator polynomial. If the remainder is 0, the message is intact, else it has been corrupted.– The bit pattern that can cause a CRC code to fail
is not a regular pattern such a random error or burst errors. That’s why CRCs are strong.
– Try out an example, where you try corrupting the CRC in the previous slide
Real World Example: Internet Checksum
• What’s a checksum?– Take a guess, check sum!– Another error detection scheme
• Treat message as a sequence of 16-bit integers
• Add these integers together using 16-bit one’s-complement arithmetic
• Take the one’s complement of the result• Resulting 16-bit number is the checksum
Example: Internet Checksumu_short cksum(u_short *buf, int count){register u_long sum = 0;while (count--) {
sum += *buf++;if (sum & 0xFFFF0000) {
– /* carry, so wrap around */– sum &= 0xFFFF;– sum++;
}
}return ~(sum & 0xFFFF);
}
Data link layer: Services
• Choices– Unacknowledged connectionless service– Acknowledged connectionless service– Acknowledged connection oriented service
Framing
• Messages (datagrams, packets, …) are broken into frames at the link layer.– Why?
• Frames are independently transmitted.• Frame boundaries need to be preserved.
– Frame length– Start and stop characters or bits– Invalid signaling (coding violations)
Reliable Transmission
• Why?– Frame corruption can be severe – CRCs are not enough. Recall CRCs don’t correct errors
• Two fundamental mechanisms– Acknowledgment– Timeout
• General idea is called ARQ (Automatic Repeat Request)
Stop-and-Wait ARQ (1)
• Stop-and-wait is the simplest ARQ (but not as simple as we might at first think)
• The sending DLC transmits a frame and then waits for a reply from the receiving DLC before sending the next frame.
• The receiving DLC replies with an acknowledgment (ack) if the frame iserror-free, otherwise:– May reply with a negative acknowledgment (nak)– May wait for the sender to timeout.
a2a1
Stop-and-Wait ARQ (2)
a2
Node A
Node Back ack
a1
D
D is delay for receiving frameRound-Trip Time: RTT ≈≈≈≈ 2D
Stop-and-Wait ARQ (3)• Since errors can occur in the return direction (B to
A), the acks and naks must also be protected by a CRC
• The transmitted frame may be lost or delayed, or a reply may become corrupted, lost or delayed, so the sender must time-out and retransmit the last packet
• This leads to a problem -- how does the receiver know if it is receiving a duplicate or the next packet (e.g., in the case of a lost ack)?
a1
Stop-and-Wait ARQ (4)
Node A
Node B a1lost ack
time-out
a?
a1
How does Node B know if the second frame is a1 or a2?
00 1
10
Node A
Node Back(1) ack(2)
0discard
lostack(1)
Stop-and-Wait ARQ (3)
• A solution to the “duplicate” packet problem is to use sequence numbers– The sender places a sequence number (SeqNum) in the
frame header– The receiver acknowledges with the next frame
expected (NFE) value
• The SeqNum and NFE values require extra bits in the frame
• For stop-and-wait ARQ, sequence numbers can be modulo 2, i.e. just {0,1} or {even,odd}, if link frames stay in order
SeqNum NFE packet CRC
Stop-and-Wait ARQ (4)
Stop and Wait: Possible ScenariosSender Receiver
Frame
ACK
Tim
eout
Tim
e
Sender Receiver
Frame
ACK
Tim
eout
Frame
ACKTim
eout
Sender Receiver
Frame
ACKTim
eout
Frame
ACKTim
eout
Sender Receiver
Frame
Tim
eout
Frame
ACKTim
eout
(a) (c)
(b) (d)
Stop-and-Wait ARQ Algorithm (1)
• Algorithm at node A (sender) to send to node B:1. SeqNum ← 02. Accept new packet and assign SeqNum to it3. Send packet SeqNum with SeqNum as sequence
number. 4. Set timer for recently transmitted packet5. If error-free ack from B and NFE > SeqNum, then
SeqNum ← NFE, delete timer and go to step 2; 6. If time-out then go to step 3
Stop-and-Wait ARQ Algorithm (2)
• Algorithm at node B (receiver) to receive from node A:1.NFE ← 0, repeat steps 2 and 3 forever2.If error-free frame received and SeqNum=NFE,
then pass packet to higher level andNFE ← NFE + 1 (modulo 2)
3.At some bounded time after receiving error-free frame send request for NFE to A
Stop and Wait: Performance problems?• No more than one packet in flight.
– That’s usually bad, here’s why
• Take a 10Mbps network with a 50ms round trip time
• Delay bandwidth = 107 * 0.050 = 500 Kbits
• In Stop and Wait, only frame can be in flight. The max frame size is 1500 bytes– Hence sending rate =
• 1500 * 8 ÷÷÷÷ 0.050 = 240 Kbps– This is much less than the link capacity of 10 Mbps
Performance Problems
• Using the actual 10Mbps Ethernet RTT of 50us (roughly)
• Delay bandwidth = 107 * 50us = 500 bits
• In Stop and Wait, only frame can be in flight. The max Ethernet frame size is 1500 bytes– Hence sending rate =
• 1500 * 8 ÷÷÷÷ 50us = 240 Mbps– This is much greater than the link capacity of 10 Mbps
• What happened??
Performance Analysis
• Putting in numbers for 10 Mbps ethernet– Packet size: 1518 bytes– ACK size: 64 bytes– Ignore propagation time. – Packet Tx time: 1.2144ms– Ack Tx time: 51.2 us
• Efficiency = 95.95%– More believable!
• Moral: If frame size exceeds delay bandwidth product, efficiency computation should be used.– Why?
Significance of Delay Bandwidth• Delay bandwidth represents the amount of data that
has left the transmitter and is still on the cable.
• Think of the cable as a pipe. This keeps the pipe full
• Delay bandwidth also represents the upper bound on stability.
• More sophisticated ARQ algorithms try to match their sending rate to the dynamic delay bandwidth product– Why is delay bandwidth dynamic?
Go-Back-n ARQ (1)• Sliding window or go-back-n ARQ is used in many
standard DLCs and transport protocols• Go-back-n ARQ extends stop-and-wait ARQ
– Sender does not have to wait for ack before sending the next packet
– Receiver accepts only packets in order and periodically sends an ack with request number NFE, where NFEacknowledges all packets with sequence numbers less than NFE and requests the packet with sequence number NFE
Go-Back-n ARQ (2)
• Parameter n (or SWS for send window size) determines how many packets may be outstanding before a request (ack) is received
• With SWS = n = 1, go-back-n becomesstop-and-wait ARQ
Go-Back-n ARQ Variables
• Sender variables– LFS: Last Frame Sent– LAR: Last Acknowledgment Received– SWS: Send Window Size
• Receiver variables– LFA: Last Frame Acceptable– NFE: Next Frame Expected– RWS: Receive Window Size
Go-Back-n ARQ Example (1)• Example of go-back-4 ARQ (SWS = 4)• Excessive delays and small n cause sender to have
to wait for acknowledgments
1
Node A
Node B0
SeqNum:
NFE:
delivered: 0
0
0
window: [0,3] [1,4] [2,5]
1
1
1
2
2
2 3
3
3
4 5
4
4
5
5
6
8
[6,9][3,6][4,7]
9
7
6
6
7
7
Window is [LAR, LAR+SWS-1]
Go-Back-n ARQ Example (2)
• Example of error in forward direction ingo-back-4 ARQ– All packets sent since the error-frame was sent
must be retransmitted– Sender must save and be ready to transmit the
last n (SWS) packets
5
Go-Back-n ARQ Example (3)
1
1
0Node
ANode B
0
SeqNum:
NFE:
0delivered:
0
2 3
1
window: [0,3] [1,4]4 5
[2,5]
1 2 2 2
2 3 4
2 3
[3,6]
4
2 3 4
5 6
time-out
5
Go-Back-n ARQ Algorithm (1)• Algorithm at node A (sender) to send to node B:
1. LAR ← 0, LFS ← -12. Do steps 3, 4, 5 in any order (with bounded delay)3. If LFS < LAR + SWS and if packet is available then
accept packet, assign it sequence number SeqNum, SeqNum ← LFS + 1
4. If error-free frame from B and with NFE > LAR, then LAR ← NFE
Go-Back-n ARQ Algorithm (2)
• Algorithm at node A (sender) to send to node B (continued):5.If LAR < LFS and no frame is being
transmitted, then choose some number SeqNum,LAR ≤ SeqNum ≤ LFS, and transmit packet SeqNum with SeqNum as sequence number; packet LAR must be transmitted within a bounded delay if value of LAR does not change
Go-Back-n ARQ Algorithm (3)• Algorithm at node B (receiver) to receive from
node A:1. NFE ← 0, repeat steps 2 and 3 forever2. If error-free frame received and SeqNum = NFE, then
pass packet to higher level andNFE ← NFE + 1
3. At some bounded time after receiving error-free frame send request for NFE to A
• Correctness of go-back-n ARQ can be proven; sequence numbers may be modulo m, m > n, as long as frames are delivered in order of transmission
Reliable Transmission: A State Machine PerspectiveReliable Transmission: A State Machine Perspective
Srinidhi Varadarajan
Principles of Reliable data transferPrinciples of Reliable data transfer� important in app., transport, link layers� top-10 list of important networking topics!
� characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)
Reliable data transfer: getting startedReliable data transfer: getting started
sendside
receiveside
rdt_send(): called from above, (e.g., by app.). Passed data to
deliver to receiver upper layer (receiver app)
udt_send(): called by rdt,to transfer packet over
unreliable channel to receiver
rdt_rcv(): called when packet arrives on rcv-side of channel
deliver_data(): called by rdtto deliver data to upper layer
Reliable data transfer: getting startedReliable data transfer: getting started
We’ll:� incrementally develop sender, receiver sides
of reliable data transfer protocol (rdt)� consider only unidirectional data transfer
– but control info will flow on both directions!� use finite state machines (FSM) to specify
sender, receiver
state1
state2
event causing state transitionactions taken on state transition
state: From current “state” next state
uniquely determined by
next event
eventactions
Rdt1.0: Rdt1.0: reliable transfer over a reliable channelreliable transfer over a reliable channel
� underlying channel perfectly reliable– no bit errors– no loss of packets
� separate FSMs for sender, receiver:– sender sends data into underlying channel– receiver read data from underlying channel
Rdt2.0: channel with bit errorsRdt2.0: channel with bit errors� underlying channel may flip bits in packet
– Need error detection. CRC, parity …� the question: how to recover from errors:
– acknowledgements (ACKs): receiver explicitly tells sender that packet was received correctly
– negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors
– sender retransmits packet on receipt of NAK– human scenarios using ACKs, NAKs?
• Telephone conversation. OK, Could you repeat that please?� new mechanisms in rdt2.0 (beyond rdt1.0):
– error detection– receiver feedback: control msgs (ACK,NAK) rcvr->sender
rdt2.0 has a fatal flaw!rdt2.0 has a fatal flaw!What happens if
ACK/NAK corrupted?� sender doesn’t know what
happened at receiver!� can’t just retransmit:
possible duplicate
What to do?� sender ACKs/NAKs
receiver’s ACK/NAK? What if sender ACK/NAK lost?
� retransmit, but this might cause retransmission of correctly received pkt!
Handling duplicates: � sender adds sequence
number to each pkt� sender retransmits current
pkt if ACK/NAK garbled� receiver discards (doesn’t
deliver up) duplicate pkt
Sender sends one packet, then waits for receiver response
stop and wait
rdt2.1: discussionrdt2.1: discussion
Sender:� seq # added to pkt� two seq. #’s (0,1) will
suffice. Why?� must check if received
ACK/NAK corrupted � twice as many states
– state must “remember” whether “current” pkt has 0 or 1 seq. #
Receiver:� must check if received
packet is duplicate– state indicates whether
0 or 1 is expected pkt seq #
� note: receiver can notknow if its last ACK/NAK received OK at sender
rdt2.2: a NAKrdt2.2: a NAK--free protocolfree protocol
� same functionality as rdt2.1, using ACKsonly
� instead of NAK, receiver sends ACK for last pkt received OK– receiver must explicitly
include seq # of pkt being ACKed
� duplicate ACK at sender results in same action as NAK: retransmit current pkt
senderFSM
!
rdt3.0: channels with errors rdt3.0: channels with errors andand lossloss
New assumption:underlying channel can also lose packets (data or ACKs)– checksum, seq. #,
ACKs, retransmissions will be of help, but not enough
Q: how to deal with loss?– sender waits until data
or ACK lost, then retransmits
– How do you know when the data is lost?
Approach: sender waits “reasonable” amount of time for ACK
� retransmits if no ACK received in this time
� if pkt (or ACK) just delayed (not lost):– retransmission will be
duplicate, but use of seq. #’s already handles this
– receiver must specify seq # of pkt being ACKed
� requires countdown timer
Performance of rdt3.0Performance of rdt3.0
� rdt3.0 works, but performance stinks� example: 1 Gbps link, 15 ms e-e prop. delay, 1KB
packet:Ttransmit = 8kb/pkt
10**9 b/sec = 8 microsec
Utilization = U = = 8 microsec30.016 msec
fraction of timesender busy sending = 0.00015
– 1KB pkt every 30 msec -> 33kB/sec throughput over 1 Gbps link
– network protocol limits use of physical resources!
Pipelined protocolsPipelined protocolsPipelining: sender allows multiple, “in-flight”, yet-
to-be-acknowledged pkts– range of sequence numbers must be increased– buffering at sender and/or receiver
� Two generic forms of pipelined protocols: go-Back-N, selective repeat
GoGo--BackBack--NNSender:� k-bit seq # in pkt header� “window” of up to N, consecutive unack’ed pkts allowed
� ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”– may receive duplicate ACKs (see receiver)
� timer for each in-flight pkt� timeout(n): retransmit pkt n and all higher seq # pkts in window
GBN: receiver extended FSMGBN: receiver extended FSM
receiver simple:� ACK-only: always send ACK for correctly-
received pkt with highest in-order seq #– may generate duplicate ACKs– need only remember expectedseqnum
� out-of-order pkt: – discard (don’t buffer) -> no receiver buffering!– ACK pkt with highest in-order seq #
Problems with GBNProblems with GBN
� Retransmits entire sender window on timeout– Can cause excessive retransmissions– Problem is exacerbated for networks with large
“memory”, i.e. large delay bandwidth product� Receiver throws away any out of order
packets, even if they are received correctly.– Forces retransmission
Selective RepeatSelective Repeat
� receiver individually acknowledges all correctly received pkts– buffers pkts, as needed, for eventual in-order
delivery to upper layer� sender only resends pkts for which ACK
not received– sender timer for each unACKed pkt
� sender window– N consecutive seq #’s– again limits seq #s of sent, unACKed pkts
Selective repeatSelective repeat
data from above :� if next available seq # in
window, send pkt� else hold packettimeout(n):� resend pkt n, restart timerACK(n) in
[sendbase,sendbase+N]:� mark pkt n as received� if n smallest unACKed pkt,
advance window base to next unACKed seq #
� Transmit any pending packets
senderpkt n in [rcvbase, rcvbase+N-1]
� send ACK(n)� out-of-order: buffer� in-order: deliver (also
deliver buffered, in-order pkts), advance window to next not-yet-received pkt
pkt n in [rcvbase-N,rcvbase-1]
� ACK(n)otherwise:� ignore
receiver
Selective repeat:Selective repeat:dilemmadilemma
Example: � seq #’s: 0, 1, 2, 3� window size=3
� receiver sees no difference in two scenarios!
� incorrectly passes duplicate data as new in (a)
Q: what relationship between seq # size and window size?
Out of Order DeliveryOut of Order Delivery
� What happens if the network delivers packets out of order – Send order != receive order
� Need a much larger – potentially infinite - sequence space– Why?
ClientClient--server paradigmserver paradigm
Client:� initiates contact with server
(“speaks first”)� typically requests service
from server, � for Web, client is
implemented in browser; for e-mail, in mail reader
Server:� provides requested service
to client� e.g., Web server sends
requested Web page, mail server delivers e-mail
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
request
reply
Application Layer ProgrammingApplication Layer Programming
API: application programming interface� defines interface between application and
transport layer
� sockets: Internet API– two processes communicate by sending data
into socket, reading data out of socket
Socket Interface. What is it?Socket Interface. What is it?
� Gives a file system like abstraction to the capabilities of the network.
� Each transport protocol offers a set of services. The socket API provides the abstraction to access these services
� The API defines function calls to create, close, read and write to/from a socket.
Socket AbstractionSocket Abstraction
� The socket is the basic abstraction for network communication in the socket API– Defines an endpoint of communication for a process– Operating system maintains information about the
socket and its connection– Application references the socket for sends, receives,
etc.
ProcessB
ProcessA
Ports (Sockets)
Network
What do you need for socket communication ?What do you need for socket communication ?
� Basically 4 parameters– Source Identifier (IP address)– Source Port– Destination Identifier – Destination Port
� In the socket API, this information is communicated by binding the socket.
Creating a socketCreating a socketint socket(int domain, int type, int protocol)
The call returns a integer identifier called a handle
Protocol Family:PF_INET or PF_UNIX
Communication semantics:
SOCK_STREAM or SOCK_DGRAM
Usually UNSPEC
Binding a socketBinding a socketint bind (int socket, struct sockaddr *address, int addr_len)
� This call is executed by:– Server in TCP and UDP
� It binds the socket to the specified address. The address parameter specifies the local component of the address, e.g. IP address and UDP/TCP port
Socket DescriptorsSocket Descriptors
� Operating system maintains a set of socket descriptors for each process– Note that socket descriptors are shared
by threads� Three data structures
– Socket descriptor table– Socket data structure– Address data structure
Socket DescriptorsSocket DescriptorsSocket
DescriptorTable
0:1:2:
...
proto family:PF_INET
Socket DataStructure
service:SOCK_STREAMlocal address:
...
remote address:
address family:AF_INET
Address DataStructure
host IP:128.173.88.85port:80
TCP Server Side: ListenTCP Server Side: Listenint listen (int socket, int backlog)
� This server side call specifies the number of pending connections on the given socket.
� When the server is processing a connection, “backlog” number of connections may be pending in a queue.
TCP Server Side: Passive OpenTCP Server Side: Passive Openint accept (int socket, struct sockaddr *address, int *addr_len)
� This call is executed by the server.
� The call does not return until a remote client has established a connection.
� When it completes, it returns a new socket handle corresponding to the just-established connection
TCP Client Side: Active OpenTCP Client Side: Active Openint connect (int socket, struct sockaddr *address, int *addr_len)
� This call is executed by the client. *address contains the remote address.
� The call attempts to connect the socket to a server. It does not return until a connection has been established.
� When the call completes, the socket “socket” is connected and ready for communication.
Sockets: SummarySockets: Summary
� Client:int socket(int domain, int type, int protocol)int connect (int socket, struct sockaddr *address, int addr_len)
� Server:int socket(int domain, int type, int protocol)int bind (int socket, struct sockaddr *address, int addr_len)int listen (int socket, int backlog)int accept (int socket, struct sockaddr *address, int *addr_len)
Message PassingMessage Passing� int send (int socket, char *message, int msg_len, int
flags) (TCP)
� int sendto (int socket, void *msg, int len, intflags, struct sockaddr * to,int tolen ); (UDP)
� int write(int socket, void *msg, int len); /* TCP */
� int recv (int socket, char *buffer, int buf_len, intflags) (TCP)
� int recvfrom(int socket, void *msg, int len, intflags, struct sockaddr *from, int*fromlen); (UDP)
� int read(int socket, void *msg, int len); (TCP)
Summary of Basic Socket CallsSummary of Basic Socket Calls
CLIENT SERVER
accept()connect()
Connect(3-way handshake)
write() read()Data
read() write()Data
close() close()
new connection
Network Byte OrderNetwork Byte Order
� Network byte order is most-significant byte first
� Byte ordering at a host may differ� Utility functions
– htons(): Host-to-network byte order for a short word (2 bytes)
– htonl(): Host-to-network byte order for a long word (4 bytes)
– ntohs(): Network-to-host byte order for a short word
– ntohl(): Network-to-host byte order for a long word
Some Other “Utility” FunctionsSome Other “Utility” Functions� gethostname() -- get name of local host� getpeername() -- get address of remote
host� getsockname() -- get local address of
socket� getXbyY() -- get protocol, host, or service
number using known number, address, or port, respectively
� getsockopt() -- get current socket options� setsockopt() -- set socket options� ioctl() -- retrieve or set socket information
Some Other “Utility” Functions Some Other “Utility” Functions
� inet_addr() -- convert “dotted” character string form of IP address to internal binary form
� inet_ntoa() -- convert internal binary form of IP address to “dotted” character string form
Address Data StructuresAddress Data Structures
� sockaddr is a generic address structure
� sockaddr_in is specific instance for the Internet address family
struct sockaddr {u_short sa_family; // type of addresschar sa_data[14]; // value of address
}
struct sockaddr_in {u_short sa_family; // type of address (AF_INET)u_short sa_port; // protocol port numberstruct in_addr sin_addr; // IP addresschar sin_zero[8]; // unused (set to zero)
}
Medium Access Links and ProtocolsThree types of “links”:• point-to-point (single wire, e.g. PPP, SLIP)• broadcast (shared wire or medium; e.g, Ethernet,
Wavelan, etc.)
• switched (e.g., telephone systems, switched Ethernet, ATM etc)
Point-to-Point protocols
• Telephone networks– Switched hierarchy.– Local Loop is the last mile interface to customer
premises equipment. (generally referred to in the networking world as the source of all evil)
– Originally involved a physical connection between the sender and the receiver.
– Nowadays, telephone networks use circuit switched medium access control
• Modems: Digital interface to the world of telephony
Modems: Signaling
• Modems: – Work over low bandwidth telephone lines (3000
Hz)• Signaling schemes: (why not just use digital
bit patterns?)– Possible choices:
• Amplitude modulation (AM)• Frequency modulation (FM or FSK)• Phase modulation (PSK)
Modems Signaling
• Modern modems use a combination of PSK and AM
• Create charts called constellation patterns.– Multiple bits encoded per signal.– Trellis encoding is used to minimize the chance of error.
Errors cause loss of several bits• Echo cancellation/suppression
– Needed for long-haul voice communication. – Prevents full duplex – In-band signaling at 2100 Hz is used to inhibit echo
cancellation circuitry.– Newer solution uses end-point resources for echo
suppression.
RS-232C, RS449: Point-to-Point Communication
• RS-232C and RS449 specify physical layer point-to-point serial communication
• 25 or 9 pin connectors, 15m cable length– <-3V = 1, >+4V=0,– BW: 20Kbps (originally, upgraded now to up to 115Kbps)– Main communication occurs using the RTS/CTS
protocol.• RS-449 is an upgraded RS-232C with 2 modes of
communication– Unbalanced mode, physically is similar to RS-232C, with
common ground signaling. – Balanced mode uses independent ground. Data rate
2Mbps with lengths up to 60m
Multiple Access protocols• single shared communication channel • two or more simultaneous transmissions by nodes:
interference – only one node can send successfully at a time
• multiple access protocol:– distributed algorithm that determines how stations share channel,
i.e., determine when station can transmit– communication about channel sharing must use channel itself! – what to look for in multiple access protocols:
• synchronous or asynchronous • information needed about other stations • robustness (e.g., to channel errors) • performance
Multiple Access protocols
• claim: humans use multiple access protocols all the time
• class can "guess" multiple access protocols– multiaccess protocol 1:– multiaccess protocol 2:– multiaccess protocol 3:– multiaccess protocol 4:
MAC Protocols: a taxonomy
Three broad classes:• Channel Partitioning
– divide channel into smaller “pieces” (time slots, frequency)
– allocate piece to node for exclusive use• Random Access
– allow collisions– “recover” from collisions
• “Taking turns”– tightly coordinate shared access to avoid collisions
Goal: efficient, fair, simple, decentralized
Channel Partitioning MAC protocols: TDMA
TDMA: time division multiple access• access to channel in "rounds" • each station gets fixed length slot (length = pkt trans
time) in each round • unused slots go idle • example: 6-station LAN, 1,3,4 have pkt, slots 2,5,6
idle
Channel Partitioning MAC protocols: FDMA
FDMA: frequency division multiple access• channel spectrum divided into frequency bands• each station assigned fixed frequency band• unused transmission time in frequency bands go idle • example: 6-station LAN, 1,3,4 have pkt, frequency bands
2,5,6 idle fr
eque
ncy
band
s
time
Channel Partitioning (CDMA)
CDMA (Code Division Multiple Access)• unique “code” assigned to each user; ie, code set
partitioning• used mostly in wireless broadcast channels (cellular,
satellite,etc)• all users share same frequency, but each user has own
“chipping” sequence (ie, code) to encode data• encoded signal = (original data) X (chipping sequence)• decoding: inner-product of encoded signal and chipping
sequence• allows multiple users to “coexist” and transmit
simultaneously with minimal interference (if codes are “orthogonal”)
Random Access protocols
• When node has packet to send– transmit at full channel data rate R.– no a priori coordination among nodes
• two or more transmitting nodes -> “collision”,• random access MAC protocol specifies:
– how to detect collisions– how to recover from collisions (e.g., via delayed
retransmissions)• Examples of random access MAC protocols:
– slotted ALOHA– ALOHA– CSMA and CSMA/CD
Slotted Aloha• time is divided into equal size slots (= pkt trans.
time)• node with new arriving pkt: transmit at beginning
of next slot • if collision: retransmit pkt in future slots with
probability p, until successful.
Success (S), Collision (C), Empty (E) slots
Slotted Aloha efficiencyQ: what is max fraction slots successful?A: Suppose N stations have packets to send
– each transmits in slot with probability p– prob. successful transmission S is:
by single node: S= p (1-p)(N-1)
by any of N nodes S = Prob (only one transmits)= N p (1-p)(N-1)
… choosing optimum p as n -> infty ...
= 1/e = .37 as N -> infty
At best: channeluse for useful transmissions 37%of time!
Pure (unslotted) ALOHA
• unslotted Aloha: simpler, no synchronization• pkt needs transmission:
– send without awaiting for beginning of slot• collision probability increases:
– pkt sent at t0 collide with other pkts sent in [t0-1, t0+1]
Pure Aloha (cont.)
P(success by given node) = P(node transmits) .
P(no other node transmits in [p0-1,p0] .P(no other node transmits in [p0-1,p0]
= p . (1-p) . (1-p)P(success by any of N nodes) = N p . (1-p) . (1-p)
… choosing optimum p as n -> infty ...= 1/(2e) = .18
S =
thr o
ugh p
ut =
“go o
dpu t
” (s
u cce
s s r
ate)
G = offered load = Np0.5 1.0 1.5 2.0
0.1
0.2
0.3
0.4
Pure Aloha
Slotted Aloha protocol constrainseffective channelthroughput!
CSMA: Carrier Sense Multiple Access
CSMA: listen before transmit:• If channel sensed idle: transmit entire pkt• If channel sensed busy, defer transmission
– Persistent CSMA: retry immediately with probability p when channel becomes idle (may cause instability)
– Non-persistent CSMA: retry after random interval
• human analogy: don’t interrupt others!
CSMA collisions
collisions can occur:propagation delay means two nodes may not yearhear each other’s transmission
collision:entire packet transmission time wasted
spatial layout of nodes along ethernet
note:role of distance and propagation delay in determining collision prob.
CSMA/CD (Collision Detection)
CSMA/CD: carrier sensing, deferral as in CSMA– collisions detected within short time– colliding transmissions aborted, reducing channel
wastage – persistent or non-persistent retransmission
• collision detection:– easy in wired LANs: measure signal strengths,
compare transmitted, received signals– difficult in wireless LANs: receiver shut off while
transmitting• human analogy: the polite conversationalist
“Taking Turns” MAC protocols
channel partitioning MAC protocols:– share channel efficiently at high load– inefficient at low load: delay in channel access, 1/N
bandwidth allocated even if only 1 active node! Random access MAC protocols
– efficient at low load: single node can fully utilize channel– high load: collision overhead
“taking turns” protocolslook for best of both worlds!
“Taking Turns” MAC protocols
Polling:• master node “invites”
slave nodes to transmit in turn
• Request to Send, Clear to Send msgs
• concerns:– polling overhead – latency– single point of failure
(master)
Token passing:• control token passed from
one node to next sequentially.• token message• concerns:
– token overhead – latency– single point of failure (token)
Reservation-based protocolsDistributed Polling:• time divided into slots• begins with N short reservation slots
– reservation slot time equal to channel end-end propagation delay
– station with message to send posts reservation– reservation seen by all stations
• after reservation slots, message transmissions ordered by known priority
Summary of MAC protocols• What do you do with a shared media?
– Channel Partitioning, by time, frequency or code• Time Division,Code Division, Frequency Division
– Random partitioning (dynamic), • ALOHA, S-ALOHA, CSMA, CSMA/CD• carrier sensing: easy in some technologies (wire), hard in others
(wireless)• CSMA/CD used in Ethernet
– Taking Turns• polling from a central cite, token passing
LAN technologiesData link layer so far:
– services, error detection/correction, multiple access
Next: LAN technologies– addressing– Ethernet– hubs, bridges, switches– 802.11– PPP– ATM
LAN Addresses and ARP32-bit IP address:• network-layer address• used to get datagram to destination network (recall
IP network definition)LAN (or MAC or physical) address: • used to get datagram from one interface to another
physically-connected interface (same network)• 48 bit MAC address (for most LANs)
burned in the adapter ROM
LAN Address (more)• MAC address allocation administered by IEEE• manufacturer buys portion of MAC address space
(to assure uniqueness)• Analogy:
(a) MAC address: like Social Security Number(b) IP address: like postal address
• MAC flat address => portability – can move LAN card from one LAN to another
• IP hierarchical address NOT portable– depends on network to which one attaches
Ethernet
IP SourceIP: 130.245.20.1
Ethernet: 0A:03:21:60:09:FA
IP DestinationIP: 130.245.20.2
Ethernet: 0A:03:23:65:09:FB
ARP QueryWhat is the Ethernet Address of 130.245.20.2
ARP Response0A:03:23:65:09:FB
Address Resolution Protocol (ARP)
• Maps IP addresses to Ethernet Addresses• ARP responses are cached
ARP protocol• A knows B's IP address, wants to learn
physical address of B • A broadcasts ARP query pkt, containing B's
IP address – all machines on LAN receive ARP query
• B receives ARP packet, replies to A with its (B's) physical layer address
• A caches (saves) IP-to-physical address pairs until information becomes old (times out) – soft state: information that times out (goes
away) unless refreshed
Ethernet“dominant” LAN technology: • cheap $20 for 100Mbs!• first widely used LAN technology• Simpler, cheaper than token ring LANs and ATM• Kept up with speed race: 10, 100, 1000 Mbps
Metcalfe’s Etheretsketch
Ethernet Frame StructureSending adapter encapsulates IP datagram (or
other network layer protocol packet) in Ethernet frame
Preamble:• 7 bytes with pattern 10101010 followed by
one byte with pattern 10101011• used to synchronize receiver, sender clock
rates
Ethernet Frame Structure (more)• Addresses: 6 bytes, frame is received by all
adapters on a LAN and dropped if address does not match
• Type/length: indicates the higher layer protocol, mostly IP but others may be supported such as Novell IPX and AppleTalk)
• CRC: checked at receiver, if error is detected, the frame is simply dropped
Ethernet: uses CSMA/CD
A: sense channel, if idle then {
transmit and monitor the channel; If detect another transmission then {
abort and send jam signal; update # collisions; delay as required by exponential backoff algorithm; goto A}
else {done with the frame; set collisions to zero}}
else {wait until ongoing transmission is over and goto A}
Ethernet’s CSMA/CD (more)
Jam Signal: make sure all other transmitters are aware of collision; 48 bits;
Exponential Backoff:• Goal: adapt retransmission attempts to estimated
current load– heavy load: random wait will be longer
• first collision: choose K from {0,1}; delay is K x 512 bit transmission times
• after second collision: choose K from {0,1,2,3}…• after ten or more collisions, choose K from
{0,1,2,3,4,…,1023}
Ethernet Technologies: 10Base2• 10: 10Mbps; 2: under 200 meters max cable length• thin coaxial cable in a bus topology
• repeaters used to connect up to multiple segments• repeater repeats bits it hears on one interface to its
other interfaces: physical layer device only!
10BaseT and 100BaseT• 10/100 Mbps rate; latter called “fast ethernet”• T stands for Twisted Pair• Hub to which nodes are connected by twisted pair,
thus “star topology”• CSMA/CD implemented at hub
10BaseT and 100BaseT (more)• Max distance from node to Hub is 100 meters• Hub can disconnect “jabbering adapter• Hub can gather monitoring information, statistics for
display to LAN administrators
Gbit Ethernet• use standard Ethernet frame format• allows for point-to-point links and shared broadcast
channels• in shared mode, CSMA/CD is used; short distances
between nodes to be efficient• uses hubs, called “Buffered Distributors”• Full-Duplex at 1 Gbps for point-to-point links
Token Passing: IEEE802.5 standard
• 4 Mbps • max token holding time: 10 ms, limiting frame length
• SD, ED mark start, end of packet • AC: access control byte:
– token bit: value 0 means token can be seized, value 1 means data follows FC
– priority bits: priority of packet – reservation bits: station can write these bits to prevent stations
with lower priority packet from seizing token after token becomes free
Token Passing: IEEE802.5 standard
• FC: frame control used for monitoring and maintenance
• source, destination address: 48 bit physical address, as in Ethernet
• data: packet from network layer • checksum: CRC • FS: frame status: set by dest., read by sender
– set to indicate destination up, frame copied OK from ring – DLC-level ACKing
Interconnecting LANsQ: Why not just one big LAN? • Limited amount of supportable traffic: on single LAN,
all stations must share bandwidth • limited length: 802.3 specifies maximum cable
length • large “collision domain” (can collide with many
stations)• limited number of stations: 802.5 have token
passing delays at each station
Hubs• Physical Layer devices: essentially repeaters
operating at bit levels: repeat received bits on one interface to all other interfaces
• Hubs can be arranged in a hierarchy (or multi-tier design), with backbone hub at its top
Hubs (more)
• Each connected LAN referred to as LAN segment• Hubs do not isolate collision domains: node may collide
with any node residing at any segment in LAN • Hub Advantages:
– simple, inexpensive device– Multi-tier provides graceful degradation: portions of
the LAN continue to operate if one hub malfunctions– extends maximum distance between node pairs
(100m per Hub)
Hub limitations• single collision domain results in no increase in max
throughput– multi-tier throughput same as single segment throughput
• individual LAN restrictions pose limits on number of nodes in same collision domain and on total allowed geographical coverage
• cannot connect different Ethernet types (e.g., 10BaseT and 100baseT)
Bridges• Link Layer devices: operate on Ethernet frames,
examining frame header and selectively forwarding frame based on its destination
• Bridge isolates collision domains since it buffers frames
• When frame is to be forwarded on segment, bridge uses CSMA/CD to access segment and transmit
Bridges (more)• Bridge advantages:
– Isolates collision domains resulting in higher total max throughput, and does not limit the number of nodes nor geographical coverage
– Can connect different type Ethernet since it is a store and forward device
– Transparent: no need for any change to hosts LAN adapters
Bridges: frame filtering, forwarding• bridges filter packets
– same-LAN -segment frames not forwarded onto other LAN segments
• forwarding: – how to know which LAN segment on which to
forward frame?– looks like a routing problem (more shortly!)
Interconnection Without Backbone
• Not recommended for two reasons:- single point of failure at Computer Science hub- all traffic between EE and SE must path over CS segment
Bridge Filtering
• bridges learn which hosts can be reached through which interfaces: maintain filtering tables– when frame received, bridge “learns” location of
sender: incoming LAN segment– records sender location in filtering table
• filtering table entry: – (Node LAN Address, Bridge Interface, Time Stamp)– stale entries in Filtering Table dropped (TTL can be 60
minutes)
Bridge Filtering
• filtering procedure:if destination is on LAN on which frame was received
then drop the frameelse { lookup filtering table
if entry found for destinationthen forward the frame on interface indicated;else flood; /* forward on all but the interface on
which the frame arrived*/}
Bridge Learning: exampleSuppose C sends frame to D and D replies
back with frame to C
• C sends frame, bridge has no info about D, so floods to both LANs– bridge notes that C is on port 1 – frame ignored on upper LAN – frame received by D
Bridge Learning: example
• D generates reply to C, sends – bridge sees frame from D – bridge notes that D is on interface 2 – bridge knows C on interface 1, so selectively forwards frame out via interface 1
Spanning Tree• The learning bridge fails when the network topology
has a loop. – Why?
• Loops are not necessarily bad. They provide redundancy that can be used to recover from failures
• To handle loops, bridges implement the spanning tree algorithm.– The spanning tree algorithm imposes a logical tree over
the physical topology– Data is only transferred along links that belong to the
spanning tree
Spanning Tree Algorithm• Each bridge has unique id (e.g., B1, B2, B3)
• Select bridge with smallest id as root
• Select bridge on each LAN closest to root as designated bridge (use id to break ties)
• Each bridge forwards frames over each LAN for which it is the designated bridge
B3
A
C
E
DB2
B5
B
B7 KF
H
B4
J
B1
B6
G
I
Spanning Tree Algorithm (contd.)• Bridges exchange configuration messages called
CBPDU’s(Configuration Bridge Protocol Data Unit)– id for bridge sending the message– id for what the sending bridge believes to be root
bridge– distance (hops) from sending bridge to root bridge
• Each bridge records the current best configuration message for each port
• Initially, each bridge believes it is the root
Spanning Tree Algorithm (contd.)• When a bridge learns that it is not the root it stops generating
configuration messages– in steady state, only root generates configuration messages
• When the bridge learns that it is not the designated bridge, it stops forwarding configuration messages– in steady state, only designated bridges forward config messages
• Root continues to periodically send config messages
• If any bridge does not receive successive config messages, it starts generating config messages claiming to be the root– This is used to recover from root failure
Limitations of Bridges• Do not scale
– spanning tree algorithm does not scale– single large broadcast domains do not scale
• Do not accommodate heterogeneity– Bridges support ethernet to ethernet, ethernet to 802.5
and 802.5 to 802.5.
• Caution: beware of transparency – Applications that assume that they are executing on a
single LAN will fail.– Latency increases in large LANs, so does jitter
WWF Bridges vs. Routers• both store-and-forward devices
– routers: network layer devices (examine network layer headers)
– bridges are Link Layer devices• routers maintain routing tables, implement
routing algorithms• bridges maintain filtering tables, implement
filtering, learning and spanning tree algorithms
Routers vs. BridgesBridges + and -+ Bridge operation is simpler requiring less processing
bandwidth- Topologies are restricted with bridges: a spanning
tree must be built to avoid cycles - Bridges do not offer protection from broadcast
storms (endless broadcasting by a host will be forwarded by a bridge)
Routers vs. Bridges
Routers + and -+ arbitrary topologies can be supported, cycling is limited
by TTL counters (and good routing protocols)+ provide firewall protection against broadcast storms- require IP address configuration (not plug and play)- require higher processing bandwidth
• bridges do well in small (few hundred hosts) while routers used in large networks (thousands of hosts)
Ethernet Switches• layer 2 (frame) forwarding,
filtering using LAN addresses• Switching: A-to-B and A’-to-
B’ simultaneously, no collisions
• large number of interfaces• often: individual hosts, star-
connected into switch– Ethernet, but no collisions!
Ethernet Switches• cut-through switching: frame forwarded from
input to output port without awaiting for assembly of entire frame– slight reduction in latency
• combinations of shared/dedicated, 10/100/1000 Mbps interfaces
IEEE 802.11 Wireless LAN• wireless LANs: untethered (often mobile) networking• IEEE 802.11 standard:
– MAC protocol– unlicensed frequency spectrum: 900Mhz, 2.4Ghz
• Basic Service Set (BSS)(a.k.a. “cell”) contains:– wireless hosts– access point (AP): base
station• BSS’s combined to form
distribution system (DS)
Ad Hoc Networks• Ad hoc network: IEEE 802.11 stations can
dynamically form network without AP• Applications:
– “laptop” meeting in conference room, car– interconnection of “personal” devices– battlefield
• IETF MANET (Mobile Ad hoc Networks) working group
IEEE 802.11 MAC Protocol: CSMA/CA
802.11 CSMA: sender- if sense channel idle for
DIFS sec.then transmit entire frame
(no collision detection)-if sense channel busy
then binary backoff
802.11 CSMA receiver:if received OK
return ACK after SIFS
IEEE 802.11 MAC Protocol802.11 CSMA Protocol:
others• NAV: Network Allocation
Vector• 802.11 frame has
transmission time field• others (hearing data) defer
access for NAV time units
Hidden Terminal effect• hidden terminals: A, C cannot hear each other
– obstacles, signal attenuation– collisions at B
• goal: avoid collisions at B• CSMA/CA: CSMA with Collision Avoidance
Collision Avoidance: RTS-CTS exchange
• CSMA/CA: explicit channel reservation– sender: send short
RTS: request to send– receiver: reply with
short CTS: clear to send
• CTS reserves channel for sender, notifying (possibly hidden) stations
• avoid hidden station collisions
Collision Avoidance: RTS-CTS exchange
• RTS and CTS short:– collisions less likely,
of shorter duration– end result similar to
collision detection• IEEE 802.11 allows:
– CSMA– CSMA/CA:
reservations– polling from AP
Point to Point Data Link Control
• one sender, one receiver, one link: easier than broadcast link:– no Media Access Control– no need for explicit MAC addressing– e.g., dialup link, ISDN line
• popular point-to-point DLC protocols:– PPP (point-to-point protocol)– HDLC: High level data link control (Data
link used to be considered “high layer” in protocol stack!
PPP Design Requirements [RFC 1557]• packet framing: encapsulation of network-layer
datagram in data link frame – carry network layer data of any network layer
protocol (not just IP) at same time– ability to demultiplex upwards
• bit transparency: must carry any bit pattern in the data field
• error detection (no correction)• connection livenes: detect, signal link failure to
network layer• network layer address negotiation: endpoint can
learn/configure each other’s network address
PPP non-requirements
• no error correction/recovery• no flow control• out of order delivery OK • no need to support multipoint links (e.g.,
polling)
Error recovery, flow control, data re-ordering all relegated to higher layers!|
PPP Data Frame• Flag: delimiter (framing)• Address: does nothing (only one option)• Control: does nothing; in the future possible
multiple control fields• Protocol: upper layer protocol to which frame
delivered (eg, PPP-LCP, IP, IPCP, etc)
PPP Data Frame• info: upper layer data being carried• check: cyclic redundancy check for error
detection
Byte Stuffing• “data transparency” requirement: data field
must be allowed to include flag pattern <01111110>– Q: is received <01111110> data or flag?
• Sender: adds (“stuffs”) extra < 01111110> byte after each < 01111110> data byte
• Receiver:– two 01111110 bytes in a row: discard first
byte, continue data reception– single 01111110: flag byte
PPP Data Control ProtocolBefore exchanging
network-layer data, data link peers must
• configure PPP link (max. frame length, authentication)
• learn/configure networklayer information– for IP: carry IP Control
Protocol (IPCP) msgs(protocol field: 8021) to configure/learn IP address
Asynchronous Transfer Mode: ATM• 1980s/1990’s standard for high-speed
(155Mbps to 622 Mbps and higher) Broadband Integrated Service Digital Network architecture
• Goal: integrated, end-end transport of carry voice, video, data– meeting timing/QoS requirements of voice,
video (versus Internet best-effort model)– “next generation” telephony: technical roots
in telephone world– packet-switching (fixed length packets,
called “cells”) using virtual circuits
ATM architecture
• adaptation layer: only at edge of ATM network– data segmentation/reassembly– roughly analogous to Internet transport layer
• ATM layer: “network” layer– cell switching, routing
• physical layer
ATM: network or link layer?Vision: end-to-end
transport: “ATM from desktop to desktop”– ATM is a network
technologyReality: used to
connect IP backbone routers – “IP over ATM”– ATM as switched
link layer, connecting IP routers
ATM Adaptation Layer (AAL)• ATM Adaptation Layer (AAL): “adapts” upper layers
(IP or native ATM applications) to ATM layer below• AAL present only in end systems, not in switches• AAL layer segment (header/trailer fields, data)
fragmented across multiple ATM cells – analogy: TCP segment in many IP packets
ATM Adaption Layer (AAL) [more]Different versions of AAL layers, depending on ATM service
class:• AAL1: for CBR (Constant Bit Rate) services, e.g. circuit emulation• AAL2: for VBR (Variable Bit Rate) services, e.g., MPEG video• AAL5: for data (eg, IP datagrams)
AAL PDU
ATM cell
User data
AAL5 - Simple And Efficient AL (SEAL)
• AAL5: low overhead AAL used to carry IP datagrams– 4 byte cyclic redundancy check – PAD ensures payload multiple of 48bytes – large AAL5 data unit to be fragmented into 48-byte
ATM cells
ATM LayerService: transport cells across ATM network• analogous to IP network layer• very different services than IP network layer
NetworkArchitecture
Internet
ATM
ATM
ATM
ATM
ServiceModel
best effort
CBR
VBR
ABR
UBR
Bandwidth
none
constantrateguaranteedrateguaranteed minimumnone
Loss
no
yes
yes
no
no
Order
no
yes
yes
yes
yes
Timing
no
yes
yes
no
no
Congestionfeedback
no (inferredvia loss)nocongestionnocongestionyes
no
Guarantees ?
ATM Layer: Virtual Circuits• VC transport: cells carried on VC from source to
dest– call setup, teardown for each call before data can flow– each packet carries VC identifier (not destination ID)– every switch on source-dest path maintain “state” for
each passing connection– link,switch resources (bandwidth, buffers) may be
allocated to VC: to get circuit-like perf.• Permanent VCs (PVCs)
– long lasting connections– typically: “permanent” route between to IP routers
• Switched VCs (SVC):– dynamically set up on per-call basis
ATM VCs• Advantages of ATM VC approach:
– QoS performance guarantee for connection mapped to VC (bandwidth, delay, delay jitter)
• Drawbacks of ATM VC approach:– Inefficient support of datagram traffic– one PVC between each source/dest pair) does not
scale (N*2 connections needed) – SVC introduces call setup latency, processing
overhead for short lived connections
ATM Layer: ATM cell• 5-byte ATM cell header• 48-byte payload
– Why?: small payload -> short cell-creation delay for digitized voice
– halfway between 32 and 64 (compromise!)
Cell header
Cell format
ATM cell header• VCI: virtual channel ID
– will change from link to link thru net• PT: Payload type (e.g. RM cell versus data cell) • CLP: Cell Loss Priority bit
– CLP = 1 implies low priority cell, can be discarded if congestion
• HEC: Header Error Checksum– cyclic redundancy check
ATM Physical Layer (more)Two pieces (sublayers) of physical layer:• Transmission Convergence Sublayer (TCS): adapts
ATM layer above to PMD sublayer below• Physical Medium Dependent: depends on physical
medium being used
TCS Functions:– Header checksum generation: 8 bits CRC – Cell delineation– With “unstructured” PMD sublayer, transmission of idle
cells when no data cells to send
ATM Physical LayerPhysical Medium Dependent (PMD) sublayer• SONET/SDH: transmission frame structure (like a
container carrying bits); – bit synchronization; – bandwidth partitions (TDM); – several speeds: OC1 = 51.84 Mbps; OC3 = 155.52
Mbps; OC12 = 622.08 Mbps• TI/T3: transmission frame structure (old telephone
hierarchy): 1.5 Mbps/ 45 Mbps• unstructured: just cells (busy/idle)
IP-Over-ATMClassic IP only• 3 “networks” (e.g., LAN
segments)• MAC (802.3) and IP
addresses
IP over ATM• replace “network” (e.g.,
LAN segment) with ATM network
• ATM addresses, IP addresses
ATMnetwork
EthernetLANs Ethernet
LANs
IP-Over-ATMIssues:• IP datagrams
into ATM AAL5 PDUs
• from IP addresses to ATM addresses– just like IP
addresses to 802.3 MAC addresses!
ATMnetwork
EthernetLANs
Datagram Journey in IP-over-ATM Network
• at Source Host:– IP layer finds mapping between IP, ATM dest address (using ARP)– passes datagram to AAL5– AAL5 encapsulates data, segments to cells, passes to ATM layer
• ATM network: moves cell along VC to destination
• at Destination Host:– AAL5 reassembles cells into original datagram– if CRC OK, datgram is passed to IP
ARP in ATM Nets• ATM network needs destination ATM address
– just like Ethernet needs destination Ethernet address
• IP/ATM address translation done by ATM ARP (Address Resolution Protocol)– ARP server in ATM network performs broadcast of
ATM ARP translation request to all connected ATM devices
– hosts can register their ATM addresses with server to avoid lookup
X.25 and Frame RelayLike ATM:• wide area network technologies • virtual circuit oriented • origins in telephony world• can be used to carry IP datagrams
– can thus be viewed as Link Layers by IP protocol
X.25• X.25 builds VC between source and destination for
each user connection• Per-hop control along path
– error control (with retransmissions) on each hop using LAP-B• variant of the HDLC protocol
– per-hop flow control using credits• congestion arising at intermediate node
propagates to previous node on path• back to source via back pressure
IP versus X.25 • X.25: reliable in-sequence end-end delivery
from end-to-end– “intelligence in the network”
• IP: unreliable, out-of-sequence end-end delivery– “intelligence in the endpoints”
• gigabit routers: limited processing possible• 2000: IP wins
Frame Relay• Designed in late ‘80s, widely deployed in the ‘90s• Frame relay service:
– no error control– end-to-end congestion control
Frame Relay (more)• Designed to interconnect corporate customer
LANs– typically permanent VC’s: “pipe” carrying
aggregate traffic between two routers– switched VC’s: as in ATM
• corporate customer leases FR service from public Frame Relay network (eg, Sprint, ATT)
Frame Relay (more)
• Flag bits, 01111110, delimit frame• address:
– 10 bit VC ID field– 3 congestion control bits
• FECN: forward explicit congestion notification (frame experienced congestion on path)
• BECN: congestion on reverse path• DE: discard eligibility
addressflags data CRC flags
Frame Relay -VC Rate Control• Committed Information Rate (CIR)
– defined, “guaranteed” for each VC– negotiated at VC set up time– customer pays based on CIR
• DE bit: Discard Eligibility bit– Edge FR switch measures traffic rate for each VC;
marks DE bit– DE = 0: high priority, rate compliant frame; deliver
at “all costs”– DE = 1: low priority, eligible for discard when
congestion
Frame Relay - CIR & Frame Marking
• Access Rate: rate R of the access link between source router (customer) and edge FR switch(provider); 64Kbps < R < 1,544Kbps
• Typically, many VCs (one per destination router) multiplexed on the same access trunk; each VC has own CIR
• Edge FR switch measures traffic rate for each VC; it marks
• (ie DE <= 1) frames which exceed CIR (these may be later dropped)
Summary• principles behind data link layer services:
– error detection, correction– sharing a broadcast channel: multiple access– link layer addressing, ARP
• various link layer technologies– Ethernet– hubs, bridges, switches– IEEE 802.11 LANs– PPP– ATM– X.25, Frame Relay
Internetworking• Motivation
– Heterogeneity– Scale
• IP is the glue that connects heterogeneous networks giving the illusion of a homogenous one.
• Salient Features– Each host is identified by a unique 32 bit identifier.– Best Effort Service Model– Global Addressing Scheme
Network layer functions• transport packet from sending
to receiving hosts • network layer protocols in
every host, router
three important functions:• path determination: route taken
by packets from source to dest. Routing algorithms
• switching: move packets from router’s input to appropriate router output
• call setup: some network architectures require router call setup along path before data flows
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
networkdata linkphysical
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
The Internet Model• no call setup at network layer• routers: no state about end-to-end connections
– no network-level concept of “connection”• packets typically routed using destination host ID
– packets between same source-dest pair may take different paths
applicationtransportnetworkdata linkphysical
applicationtransportnetworkdata linkphysical
1. Send data 2. Receive data
The Internet: Service Model• Connectionless
– Datagram based
• Best-effort delivery (unreliable service)– packets are lost– packets are delivered out of order– duplicate copies of a packet may be delivered– packets can be delayed for a long time
IP Internet
• Concatenation of Networks
• Protocol Stack
R2
R1
H4
H5
H3H2H1
Network 2 (Ethernet)
Network 1 (Ethernet)
H6
Network 3 (FDDI)
Network 4(point-to-point)
H7 R3 H8
R1
ETH FDDI
IPIP
ETH
TCP R2
FDDI PPP
IP
R3
PPP ETH
IP
H1
IP
ETH
TCP
H8
IP datagram format
ver length
32 bits
data (variable length,typically a TCP
or UDP segment)
16-bit identifierInternetchecksum
time tolive
32 bit source IP address
IP protocol versionnumber
header length(32 bit words)
max numberremaining hops
(decremented at each router)
forfragmentation/reassembly
total datagramlength (bytes)
upper layer protocolto deliver payload to
head.len
type ofservice
“type” of data flgs fragmentoffset
upperlayer
32 bit destination IP address
Options (if any) E.g. timestamp,record routetaken, specifylist of routers to visit.
IP Fragmentation & Reassembly• network links have MTU
(max.transfer size) - largest possible link-level frame.– different link types,
different MTUs • large IP datagram divided
(“fragmented”) within net– one datagram becomes
several datagrams– “reassembled” only at
final destination– IP header bits used to
identify, order related fragments
fragmentation: in: one large datagramout: 3 smaller datagrams
reassembly
IP Fragmentation and ReassemblyID=x
offset=0
fragflag=0
length=4000
ID=x
offset=0
fragflag=1
length=1500
ID=x
offset=1480
fragflag=1
length=1500
ID=x
offset=2960
fragflag=0
length=1040
One large datagram becomesseveral smaller datagrams
ICMP: Internet Control Message Protocol
• used by hosts, routers, gateways to communication network-level information– error reporting: unreachable
host, network, port, protocol– echo request/reply (used by
ping)• network-layer “above” IP:
– ICMP msgs carried in IP datagrams
• ICMP message: type, code plus first 8 bytes of IP datagram causing error
Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion
control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header
IP Addressing: Introduction
• IP address: 32-bit identifier for host, router interface
• interface: connection between host, router and physical link– router’s typically have
multiple interfaces– host may have multiple
interfaces– IP addresses associated
with interface, not host, router
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.1 = 11011111 00000001 00000001 00000001
223 1 11
IP Addressing• IP address:
– network part (high order bits)
– host part (low order bits) • What’s a network ?
(from IP address perspective)– device interfaces with
same network part of IP address
– can physically reach each other without intervening router
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
network consisting of 3 IP networks(for IP addresses starting with 223, first 24 bits are network address)
LAN
IP AddressingHow to find the
networks?• Detach each
interface from router, host
• create “islands of isolated networks
223.1.1.1
223.1.1.3
223.1.1.4
223.1.2.2223.1.2.1
223.1.2.6
223.1.3.2223.1.3.1
223.1.3.27
223.1.1.2
223.1.7.0
223.1.7.1223.1.8.0223.1.8.1
223.1.9.1
223.1.9.2
Interconnected system consisting
of six networks
IP Addresses
0network host
10 network host
110 network host
1110 multicast address
A
B
C
D
class1.0.0.0 to127.255.255.255128.0.0.0 to191.255.255.255192.0.0.0 to223.255.255.255
224.0.0.0 to239.255.255.255
32 bits
given notion of “network”, let’s re-examine IP addresses:
“class-full” addressing:
IP addressing: CIDR• classful addressing:
– inefficient use of address space, address space exhaustion
– e.g., class B net allocated enough addresses for 65K hosts, even if only 2K hosts in that network
• CIDR: Classless InterDomain Routing– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in
network portion of address
11001000 00010111 00010000 00000000
networkpart
hostpart
200.23.16.0/23
IP addresses: how to get one?
Hosts (host portion):• hard-coded by system admin in a file• DHCP: Dynamic Host Configuration Protocol:
dynamically get address: “plug-and-play”– host broadcasts “DHCP discover” msg– DHCP server responds with “DHCP offer” msg– host requests IP address: “DHCP request” msg– DHCP server sends address: “DHCP ack” msg
IP addresses: how to get one?Network (network portion):• get allocated portion of ISP’s address space:
ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20
Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23
Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23
Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….
Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23
Hierarchical addressing: route aggregation
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16”
200.23.20.0/23Organization 2
...
...
Hierarchical addressing allows efficient advertisement of routing information:
Hierarchical addressing: more specific routes
ISPs-R-Us has a more specific route to Organization 1
“Send me anythingwith addresses beginning 200.23.16.0/20”
200.23.16.0/23
200.23.18.0/23
200.23.30.0/23
Fly-By-Night-ISP
Organization 0
Organization 7Internet
Organization 1
ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16or 200.23.18.0/23”
200.23.20.0/23Organization 2
...
...
IP addressing: the last word...
Q: How does an ISP get block of addresses?A: ICANN: Internet Corporation for Assigned
Names and Numbers– allocates addresses– manages DNS– assigns domain names, resolves disputes
Getting a datagram from source to dest.
IP datagram:
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
miscfields
sourceIP addr
destIP addr data
• datagram remains unchanged, as it travels source to destination
• addr fields of interest here
Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2
routing table in A
Getting a datagram from source to dest.
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Starting at A, given IP datagram addressed to B:
• look up net. address of B• find B is on same net. as A• link layer will send
datagram directly to B inside link-layer frame– B and A are directly
connected
Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2
miscfields 223.1.1.1 223.1.1.3 data
Getting a datagram from source to dest.
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2Starting at A, dest. E:
• look up network address of E• E on different network
– A, E not directly attached• routing table: next hop router to
E is 223.1.1.4 • link layer sends datagram to
router 223.1.1.4 inside link-layer frame
• datagram arrives at 223.1.1.4 • continued…..
miscfields 223.1.1.1 223.1.2.3 data
Getting a datagram from source to dest.
223.1.1.1
223.1.1.2
223.1.1.3
223.1.1.4 223.1.2.9
223.1.2.2
223.1.2.1
223.1.3.2223.1.3.1
223.1.3.27
A
BE
Arriving at 223.1.4, destined for 223.1.2.2
• look up network address of E• E on same network as router’s
interface 223.1.2.9– router, E directly attached
• link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9
• datagram arrives at 223.1.2.2!!!
miscfields 223.1.1.1 223.1.2.3 data network router Nhops interface
223.1.1 - 1 223.1.1.4223.1.2 - 1 223.1.2.9223.1.3 - 1 223.1.3.27
Dest. next
RoutingRouting
Graph abstraction for routing algorithms:
� graph nodes are routers
� graph edges are physical links– link cost: delay, $ cost,
or congestion level
Goal: determine “good” path(sequence of routers) thru
network from source to dest.
Routing protocol
A
ED
CB
F2
21
3
1
1
2
53
5
� “good” path:– typically means
minimum cost path– other def’s possible
Routing Algorithm classificationRouting Algorithm classification
Global or decentralized information?
Global:� all routers have complete
topology, link cost info� “link state” algorithmsDecentralized:� router knows physically-
connected neighbors, link costs to neighbors
� iterative process of computation, exchange of info with neighbors
� “distance vector” algorithms
Static or dynamic?Static:� routes change slowly
over timeDynamic:� routes change more
quickly– periodic update– in response to link
cost changes
A LinkA Link--State Routing AlgorithmState Routing Algorithm
Dijkstra’s algorithm� net topology, link costs
known to all nodes– accomplished via “link
state broadcast” – all nodes have same
info� computes least cost paths
from one node (‘source”) to all other nodes– gives routing table for
that node� iterative: after k iterations,
know least cost path to k dest.’s
Notation:� c(i,j): link cost from node i
to j. cost infinite if not direct neighbors
� D(v): current value of cost of path from source to dest. V
� p(v): predecessor node along path from source to v, that is next v
� N: set of nodes whose least cost path definitively known
Dijsktra’sDijsktra’s AlgorithmAlgorithm1 Initialization:2 N = {A} 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v) 6 else D(v) = infty 7 8 Loop9 find w not in N such that D(w) is a minimum 10 add w to N 11 update D(v) for all v adjacent to w and not in N: 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N
Dijkstra’s Dijkstra’s algorithm: examplealgorithm: exampleStep
012345
start NA
ADADE
ADEBADEBC
ADEBCF
D(B),p(B)2,A2,A2,A
D(C),p(C)5,A4,D3,E3,E
D(D),p(D)1,A
D(E),p(E)infinity
2,D
D(F),p(F)infinityinfinity
4,E4,E4,E
A
ED
CB
F2
21
3
1
1
2
53
5
Dijkstra’s Dijkstra’s algorithm, discussionalgorithm, discussionAlgorithm computational complexity: n nodes� each iteration: need to check all nodes, w, not in N� n*(n+1)/2 comparisons: O(n2)� more efficient implementations possible:
– O(nlogn): Use a heap (sorted) to maintain interim table
Oscillations possible:� e.g., link cost = amount of carried traffic
AD
CB
1 1+e
e0
e1 1
0 0
AD
CB
2+e 0
001+e 1
AD
CB
0 2+e
1+e10 0
AD
CB
2+e 0
e01+e 1
initially … recomputerouting
… recompute … recompute
Link State: Reliable FloodingLink State: Reliable Flooding� Link State routers exchange information using
Link State Packets (LSP).� LSP contains
– id of the node that created the LSP– cost of the link to each directly connected neighbor– sequence number (SEQNO)– time-to-live (TTL) for this packet
� Reliable flooding– store most recent LSP from each node– forward LSP to all nodes but one that sent it– generate new LSP periodically
• increment SEQNO– start SEQNO at 0 when reboot– decrement TTL of each stored LSP
• discard when TTL=0
Distance Vector Routing AlgorithmDistance Vector Routing Algorithmiterative:� continues until no
nodes exchange info.� self-terminating: no
“signal” to stopasynchronous:� nodes need not
exchange info/iterate in lock step!
distributed:� each node
communicates onlywith directly-attached neighbors
Distance Table data structure� each node has its own� row for each possible
destination� column for each directly-
attached neighbor to node
Distance Table: exampleDistance Table: example
A
E D
CB7
81
2
1
2D ()
A
B
C
D
A
1
7
6
4
B
14
8
9
11
D
5
5
4
2
E cost to destination via
dest
inat
ion
D (C,D)E
c(E,D) + min {D (C,w)}Dw=
= 2+2 = 4
D (A,D)E
c(E,D) + min {D (A,w)}Dw=
= 2+3 = 5
D (A,B)E
c(E,B) + min {D (A,w)}Bw=
= 8+6 = 14
loop!
loop!
Distance table Distance table givesgives routing tablerouting table
D ()
A
B
C
D
A
1
7
6
4
B
14
8
9
11
D
5
5
4
2
E cost to destination via
dest
inat
ion
A
B
C
D
A,1
D,5
D,4
D,4
Outgoing link to use, cost
dest
inat
ion
Distance table Routing table
Distance Vector Routing: overviewDistance Vector Routing: overviewIterative, asynchronous:
each local iteration caused by:
� local link cost change � message from neighbor:
its least cost path change from neighbor
Distributed:� each node notifies
neighbors only when its least cost path to any destination changes– neighbors then notify
their neighbors if necessary
wait for (change in local link cost of msg from neighbor)
recompute distance table
if least cost path to any dest has changed, notifyneighbors
Each node:
Distance Vector: link cost changesDistance Vector: link cost changes
Link cost changes:� node detects local link cost change � updates distance table (line 15)� if cost change in least cost path,
notify neighbors (lines 23,24)X Z
14
50
Y1
algorithmterminates
“goodnews travelsfast”
Distance Vector: link cost changesDistance Vector: link cost changesLink cost changes:� good news travels fast � bad news travels slow -
“count to infinity” problem!
X Z14
50
Y60
algorithmcontinues
on!
Distance Vector: poisoned reverseDistance Vector: poisoned reverse
If Z routes through Y to get to X :� Z tells Y its (Z’s) distance to X is
infinite (so Y won’t route to X via Z)� Does not work on larger loops
X Z14
50
Y60
algorithmterminates
Comparison of LS and DV algorithmsComparison of LS and DV algorithms
Message complexity� LS: with n nodes, with an
average of l links/node, each node sends O(nl). Total messages O(n2l)
� DV: exchange between neighbors only– convergence time varies– may be routing loops– count-to-infinity problem
Robustness: what happens if router malfunctions?
LS:– node can advertise
incorrect link cost– each node computes only
its own tableDV:
– DV node can advertise incorrect path cost
– each node’s table used by others
• error propagate thru network
Hierarchical RoutingHierarchical Routing
scale: with 50 million destinations:
� can’t store all dest’s in routing tables!
� routing table exchange would swamp links!
administrative autonomy� internet = network of
networks� each network admin may
want to control routing in its own network
Our routing study thus far -idealization � all routers identical� network “flat”… not true in practice
Hierarchical RoutingHierarchical Routing
� aggregate routers into regions, “autonomous systems” (AS)
� routers in same AS run same routing protocol– “intra-AS” routing
protocol– routers in different AS
can run different intra-AS routing protocol
� special routers in AS� run intra-AS routing
protocol with all other routers in AS
� also responsible for routing to destinations outside AS– run inter-AS routing
protocol with other gateway routers
gateway routers
Why different IntraWhy different Intra-- and Interand Inter--AS routing ?AS routing ?
Policy:� Inter-AS: admin wants control over how its traffic is
routed and who routes through its net. � Intra-AS: single admin, so no policy decisions
neededScale:� hierarchical routing saves table size, reduced update
trafficPerformance:� Intra-AS: can focus on performance� Inter-AS: policy may dominate over performance
IntraIntra--AS and InterAS and Inter--AS routingAS routingGateways:
•perform inter-AS routing amongst themselves•perform intra-AS routers with other routers in their AS
inter-AS, intra-AS routing in
gateway A.c
network layerlink layer
physical layer
a
b
b
aaC
A
Bd
A.aA.c
C.bB.a
cb
c
IntraIntra--AS and InterAS and Inter--AS routingAS routing
Host h2
a
b
b
aaC
A
Bd c
A.aA.c
C.bB.a
cb
Hosth1
Intra-AS routingwithin AS A
Inter-ASrouting
between A and B
Intra-AS routingwithin AS B
Routing in the InternetRouting in the Internet
� The Global Internet consists of Autonomous Systems (AS) interconnected with each other:– Stub AS: small corporation– Multihomed AS: large corporation (no transit)– Transit AS: provider
� Two-level routing: – Intra-AS: administrator is responsible for choice– Inter-AS: unique standard
Internet AS HierarchyInternet AS HierarchyIntra-AS border (exterior gateway) routers
Inter-AS interior (gateway) routers
IntraIntra--AS RoutingAS Routing
� Also known as Interior Gateway Protocols (IGP)
� Most common IGPs:
– RIP: Routing Information Protocol
– OSPF: Open Shortest Path First
– IGRP: Interior Gateway Routing Protocol (Cisco proprietary.)
RIP ( Routing Information Protocol)RIP ( Routing Information Protocol)
� Distance vector algorithm� Included in BSD-UNIX Distribution in 1982� Distance metric: # of hops (max = 15 hops)
– Can you guess why?
� Distance vectors: exchanged every 30 sec via Response Message (also called advertisement)
� Each advertisement: routes for up to 25 destination nets
RIP (Routing Information Protocol)RIP (Routing Information Protocol)
Destination Network Next Router Num. of hops to dest.w A 2y B 2z B 7x -- 1…. …. ....
w x y
z
A
C
D B
Routing table in D
RIP: Link Failure and Recovery RIP: Link Failure and Recovery
If no advertisement heard after 180 sec --> neighbor/link declared dead– routes via neighbor invalidated– new advertisements sent to neighbors– neighbors in turn send out new
advertisements (if tables changed)– link failure info quickly propagates to
entire net– poison reverse used to prevent ping-pong
loops (infinite distance = 16 hops)
RIP Table processingRIP Table processing
� RIP routing tables managed by application-level process called routed (daemon)
� advertisements sent in UDP packets, periodically repeated
RIP Table example (continued)RIP Table example (continued)Router: giroflee.eurocom.fr
� Three attached class C networks (LANs)� Router only knows routes to attached LANs� Default router used to “go up”� Route multicast address: 224.0.0.0� Loopback interface (for debugging)
Destination Gateway Flags Ref Use Interface-------------------- -------------------- ----- ----- ------ ---------127.0.0.1 127.0.0.1 UH 0 26492 lo0192.168.2. 192.168.2.5 U 2 13 fa0193.55.114. 193.55.114.6 U 3 58503 le0192.168.3. 192.168.3.5 U 2 25 qaa0224.0.0.0 193.55.114.6 U 3 0 le0default 193.55.114.129 UG 0 143454
OSPF (Open Shortest Path First)OSPF (Open Shortest Path First)
� “open”: publicly available� Uses Link State algorithm
– LS packet dissemination– Topology map at each node– Route computation using Dijkstra’s algorithm
� OSPF advertisement carries one entry per neighbor router
� Advertisements disseminated to entire AS (via flooding)
OSPF “advanced” features (not in RIP)OSPF “advanced” features (not in RIP)
� Security: all OSPF messages authenticated (to prevent malicious intrusion); TCP connections used
� Multiple same-cost paths allowed (only one path in RIP)
� For each link, multiple cost metrics for different TOS (eg, satellite link cost set “low” for best effort; high for real time)
� Integrated uni- and multicast support: – Multicast OSPF (MOSPF) uses same topology
data base as OSPF� Hierarchical OSPF in large domains.
Hierarchical OSPFHierarchical OSPF� Two-level hierarchy: local area, backbone.
– Link-state advertisements only in area – each nodes has detailed area topology;
only know direction (shortest path) to nets in other areas.
� Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers.
� Backbone routers: run OSPF routing limited to backbone.
� Boundary routers: connect to other ASs.
IGRP (Interior Gateway Routing Protocol)IGRP (Interior Gateway Routing Protocol)
� CISCO proprietary; successor of RIP (mid 80s)� Distance Vector, like RIP
– Hold time– Split Horizon– Poison Reverse
� several cost metrics (delay, bandwidth, reliability, load etc)
� uses TCP to exchange routing updates� EIGRP (Garcia-Luna): Loop-free routing via
Distributed Updating Algorithm. (DUAL) based on diffused computation– Uses a mix of link-state and distance vector
Internet interInternet inter--AS routing: BGPAS routing: BGP
� BGP (Border Gateway Protocol): the de facto standard
� Path Vector protocol:– similar to Distance Vector protocol– each Border Gateway broadcast to neighbors
(peers) entire path (I.e, sequence of ASs) to destination
– E.g., Gateway X may send its path to dest. Z:
Path (X,Z) = X,Y1,Y2,Y3,…,Z
Internet interInternet inter--AS routing: BGPAS routing: BGP
Suppose: gateway X send its path to peer gateway W
� W may or may not select path offered by X– cost, policy (don’t route via competitors
AS), loop prevention reasons.� If W selects path advertised by X, then:
Path (W,Z) = w, Path (X,Z)� Note: X can control incoming traffic by
controlling it route advertisements to peers:– e.g., don’t want to route traffic to Z -> don’t
advertise any routes to Z
Internet interInternet inter--AS routing: BGPAS routing: BGP
� BGP messages exchanged using TCP.� BGP messages:
– OPEN: opens TCP connection to peer and authenticates sender
– UPDATE: advertises new path (or withdraws old)
– KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request
– NOTIFICATION: reports errors in previous msg; also used to close connection
Other Routing TechniquesOther Routing Techniques
� Hot-Potato Routing a.k.a Deflection Routing– Use the first available link irrespective of
whether it leads to the destination or not.� Cut Through routing
– Non-store and forward: Routes before entire packet is received at the router.
– Outgoing link is reserved. What happens if a fast links succeeds a slow link?
ReadingReading
� Recommended– End-To-End Routing Behavior in the Internet, V. Paxson,
SIGCOMM 1996. ftp://ftp.ee.lbl.gov/papers/routing.SIGCOMM.ps.Z
• Due: 3/26/01– Persistent Route Oscillations in Inter-Domain Routing,
K. Varadhan, R. Govindan, D. Estrin, ftp://ftp.isi.edu/ra/Publications/bgp_osc.ps.gz
• Due: 3/28/01– http://netresearch.ics.uci.edu/agentos/related/routing :
Contains information on CISCO Routing Protocols