1/14/2002 - iut.ac.irivut.iut.ac.ir/content/110/slides/communication_networks2002.pdf · 1/14/2002...

381
1/14/2002 1 CS 5516: What is it about? CS 5516: What is it about? Srinidhi Varadarajan

Upload: phamliem

Post on 24-Mar-2018

215 views

Category:

Documents


2 download

TRANSCRIPT

1/14/2002 1

CS 5516: What is it about?CS 5516: What is it about?

Srinidhi Varadarajan

1/14/2002 2

GoalsGoals

� Introduce the basics of networking both from a theoretical as well a practical standpoint

� Foster the ability to understand research issues

1/14/2002 3

MeansMeans

� Traditional textbook model

� Class discussion

� Projects

1/14/2002 4

TopicsTopics

� Transport Layer: Service Models, Protocols, Congestion Control

� Network Layer: Service Models, Routing algorithms, IPv6, Multicast

� Link Layer: Issues, performance, implementations

� Multimedia Services: Application requirements, traffic models, Quality of Service issues, transport protocols for adaptive and hard real time traffic

1/14/2002 5

PrerequisitesPrerequisites

• Knowledge of computer architectures. Topics include virtual memory, timers, scheduling, multiprogramming

• Strong programming ability in C • User-level understanding of the UNIX operating

system • Ability to undertake substantial independent

design projects

1/14/2002 6

ResourcesResources

• Required Text:• Andrew Tannenbaum, Computer Networks,

Third Edition, Prentice Hall, 1996• Recommended Books:

• Douglas E. Comer, Internetworking with TCP/IP Volume I: Principles, Protocols, and Architecture, 3rd edition, Prentice Hall, 1995.

• Wright and Stevens, TCP/IP Illustrated, Vol 1. Addison Wesley

1/14/2002 7

GradingGrading

30%Final Exam

30%Project 2

20%Project 1

20%Midterm

Introduction to Networks

Lecture Topics• History and motivation• Network architecture

– Layered models– Definitions and abstractions– OSI Reference Model

• Network design issues– Definitions– Components– Message, packet, and cell switching– Resource sharing– Functionality– Performance

C/SClient-Server Applications

C/SClient-Server Applications

FTPFile Transfer Protocol

FTPFile Transfer Protocol MultimediaMultimedia

Networks are Important!!!

TelnetTerminal Emulation

TelnetTerminal Emulation

WWWWorld Wide Web

WWWWorld Wide Web Email

Electronic MailEmail

Electronic Mail

… and many others …

Internet Hosts

http://www.isoc.org/guest/zakon/Internet/History/HIT.html

World Wide Web Sites

http://www.isoc.org/guest/zakon/Internet/History/HIT.html

NET.WORK.VIRGINIA

� ATM network with Internet access� Over 400 sites with OC3,

DS3, or DS1 service� Service through Sprint and

Vision Alliance (consortiumled by Bell Atlantic)

OC3 OC3

OC3

DS3

Internet

OC3

Sprint ROA

Sprint RIC

SprintLinkRouter

Backbone/Internet Gateway

Sprint WTN

ESnetvBNS

Internet2

Network Architecture• Network architecture

– Guides the design and implementation of the network– Assists in coping with complexity

• Networks are typically modeled as a set of layered, cooperating processes

• The International Organization for Standards (ISO) has developed the seven-layer Open Systems Interconnect (OSI) model– The OSI model is not strictly adhered to in actual

implementations. It is used more as guidelines.

A Simple Layered ModelApplication Programs

Process-to-Process Channels

Host-to-Host Connectivity

Networking Hardware

• Decomposes system into simpler, manageable components

• Provides a modular design

Laye

rs

Multiple Abstractions for One Layer

• Process-to-process channel– Request/reply interaction– Stream of messages

Application Programs

Request/ReplyChannel

Host-to-Host Connectivity

Networking Hardware

Message StreamChannel

Functions Are Not Always “Layer-able”

• Some functions may need to interact with multiple layers

Application Programs

Process-to-Process Channels

Host-to-Host Connectivity

Networking Hardware

Net

wor

kM

anag

emen

t

Layered Models … Generalized (1)• Layer N

– Provides services to layers N+1 and above– Uses services offered by layers N-1 and below– May ONLY interact with peer layer N entities via protocols

• Distinction between service, interface, and implementation

Layer N+1

Layer N

Layer N-1

Layer N+1

Layer N

Layer N-1

Layered Models … Generalized (2)

Node A Node B

services provided by lower layers

protocol

services provided to upper layers

Layer N

peer-to-peerinterface

serviceinterface

serviceinterface

Layered Models … Generalized (3)• Protocols are rules for cooperation between peers

– Peer-to-peer interfaces, e.g. Protocol X defines the interfaces

– “Protocol” sometimes used to refer to the layer itself, e.g. the entity that realizes Protocol X

• Service access points (SAPs) adhering to an interface definition are needed between layers– Service or layer-to-layer interface– The services implemented by a protocol at layer X are

accessed through its SAP. Think of SAP as a functional interface.

Interfaces and Protocols• Three components of an interface

– Set of visible abstract objects, and for each, a set of allowed operations with parameters

– Set of rules governing sequences of operations– Encoding and formatting conventions required for

operations and parameters

• Protocols are operationally equivalent, but are usually restricted to peer layers (interfaces are between adjacent layers)

OSI Terminology for Layering

SAP Service Access Point (where N+1 accesses N)IDU Interface Data Unit (passed from N+1 to N)SDU Service Data Unit (data from N+1)ICI Interface Control Information (service type, etc.)PDU Protocol Data Unit (exchanged by peer N entities)

ICI SDUIDU

ICI SDU

SAP

Layer N

Layer N+1

header SDUPDU

OSI Reference Model

Data Link

Physical

Data Link

Physical

Data Link

Physical

Transport

Network Network

Transport

Network

Presentation

Session

Application

Presentation

Session

Application

Deviation from Strict Layering

LLC Logical Link ControlMAC Media Access ControlPHY PhysicalPMD Physical Media DependentSMT Station Management

LLCMACPHYPMD

SMT

Data Link

Physical

• Example: Fiber Distributed Data Interface (FDDI)

Layered Model Example• Typical protocol “stack” in a UNIX-based TCP/IP

environment

Data LinkPhysical

NetworkTransportSession

Presentation

Application

Ethernet FDDI

IPTCP UDP

RPC

HTTP

XDR

SMTP NFS

TelnetFTPX

Internet Protocol Graph

IP

TCP UDP

Net nNet 2Net 1

FTPHTTP FTPHTTP

...

• Internet protocols (“TCP/IP”) really uses a four-layer architecture

Advantages of Layering (1)• Data hiding and encapsulation -- data structures,

algorithms, etc. in a layer are not visible to other layers

• Decomposition -- complex systems can be decomposed into more easily understood pieces

• System can evolve since layers can be changed (as long as service and interface does not change)

• Alternate services can be offered at layer N+1 that share the services of layer N

Advantages of Layering (2)

• Alternate implementations of a layer canco-exist

• A layer or sublayer can be simplified or omitted if some or all of its services are not needed

• Confidence in correct operation enhanced by testing each layer independently

Disadvantages of Layering• Some functions (like FDDI station management)

really need to access and operate at multiple layers• Poorly conceived layers can lead to awkward and

complex interfaces• There may be performance penalties due to extra

overhead of layers, for examplememory-to-memory copies

• Design of (an older) layer N+1 may besub-optimal given the properties of (a new) layerN

Physical Layer• The physical layer provides a virtual link for

transmitting a sequence of bits between any pair of nodes joined by a physical communication channel -- “virtual bit pipe”

• Synchronous or asynchronous• Defines physical interface, signaling, cabling,

connectors, etc.• May be variations at the physical level for a basic

data link protocol (PMD specs)– IEEE 802.3 (Ethernet): 10Base5 (thick wire), 10Base2

(thin wire), 10BaseT (twisted pair)

Data Link Layer• The data link layer is responsible for the

error-free transmission of packets between “adjacent” or directly-connected nodes (OSI defn)

• The media access control (MAC) function is a sub-layer of the data link layer– Allows multiple nodes to share a common transmission

media– Supports addressing of nodes

• The logical link control (LLC) function is another sub-layer– Functions such as error recovery

Network Layer• The network layer is responsible for getting a

packet through the network from the source node to the destination node– Routing to select network path– Flow control or congestion control– Internetworking to allow transmission between

different types of networks• In a WAN or internetwork, the network layer

requires cooperation among peers at intermediate nodes

• Network layer function is minimal in a LAN• Key: Network layer provides host-to-host

communication

Transport Layer (1)• The transport layer provides

network-independent, end-to-end message transfer between pairs of ports or sockets

• Ports are destination points for communication that are defined by software– Ports are identified by a transport address that identifies

the host computer and the port identifier– Used to distinguish between multiple applications on

one host– Established services, like FTP and HTTP, have “well-

known” default port identifiers that can be obtained through a name service (RFC 1700)

• Key: Transport layer provides process-to-process communication.

Transport Layer (2)

ProcessB

ProcessA

Ports (Sockets)

Network

Transport Layer (3)

• Transport layers typically provide one of two basic types of service:– Virtual circuit or connection-oriented service

• Transmission Control Protocol (TCP)

– Datagram or connection-less service• User Datagram Protocol (UDP)

Transport Layer: Virtual Circuits• Virtual circuits are logical channels between a

source and destination• Connections are maintained for multiple packet or

message transmissions until they are explicitly released – Network layer may still use dynamic routing

• Functions– Translate transport address to network address– Segment messages into packets for transmission– Pass packets to network layer for delivery– Reassemble packets at receiving end

Transport Layer: Datagrams• Datagram communication is connectionless• New connection is established and released for each

packet or message transmitted– Packet itself establishes and releases the “connection”

• Functions– Translate transport address to network address– Pass messages to network layer for delivery– Each message sent as a single packet– Upper layer responsible for re-ordering and error detection

Session Layer

• The session layer is responsible for establishing and maintaining virtual connections between pairs of processes in different hosts, possibly including service location and access rights

• Multiple sessions may be multiplexed over a single connection (provided by a lower layer)

Presentation Layer (1)• The presentation layer represents information to

applications so as to preserve semantics (meanings or values) while resolving syntactic (representation) differences

• In open systems, heterogeneous computers result in heterogeneous representations – Characters: ASCII, EBCDIC, Unicode– Integers: lengths, 1’s versus 2’s complement– Reals: fixed or floats, different float points– Byte order: 01234567... or 67452301– Structured data

Presentation Layer (2)

• Presentation layer may provide encryption and/or compression may be used

• Comments on security– Information security (INFOSEC): security at

this layer – Communications security (COMSEC): security

at the physical or data link layer

Application Layer• Network applications make up the application

layer• Protocol specific to each particular application• Certain applications, like HTTP, NFS, FTP, and

Telnet have been standardized• Standards do not provide a fixed model for

applications, but models do exist– Client-server versus peer-to-peer– Remote procedure call (RPC) versus message passing

Network Requirements• Multiple view points:

– Network users• Performance that a user’s applications need, e.g., latency

(delay) and loss rate

– Network designers• Cost-effective design e.g., network resources are efficiently

utilized and fairly allocated

– Network service providers• System that is easy to administer and manage e.g., faults can

be easily isolated and it is easy to account for use

Point-to-Point

Multiple Access

...

Connectivity (1)• Network building blocks

– Nodes -- Workstations, PCs– Direct links -- twisted pair, coaxial cable, optical fiber, radio

frequency link, … • Point-to-point• Multiple access (multiaccess)

Connectivity (2)• Indirect connectivity

– Switched or routed networks allow indirectly connected nodes to communication

– Switches, routers, hubs, etc. are specialized nodes in the network

– Switching network is the “cloud”

Switched Network

Connectivity (3)• An internetwork or internet is a network of

networks– Need internetworking devices: Routers– The Internet is a specific example of an internet.

Internetwork

Message versus Packet Switching (1)

• Networks may be classified by how they segment data for transmission and switching– Message-switched versus packet-switched– Most networks use packet switching (or cell switching)

• Messages– Have some higher level meaning, e.g. as a request for

service or a reply– Encoded as a string of bits

Message versus Packet Switching (2)• Packets

– Messages may be decomposed into one or more packets for transmission, reconstructed at receiver

– Lower layer entities may further decompose packets, for example: Ethernet frames, ATM cells

Message

Packets with headers

Sessions• Messages usually occur as part of a longer

transaction called a session• Session properties

– Message or packet arrival process (rate, variability)– Session holding time– Message or packet length distribution– Acceptable delay– Required reliability and security– Acceptable ordering of messages or packets

Circuit vs. Store-and-Forward Switching

• Two forms of switching for the messages or packets in a session are widely used– Circuit switching– Store-and-forward or, simply, “packet

switching”

Circuit Switching• Session s initiated with a request for a fixed

transmission rate (bandwidth requirement) of rsbits/sec

• Path created through the network– Each link in path allocates capacity of rs bits/sec to s,

e.g. using time-division multiplexing (TDM) or frequency division multiplexing (FDM)

– Request is blocked if no path can be established

• Bandwidth dedicated to s for the life of the session

Efficiency of Circuit Switching• Most data traffic is “bursty,” so links are not well

utilized

• Circuit switching not widely used in data networks (except, inefficiently, for access)– Links are expensive– Sessions require significant portion of link capacity

(only a few sessions can be supported)– Traffic is bursty, so utilization is low

time

Store-and-Forward Switching (1)

• No transmission rate allocation is dedicated at set-up– Differs from circuit switching

• Data transmitted at full link capacity, but links can be shared by multiple sessions on a demand basis

Store-and-Forward Switching (2)• Advantages:

– Link fully utilized if there is any data to transmit– Delay can be significantly reduced– Utilization can be significantly increased

• Disadvantages:– Greater variance in delay due to queuing delays– Flow control needed to prevent buffer overflows

Store-and-Forward Switching (3)

• How is information switched?– Message Switching: messages are sent intact

without being broken into packets– Packet Switching: messages are broken into

packets for transmission– Cell Switching: messages (or packets) are

broken into fixed-size packets called cells

Store-and-Forward Switching (4)

• How are messages or packets routed through the network?– Virtual Circuit Routing: a path is established

and used for the duration of the session• Connection-oriented or virtual circuit service

– Dynamic Routing: each packet or message may traverse a different path through the network

• Connection-less or datagram service

Geographic Extent (1)• Networks may be classified by their geographic

extent– DANs, LANs, MANs, and WANs– Useful classification for lower level protocols– Should be transparent to upper layer protocols

• DAN: Desk Area Network– Connects PC and peripherals– USB, Firewire– Medium to high data rates– Low-cost, high-volume, built-in interfaces

Geographic Extent (2)• Local Area Networks (LANs)

– Limited extent (10’s of meters to a few kilometers)– High data rates (megabits to gigabits per second)– Built-in interfaces in workstations, PCs– Low cost– Low delay– Examples: Ethernet, Token Ring, FDDI, ATM

Geographic Extent (3)• Metropolitan Area Networks (MANs)

– Medium extent (10’s of kilometers)– Medium data rates (kilobits to 100’s of megabits per

second)– Special access equipment, often expensive– Example: FDDI, ATM, DQDB

• Wide Area Networks (WANs)– Large extent (global)– Low speed (kilobits to 100’s of megabits per second)– Special access equipment, usually expensive– High latency– Examples: T1, T3, SMDS, ATM, OC-XXX links

Resource Sharing (1)

• Economics dictates that network resources must be shared or multiplexed among multiple users– Shared links– Shared network nodes (switches, hubs, etc.)

Host

Host

Host

Switch Switch

Host

Host

Host

Resource Sharing (2)

• Multiplexing schemes– Fixed

• Time-division multiplexing (TDM) or synchronous time-division multiplexing (STDM)

• Frequency division multiplexing (FDM)

– On-demand• Statistical multiplexing, including asynchronous

time-division multiplexing

Statistical Multiplexing• Packets from all traffic streams are merged into a

single queue and transmittedon-demand– Scheduling is typically first-come first-served (FCFS),

but priority schemes are also used– TSM=L/C seconds needed to transmit L-bit packet– May also maintain a separate queue for each traffic

stream and service in a “round-robin” manner (skipping over an empty queue with no loss of transmission capacity)

Synchronous Time-Division Multiplexing

• Time on the channel is divided into m slots and each of m traffic streams is given one slot --unused slots are wasted– Create m channels, each with capacity C/m– L-bit packet takes TSTDM=Lm/C seconds to transmit if

packets are long compared to the length of a slot– L-bit packet takes TSTDM=L/C seconds to transmit if

slots are of packet length, but must wait (m-1) slots between transmissions

Frequency Division Multiplexing

• Channel bandwidth W is subdivided into mchannels and each of m traffic streams is given one channel– Create m channels, each with bandwidth W/m,

or capacity C/m (ignoring guard bands between channels)

– L-bit packet takes TFDM=Lm/C seconds to transmit

FDM, STDM vs. Statistical Multiplexing

• Statistical multiplexing has smaller average delay than either STDM or FDM– Channel capacity is wasted with STDM (wasted time

slot) and FDM (wasted bandwidth) when a traffic stream is idle

– Transmission time greater for STDM and FDM• Advantages of STDM or FDM

– Statistical multiplexing has lower average delay, but higher variance of delay

– STDM and FDM eliminate the need to identify traffic stream associated with each packet

Functionality (1)

• Network must support common services or process-to-process channels, for example– Request/reply channel for file access, digital

libraries, etc.– Message stream channel for video and audio

applications

Functionality (2)• What can corrupt this functionality? What

can go wrong?– Link or node failures– Errors at the bit or packet level– Arbitrary delays– Buffer overflows -- lost packets– Out of order delivery– Security -- eavesdropping, spoofing, etc.

Functionality (3)

• The key problem is to bridge– What the application expects and– What the underlying technology can provide

• Carries over to a layered model -- Layer Nneeds to provide– What Layer N+1 expects using– What Layer N-1 can provide

Distributed Algorithms (1)• Peers must cooperate to perform network

functions• A distributed algorithm is decomposed into one or

more local algorithms• Each local algorithm proceeds based on the data

received from other layers or peers, and the order in which the data is received

Network

Data Link

Physical

Network

Data Link

Physical

Network

Data Link

Physical

Network

Data Link

Physical

Distributed Algorithms (2)• These algorithms are complex because underlying

services may be unreliable• Data may …

– Never arrive (due to transmission error, overflow, etc.)– Arrive late (due to arbitrary network delay)– Arrive out of order (due to differing network paths)

• It may be impossible to ensure correct operation 100% of the time– Maximize probability of success– Detect errors

Maroon and Orange Armies (1)

OrangeArmy

����

MaroonArmy #1

MaroonArmy #2

• Maroon Armies #1 and #2 must attack simultaneously to defeat the Orange Army

• Maroon Army #1 wants to send a messenger (�) to Maroon Army #2 to set a time for the attack

Maroon and Orange Armies (2)

• The messenger must go through enemy territory (an unreliable communication channel)

• Problems … – May be delayed -- until after the attack time– May be captured -- so that message is never

delivered

Maroon and Orange Armies (3)• Possible solution: require Maroon Army #2 to

send another messenger to acknowledge that the first messenger arrived with the message– Acknowledgment messenger may be delayed or

captured– Maroon Army #2 would think that the attack is on, but

Maroon Army #1 cannot know if it is on or not

• There is no possible solution to the problem with probability 1 of success

Maroon and Orange Armies (4)

• The attack can be synchronized with high probability– For example, send many messengers to increase

likelihood of one reaching Maroon Army #2?

Performance• Protocols and services define functionality, but not

performance– Bandwidth, throughput, data rate, capacity, … – Latency, delay, … – Variability in latency and data rate important for some

applications– Loss is sometimes a performance measure

• Performance is determined by– Underlying technologies– Protocol design– Protocol implementation– Use by the application or upper layer

Bandwidth• Bandwidth is commonly used to indicate the

amount of data that can be transferred in some unit of time

• Example: 10 megabits per second– 107 bits per second– 10-7 seconds per bit (100 ns) -- the “bit width”

• Link versus end-to-end bandwidth may vary

1 0 1

10-7 s = 100 ns

Latency (1)• Latency is delay, i.e. the time it takes for a

message to get from one point to another• Round-trip time (RTT) is the time it takes to get to

one point and receive a return back• End-to-end versus link delay• Components

– Processing overhead -- e.g., software overhead– Transmission time -- depends on bandwidth and length

of message– Propagation delay -- time for a bit to travel from one

end of a link to another– Queueing delay -- time waiting for a shared link

Latency (2)• Example

– Processing overhead -- assume 1 µs– Transmission time

• Assume L = 1,000 bit message• Assume C = 10 Mbps link• Transmission time: T = L/C = 100 µs

– Propagation delay• Speed of light is c = 2×108 m/s in optical fiber• Assume D = 1 km (1000 m)• Propagation delay = D/c = 5 µs

– Queueing delay -- assume 0– Latency is 1 + 100 + 5 = 106 µs (transmission time

dominates in this example)

Latency (3)• Dominating factors

– Processing overhead can dominate for high data rate links over short distances with short messages

– Transmission time can dominate for slower links or longer messages

– Propagation delay is important with long links– Queuing delay can dominate in a congested

network

• The delay×bandwidth product is an important factor in protocol design– Determines the “size of the pipe”

– Made large by• High delay, e.g. long propagation time• High bandwidth, e.g. a fast link

– Large product means that a large amount of data must be sent to “fill the pipe” before the receiver can respond

Delay × Bandwidth Product

B

D

You should now be able to … (1)• Define protocol, service access point, protocol

data unit, service data unit• Describe the structure and role of a layers in a

network architecture• Cite advantages and disadvantages of a layered

model for a network architecture• List the seven layers in OSI reference model and

describe the basic functions of each layer• Describe the three different perspectives on

network design

You should now be able to … (2)• Define the basic components of a network

including links, nodes, and switches• Describe the construction of an internet (with a

lower case i)• Distinguish between message, packet, and cell

switching• Distinguish between store-and-forward and circuit

switching and cite advantages and disadvantages of each

• Define DAN, LAN, MAN, and WAN and describe their general characteristics

You should now be able to … (3)• Describe how STDM, FDM, and statistical

multiplexing enable resource sharing and cite advantages and disadvantages of STDM and FDM versus statistical multiplexing

• Define bandwidth and latency• Calculate bandwidth given the time needed to

transmit one bit• Define the components of latency and describe

factors that can increase latency• Calculate latency given information about the

components

Physical Layer

Links at the Physical Layer

• Links can be implemented using a variety of physical media– Magnetic Media (sneaker net)– Twisted pair– Coaxial cable– Optical fiber– Radio waves– Infrared

• Different media, together with end electronics and optics, determine the relevant properties of the media

Physical Layer Properties

• Bit encoding -- how is information -- the 1’s and 0’s -- encoded?

• Full-duplex versus half-duplex operation– Full-duplex: data in both directions simultaneously– Half-duplex: data in one direction at a time

• Data rate -- how much information can be sent in a unit of time?

• Extent -- how long can the link be and still operate reliably?

Examples of Local Links

Category 5 twisted pair50-ohm coax (Thinwire)75-ohm coax (Thickwire)Multimode fiberSingle-mode fiber

Service10-1000 Mbps10-100 Mbps10-100 Mbps

Bandwidth

100 Mbps100-2400 Mbps

100 m200 m500 m

Distances

2 km40 km

Examples of Leased Links

ISDN (B-channel)T1 (DS1)T3 (DS3)STS-3 (OC-3)STS-12 (OC-12)STS-24 (OC-24)STS-48 (OC-48)

Service64 Kbps1.544 Mbps44.736 Mbps

1.244160 Gbps2.488320 Gbps

Bandwidth

155.251 Mbps622.080 Mbps

Encoding

• Encoding determines how information is represented by electrical, optical, or electromagnetic signal

• Examples– Non-return to zero (NRZ)– Non-return to zero inverted (NRZI)– Manchester– Block codes, e.g. 4B/5B

NodeAdaptor

NodeAdaptor

signal

bits

10111010001??11???00?1

Physical Layer “Bit Pipe”• The Physical layer defines signal levels and

timing so that it can deliver a stream of bits to the Data Link layer

• Signal bandwidth determines data rate limit• Timing errors

– Noise or distortion can lead to errors in timing– Sender and receiver clocks may differ -- “drift”

Asynchronous Transmission (1)• Each transmission is synchronized

– A start bit begins a transmission– A stop bit ends a transmission– Line stays in an idle state until the next start bit

• Samples timed from beginning of start bit• Timing errors can accumulate up to

± one-half bit time over the entire character

1 1 0 0 1 0 1 0

T/2 T

start stop idle start

Asynchronous Transmission (2)• The Physical layer can provide characters (n-bit

units) to the Data Link layer

n bits n bits n bitsidle idle

Asynchronous Transmission (3)

• Advantages– Simple timing mechanism– Inherent character framing– Adapts to different data rates (idle serves as fill)

• Disadvantages– Timing errors can occur if the line is noisy (e.g., missed start

bit)– Overhead for stop and start bits

• Used in applications where performance can be reduced to reduce costs– Modems– PC serial ports

Synchronous Transmission (1)

• Used for high data rates, including T1 and other interoffice digital transmission lines

• Information is sent continuously– Receiver and repeater maintain synchronization between the

incoming signaling rate and the local sample clock– Idle or fill characters inserted if line is idle

• Signal transitions (high-to-low orpositive-to-negative) enable clock recovery, or synchronization– Some minimum occurrence of signal transitions are needed to

maintain synchronization

Synchronous Transmission (2)

• Methods to ensure signal transitions– Source code restrictions– Dedicated timing bits– Bit insertion– Data scrambling– Forced bit errors– Line coding -- include transitions in the signals

Synchronous Transmission (3)• The Physical layer can provide bits to the Data

Link layer• Data Link layer must extract frame boundaries

– SYN = 0001 0110 (16H)– STX = 0000 0010 (02H)

SYN SYN STX header packet ETX CRC SYN

frame00010110 00010110 00000010 ...

Ensuring Signal Transitions (1)

• Source code restrictions– Disallow any codes that do not include a transition– Cannot provide a “clear channel”

• Dedicated timing bits– Use some bit transmissions just for timing– Example: Dataphone Digital Service

• Use every one bit out of eight to guarantee a signal transition– Example: Synchronous modems

• Periodically insert a SYNC character– Ethernet uses a timing sequence “preamble”. Needs accurate

local clock that say stay “drift-free” for a single packet transmission

Ensuring Signal Transitions (2)

• Bit insertion– Use bit transmissions for timing only when needed

by inserting timing bits into the data bit stream– HDLC “bit stuffing”

• 01111110 indicates the end of a data block (for framing)• Six consecutive 1’s must not be sent as data (may be

mistaken as the end of a data block)• Sender inserts a 0 after every string of five consecutive

1’s; receiver must strip a 0 after five consecutive 1’s– Problems with extra bits and extra delay

Ensuring Signal Transitions (3)

• Data scrambling– Similar to encryption/decryption– Prevents the transmission of repetitive patterns– With high probability, prevents long strings of 0’s (or 1’s)

that would not have signal transitions• Forced bit errors

– Errors are introduced in code word to prevent long periods without a signal transition

• Line coding– Some line coding methods ensure regular transitions that can

be used for clock recovery– Example: 4B/5B code used in FDDI

Physical Layer (Part 2)

Srinidhi Varadarajan

Fourier Series

• Even and Odd functions:– Even functions mirror around the Y-axis (cos θ)– Odd functions invert around the Y axis (sin θ)

• A Fourier series expresses a time-domain function as a series of harmonics of sines and cosines. – The basic Fourier series expresses are periodic

function.– Aperiodic functions are involve a continuum of

frequencies – leads to the Fourier transform• Why is it useful?

– Hint: Tells you something about the frequency

Fourier Series

Take a look at :– http://sunlightd.virtualave.net/Fourier/

∑∑

=

=

=

++=∞

=

=

T

n

T

n

Tn

nn

n

dtnfttgT

B

dtnfttgT

A

dttgT

A

nftBnftAAtg

0

0

00

11

0

)2cos()(2

)2sin()(2

)(2

)2cos()2sin(2

)(

π

π

ππ

Nyquist’s Sampling Theorem

• The bit and the baud– Using multiple signaling levels. Baud * log2(signaling

levels) = bits/sec• If a signal is passed through an arbitrary low pass

filter of bandwidth B, it can be reconstructed by a sampler collecting 2B samples/sec– Reasoning: Frequency components higher than the

sampling rate are already filtered out.• Max data rate = 2B log2 V bits/sec,

– Where V is the number of discrete levels in the signal.

Thermal Noise: Shannon’s Theorem

• Nyquist theorem is used for noiseless channels.– Real world communication channels are noisy. At the

minimum, you have thermal noise• Signal to Noise ration S/N is expressed in dB as 10log10(S/N).– Recall the -3db cutoff point.

• Shannon’s Theorem: – Max data rate = B log2(1+S/N)– This is an upper bound independent of the number of

signaling levels and sampling rate.

Module atNode A

Module atNode B

“Link”

Point-to-Point Protocols and Links (1)

• Point-to-point protocols involve exactly two peer entities or modules that are connected by some “link”

• The modules must interact to ensure the proper transfer of information using the link

Point-to-Point Protocols and Links (2)

• For example, the link may be:– Physical link (e.g. RS-232 is a point-to-point

protocol)– Virtual bit pipe (e.g. at the data link layer)– A connection or virtual connection (e.g. at the

transport or session layer)

Data Link Control -- DLC (1)

• For each point-to-point link in a network there are two data link control (DLC) peer modules, one at each end

• DLC modules use a distributed algorithm to transfer packets– Received from and delivered the network layer

• Usual objective is to deliver packets in order of arrival (from the network layer) without errors or repeated packets

Network

DLC

Physical

Network

DLC

Physical

Data Link Control -- DLC (2)

• DLC modules must use the unreliable “virtual bit pipe” provided by the physical layer

Data Link Control -- DLC (3)

• DLC must:– Detect errors (using redundancy bits)– Request retransmission if data is lost (using

automatic repeat request -- ARQ)– Perform framing (detect packet start and end)– Support initialization and disconnection

operations

service data unitheader trailer

frame

Data Link Control -- DLC (4)

• These functions require that extra bits be added to the packet to be transmitted– Header bits are added to the front of each each packet– Trailer bits are added to the rear of each packet– The header, packet from upper layer (service data unit),

and trailer form a frame

service data unitheader trailer

frame

Frame Format• The packet from the upper layer is the service data

unit (SDU)• The frame (header, network layer packet, and

trailer) is the protocol data unit (PDU)• Note that the DLC does not care what is in the

network layer packet and the physical layer does not care what is in the frame generated by the data link layer

Data Link Layer

Srinidhi Varadarajan

Data Link Layer: Functionality

• The data link layer must:– Detect errors (using redundancy bits)– Request retransmission if data is lost

(using automatic repeat request -- ARQ)– Perform framing (detect packet start and

end)– Support initialization and disconnection

operations

service data unitheader trailer

frame

Data Link Layer

• These functions require that extra bits be added to the packet to be transmitted– Header bits are added to the front of each each

packet– Trailer bits are added to the rear of each packet– The header, packet from upper layer (service data

unit), and trailer form a frame

service data unitheader trailer

frame

Frame Format• The packet from the upper layer is the service

data unit (SDU)• The frame (header, network layer packet, and

trailer) is the protocol data unit (PDU)• Note that the data link layer does not care

what is in the network layer packet and the physical layer does not care what is in the frame generated by the data link layer

Error Detection

• Two types– Error Detection Codes (e.g. CRC, Parity,

Checksums)– Error Correction Codes (e.g. Hamming, Reed

Solomon)• Basic Idea

– All bit combinations in a packet are valid– Add redundant information to determine if errors

have been introduced• Why redundant?

Error Detection Codes

• Naïve scheme– Send a duplicate copy of the message

• Problems– Takes up too much space– Poor performance.

• Can’t even detect 2 bit errors

Single Parity Checks• Technically used for 1 bit error detection. Can also

detect any odd number of bit errors.• Involves adding an extra “parity” bit to the bit string• Two varieties:

– Even Parity– Odd Parity

• Basic Idea:– For even parity, make the total number of 1’s in the bit string

an even number. This mechanism decides the value of the parity bit. Odd parity makes the number of 1”s odd instead of even.

• Single Parity cannot detect burst errors– Burst errors cause errors in a sub-string of arbitrary length– A burst error is as likely to cause an even number of errors

as an odd number of errors

Two Dimensional Parity

• Each byte is protected by a parity bit• The entire frame is protected by a parity

byte

1011110 1

1101001 0

0101001 1

1011111 0

0110100 1

0001110 1

1111011 0

Paritybits

Paritybyte

Data

Two-Dimensional Parity Checks

• Arrange a string of bits as atwo-dimensional array and compute parity over each row and each column of the array

• Can detect– Any number of errors in a single row (detect even

number of errors with column parity)– Any number of errors in a single column (detect even

number of errors with row parity)– Does it protect against everything?

• Answer is no. Why? Hint: Read between the lines. Single row or column

• What about burst errors– Need something stronger. CRC codes

CRC Codes

• Burst errors are hard to model -- three parameters are typically used to measure the effectiveness of a code for error detection1. Minimum distance of the code2. Burst-detecting capability3. Probability that a random string is

accepted as being error-free

Cyclic Redundancy Check• Treat the (n+1) bit message as a polynomial of

degree n. The bits are the coefficients of the polynomial. – 1101 = 1*x3 + 1*x2 + 0*x1 + 1*x0

• Calculating CRC– Sender and transmitter choose a divisor polynomial of

degree k. e.g x3 + x2 + 1. Call this C(x)– Add k bits to the (n+1) bit message such that the

n+1+k bit message is exactly divisible by the divisor

• Choice of divisor is very important. – It determines the kind of errors that the CRC can guard

against.

CRC Computation• Given:

– Message: M(x)– Divisor: C(x)

• Multiply M(x) by xk, i.e. add k zeroes to the end of the message. Call this T(x)

• Divide T(x) by C(x).• Subtract the remainder from T(x)

• The result is the message including the CRC

CRC Computation

• C(x) = x3 + x2 + 1• M(x) = x7 + x4 + x3 + x• Subtraction: logical XOR operation

Generator 11011111100110011010000 Message1101

1001110110001101

1011110111001101

10001101

101 Remainder

CRC Codes• Note: The CRC is computed over the entire

message, not a byte or a row/column.• When a message+CRC arrives at the

destination, divide it by the generator polynomial. If the remainder is 0, the message is intact, else it has been corrupted.– The bit pattern that can cause a CRC code to fail

is not a regular pattern such a random error or burst errors. That’s why CRCs are strong.

– Try out an example, where you try corrupting the CRC in the previous slide

Real World Example: Internet Checksum

• What’s a checksum?– Take a guess, check sum!– Another error detection scheme

• Treat message as a sequence of 16-bit integers

• Add these integers together using 16-bit one’s-complement arithmetic

• Take the one’s complement of the result• Resulting 16-bit number is the checksum

Example: Internet Checksumu_short cksum(u_short *buf, int count){register u_long sum = 0;while (count--) {

sum += *buf++;if (sum & 0xFFFF0000) {

– /* carry, so wrap around */– sum &= 0xFFFF;– sum++;

}

}return ~(sum & 0xFFFF);

}

Data Link Layer: Reliable Communication

Data link layer: Services

• Choices– Unacknowledged connectionless service– Acknowledged connectionless service– Acknowledged connection oriented service

Framing

• Messages (datagrams, packets, …) are broken into frames at the link layer.– Why?

• Frames are independently transmitted.• Frame boundaries need to be preserved.

– Frame length– Start and stop characters or bits– Invalid signaling (coding violations)

Reliable Transmission

• Why?– Frame corruption can be severe – CRCs are not enough. Recall CRCs don’t correct errors

• Two fundamental mechanisms– Acknowledgment– Timeout

• General idea is called ARQ (Automatic Repeat Request)

Stop-and-Wait ARQ (1)

• Stop-and-wait is the simplest ARQ (but not as simple as we might at first think)

• The sending DLC transmits a frame and then waits for a reply from the receiving DLC before sending the next frame.

• The receiving DLC replies with an acknowledgment (ack) if the frame iserror-free, otherwise:– May reply with a negative acknowledgment (nak)– May wait for the sender to timeout.

a2a1

Stop-and-Wait ARQ (2)

a2

Node A

Node Back ack

a1

D

D is delay for receiving frameRound-Trip Time: RTT ≈≈≈≈ 2D

Stop-and-Wait ARQ (3)• Since errors can occur in the return direction (B to

A), the acks and naks must also be protected by a CRC

• The transmitted frame may be lost or delayed, or a reply may become corrupted, lost or delayed, so the sender must time-out and retransmit the last packet

• This leads to a problem -- how does the receiver know if it is receiving a duplicate or the next packet (e.g., in the case of a lost ack)?

a1

Stop-and-Wait ARQ (4)

Node A

Node B a1lost ack

time-out

a?

a1

How does Node B know if the second frame is a1 or a2?

00 1

10

Node A

Node Back(1) ack(2)

0discard

lostack(1)

Stop-and-Wait ARQ (3)

• A solution to the “duplicate” packet problem is to use sequence numbers– The sender places a sequence number (SeqNum) in the

frame header– The receiver acknowledges with the next frame

expected (NFE) value

• The SeqNum and NFE values require extra bits in the frame

• For stop-and-wait ARQ, sequence numbers can be modulo 2, i.e. just {0,1} or {even,odd}, if link frames stay in order

SeqNum NFE packet CRC

Stop-and-Wait ARQ (4)

Stop and Wait: Possible ScenariosSender Receiver

Frame

ACK

Tim

eout

Tim

e

Sender Receiver

Frame

ACK

Tim

eout

Frame

ACKTim

eout

Sender Receiver

Frame

ACKTim

eout

Frame

ACKTim

eout

Sender Receiver

Frame

Tim

eout

Frame

ACKTim

eout

(a) (c)

(b) (d)

Stop-and-Wait ARQ Algorithm (1)

• Algorithm at node A (sender) to send to node B:1. SeqNum ← 02. Accept new packet and assign SeqNum to it3. Send packet SeqNum with SeqNum as sequence

number. 4. Set timer for recently transmitted packet5. If error-free ack from B and NFE > SeqNum, then

SeqNum ← NFE, delete timer and go to step 2; 6. If time-out then go to step 3

Stop-and-Wait ARQ Algorithm (2)

• Algorithm at node B (receiver) to receive from node A:1.NFE ← 0, repeat steps 2 and 3 forever2.If error-free frame received and SeqNum=NFE,

then pass packet to higher level andNFE ← NFE + 1 (modulo 2)

3.At some bounded time after receiving error-free frame send request for NFE to A

Stop and Wait: Performance problems?• No more than one packet in flight.

– That’s usually bad, here’s why

• Take a 10Mbps network with a 50ms round trip time

• Delay bandwidth = 107 * 0.050 = 500 Kbits

• In Stop and Wait, only frame can be in flight. The max frame size is 1500 bytes– Hence sending rate =

• 1500 * 8 ÷÷÷÷ 0.050 = 240 Kbps– This is much less than the link capacity of 10 Mbps

Performance Problems

• Using the actual 10Mbps Ethernet RTT of 50us (roughly)

• Delay bandwidth = 107 * 50us = 500 bits

• In Stop and Wait, only frame can be in flight. The max Ethernet frame size is 1500 bytes– Hence sending rate =

• 1500 * 8 ÷÷÷÷ 50us = 240 Mbps– This is much greater than the link capacity of 10 Mbps

• What happened??

Performance Analysis

Performance Analysis

• Putting in numbers for 10 Mbps ethernet– Packet size: 1518 bytes– ACK size: 64 bytes– Ignore propagation time. – Packet Tx time: 1.2144ms– Ack Tx time: 51.2 us

• Efficiency = 95.95%– More believable!

• Moral: If frame size exceeds delay bandwidth product, efficiency computation should be used.– Why?

Significance of Delay Bandwidth• Delay bandwidth represents the amount of data that

has left the transmitter and is still on the cable.

• Think of the cable as a pipe. This keeps the pipe full

• Delay bandwidth also represents the upper bound on stability.

• More sophisticated ARQ algorithms try to match their sending rate to the dynamic delay bandwidth product– Why is delay bandwidth dynamic?

Sliding Window Protocols

• Keep the pipe full

•Send N packets before expecting the first ACK

Go-Back-n ARQ (1)• Sliding window or go-back-n ARQ is used in many

standard DLCs and transport protocols• Go-back-n ARQ extends stop-and-wait ARQ

– Sender does not have to wait for ack before sending the next packet

– Receiver accepts only packets in order and periodically sends an ack with request number NFE, where NFEacknowledges all packets with sequence numbers less than NFE and requests the packet with sequence number NFE

Go-Back-n ARQ (2)

• Parameter n (or SWS for send window size) determines how many packets may be outstanding before a request (ack) is received

• With SWS = n = 1, go-back-n becomesstop-and-wait ARQ

Go-Back-n ARQ Variables

• Sender variables– LFS: Last Frame Sent– LAR: Last Acknowledgment Received– SWS: Send Window Size

• Receiver variables– LFA: Last Frame Acceptable– NFE: Next Frame Expected– RWS: Receive Window Size

Go-Back-n ARQ Example (1)• Example of go-back-4 ARQ (SWS = 4)• Excessive delays and small n cause sender to have

to wait for acknowledgments

1

Node A

Node B0

SeqNum:

NFE:

delivered: 0

0

0

window: [0,3] [1,4] [2,5]

1

1

1

2

2

2 3

3

3

4 5

4

4

5

5

6

8

[6,9][3,6][4,7]

9

7

6

6

7

7

Window is [LAR, LAR+SWS-1]

Go-Back-n ARQ Example (2)

• Example of error in forward direction ingo-back-4 ARQ– All packets sent since the error-frame was sent

must be retransmitted– Sender must save and be ready to transmit the

last n (SWS) packets

5

Go-Back-n ARQ Example (3)

1

1

0Node

ANode B

0

SeqNum:

NFE:

0delivered:

0

2 3

1

window: [0,3] [1,4]4 5

[2,5]

1 2 2 2

2 3 4

2 3

[3,6]

4

2 3 4

5 6

time-out

5

Go-Back-n ARQ Algorithm (1)• Algorithm at node A (sender) to send to node B:

1. LAR ← 0, LFS ← -12. Do steps 3, 4, 5 in any order (with bounded delay)3. If LFS < LAR + SWS and if packet is available then

accept packet, assign it sequence number SeqNum, SeqNum ← LFS + 1

4. If error-free frame from B and with NFE > LAR, then LAR ← NFE

Go-Back-n ARQ Algorithm (2)

• Algorithm at node A (sender) to send to node B (continued):5.If LAR < LFS and no frame is being

transmitted, then choose some number SeqNum,LAR ≤ SeqNum ≤ LFS, and transmit packet SeqNum with SeqNum as sequence number; packet LAR must be transmitted within a bounded delay if value of LAR does not change

Go-Back-n ARQ Algorithm (3)• Algorithm at node B (receiver) to receive from

node A:1. NFE ← 0, repeat steps 2 and 3 forever2. If error-free frame received and SeqNum = NFE, then

pass packet to higher level andNFE ← NFE + 1

3. At some bounded time after receiving error-free frame send request for NFE to A

• Correctness of go-back-n ARQ can be proven; sequence numbers may be modulo m, m > n, as long as frames are delivered in order of transmission

Reliable Transmission: A State Machine PerspectiveReliable Transmission: A State Machine Perspective

Srinidhi Varadarajan

Principles of Reliable data transferPrinciples of Reliable data transfer� important in app., transport, link layers� top-10 list of important networking topics!

� characteristics of unreliable channel will determine complexity of reliable data transfer protocol (rdt)

Reliable data transfer: getting startedReliable data transfer: getting started

sendside

receiveside

rdt_send(): called from above, (e.g., by app.). Passed data to

deliver to receiver upper layer (receiver app)

udt_send(): called by rdt,to transfer packet over

unreliable channel to receiver

rdt_rcv(): called when packet arrives on rcv-side of channel

deliver_data(): called by rdtto deliver data to upper layer

Reliable data transfer: getting startedReliable data transfer: getting started

We’ll:� incrementally develop sender, receiver sides

of reliable data transfer protocol (rdt)� consider only unidirectional data transfer

– but control info will flow on both directions!� use finite state machines (FSM) to specify

sender, receiver

state1

state2

event causing state transitionactions taken on state transition

state: From current “state” next state

uniquely determined by

next event

eventactions

Rdt1.0: Rdt1.0: reliable transfer over a reliable channelreliable transfer over a reliable channel

� underlying channel perfectly reliable– no bit errors– no loss of packets

� separate FSMs for sender, receiver:– sender sends data into underlying channel– receiver read data from underlying channel

Rdt2.0: channel with bit errorsRdt2.0: channel with bit errors� underlying channel may flip bits in packet

– Need error detection. CRC, parity …� the question: how to recover from errors:

– acknowledgements (ACKs): receiver explicitly tells sender that packet was received correctly

– negative acknowledgements (NAKs): receiver explicitly tells sender that packet had errors

– sender retransmits packet on receipt of NAK– human scenarios using ACKs, NAKs?

• Telephone conversation. OK, Could you repeat that please?� new mechanisms in rdt2.0 (beyond rdt1.0):

– error detection– receiver feedback: control msgs (ACK,NAK) rcvr->sender

rdt2.0: FSM specificationrdt2.0: FSM specification

sender FSM receiver FSM

rdt2.0: in action (no errors)rdt2.0: in action (no errors)

sender FSM receiver FSM

rdt2.0: in action (error scenario)rdt2.0: in action (error scenario)

sender FSM receiver FSM

rdt2.0 has a fatal flaw!rdt2.0 has a fatal flaw!What happens if

ACK/NAK corrupted?� sender doesn’t know what

happened at receiver!� can’t just retransmit:

possible duplicate

What to do?� sender ACKs/NAKs

receiver’s ACK/NAK? What if sender ACK/NAK lost?

� retransmit, but this might cause retransmission of correctly received pkt!

Handling duplicates: � sender adds sequence

number to each pkt� sender retransmits current

pkt if ACK/NAK garbled� receiver discards (doesn’t

deliver up) duplicate pkt

Sender sends one packet, then waits for receiver response

stop and wait

rdt2.1: sender, handles garbled ACK/rdt2.1: sender, handles garbled ACK/NAKsNAKs

rdt2.1: receiver, handles garbled rdt2.1: receiver, handles garbled ACK/ACK/NAKsNAKs

rdt2.1: discussionrdt2.1: discussion

Sender:� seq # added to pkt� two seq. #’s (0,1) will

suffice. Why?� must check if received

ACK/NAK corrupted � twice as many states

– state must “remember” whether “current” pkt has 0 or 1 seq. #

Receiver:� must check if received

packet is duplicate– state indicates whether

0 or 1 is expected pkt seq #

� note: receiver can notknow if its last ACK/NAK received OK at sender

rdt2.2: a NAKrdt2.2: a NAK--free protocolfree protocol

� same functionality as rdt2.1, using ACKsonly

� instead of NAK, receiver sends ACK for last pkt received OK– receiver must explicitly

include seq # of pkt being ACKed

� duplicate ACK at sender results in same action as NAK: retransmit current pkt

senderFSM

!

rdt3.0: channels with errors rdt3.0: channels with errors andand lossloss

New assumption:underlying channel can also lose packets (data or ACKs)– checksum, seq. #,

ACKs, retransmissions will be of help, but not enough

Q: how to deal with loss?– sender waits until data

or ACK lost, then retransmits

– How do you know when the data is lost?

Approach: sender waits “reasonable” amount of time for ACK

� retransmits if no ACK received in this time

� if pkt (or ACK) just delayed (not lost):– retransmission will be

duplicate, but use of seq. #’s already handles this

– receiver must specify seq # of pkt being ACKed

� requires countdown timer

rdt3.0 senderrdt3.0 sender

rdt3.0 in actionrdt3.0 in action

rdt3.0 in actionrdt3.0 in action

Performance of rdt3.0Performance of rdt3.0

� rdt3.0 works, but performance stinks� example: 1 Gbps link, 15 ms e-e prop. delay, 1KB

packet:Ttransmit = 8kb/pkt

10**9 b/sec = 8 microsec

Utilization = U = = 8 microsec30.016 msec

fraction of timesender busy sending = 0.00015

– 1KB pkt every 30 msec -> 33kB/sec throughput over 1 Gbps link

– network protocol limits use of physical resources!

Pipelined protocolsPipelined protocolsPipelining: sender allows multiple, “in-flight”, yet-

to-be-acknowledged pkts– range of sequence numbers must be increased– buffering at sender and/or receiver

� Two generic forms of pipelined protocols: go-Back-N, selective repeat

GoGo--BackBack--NNSender:� k-bit seq # in pkt header� “window” of up to N, consecutive unack’ed pkts allowed

� ACK(n): ACKs all pkts up to, including seq # n - “cumulative ACK”– may receive duplicate ACKs (see receiver)

� timer for each in-flight pkt� timeout(n): retransmit pkt n and all higher seq # pkts in window

GBN: sender extended FSMGBN: sender extended FSM

GBN: receiver extended FSMGBN: receiver extended FSM

receiver simple:� ACK-only: always send ACK for correctly-

received pkt with highest in-order seq #– may generate duplicate ACKs– need only remember expectedseqnum

� out-of-order pkt: – discard (don’t buffer) -> no receiver buffering!– ACK pkt with highest in-order seq #

GBN inGBN inactionaction

Problems with GBNProblems with GBN

� Retransmits entire sender window on timeout– Can cause excessive retransmissions– Problem is exacerbated for networks with large

“memory”, i.e. large delay bandwidth product� Receiver throws away any out of order

packets, even if they are received correctly.– Forces retransmission

Selective RepeatSelective Repeat

� receiver individually acknowledges all correctly received pkts– buffers pkts, as needed, for eventual in-order

delivery to upper layer� sender only resends pkts for which ACK

not received– sender timer for each unACKed pkt

� sender window– N consecutive seq #’s– again limits seq #s of sent, unACKed pkts

Selective repeat: sender, receiver windowsSelective repeat: sender, receiver windows

Selective repeatSelective repeat

data from above :� if next available seq # in

window, send pkt� else hold packettimeout(n):� resend pkt n, restart timerACK(n) in

[sendbase,sendbase+N]:� mark pkt n as received� if n smallest unACKed pkt,

advance window base to next unACKed seq #

� Transmit any pending packets

senderpkt n in [rcvbase, rcvbase+N-1]

� send ACK(n)� out-of-order: buffer� in-order: deliver (also

deliver buffered, in-order pkts), advance window to next not-yet-received pkt

pkt n in [rcvbase-N,rcvbase-1]

� ACK(n)otherwise:� ignore

receiver

Selective repeat in actionSelective repeat in action

Selective repeat:Selective repeat:dilemmadilemma

Example: � seq #’s: 0, 1, 2, 3� window size=3

� receiver sees no difference in two scenarios!

� incorrectly passes duplicate data as new in (a)

Q: what relationship between seq # size and window size?

Out of Order DeliveryOut of Order Delivery

� What happens if the network delivers packets out of order – Send order != receive order

� Need a much larger – potentially infinite - sequence space– Why?

Socket ProgrammingSocket Programming

Srinidhi Varadarajan

ClientClient--server paradigmserver paradigm

Client:� initiates contact with server

(“speaks first”)� typically requests service

from server, � for Web, client is

implemented in browser; for e-mail, in mail reader

Server:� provides requested service

to client� e.g., Web server sends

requested Web page, mail server delivers e-mail

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

request

reply

Application Layer ProgrammingApplication Layer Programming

API: application programming interface� defines interface between application and

transport layer

� sockets: Internet API– two processes communicate by sending data

into socket, reading data out of socket

Socket Interface. What is it?Socket Interface. What is it?

� Gives a file system like abstraction to the capabilities of the network.

� Each transport protocol offers a set of services. The socket API provides the abstraction to access these services

� The API defines function calls to create, close, read and write to/from a socket.

Socket AbstractionSocket Abstraction

� The socket is the basic abstraction for network communication in the socket API– Defines an endpoint of communication for a process– Operating system maintains information about the

socket and its connection– Application references the socket for sends, receives,

etc.

ProcessB

ProcessA

Ports (Sockets)

Network

What do you need for socket communication ?What do you need for socket communication ?

� Basically 4 parameters– Source Identifier (IP address)– Source Port– Destination Identifier – Destination Port

� In the socket API, this information is communicated by binding the socket.

Creating a socketCreating a socketint socket(int domain, int type, int protocol)

The call returns a integer identifier called a handle

Protocol Family:PF_INET or PF_UNIX

Communication semantics:

SOCK_STREAM or SOCK_DGRAM

Usually UNSPEC

Binding a socketBinding a socketint bind (int socket, struct sockaddr *address, int addr_len)

� This call is executed by:– Server in TCP and UDP

� It binds the socket to the specified address. The address parameter specifies the local component of the address, e.g. IP address and UDP/TCP port

Socket DescriptorsSocket Descriptors

� Operating system maintains a set of socket descriptors for each process– Note that socket descriptors are shared

by threads� Three data structures

– Socket descriptor table– Socket data structure– Address data structure

Socket DescriptorsSocket DescriptorsSocket

DescriptorTable

0:1:2:

...

proto family:PF_INET

Socket DataStructure

service:SOCK_STREAMlocal address:

...

remote address:

address family:AF_INET

Address DataStructure

host IP:128.173.88.85port:80

TCP Server Side: ListenTCP Server Side: Listenint listen (int socket, int backlog)

� This server side call specifies the number of pending connections on the given socket.

� When the server is processing a connection, “backlog” number of connections may be pending in a queue.

TCP Server Side: Passive OpenTCP Server Side: Passive Openint accept (int socket, struct sockaddr *address, int *addr_len)

� This call is executed by the server.

� The call does not return until a remote client has established a connection.

� When it completes, it returns a new socket handle corresponding to the just-established connection

TCP Client Side: Active OpenTCP Client Side: Active Openint connect (int socket, struct sockaddr *address, int *addr_len)

� This call is executed by the client. *address contains the remote address.

� The call attempts to connect the socket to a server. It does not return until a connection has been established.

� When the call completes, the socket “socket” is connected and ready for communication.

Sockets: SummarySockets: Summary

� Client:int socket(int domain, int type, int protocol)int connect (int socket, struct sockaddr *address, int addr_len)

� Server:int socket(int domain, int type, int protocol)int bind (int socket, struct sockaddr *address, int addr_len)int listen (int socket, int backlog)int accept (int socket, struct sockaddr *address, int *addr_len)

Message PassingMessage Passing� int send (int socket, char *message, int msg_len, int

flags) (TCP)

� int sendto (int socket, void *msg, int len, intflags, struct sockaddr * to,int tolen ); (UDP)

� int write(int socket, void *msg, int len); /* TCP */

� int recv (int socket, char *buffer, int buf_len, intflags) (TCP)

� int recvfrom(int socket, void *msg, int len, intflags, struct sockaddr *from, int*fromlen); (UDP)

� int read(int socket, void *msg, int len); (TCP)

Summary of Basic Socket CallsSummary of Basic Socket Calls

CLIENT SERVER

accept()connect()

Connect(3-way handshake)

write() read()Data

read() write()Data

close() close()

new connection

Network Byte OrderNetwork Byte Order

� Network byte order is most-significant byte first

� Byte ordering at a host may differ� Utility functions

– htons(): Host-to-network byte order for a short word (2 bytes)

– htonl(): Host-to-network byte order for a long word (4 bytes)

– ntohs(): Network-to-host byte order for a short word

– ntohl(): Network-to-host byte order for a long word

Some Other “Utility” FunctionsSome Other “Utility” Functions� gethostname() -- get name of local host� getpeername() -- get address of remote

host� getsockname() -- get local address of

socket� getXbyY() -- get protocol, host, or service

number using known number, address, or port, respectively

� getsockopt() -- get current socket options� setsockopt() -- set socket options� ioctl() -- retrieve or set socket information

Some Other “Utility” Functions Some Other “Utility” Functions

� inet_addr() -- convert “dotted” character string form of IP address to internal binary form

� inet_ntoa() -- convert internal binary form of IP address to “dotted” character string form

Address Data StructuresAddress Data Structures

� sockaddr is a generic address structure

� sockaddr_in is specific instance for the Internet address family

struct sockaddr {u_short sa_family; // type of addresschar sa_data[14]; // value of address

}

struct sockaddr_in {u_short sa_family; // type of address (AF_INET)u_short sa_port; // protocol port numberstruct in_addr sin_addr; // IP addresschar sin_zero[8]; // unused (set to zero)

}

Physical Layer

Srinidhi Varadarajan

P

Medium Access Links and ProtocolsThree types of “links”:• point-to-point (single wire, e.g. PPP, SLIP)• broadcast (shared wire or medium; e.g, Ethernet,

Wavelan, etc.)

• switched (e.g., telephone systems, switched Ethernet, ATM etc)

Point-to-Point protocols

• Telephone networks– Switched hierarchy.– Local Loop is the last mile interface to customer

premises equipment. (generally referred to in the networking world as the source of all evil)

– Originally involved a physical connection between the sender and the receiver.

– Nowadays, telephone networks use circuit switched medium access control

• Modems: Digital interface to the world of telephony

Modems: Signaling

• Modems: – Work over low bandwidth telephone lines (3000

Hz)• Signaling schemes: (why not just use digital

bit patterns?)– Possible choices:

• Amplitude modulation (AM)• Frequency modulation (FM or FSK)• Phase modulation (PSK)

Modems Signaling

• Modern modems use a combination of PSK and AM

• Create charts called constellation patterns.– Multiple bits encoded per signal.– Trellis encoding is used to minimize the chance of error.

Errors cause loss of several bits• Echo cancellation/suppression

– Needed for long-haul voice communication. – Prevents full duplex – In-band signaling at 2100 Hz is used to inhibit echo

cancellation circuitry.– Newer solution uses end-point resources for echo

suppression.

RS-232C, RS449: Point-to-Point Communication

• RS-232C and RS449 specify physical layer point-to-point serial communication

• 25 or 9 pin connectors, 15m cable length– <-3V = 1, >+4V=0,– BW: 20Kbps (originally, upgraded now to up to 115Kbps)– Main communication occurs using the RTS/CTS

protocol.• RS-449 is an upgraded RS-232C with 2 modes of

communication– Unbalanced mode, physically is similar to RS-232C, with

common ground signaling. – Balanced mode uses independent ground. Data rate

2Mbps with lengths up to 60m

Multiple Access protocols• single shared communication channel • two or more simultaneous transmissions by nodes:

interference – only one node can send successfully at a time

• multiple access protocol:– distributed algorithm that determines how stations share channel,

i.e., determine when station can transmit– communication about channel sharing must use channel itself! – what to look for in multiple access protocols:

• synchronous or asynchronous • information needed about other stations • robustness (e.g., to channel errors) • performance

Multiple Access protocols

• claim: humans use multiple access protocols all the time

• class can "guess" multiple access protocols– multiaccess protocol 1:– multiaccess protocol 2:– multiaccess protocol 3:– multiaccess protocol 4:

MAC Protocols: a taxonomy

Three broad classes:• Channel Partitioning

– divide channel into smaller “pieces” (time slots, frequency)

– allocate piece to node for exclusive use• Random Access

– allow collisions– “recover” from collisions

• “Taking turns”– tightly coordinate shared access to avoid collisions

Goal: efficient, fair, simple, decentralized

Channel Partitioning MAC protocols: TDMA

TDMA: time division multiple access• access to channel in "rounds" • each station gets fixed length slot (length = pkt trans

time) in each round • unused slots go idle • example: 6-station LAN, 1,3,4 have pkt, slots 2,5,6

idle

Channel Partitioning MAC protocols: FDMA

FDMA: frequency division multiple access• channel spectrum divided into frequency bands• each station assigned fixed frequency band• unused transmission time in frequency bands go idle • example: 6-station LAN, 1,3,4 have pkt, frequency bands

2,5,6 idle fr

eque

ncy

band

s

time

Channel Partitioning (CDMA)

CDMA (Code Division Multiple Access)• unique “code” assigned to each user; ie, code set

partitioning• used mostly in wireless broadcast channels (cellular,

satellite,etc)• all users share same frequency, but each user has own

“chipping” sequence (ie, code) to encode data• encoded signal = (original data) X (chipping sequence)• decoding: inner-product of encoded signal and chipping

sequence• allows multiple users to “coexist” and transmit

simultaneously with minimal interference (if codes are “orthogonal”)

CDMA Encode/Decode

CDMA: two-sender interference

Random Access protocols

• When node has packet to send– transmit at full channel data rate R.– no a priori coordination among nodes

• two or more transmitting nodes -> “collision”,• random access MAC protocol specifies:

– how to detect collisions– how to recover from collisions (e.g., via delayed

retransmissions)• Examples of random access MAC protocols:

– slotted ALOHA– ALOHA– CSMA and CSMA/CD

Slotted Aloha• time is divided into equal size slots (= pkt trans.

time)• node with new arriving pkt: transmit at beginning

of next slot • if collision: retransmit pkt in future slots with

probability p, until successful.

Success (S), Collision (C), Empty (E) slots

Slotted Aloha efficiencyQ: what is max fraction slots successful?A: Suppose N stations have packets to send

– each transmits in slot with probability p– prob. successful transmission S is:

by single node: S= p (1-p)(N-1)

by any of N nodes S = Prob (only one transmits)= N p (1-p)(N-1)

… choosing optimum p as n -> infty ...

= 1/e = .37 as N -> infty

At best: channeluse for useful transmissions 37%of time!

Pure (unslotted) ALOHA

• unslotted Aloha: simpler, no synchronization• pkt needs transmission:

– send without awaiting for beginning of slot• collision probability increases:

– pkt sent at t0 collide with other pkts sent in [t0-1, t0+1]

Pure Aloha (cont.)

P(success by given node) = P(node transmits) .

P(no other node transmits in [p0-1,p0] .P(no other node transmits in [p0-1,p0]

= p . (1-p) . (1-p)P(success by any of N nodes) = N p . (1-p) . (1-p)

… choosing optimum p as n -> infty ...= 1/(2e) = .18

S =

thr o

ugh p

ut =

“go o

dpu t

” (s

u cce

s s r

ate)

G = offered load = Np0.5 1.0 1.5 2.0

0.1

0.2

0.3

0.4

Pure Aloha

Slotted Aloha protocol constrainseffective channelthroughput!

CSMA: Carrier Sense Multiple Access

CSMA: listen before transmit:• If channel sensed idle: transmit entire pkt• If channel sensed busy, defer transmission

– Persistent CSMA: retry immediately with probability p when channel becomes idle (may cause instability)

– Non-persistent CSMA: retry after random interval

• human analogy: don’t interrupt others!

CSMA collisions

collisions can occur:propagation delay means two nodes may not yearhear each other’s transmission

collision:entire packet transmission time wasted

spatial layout of nodes along ethernet

note:role of distance and propagation delay in determining collision prob.

CSMA/CD (Collision Detection)

CSMA/CD: carrier sensing, deferral as in CSMA– collisions detected within short time– colliding transmissions aborted, reducing channel

wastage – persistent or non-persistent retransmission

• collision detection:– easy in wired LANs: measure signal strengths,

compare transmitted, received signals– difficult in wireless LANs: receiver shut off while

transmitting• human analogy: the polite conversationalist

CSMA/CD collision detection

“Taking Turns” MAC protocols

channel partitioning MAC protocols:– share channel efficiently at high load– inefficient at low load: delay in channel access, 1/N

bandwidth allocated even if only 1 active node! Random access MAC protocols

– efficient at low load: single node can fully utilize channel– high load: collision overhead

“taking turns” protocolslook for best of both worlds!

“Taking Turns” MAC protocols

Polling:• master node “invites”

slave nodes to transmit in turn

• Request to Send, Clear to Send msgs

• concerns:– polling overhead – latency– single point of failure

(master)

Token passing:• control token passed from

one node to next sequentially.• token message• concerns:

– token overhead – latency– single point of failure (token)

Reservation-based protocolsDistributed Polling:• time divided into slots• begins with N short reservation slots

– reservation slot time equal to channel end-end propagation delay

– station with message to send posts reservation– reservation seen by all stations

• after reservation slots, message transmissions ordered by known priority

Medium Access Protocols

Summary of MAC protocols• What do you do with a shared media?

– Channel Partitioning, by time, frequency or code• Time Division,Code Division, Frequency Division

– Random partitioning (dynamic), • ALOHA, S-ALOHA, CSMA, CSMA/CD• carrier sensing: easy in some technologies (wire), hard in others

(wireless)• CSMA/CD used in Ethernet

– Taking Turns• polling from a central cite, token passing

LAN technologiesData link layer so far:

– services, error detection/correction, multiple access

Next: LAN technologies– addressing– Ethernet– hubs, bridges, switches– 802.11– PPP– ATM

LAN Addresses and ARP32-bit IP address:• network-layer address• used to get datagram to destination network (recall

IP network definition)LAN (or MAC or physical) address: • used to get datagram from one interface to another

physically-connected interface (same network)• 48 bit MAC address (for most LANs)

burned in the adapter ROM

LAN Addresses and ARPEach adapter on LAN has unique LAN address

LAN Address (more)• MAC address allocation administered by IEEE• manufacturer buys portion of MAC address space

(to assure uniqueness)• Analogy:

(a) MAC address: like Social Security Number(b) IP address: like postal address

• MAC flat address => portability – can move LAN card from one LAN to another

• IP hierarchical address NOT portable– depends on network to which one attaches

Ethernet

IP SourceIP: 130.245.20.1

Ethernet: 0A:03:21:60:09:FA

IP DestinationIP: 130.245.20.2

Ethernet: 0A:03:23:65:09:FB

ARP QueryWhat is the Ethernet Address of 130.245.20.2

ARP Response0A:03:23:65:09:FB

Address Resolution Protocol (ARP)

• Maps IP addresses to Ethernet Addresses• ARP responses are cached

ARP protocol• A knows B's IP address, wants to learn

physical address of B • A broadcasts ARP query pkt, containing B's

IP address – all machines on LAN receive ARP query

• B receives ARP packet, replies to A with its (B's) physical layer address

• A caches (saves) IP-to-physical address pairs until information becomes old (times out) – soft state: information that times out (goes

away) unless refreshed

Ethernet“dominant” LAN technology: • cheap $20 for 100Mbs!• first widely used LAN technology• Simpler, cheaper than token ring LANs and ATM• Kept up with speed race: 10, 100, 1000 Mbps

Metcalfe’s Etheretsketch

Ethernet Frame StructureSending adapter encapsulates IP datagram (or

other network layer protocol packet) in Ethernet frame

Preamble:• 7 bytes with pattern 10101010 followed by

one byte with pattern 10101011• used to synchronize receiver, sender clock

rates

Ethernet Frame Structure (more)• Addresses: 6 bytes, frame is received by all

adapters on a LAN and dropped if address does not match

• Type/length: indicates the higher layer protocol, mostly IP but others may be supported such as Novell IPX and AppleTalk)

• CRC: checked at receiver, if error is detected, the frame is simply dropped

Ethernet: uses CSMA/CD

A: sense channel, if idle then {

transmit and monitor the channel; If detect another transmission then {

abort and send jam signal; update # collisions; delay as required by exponential backoff algorithm; goto A}

else {done with the frame; set collisions to zero}}

else {wait until ongoing transmission is over and goto A}

Ethernet’s CSMA/CD (more)

Jam Signal: make sure all other transmitters are aware of collision; 48 bits;

Exponential Backoff:• Goal: adapt retransmission attempts to estimated

current load– heavy load: random wait will be longer

• first collision: choose K from {0,1}; delay is K x 512 bit transmission times

• after second collision: choose K from {0,1,2,3}…• after ten or more collisions, choose K from

{0,1,2,3,4,…,1023}

Ethernet Technologies: 10Base2• 10: 10Mbps; 2: under 200 meters max cable length• thin coaxial cable in a bus topology

• repeaters used to connect up to multiple segments• repeater repeats bits it hears on one interface to its

other interfaces: physical layer device only!

10BaseT and 100BaseT• 10/100 Mbps rate; latter called “fast ethernet”• T stands for Twisted Pair• Hub to which nodes are connected by twisted pair,

thus “star topology”• CSMA/CD implemented at hub

10BaseT and 100BaseT (more)• Max distance from node to Hub is 100 meters• Hub can disconnect “jabbering adapter• Hub can gather monitoring information, statistics for

display to LAN administrators

Gbit Ethernet• use standard Ethernet frame format• allows for point-to-point links and shared broadcast

channels• in shared mode, CSMA/CD is used; short distances

between nodes to be efficient• uses hubs, called “Buffered Distributors”• Full-Duplex at 1 Gbps for point-to-point links

Token Passing: IEEE802.5 standard

• 4 Mbps • max token holding time: 10 ms, limiting frame length

• SD, ED mark start, end of packet • AC: access control byte:

– token bit: value 0 means token can be seized, value 1 means data follows FC

– priority bits: priority of packet – reservation bits: station can write these bits to prevent stations

with lower priority packet from seizing token after token becomes free

Token Passing: IEEE802.5 standard

• FC: frame control used for monitoring and maintenance

• source, destination address: 48 bit physical address, as in Ethernet

• data: packet from network layer • checksum: CRC • FS: frame status: set by dest., read by sender

– set to indicate destination up, frame copied OK from ring – DLC-level ACKing

Interconnecting LANsQ: Why not just one big LAN? • Limited amount of supportable traffic: on single LAN,

all stations must share bandwidth • limited length: 802.3 specifies maximum cable

length • large “collision domain” (can collide with many

stations)• limited number of stations: 802.5 have token

passing delays at each station

Hubs• Physical Layer devices: essentially repeaters

operating at bit levels: repeat received bits on one interface to all other interfaces

• Hubs can be arranged in a hierarchy (or multi-tier design), with backbone hub at its top

Hubs (more)

• Each connected LAN referred to as LAN segment• Hubs do not isolate collision domains: node may collide

with any node residing at any segment in LAN • Hub Advantages:

– simple, inexpensive device– Multi-tier provides graceful degradation: portions of

the LAN continue to operate if one hub malfunctions– extends maximum distance between node pairs

(100m per Hub)

Hub limitations• single collision domain results in no increase in max

throughput– multi-tier throughput same as single segment throughput

• individual LAN restrictions pose limits on number of nodes in same collision domain and on total allowed geographical coverage

• cannot connect different Ethernet types (e.g., 10BaseT and 100baseT)

Bridges• Link Layer devices: operate on Ethernet frames,

examining frame header and selectively forwarding frame based on its destination

• Bridge isolates collision domains since it buffers frames

• When frame is to be forwarded on segment, bridge uses CSMA/CD to access segment and transmit

Bridges (more)• Bridge advantages:

– Isolates collision domains resulting in higher total max throughput, and does not limit the number of nodes nor geographical coverage

– Can connect different type Ethernet since it is a store and forward device

– Transparent: no need for any change to hosts LAN adapters

Bridges: frame filtering, forwarding• bridges filter packets

– same-LAN -segment frames not forwarded onto other LAN segments

• forwarding: – how to know which LAN segment on which to

forward frame?– looks like a routing problem (more shortly!)

Backbone Bridge

Interconnection Without Backbone

• Not recommended for two reasons:- single point of failure at Computer Science hub- all traffic between EE and SE must path over CS segment

Bridge Filtering

• bridges learn which hosts can be reached through which interfaces: maintain filtering tables– when frame received, bridge “learns” location of

sender: incoming LAN segment– records sender location in filtering table

• filtering table entry: – (Node LAN Address, Bridge Interface, Time Stamp)– stale entries in Filtering Table dropped (TTL can be 60

minutes)

Bridge Filtering

• filtering procedure:if destination is on LAN on which frame was received

then drop the frameelse { lookup filtering table

if entry found for destinationthen forward the frame on interface indicated;else flood; /* forward on all but the interface on

which the frame arrived*/}

Bridge Learning: exampleSuppose C sends frame to D and D replies

back with frame to C

• C sends frame, bridge has no info about D, so floods to both LANs– bridge notes that C is on port 1 – frame ignored on upper LAN – frame received by D

Bridge Learning: example

• D generates reply to C, sends – bridge sees frame from D – bridge notes that D is on interface 2 – bridge knows C on interface 1, so selectively forwards frame out via interface 1

Spanning Tree• The learning bridge fails when the network topology

has a loop. – Why?

• Loops are not necessarily bad. They provide redundancy that can be used to recover from failures

• To handle loops, bridges implement the spanning tree algorithm.– The spanning tree algorithm imposes a logical tree over

the physical topology– Data is only transferred along links that belong to the

spanning tree

Spanning Tree Algorithm• Each bridge has unique id (e.g., B1, B2, B3)

• Select bridge with smallest id as root

• Select bridge on each LAN closest to root as designated bridge (use id to break ties)

• Each bridge forwards frames over each LAN for which it is the designated bridge

B3

A

C

E

DB2

B5

B

B7 KF

H

B4

J

B1

B6

G

I

Spanning Tree Algorithm (contd.)• Bridges exchange configuration messages called

CBPDU’s(Configuration Bridge Protocol Data Unit)– id for bridge sending the message– id for what the sending bridge believes to be root

bridge– distance (hops) from sending bridge to root bridge

• Each bridge records the current best configuration message for each port

• Initially, each bridge believes it is the root

Spanning Tree Algorithm (contd.)• When a bridge learns that it is not the root it stops generating

configuration messages– in steady state, only root generates configuration messages

• When the bridge learns that it is not the designated bridge, it stops forwarding configuration messages– in steady state, only designated bridges forward config messages

• Root continues to periodically send config messages

• If any bridge does not receive successive config messages, it starts generating config messages claiming to be the root– This is used to recover from root failure

Limitations of Bridges• Do not scale

– spanning tree algorithm does not scale– single large broadcast domains do not scale

• Do not accommodate heterogeneity– Bridges support ethernet to ethernet, ethernet to 802.5

and 802.5 to 802.5.

• Caution: beware of transparency – Applications that assume that they are executing on a

single LAN will fail.– Latency increases in large LANs, so does jitter

WWF Bridges vs. Routers• both store-and-forward devices

– routers: network layer devices (examine network layer headers)

– bridges are Link Layer devices• routers maintain routing tables, implement

routing algorithms• bridges maintain filtering tables, implement

filtering, learning and spanning tree algorithms

Routers vs. BridgesBridges + and -+ Bridge operation is simpler requiring less processing

bandwidth- Topologies are restricted with bridges: a spanning

tree must be built to avoid cycles - Bridges do not offer protection from broadcast

storms (endless broadcasting by a host will be forwarded by a bridge)

Routers vs. Bridges

Routers + and -+ arbitrary topologies can be supported, cycling is limited

by TTL counters (and good routing protocols)+ provide firewall protection against broadcast storms- require IP address configuration (not plug and play)- require higher processing bandwidth

• bridges do well in small (few hundred hosts) while routers used in large networks (thousands of hosts)

Medium Access Layer

Ethernet Switches• layer 2 (frame) forwarding,

filtering using LAN addresses• Switching: A-to-B and A’-to-

B’ simultaneously, no collisions

• large number of interfaces• often: individual hosts, star-

connected into switch– Ethernet, but no collisions!

Ethernet Switches• cut-through switching: frame forwarded from

input to output port without awaiting for assembly of entire frame– slight reduction in latency

• combinations of shared/dedicated, 10/100/1000 Mbps interfaces

Ethernet Switches (more)Dedicated

Shared

IEEE 802.11 Wireless LAN• wireless LANs: untethered (often mobile) networking• IEEE 802.11 standard:

– MAC protocol– unlicensed frequency spectrum: 900Mhz, 2.4Ghz

• Basic Service Set (BSS)(a.k.a. “cell”) contains:– wireless hosts– access point (AP): base

station• BSS’s combined to form

distribution system (DS)

Ad Hoc Networks• Ad hoc network: IEEE 802.11 stations can

dynamically form network without AP• Applications:

– “laptop” meeting in conference room, car– interconnection of “personal” devices– battlefield

• IETF MANET (Mobile Ad hoc Networks) working group

IEEE 802.11 MAC Protocol: CSMA/CA

802.11 CSMA: sender- if sense channel idle for

DIFS sec.then transmit entire frame

(no collision detection)-if sense channel busy

then binary backoff

802.11 CSMA receiver:if received OK

return ACK after SIFS

IEEE 802.11 MAC Protocol802.11 CSMA Protocol:

others• NAV: Network Allocation

Vector• 802.11 frame has

transmission time field• others (hearing data) defer

access for NAV time units

Hidden Terminal effect• hidden terminals: A, C cannot hear each other

– obstacles, signal attenuation– collisions at B

• goal: avoid collisions at B• CSMA/CA: CSMA with Collision Avoidance

Collision Avoidance: RTS-CTS exchange

• CSMA/CA: explicit channel reservation– sender: send short

RTS: request to send– receiver: reply with

short CTS: clear to send

• CTS reserves channel for sender, notifying (possibly hidden) stations

• avoid hidden station collisions

Collision Avoidance: RTS-CTS exchange

• RTS and CTS short:– collisions less likely,

of shorter duration– end result similar to

collision detection• IEEE 802.11 allows:

– CSMA– CSMA/CA:

reservations– polling from AP

Point to Point Data Link Control

• one sender, one receiver, one link: easier than broadcast link:– no Media Access Control– no need for explicit MAC addressing– e.g., dialup link, ISDN line

• popular point-to-point DLC protocols:– PPP (point-to-point protocol)– HDLC: High level data link control (Data

link used to be considered “high layer” in protocol stack!

PPP Design Requirements [RFC 1557]• packet framing: encapsulation of network-layer

datagram in data link frame – carry network layer data of any network layer

protocol (not just IP) at same time– ability to demultiplex upwards

• bit transparency: must carry any bit pattern in the data field

• error detection (no correction)• connection livenes: detect, signal link failure to

network layer• network layer address negotiation: endpoint can

learn/configure each other’s network address

PPP non-requirements

• no error correction/recovery• no flow control• out of order delivery OK • no need to support multipoint links (e.g.,

polling)

Error recovery, flow control, data re-ordering all relegated to higher layers!|

PPP Data Frame• Flag: delimiter (framing)• Address: does nothing (only one option)• Control: does nothing; in the future possible

multiple control fields• Protocol: upper layer protocol to which frame

delivered (eg, PPP-LCP, IP, IPCP, etc)

PPP Data Frame• info: upper layer data being carried• check: cyclic redundancy check for error

detection

Byte Stuffing• “data transparency” requirement: data field

must be allowed to include flag pattern <01111110>– Q: is received <01111110> data or flag?

• Sender: adds (“stuffs”) extra < 01111110> byte after each < 01111110> data byte

• Receiver:– two 01111110 bytes in a row: discard first

byte, continue data reception– single 01111110: flag byte

Byte Stuffing

flag bytepatternin datato send

flag byte pattern plusstuffed byte in transmitted data

PPP Data Control ProtocolBefore exchanging

network-layer data, data link peers must

• configure PPP link (max. frame length, authentication)

• learn/configure networklayer information– for IP: carry IP Control

Protocol (IPCP) msgs(protocol field: 8021) to configure/learn IP address

Asynchronous Transfer Mode: ATM• 1980s/1990’s standard for high-speed

(155Mbps to 622 Mbps and higher) Broadband Integrated Service Digital Network architecture

• Goal: integrated, end-end transport of carry voice, video, data– meeting timing/QoS requirements of voice,

video (versus Internet best-effort model)– “next generation” telephony: technical roots

in telephone world– packet-switching (fixed length packets,

called “cells”) using virtual circuits

ATM architecture

• adaptation layer: only at edge of ATM network– data segmentation/reassembly– roughly analogous to Internet transport layer

• ATM layer: “network” layer– cell switching, routing

• physical layer

ATM: network or link layer?Vision: end-to-end

transport: “ATM from desktop to desktop”– ATM is a network

technologyReality: used to

connect IP backbone routers – “IP over ATM”– ATM as switched

link layer, connecting IP routers

ATM Adaptation Layer (AAL)• ATM Adaptation Layer (AAL): “adapts” upper layers

(IP or native ATM applications) to ATM layer below• AAL present only in end systems, not in switches• AAL layer segment (header/trailer fields, data)

fragmented across multiple ATM cells – analogy: TCP segment in many IP packets

ATM Adaption Layer (AAL) [more]Different versions of AAL layers, depending on ATM service

class:• AAL1: for CBR (Constant Bit Rate) services, e.g. circuit emulation• AAL2: for VBR (Variable Bit Rate) services, e.g., MPEG video• AAL5: for data (eg, IP datagrams)

AAL PDU

ATM cell

User data

AAL5 - Simple And Efficient AL (SEAL)

• AAL5: low overhead AAL used to carry IP datagrams– 4 byte cyclic redundancy check – PAD ensures payload multiple of 48bytes – large AAL5 data unit to be fragmented into 48-byte

ATM cells

ATM LayerService: transport cells across ATM network• analogous to IP network layer• very different services than IP network layer

NetworkArchitecture

Internet

ATM

ATM

ATM

ATM

ServiceModel

best effort

CBR

VBR

ABR

UBR

Bandwidth

none

constantrateguaranteedrateguaranteed minimumnone

Loss

no

yes

yes

no

no

Order

no

yes

yes

yes

yes

Timing

no

yes

yes

no

no

Congestionfeedback

no (inferredvia loss)nocongestionnocongestionyes

no

Guarantees ?

ATM Layer: Virtual Circuits• VC transport: cells carried on VC from source to

dest– call setup, teardown for each call before data can flow– each packet carries VC identifier (not destination ID)– every switch on source-dest path maintain “state” for

each passing connection– link,switch resources (bandwidth, buffers) may be

allocated to VC: to get circuit-like perf.• Permanent VCs (PVCs)

– long lasting connections– typically: “permanent” route between to IP routers

• Switched VCs (SVC):– dynamically set up on per-call basis

ATM VCs• Advantages of ATM VC approach:

– QoS performance guarantee for connection mapped to VC (bandwidth, delay, delay jitter)

• Drawbacks of ATM VC approach:– Inefficient support of datagram traffic– one PVC between each source/dest pair) does not

scale (N*2 connections needed) – SVC introduces call setup latency, processing

overhead for short lived connections

ATM Layer: ATM cell• 5-byte ATM cell header• 48-byte payload

– Why?: small payload -> short cell-creation delay for digitized voice

– halfway between 32 and 64 (compromise!)

Cell header

Cell format

ATM cell header• VCI: virtual channel ID

– will change from link to link thru net• PT: Payload type (e.g. RM cell versus data cell) • CLP: Cell Loss Priority bit

– CLP = 1 implies low priority cell, can be discarded if congestion

• HEC: Header Error Checksum– cyclic redundancy check

ATM Physical Layer (more)Two pieces (sublayers) of physical layer:• Transmission Convergence Sublayer (TCS): adapts

ATM layer above to PMD sublayer below• Physical Medium Dependent: depends on physical

medium being used

TCS Functions:– Header checksum generation: 8 bits CRC – Cell delineation– With “unstructured” PMD sublayer, transmission of idle

cells when no data cells to send

ATM Physical LayerPhysical Medium Dependent (PMD) sublayer• SONET/SDH: transmission frame structure (like a

container carrying bits); – bit synchronization; – bandwidth partitions (TDM); – several speeds: OC1 = 51.84 Mbps; OC3 = 155.52

Mbps; OC12 = 622.08 Mbps• TI/T3: transmission frame structure (old telephone

hierarchy): 1.5 Mbps/ 45 Mbps• unstructured: just cells (busy/idle)

IP-Over-ATMClassic IP only• 3 “networks” (e.g., LAN

segments)• MAC (802.3) and IP

addresses

IP over ATM• replace “network” (e.g.,

LAN segment) with ATM network

• ATM addresses, IP addresses

ATMnetwork

EthernetLANs Ethernet

LANs

IP-Over-ATMIssues:• IP datagrams

into ATM AAL5 PDUs

• from IP addresses to ATM addresses– just like IP

addresses to 802.3 MAC addresses!

ATMnetwork

EthernetLANs

Datagram Journey in IP-over-ATM Network

• at Source Host:– IP layer finds mapping between IP, ATM dest address (using ARP)– passes datagram to AAL5– AAL5 encapsulates data, segments to cells, passes to ATM layer

• ATM network: moves cell along VC to destination

• at Destination Host:– AAL5 reassembles cells into original datagram– if CRC OK, datgram is passed to IP

ARP in ATM Nets• ATM network needs destination ATM address

– just like Ethernet needs destination Ethernet address

• IP/ATM address translation done by ATM ARP (Address Resolution Protocol)– ARP server in ATM network performs broadcast of

ATM ARP translation request to all connected ATM devices

– hosts can register their ATM addresses with server to avoid lookup

X.25 and Frame RelayLike ATM:• wide area network technologies • virtual circuit oriented • origins in telephony world• can be used to carry IP datagrams

– can thus be viewed as Link Layers by IP protocol

X.25• X.25 builds VC between source and destination for

each user connection• Per-hop control along path

– error control (with retransmissions) on each hop using LAP-B• variant of the HDLC protocol

– per-hop flow control using credits• congestion arising at intermediate node

propagates to previous node on path• back to source via back pressure

IP versus X.25 • X.25: reliable in-sequence end-end delivery

from end-to-end– “intelligence in the network”

• IP: unreliable, out-of-sequence end-end delivery– “intelligence in the endpoints”

• gigabit routers: limited processing possible• 2000: IP wins

Frame Relay• Designed in late ‘80s, widely deployed in the ‘90s• Frame relay service:

– no error control– end-to-end congestion control

Frame Relay (more)• Designed to interconnect corporate customer

LANs– typically permanent VC’s: “pipe” carrying

aggregate traffic between two routers– switched VC’s: as in ATM

• corporate customer leases FR service from public Frame Relay network (eg, Sprint, ATT)

Frame Relay (more)

• Flag bits, 01111110, delimit frame• address:

– 10 bit VC ID field– 3 congestion control bits

• FECN: forward explicit congestion notification (frame experienced congestion on path)

• BECN: congestion on reverse path• DE: discard eligibility

addressflags data CRC flags

Frame Relay -VC Rate Control• Committed Information Rate (CIR)

– defined, “guaranteed” for each VC– negotiated at VC set up time– customer pays based on CIR

• DE bit: Discard Eligibility bit– Edge FR switch measures traffic rate for each VC;

marks DE bit– DE = 0: high priority, rate compliant frame; deliver

at “all costs”– DE = 1: low priority, eligible for discard when

congestion

Frame Relay - CIR & Frame Marking

• Access Rate: rate R of the access link between source router (customer) and edge FR switch(provider); 64Kbps < R < 1,544Kbps

• Typically, many VCs (one per destination router) multiplexed on the same access trunk; each VC has own CIR

• Edge FR switch measures traffic rate for each VC; it marks

• (ie DE <= 1) frames which exceed CIR (these may be later dropped)

Summary• principles behind data link layer services:

– error detection, correction– sharing a broadcast channel: multiple access– link layer addressing, ARP

• various link layer technologies– Ethernet– hubs, bridges, switches– IEEE 802.11 LANs– PPP– ATM– X.25, Frame Relay

Network Layer: Internet Protocol

Internetworking• Motivation

– Heterogeneity– Scale

• IP is the glue that connects heterogeneous networks giving the illusion of a homogenous one.

• Salient Features– Each host is identified by a unique 32 bit identifier.– Best Effort Service Model– Global Addressing Scheme

Network layer functions• transport packet from sending

to receiving hosts • network layer protocols in

every host, router

three important functions:• path determination: route taken

by packets from source to dest. Routing algorithms

• switching: move packets from router’s input to appropriate router output

• call setup: some network architectures require router call setup along path before data flows

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

networkdata linkphysical

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

The Internet Model• no call setup at network layer• routers: no state about end-to-end connections

– no network-level concept of “connection”• packets typically routed using destination host ID

– packets between same source-dest pair may take different paths

applicationtransportnetworkdata linkphysical

applicationtransportnetworkdata linkphysical

1. Send data 2. Receive data

The Internet: Service Model• Connectionless

– Datagram based

• Best-effort delivery (unreliable service)– packets are lost– packets are delivered out of order– duplicate copies of a packet may be delivered– packets can be delayed for a long time

IP Internet

• Concatenation of Networks

• Protocol Stack

R2

R1

H4

H5

H3H2H1

Network 2 (Ethernet)

Network 1 (Ethernet)

H6

Network 3 (FDDI)

Network 4(point-to-point)

H7 R3 H8

R1

ETH FDDI

IPIP

ETH

TCP R2

FDDI PPP

IP

R3

PPP ETH

IP

H1

IP

ETH

TCP

H8

IP datagram format

ver length

32 bits

data (variable length,typically a TCP

or UDP segment)

16-bit identifierInternetchecksum

time tolive

32 bit source IP address

IP protocol versionnumber

header length(32 bit words)

max numberremaining hops

(decremented at each router)

forfragmentation/reassembly

total datagramlength (bytes)

upper layer protocolto deliver payload to

head.len

type ofservice

“type” of data flgs fragmentoffset

upperlayer

32 bit destination IP address

Options (if any) E.g. timestamp,record routetaken, specifylist of routers to visit.

IP Fragmentation & Reassembly• network links have MTU

(max.transfer size) - largest possible link-level frame.– different link types,

different MTUs • large IP datagram divided

(“fragmented”) within net– one datagram becomes

several datagrams– “reassembled” only at

final destination– IP header bits used to

identify, order related fragments

fragmentation: in: one large datagramout: 3 smaller datagrams

reassembly

IP Fragmentation and ReassemblyID=x

offset=0

fragflag=0

length=4000

ID=x

offset=0

fragflag=1

length=1500

ID=x

offset=1480

fragflag=1

length=1500

ID=x

offset=2960

fragflag=0

length=1040

One large datagram becomesseveral smaller datagrams

ICMP: Internet Control Message Protocol

• used by hosts, routers, gateways to communication network-level information– error reporting: unreachable

host, network, port, protocol– echo request/reply (used by

ping)• network-layer “above” IP:

– ICMP msgs carried in IP datagrams

• ICMP message: type, code plus first 8 bytes of IP datagram causing error

Type Code description0 0 echo reply (ping)3 0 dest. network unreachable3 1 dest host unreachable3 2 dest protocol unreachable3 3 dest port unreachable3 6 dest network unknown3 7 dest host unknown4 0 source quench (congestion

control - not used)8 0 echo request (ping)9 0 route advertisement10 0 router discovery11 0 TTL expired12 0 bad IP header

IP Addressing: Introduction

• IP address: 32-bit identifier for host, router interface

• interface: connection between host, router and physical link– router’s typically have

multiple interfaces– host may have multiple

interfaces– IP addresses associated

with interface, not host, router

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.1 = 11011111 00000001 00000001 00000001

223 1 11

IP Addressing• IP address:

– network part (high order bits)

– host part (low order bits) • What’s a network ?

(from IP address perspective)– device interfaces with

same network part of IP address

– can physically reach each other without intervening router

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

network consisting of 3 IP networks(for IP addresses starting with 223, first 24 bits are network address)

LAN

IP AddressingHow to find the

networks?• Detach each

interface from router, host

• create “islands of isolated networks

223.1.1.1

223.1.1.3

223.1.1.4

223.1.2.2223.1.2.1

223.1.2.6

223.1.3.2223.1.3.1

223.1.3.27

223.1.1.2

223.1.7.0

223.1.7.1223.1.8.0223.1.8.1

223.1.9.1

223.1.9.2

Interconnected system consisting

of six networks

IP Addresses

0network host

10 network host

110 network host

1110 multicast address

A

B

C

D

class1.0.0.0 to127.255.255.255128.0.0.0 to191.255.255.255192.0.0.0 to223.255.255.255

224.0.0.0 to239.255.255.255

32 bits

given notion of “network”, let’s re-examine IP addresses:

“class-full” addressing:

IP addressing: CIDR• classful addressing:

– inefficient use of address space, address space exhaustion

– e.g., class B net allocated enough addresses for 65K hosts, even if only 2K hosts in that network

• CIDR: Classless InterDomain Routing– network portion of address of arbitrary length– address format: a.b.c.d/x, where x is # bits in

network portion of address

11001000 00010111 00010000 00000000

networkpart

hostpart

200.23.16.0/23

IP addresses: how to get one?

Hosts (host portion):• hard-coded by system admin in a file• DHCP: Dynamic Host Configuration Protocol:

dynamically get address: “plug-and-play”– host broadcasts “DHCP discover” msg– DHCP server responds with “DHCP offer” msg– host requests IP address: “DHCP request” msg– DHCP server sends address: “DHCP ack” msg

IP addresses: how to get one?Network (network portion):• get allocated portion of ISP’s address space:

ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20

Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23

Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23

Organization 2 11001000 00010111 00010100 00000000 200.23.20.0/23 ... ….. …. ….

Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23

Hierarchical addressing: route aggregation

“Send me anythingwith addresses beginning 200.23.16.0/20”

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7Internet

Organization 1

ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16”

200.23.20.0/23Organization 2

...

...

Hierarchical addressing allows efficient advertisement of routing information:

Hierarchical addressing: more specific routes

ISPs-R-Us has a more specific route to Organization 1

“Send me anythingwith addresses beginning 200.23.16.0/20”

200.23.16.0/23

200.23.18.0/23

200.23.30.0/23

Fly-By-Night-ISP

Organization 0

Organization 7Internet

Organization 1

ISPs-R-Us “Send me anythingwith addresses beginning 199.31.0.0/16or 200.23.18.0/23”

200.23.20.0/23Organization 2

...

...

IP addressing: the last word...

Q: How does an ISP get block of addresses?A: ICANN: Internet Corporation for Assigned

Names and Numbers– allocates addresses– manages DNS– assigns domain names, resolves disputes

Getting a datagram from source to dest.

IP datagram:

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

A

BE

miscfields

sourceIP addr

destIP addr data

• datagram remains unchanged, as it travels source to destination

• addr fields of interest here

Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2

routing table in A

Getting a datagram from source to dest.

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

A

BE

Starting at A, given IP datagram addressed to B:

• look up net. address of B• find B is on same net. as A• link layer will send

datagram directly to B inside link-layer frame– B and A are directly

connected

Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2

miscfields 223.1.1.1 223.1.1.3 data

Getting a datagram from source to dest.

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

A

BE

Dest. Net. next router Nhops223.1.1 1223.1.2 223.1.1.4 2223.1.3 223.1.1.4 2Starting at A, dest. E:

• look up network address of E• E on different network

– A, E not directly attached• routing table: next hop router to

E is 223.1.1.4 • link layer sends datagram to

router 223.1.1.4 inside link-layer frame

• datagram arrives at 223.1.1.4 • continued…..

miscfields 223.1.1.1 223.1.2.3 data

Getting a datagram from source to dest.

223.1.1.1

223.1.1.2

223.1.1.3

223.1.1.4 223.1.2.9

223.1.2.2

223.1.2.1

223.1.3.2223.1.3.1

223.1.3.27

A

BE

Arriving at 223.1.4, destined for 223.1.2.2

• look up network address of E• E on same network as router’s

interface 223.1.2.9– router, E directly attached

• link layer sends datagram to 223.1.2.2 inside link-layer frame via interface 223.1.2.9

• datagram arrives at 223.1.2.2!!!

miscfields 223.1.1.1 223.1.2.3 data network router Nhops interface

223.1.1 - 1 223.1.1.4223.1.2 - 1 223.1.2.9223.1.3 - 1 223.1.3.27

Dest. next

Internet Protocol: Routing AlgorithmsInternet Protocol: Routing Algorithms

Srinidhi Varadarajan

RoutingRouting

Graph abstraction for routing algorithms:

� graph nodes are routers

� graph edges are physical links– link cost: delay, $ cost,

or congestion level

Goal: determine “good” path(sequence of routers) thru

network from source to dest.

Routing protocol

A

ED

CB

F2

21

3

1

1

2

53

5

� “good” path:– typically means

minimum cost path– other def’s possible

Routing Algorithm classificationRouting Algorithm classification

Global or decentralized information?

Global:� all routers have complete

topology, link cost info� “link state” algorithmsDecentralized:� router knows physically-

connected neighbors, link costs to neighbors

� iterative process of computation, exchange of info with neighbors

� “distance vector” algorithms

Static or dynamic?Static:� routes change slowly

over timeDynamic:� routes change more

quickly– periodic update– in response to link

cost changes

A LinkA Link--State Routing AlgorithmState Routing Algorithm

Dijkstra’s algorithm� net topology, link costs

known to all nodes– accomplished via “link

state broadcast” – all nodes have same

info� computes least cost paths

from one node (‘source”) to all other nodes– gives routing table for

that node� iterative: after k iterations,

know least cost path to k dest.’s

Notation:� c(i,j): link cost from node i

to j. cost infinite if not direct neighbors

� D(v): current value of cost of path from source to dest. V

� p(v): predecessor node along path from source to v, that is next v

� N: set of nodes whose least cost path definitively known

Dijsktra’sDijsktra’s AlgorithmAlgorithm1 Initialization:2 N = {A} 3 for all nodes v 4 if v adjacent to A 5 then D(v) = c(A,v) 6 else D(v) = infty 7 8 Loop9 find w not in N such that D(w) is a minimum 10 add w to N 11 update D(v) for all v adjacent to w and not in N: 12 D(v) = min( D(v), D(w) + c(w,v) ) 13 /* new cost to v is either old cost to v or known 14 shortest path cost to w plus cost from w to v */ 15 until all nodes in N

Dijkstra’s Dijkstra’s algorithm: examplealgorithm: exampleStep

012345

start NA

ADADE

ADEBADEBC

ADEBCF

D(B),p(B)2,A2,A2,A

D(C),p(C)5,A4,D3,E3,E

D(D),p(D)1,A

D(E),p(E)infinity

2,D

D(F),p(F)infinityinfinity

4,E4,E4,E

A

ED

CB

F2

21

3

1

1

2

53

5

Dijkstra’s Dijkstra’s algorithm, discussionalgorithm, discussionAlgorithm computational complexity: n nodes� each iteration: need to check all nodes, w, not in N� n*(n+1)/2 comparisons: O(n2)� more efficient implementations possible:

– O(nlogn): Use a heap (sorted) to maintain interim table

Oscillations possible:� e.g., link cost = amount of carried traffic

AD

CB

1 1+e

e0

e1 1

0 0

AD

CB

2+e 0

001+e 1

AD

CB

0 2+e

1+e10 0

AD

CB

2+e 0

e01+e 1

initially … recomputerouting

… recompute … recompute

Link State: Reliable FloodingLink State: Reliable Flooding� Link State routers exchange information using

Link State Packets (LSP).� LSP contains

– id of the node that created the LSP– cost of the link to each directly connected neighbor– sequence number (SEQNO)– time-to-live (TTL) for this packet

� Reliable flooding– store most recent LSP from each node– forward LSP to all nodes but one that sent it– generate new LSP periodically

• increment SEQNO– start SEQNO at 0 when reboot– decrement TTL of each stored LSP

• discard when TTL=0

Distance Vector Routing AlgorithmDistance Vector Routing Algorithmiterative:� continues until no

nodes exchange info.� self-terminating: no

“signal” to stopasynchronous:� nodes need not

exchange info/iterate in lock step!

distributed:� each node

communicates onlywith directly-attached neighbors

Distance Table data structure� each node has its own� row for each possible

destination� column for each directly-

attached neighbor to node

Distance Table: exampleDistance Table: example

A

E D

CB7

81

2

1

2D ()

A

B

C

D

A

1

7

6

4

B

14

8

9

11

D

5

5

4

2

E cost to destination via

dest

inat

ion

D (C,D)E

c(E,D) + min {D (C,w)}Dw=

= 2+2 = 4

D (A,D)E

c(E,D) + min {D (A,w)}Dw=

= 2+3 = 5

D (A,B)E

c(E,B) + min {D (A,w)}Bw=

= 8+6 = 14

loop!

loop!

Distance table Distance table givesgives routing tablerouting table

D ()

A

B

C

D

A

1

7

6

4

B

14

8

9

11

D

5

5

4

2

E cost to destination via

dest

inat

ion

A

B

C

D

A,1

D,5

D,4

D,4

Outgoing link to use, cost

dest

inat

ion

Distance table Routing table

Distance Vector Routing: overviewDistance Vector Routing: overviewIterative, asynchronous:

each local iteration caused by:

� local link cost change � message from neighbor:

its least cost path change from neighbor

Distributed:� each node notifies

neighbors only when its least cost path to any destination changes– neighbors then notify

their neighbors if necessary

wait for (change in local link cost of msg from neighbor)

recompute distance table

if least cost path to any dest has changed, notifyneighbors

Each node:

Distance Vector: link cost changesDistance Vector: link cost changes

Link cost changes:� node detects local link cost change � updates distance table (line 15)� if cost change in least cost path,

notify neighbors (lines 23,24)X Z

14

50

Y1

algorithmterminates

“goodnews travelsfast”

Distance Vector: link cost changesDistance Vector: link cost changesLink cost changes:� good news travels fast � bad news travels slow -

“count to infinity” problem!

X Z14

50

Y60

algorithmcontinues

on!

Distance Vector: poisoned reverseDistance Vector: poisoned reverse

If Z routes through Y to get to X :� Z tells Y its (Z’s) distance to X is

infinite (so Y won’t route to X via Z)� Does not work on larger loops

X Z14

50

Y60

algorithmterminates

Comparison of LS and DV algorithmsComparison of LS and DV algorithms

Message complexity� LS: with n nodes, with an

average of l links/node, each node sends O(nl). Total messages O(n2l)

� DV: exchange between neighbors only– convergence time varies– may be routing loops– count-to-infinity problem

Robustness: what happens if router malfunctions?

LS:– node can advertise

incorrect link cost– each node computes only

its own tableDV:

– DV node can advertise incorrect path cost

– each node’s table used by others

• error propagate thru network

Hierarchical RoutingHierarchical Routing

scale: with 50 million destinations:

� can’t store all dest’s in routing tables!

� routing table exchange would swamp links!

administrative autonomy� internet = network of

networks� each network admin may

want to control routing in its own network

Our routing study thus far -idealization � all routers identical� network “flat”… not true in practice

Hierarchical RoutingHierarchical Routing

� aggregate routers into regions, “autonomous systems” (AS)

� routers in same AS run same routing protocol– “intra-AS” routing

protocol– routers in different AS

can run different intra-AS routing protocol

� special routers in AS� run intra-AS routing

protocol with all other routers in AS

� also responsible for routing to destinations outside AS– run inter-AS routing

protocol with other gateway routers

gateway routers

Why different IntraWhy different Intra-- and Interand Inter--AS routing ?AS routing ?

Policy:� Inter-AS: admin wants control over how its traffic is

routed and who routes through its net. � Intra-AS: single admin, so no policy decisions

neededScale:� hierarchical routing saves table size, reduced update

trafficPerformance:� Intra-AS: can focus on performance� Inter-AS: policy may dominate over performance

IntraIntra--AS and InterAS and Inter--AS routingAS routingGateways:

•perform inter-AS routing amongst themselves•perform intra-AS routers with other routers in their AS

inter-AS, intra-AS routing in

gateway A.c

network layerlink layer

physical layer

a

b

b

aaC

A

Bd

A.aA.c

C.bB.a

cb

c

IntraIntra--AS and InterAS and Inter--AS routingAS routing

Host h2

a

b

b

aaC

A

Bd c

A.aA.c

C.bB.a

cb

Hosth1

Intra-AS routingwithin AS A

Inter-ASrouting

between A and B

Intra-AS routingwithin AS B

Routing in the InternetRouting in the Internet

� The Global Internet consists of Autonomous Systems (AS) interconnected with each other:– Stub AS: small corporation– Multihomed AS: large corporation (no transit)– Transit AS: provider

� Two-level routing: – Intra-AS: administrator is responsible for choice– Inter-AS: unique standard

Internet AS HierarchyInternet AS HierarchyIntra-AS border (exterior gateway) routers

Inter-AS interior (gateway) routers

IntraIntra--AS RoutingAS Routing

� Also known as Interior Gateway Protocols (IGP)

� Most common IGPs:

– RIP: Routing Information Protocol

– OSPF: Open Shortest Path First

– IGRP: Interior Gateway Routing Protocol (Cisco proprietary.)

RIP ( Routing Information Protocol)RIP ( Routing Information Protocol)

� Distance vector algorithm� Included in BSD-UNIX Distribution in 1982� Distance metric: # of hops (max = 15 hops)

– Can you guess why?

� Distance vectors: exchanged every 30 sec via Response Message (also called advertisement)

� Each advertisement: routes for up to 25 destination nets

RIP (Routing Information Protocol)RIP (Routing Information Protocol)

Destination Network Next Router Num. of hops to dest.w A 2y B 2z B 7x -- 1…. …. ....

w x y

z

A

C

D B

Routing table in D

RIP: Link Failure and Recovery RIP: Link Failure and Recovery

If no advertisement heard after 180 sec --> neighbor/link declared dead– routes via neighbor invalidated– new advertisements sent to neighbors– neighbors in turn send out new

advertisements (if tables changed)– link failure info quickly propagates to

entire net– poison reverse used to prevent ping-pong

loops (infinite distance = 16 hops)

RIP Table processingRIP Table processing

� RIP routing tables managed by application-level process called routed (daemon)

� advertisements sent in UDP packets, periodically repeated

RIP Table example (continued)RIP Table example (continued)Router: giroflee.eurocom.fr

� Three attached class C networks (LANs)� Router only knows routes to attached LANs� Default router used to “go up”� Route multicast address: 224.0.0.0� Loopback interface (for debugging)

Destination Gateway Flags Ref Use Interface-------------------- -------------------- ----- ----- ------ ---------127.0.0.1 127.0.0.1 UH 0 26492 lo0192.168.2. 192.168.2.5 U 2 13 fa0193.55.114. 193.55.114.6 U 3 58503 le0192.168.3. 192.168.3.5 U 2 25 qaa0224.0.0.0 193.55.114.6 U 3 0 le0default 193.55.114.129 UG 0 143454

OSPF (Open Shortest Path First)OSPF (Open Shortest Path First)

� “open”: publicly available� Uses Link State algorithm

– LS packet dissemination– Topology map at each node– Route computation using Dijkstra’s algorithm

� OSPF advertisement carries one entry per neighbor router

� Advertisements disseminated to entire AS (via flooding)

OSPF “advanced” features (not in RIP)OSPF “advanced” features (not in RIP)

� Security: all OSPF messages authenticated (to prevent malicious intrusion); TCP connections used

� Multiple same-cost paths allowed (only one path in RIP)

� For each link, multiple cost metrics for different TOS (eg, satellite link cost set “low” for best effort; high for real time)

� Integrated uni- and multicast support: – Multicast OSPF (MOSPF) uses same topology

data base as OSPF� Hierarchical OSPF in large domains.

Hierarchical OSPFHierarchical OSPF

Hierarchical OSPFHierarchical OSPF� Two-level hierarchy: local area, backbone.

– Link-state advertisements only in area – each nodes has detailed area topology;

only know direction (shortest path) to nets in other areas.

� Area border routers: “summarize” distances to nets in own area, advertise to other Area Border routers.

� Backbone routers: run OSPF routing limited to backbone.

� Boundary routers: connect to other ASs.

IGRP (Interior Gateway Routing Protocol)IGRP (Interior Gateway Routing Protocol)

� CISCO proprietary; successor of RIP (mid 80s)� Distance Vector, like RIP

– Hold time– Split Horizon– Poison Reverse

� several cost metrics (delay, bandwidth, reliability, load etc)

� uses TCP to exchange routing updates� EIGRP (Garcia-Luna): Loop-free routing via

Distributed Updating Algorithm. (DUAL) based on diffused computation– Uses a mix of link-state and distance vector

InterInter--AS routingAS routing

Internet interInternet inter--AS routing: BGPAS routing: BGP

� BGP (Border Gateway Protocol): the de facto standard

� Path Vector protocol:– similar to Distance Vector protocol– each Border Gateway broadcast to neighbors

(peers) entire path (I.e, sequence of ASs) to destination

– E.g., Gateway X may send its path to dest. Z:

Path (X,Z) = X,Y1,Y2,Y3,…,Z

Internet interInternet inter--AS routing: BGPAS routing: BGP

Suppose: gateway X send its path to peer gateway W

� W may or may not select path offered by X– cost, policy (don’t route via competitors

AS), loop prevention reasons.� If W selects path advertised by X, then:

Path (W,Z) = w, Path (X,Z)� Note: X can control incoming traffic by

controlling it route advertisements to peers:– e.g., don’t want to route traffic to Z -> don’t

advertise any routes to Z

Internet interInternet inter--AS routing: BGPAS routing: BGP

� BGP messages exchanged using TCP.� BGP messages:

– OPEN: opens TCP connection to peer and authenticates sender

– UPDATE: advertises new path (or withdraws old)

– KEEPALIVE keeps connection alive in absence of UPDATES; also ACKs OPEN request

– NOTIFICATION: reports errors in previous msg; also used to close connection

Other Routing TechniquesOther Routing Techniques

� Hot-Potato Routing a.k.a Deflection Routing– Use the first available link irrespective of

whether it leads to the destination or not.� Cut Through routing

– Non-store and forward: Routes before entire packet is received at the router.

– Outgoing link is reserved. What happens if a fast links succeeds a slow link?

ReadingReading

� Recommended– End-To-End Routing Behavior in the Internet, V. Paxson,

SIGCOMM 1996. ftp://ftp.ee.lbl.gov/papers/routing.SIGCOMM.ps.Z

• Due: 3/26/01– Persistent Route Oscillations in Inter-Domain Routing,

K. Varadhan, R. Govindan, D. Estrin, ftp://ftp.isi.edu/ra/Publications/bgp_osc.ps.gz

• Due: 3/28/01– http://netresearch.ics.uci.edu/agentos/related/routing :

Contains information on CISCO Routing Protocols