advanced networking technologies - startseite tu ilmenau · 1 advanced networking technologies...

Advanced NetworkingTechnologies

Chapter 7Transport Layer Evolution

Advanced Networking (SS 17): 07 - Transport Layer Evolution

Content

q TCP congestion control schemesq Multipath TCPq SCTPq SPDY and HTTP/2q QUIC

TCP congestion control: What is the problem?

q Relies on packets getting lost!q Root cause of all the buffer problemsq Degrades quality for other servicesq Takes some time to measure with large buffers

q Assumes packets get loss due to congestionq What about wireless?

q Direct dependency on bandwidth/delay productq Maximum window 64KB out of the boxq Problems utilizing intercontinental linksq Problems utilizing 10GB/s

TCP congestion control: Significance of the problem

q Wireless routers implemented TCP stitching

q Nowadays:q Router vendors implement AQM strategies

■ Discussed earlierq Link layer has repeated transmits & PAUSE framesq WAN optimizers (decrease number of ACKs etc)

➡ Everything is build around TCP

“Normal” TCP I-TCP, METP, SNOOP or similar

TCP evolution

q Can we let TCP evolve?q Yes, but takes time

■ New algorithms may make use of existing protocol fields■ New fields via extensions

q RFCs take time to publicationq Need to be adopted by OSes & testers (chicken-or-egg problem)q Must not break existing TCP algorithmsq Must not mess with fairness

q Major improvements theses days: OS vendors “simply” implement new strategiesq CTCPq CUBICq BBR

TCP SACK option

q Introduces Selective Acknowledgmentsq RFC 2018 from 1996q Redefined in RFC 3517 from 2003 &

RFC 6675 from 2012q Supported by all main operating systems

q Negotiated during handshake

q Simple solution?!q Problem solved?!

TCP SACK option – Implementation errorsOS A1 A2 B C D E F GFreeBSD 5.3-5.4 ❌ ❌

FreeBSD 6.0-8.0Linux 2.2.20-2.6.18 ❌

Linux 2.6.31MacOS X 10.5-10.6OpenBSD 4.2-4.8 ❌ ❌

OpenSolaris 2008.05-2009.06 ❌ ❌

Solaris 10 ❌

Solaris 11 ❌

Windows 2000-2003 ❌ ❌ ❌ ❌ ❌

Windows Vista-7 ❌ ❌

� Degraded performance, but eventually consistent as timeouts delete SACK state (luckily)

Ekiz et al, Misbehaviors in TCP SACK Generation, ACM SIGCOMM Computer Communication Review, 2011

TCP SACK option – Gain?

burst error rate

NewReno

Westwood+

1E-07 1E-06 1E-05 1E-04 1E-03

Nguyen et al, An Implementation of the SACK-Based Conservative Loss Recovery Algorithm for TCP in ns-3 (extended version), Extended version of Wns3, 2015

q Compound TCP introduced by Microsoft 2005q 3 RFC drafts posted until November 2008q Enabled in Windows server editions by default, requires enabling them

on clients

q Idea: Two congestion windowsq “Normal” loss-based oneq One for delay, also estimates bottleneck queueq Sum up, i.e., win = min(cwnd + dwnd, awnd)

q Delay based congestion window increases quickly when long delay observed

q Decreases to 0 to reach “normal” steady state behavior afterwards

C-TCP – Window behavior

tTan et al, A Compound TCP Approach for High-speed and Long Distance Networks, TR Microsoft, 2005

C-TCP assumes a backlog of ! packets at

bottleneck

C-TCP – Throughput in lossy networks

0.01 0.001 0.0001 0.00001 0.000001 0Packet loss rate

Regular TCP HSTCP CTCP

C-TCP – Fairness

0.01 0.001 0.0001 0.00001 0.000001 0Packet loss rate

CTCP HSTCP

q Developed from an older more complex algorithm BICq Todays standard in Linux and MacOS

q Ideas: q Decrease queues in routers, send data at expected bandwidthq Aggressively increase bandwidth periodically to probe for moreq Scale window using a cubic function

A. BIC Window Growth Function Before delving into CUBIC, let us examine the features of

BIC. The main feature of BIC is its unique window growth

function.

Fig. 1 shows the growth function of BIC. When it gets a

packet loss event, BIC reduces its window by a multiplicative

factor β. The window size just before the reduction is set to the

maximum Wmax and the window size just after the reduction is

set to the minimum Wmin. Then, BIC performs a binary search

using these two parameters – by jumping to the “midpoint”

between Wmax and Wmin. Since packet losses have occurred at

Wmax, the window size that the network can currently handle

without loss must be somewhere between these two numbers.

However, jumping to the midpoint could be too much

increase within one RTT, so if the distance between the

midpoint and the current minimum is larger than a fixed

constant, called Smax, BIC increments the current window size

by Smax (linear increase). If BIC does not get packet losses at the

updated window size, that window size becomes the new

minimum. If it gets a packet loss, that window size becomes the

new maximum. This process continues until the window

increment is less than some small constant called Smin at which

point, the window is set to the current maximum. So the

growing function after a window reduction will be most likely

to be a linear one followed by a logarithmic one (marked as

“additive increase” and “binary search” respectively in Fig. 1).

If the window grows past the maximum, the equilibrium

window size must be larger than the current maximum and a

new maximum must be found. BIC enters a new phase called

“max probing.” Max probing uses a window growth function

exactly symmetric to those used in additive increase and binary

search – only in a different order: it uses the inverse of binary

search (which is logarithmic; its reciprocal will be exponential)

and then additive increase. Fig. 1 shows the growth function

during max probing. During max probing, the window grows

slowly initially to find the new maximum nearby, and after

some time of slow growth, if it does not find the new maximum

(i.e., packet losses), then it guesses the new maximum is further

away so it switches to a faster increase by switching to additive

increase where the window size is incremented by a large fixed

increment.

The good performance of BIC comes from the slow increase

around Wmax and linear increase during additive increase and

max probing.

B. CUBIC Window Growth Function Although BIC achieves pretty good scalability, fairness, and

stability during the current high speed environments, the BIC’s

growth function can still be too aggressive for TCP, especially

under short RTT or low speed networks. Furthermore, the

several different phases of window control add a lot of

complexity in analyzing the protocol. We have been searching

for a new window growth function that while retaining most of

strengths of BIC (especially, its stability and scalability),

simplifies the window control and enhances its TCP

friendliness.

In this paper, we introduce a new high-speed TCP variant:

CUBIC. As the name of the new protocol represents, the

window growth function of CUBIC is a cubic function, whose

shape is very similar to the growth function of BIC. CUBIC is

designed to simplify and enhance the window control of BIC.

More specifically, the congestion window of CUBIC is

determined by the following function:

3)( WKtCWcubic +−= (1)

where C is a scaling factor, t is the elapsed time from the last

window reduction, Wmax is the window size just before the last

window reduction, and 3max CWK β= , where β is a constant

multiplication decrease factor applied for window reduction at

the time of loss event (i.e., the window reduces to βWmax at the

time of the last reduction).

Fig. 2 shows the growth function of CUBIC with the origin

at Wmax. The window grows very fast upon a window reduction,

but as it gets closer to Wmax, it slows down its growth. Around

Wmax, the window increment becomes almost zero. Above that,

CUBIC starts probing for more bandwidth in which the

window grows slowly initially, accelerating its growth as it

moves away from Wmax. This slow growth around Wmax

enhances the stability of the protocol, and increases the

utilization of the network while the fast growth away from Wmax

ensures the scalability of the protocol.

The cubic function ensures the intra-protocol fairness among

the competing flows of the same protocol. To see this, suppose

that two flows are competing on the same end-to-end path. The

Steady State Behavior

Max Probing

Fig. 2: The Window Growth Function of CUBIC

Additive Increase

Max Probing

Fig. 1: The Window Growth Function of BIC

Binary Search

Rhee et al.:CUBIC: A New TCP-Friendly

High-Speed TCP Variant, ACM SIGOPS Operating Systems

Review 42.5 (2008)

Wcubic

= C(t� 3pW

�/C)3 +Wmax

CUBIC – Throughput over time

III. PERFORMANCE EVALUATION

In this section, we present some performance results regarding the TCP friendliness and stability of CUBIC and other high-speed TCP variants. For CUBIC, we set β to 0.8, C to 0.4, and Smax to 160. We use NS-2 for simulation. The network topology is dumbbell. For each simulation run, we run four flows of a high-speed protocol and four flows of regular long-term TCP SACK over the same end-to-end paths for the entire duration of the simulation; their starting times and RTTs are slightly varied to reduce the phase effect. About 10% of background traffic is added in both forward and backward directions of the dumbbell setup. For all the experiments unless notes explicitly, the buffer size of Drop Tail routers is set to 100% of BDP. Experiment 1: TCP Friendliness in Short-RTT Networks (Simulation script available in the BIC web site):

We test five high speed TCP variants: CUBIC, BIC, HSTCP,

Scalable TCP, and HTCP. We set RTT of the flows to be around 10 ms and vary the bottleneck bandwidth from 20 Mbps to 1 Gbps. Fig. 5 shows the throughput ratio of the long-term TCP flows over the high-speed flows (or TCP friendly ratio) measured from these runs.

The surprising result is that BIC and STCP even show worse TCP friendliness over 20Mbps than over 100Mbps. However, we are still not sure the exact reason for this result. Over 100 Mbps, all the high speed protocols show reasonable friendliness to TCP. As the bottleneck bandwidth increases from 100Mbps to 1Gbps, the ratios for BIC, HSTCP and STCP drop dramatically indicating unfair use of bandwidth with respect to TCP. Under all these environments, regular TCP can still use the full bandwidth. Scalable TCP shows the worst TCP friendliness in these tests followed by BIC and HSTCP. CUBIC and HTCP consistently give good TCP friendliness.

Experiment 2: TCP Friendliness in Long-RTT Networks (Simulation script available in the BIC web site)

Although the TCP mode improves the TCP friendliness of

the protocol, it does so mostly for short RTT situations. When the BDP is very large with long RTT, the aggressiveness of the window growth function (more specifically, the congestion epoch length) has more decisive effect on the TCP friendliness. As the epoch gets longer, it gives more time for TCP flows to grow their windows.

An important feature of BIC and CUBIC is that it keeps the epoch fairly long without losing scalability and network utilization. Generally, in AIMD, a longer congestion epoch means slower increase (or a smaller additive factor). However, this would reduce the scalability of the protocol, and also the network would be underutilized for a long time until the window becomes fully open (Note that it is true only if the multiplicative decrease factor is large; but we cannot keep the multiplicative factor too small since that implies much slower convergence to the equilibrium). Unlike AIMD, CUBIC increases the window to (or its vicinity of) Wmax very quickly and then holds the window there for a long time. This keeps the scalability of the protocol high, while keeping the epoch long and utilization high. This feature is unique both in BIC and CUBIC.

In this experiment, we vary the bottleneck bandwidth from 20Mbps to 1Gbps, and set RTT to 100ms. Fig. 6 shows the throughput ratio of long-term TCP over high-speed TCP variants. Over 20 Mbps, all the high speed protocols show reasonable friendliness to TCP. As the bandwidth gets larger than 20 Mbps, the ratio drops quite rapidly. Overall, CUBIC shows a better friendly ratio than the other protocols.

Experiment 3: Stability (Simulation script available in the

Fig. 5: TCP-Friendly Ratio in Short-RTT Networks

100050030010020

Link Speed (Mbps)

CUBICBIC

HSTCPSTCPHTCP

Fig. 4: CUBIC window curves with competing flows (NS simulation in a network with 500Mbps and 100ms RTT), C = 0.4, β = 0.8.

Fig. 6: TCP-Friendly Ratio in Long-RTT Networks

100050030010020

Link Speed (Mbps)

CUBICBIC

HSTCPSTCPHTCP

CUBIC - Maybe fair but takes forever

0 100 200 300 400 500 600

time (s)

Convergence 10Mbit/sec Bottleneck

Flow 1Flow 2

0 100 200 300 400 500 600

time (s)

Convergence 250Mbit/sec Bottleneck

Flow 1Flow 2

0 100 200 300 400 500 600 700 800

time (s)

Flow 1 Flow 2

Fig. 2. Cubic TCP cwnd time histories following startup of a secondflow. Bandwidth is 10Mbits/s (top), 250 Mbit/sec (middle) and 500 Mbit/sec(bottom). RTT is 200ms, queue size 100% BDP, no web traffic.

This effect is reinforced by changes to the AIMD backofffactor. In standard TCP flows backoff cwnd by 0.5 on detect-ing packet loss. Strategies such as BIC-TCP and Cubic-TCPinstead use a backoff factor of 0.8. As a result, flows releasebandwidth more slowly when informed of congestion, againhaving the effect of slowing convergence.

B. Slow convergence implies prolonged unfairness.One consequence of slow convergence is that periods of

extreme unfairness between flows may persist for long periods;even in situations where flows do eventually converge tofairness. Such situations are masked when fairness results arepresented purely in terms of long-term averages. However, thisbehaviour is immediately evident, for example, in the timehistories shown in Figure 2 and it seems clear that it has im-

10 100

RTT (ms)

Standard TCPCubici 10Mb/s

Cubic 250Mb/s

Fig. 3. Ratio of throughputs of two Cubic TCP flows with the same RTT(also sharing same bottleneck link and operating same congestion controlalgorithm) as path propagation delay is varied. Flow throughputs are averagedover the last 200s of each test run and so approximate asymptotic behaviour,neglecting initial transients. Results are shown for 10Mbit/sec and 250Mbit/secbottleneck bandwidths. The bottleneck queue size is 100% BDP, no webtraffic.

200 250 300 350 400 450 500 550 600

time (s)

Flow 1Flow 2

Fig. 4. Impact of web traffic on convergence. Evolution of mean bandwidth,averaged over 20 test runs, following startup of a second flow. 200 backgroundweb flows (100 in each direction). Link bandwidth is 250 Mbit/sec, RTT is200ms, queue size 100% BDP.

portant practical implications. For example, two identical filetransfers may have very different completion times dependingon the order in which they are started. Also, long-lived flowscan gain a substantial throughput advantage at the expense ofshorter-lived flows. The latter seems particularly problematicas the majority of TCP flows are short to medium sized andso a single long-lived flow may potentially penalize a largenumber of users (akin to a form of denial of service).With regard to the last point, the impact of a long-lived flow

on a short-lived flow is illustrated, for example, in Figure 5.Here, we measure the completion time for a download versusthe size of the download. Measurements are shown (i) for thebaseline case where no other flow shares the bottleneck linkand (ii) for the case where a single long-lived flow sharesthe link and competes for bandwidth. It can be seen that inthe baseline situation, Cubic-TCP, standard TCP and H-TCPall exhibit similar completion times. It is perhaps initiallysurprising that standard TCP performs so well in this test,in view of concerns about performance in high-speed paths.However, we note that the link in this example is provisionedwith a BDP of buffering. A standard TCP flow slow-starts to

Leigth et al.: Experimental evaluation of Cubic-TCP, 2008

Reinventing congestion control: BBR

q Estimates bottleneck bandwidth and round-trip propagation time (BBR)

q Developed at Google, available in recent Linux kernels

q Goal: Reduce buffer bloat by optimizing TCPq Idea: “congestion-based” observe how much data is in-flightq Idea: keep queues filled at the sender onlyq Another old idea: Use also delay information (but different from New

Vegas etc.)q No clocked by ACKs but paced

q Following slides based on: q Cardwell et al.: BBR Congestion Control, IETF Meeting 97, Seoul, 2016q Cardwell et al.: BBR Congestion-Based Congestion Control, ACM Queue,

BBR: Working point with increasing bandwidth

BDP BDP + BufSize

Amount in flight

Optimal

Where loss-based CC starts controlling

Where loss-based CC works in lossy networks

Phases in BBR: 1. Exponential BW search

q Exponential BW searchq Increase then decrease exponentiallyq Probes for max bandwidth by monitoring in-flight data & ACKs

BDP BDP + BufSize

Amount in flight

Optimal

Phases in BBR: 2. Drain queues

q Exponentially decrease in-flight dataq Clears queues fast againq By monitoring in-flight data & ACKs

BDP BDP + BufSize

Amount in flight

Optimal

Phases in BBR: 3. Refresh measurements

q Periodically increase send rate to probe for more bwq Periodically decrease it to probe for minimal RTTq Remember: BW * RTT = Max in-flight data being processed

BDP BDP + BufSize

Amount in flight

Optimal

BBR behavior compared to CUBIC

STARTUP DRAIN PROBE_BW

CUBIC (red)BBR (green)ACKs (blue)

BBR and CUBIC: Start-up behavior

Fairness to RENO & CUBICSharing deep buffers with loss-based CC

At first CUBIC/Reno gains an advantage by filling deep buffers

But BBR does not collapse it adapts BBR's bw and RTT probing tends to drive system toward fairness

Deep buffer data point 8*BDP case bw = 10Mbps, RTT = 40ms, buffer = 8 * BDP

-> CUBIC 6.31 Mbps vs BBR 3.26 Mbps

q Cubic with small advantage here, but about fairq Depending on parameters

BBR - Throughput in lossy networks

BBR vs CUBIC synthetic bulk TCP test with 1 flow, bottleneck_bw 100Mbps, RTT 100ms

Fully use bandwidth, despite high loss

What does it mean to fairness?

All fixed? Maybe, maybe not yet

http://blog.cerowrt.org/post/birthday_problem/

Excursion: Why is everybody worried about fairness?

q Back in the very old days people tried to optimize network power

q Delay average weighted by throughput!

q We want to maximize using locally observable informationq Capacity of all links on pathq # of users sharing each link on the pathq Message rate of all of these users

q Unfortunately this is impossible

J. Jaffe: Flow Control Power is Nondecentralizable, IEEE Transactions on Communications, 1981

Power P =

Total Throughput �̂

Average Delay D

Maximizing P [Kleinrock 78]

q Maximizing P for a single link yields in

q For a single link (capacity μ) modelled by an M/M/1 system (not being overburdened):

P 0 =�0D � �D0

) P 0 = 0

) �0D = �D0 =d�

d�D = �

µ� �

(µ� �)2) µ/2 = �

Network power: Example network

q Gives us:

�̂ = (µ1+µ2)/2

D = �̂�1(2�1/µ1 + 2�2/µ2) =4

µ1 + µ2

P = 1/8(µ1 + µ2)2

Optimal if µ1 = µ2

Autonomous servers

Network power: Counter example

q Assume µ1 >> µ2

q Let only server 1 send data

Autonomous servers

�̂ = (µ1)/2

D = �̂�1(2�1/µ1 + 0/µ2) =2

P = µ21/4Larger than 1/8(µ1 + µ2)

Network power: Conclusion

q (Now) obvious: Optimizing overall network requires knowledge over the whole network � would not scale

q More fundamental: q “No performance criterion based on and is decentralizable.”q Details in [Jaf81]

q Focusing on global optimization metrics may simply be not the right thing to do…q Fairness between flows is a local criterionq Seems more suited

�̂ D

Content

Multipath TCP – Motivation

q TCP connections are bound to a hosts IP addressesq IP address determines routing between hosts (unless IP spoofing)

q In many scenarios insufficient (resilience, bandwidth, mobility)q Mobile users

■ Handoff between WiFi and LTEq Channel bundling

■ Why can’t I have two DSL connections and use both?q Data centers

■ Fat Tree topologies■ Aggregating links■ Advantages with resilience and blocking

C. Paasch: Decoupled from IP, TCP is at last able to support multihomed hosts, ACM queue, March 2014

Multipath – Blocking in Fat Trees

� Not full control over routing, but at least the lowest layer

Multipath TCP – Objectives

q Scenario: q Work in any scenario with multiple IP addressesq No source routing or so (cross-product IP address many)

q Fully backward compatibleq No change to socket APIq No change to middleboxes

■ Firewalls■ NICs (think of TSO)■ WAN optimizers

q No unfairness to non-multipath aware TCPq All build in TCP option headers

Transport Layer

Multipath TCP – Architecture

TCP 1 TCP 2 TCP n

Application

Network Layer

Socket API

Transport Layer

TCP 1 TCP 2 TCP n

Application

Network Layer

Socket API

Multipath TCP – Connection setup

q Subflows created after devices agree to use MPTCP (middle-box safe)

q Keys exchanged during setupq Used to bind other

sessions cryptographically to master session

q Kb is echoed back for stateless operation■ Why? State already

existing?q Token to identify MPTCP

session derived from the keys

Initial connection

Establishing additional

connections

Fun with middleboxes

q In NAT & firewall scenarios: Server cannot initiate new connection

q � Devices announce available addresses by ADD_ADDR optionq Client may establish second connection in this scenario

(More) Fun with middleboxes

q Middleboxes mess with TCP streams (aggregate, rewrite etc.)q Rely on consistent sequence numbers per substreamq Requires additional mechanism to track sequence in overall stream

■ Absolutely independent from substream handling■ I.e. ACKs are sent for substreams, even if not expected in aggregated

stream

q Placement in overall stream carried in TCP option header?q Not feasible (aggregation, TSO)

q Sender specifies a fixed “mapping” of data to subflowsq Receiver informed in advanceq May be remapped for retransmits (if one flow dies)q Fall-back to single flow TCP possible by “infinite mapping”

MPTCP: Retransmits

q Obvious: Needs to deal with retransmits on subflow levelq Middleboxes may introduce thisq MPTCP instances may use it therefore too

q Subflows may fail (temporarily or permant)q Needs retransmits over different subflowq Implies changes in mappingq Underlying TCP connection in original flow still needs to retransmit

■ Would break connection otherwise■ � Performance penalty

q Scheduling?

MPTCP: Congestion control (I)

q Observation: MPTCP “smears” congestion over the network

q Naïve solution for CC: use congestion control of subflowsq Unfair advantage against regular TCPq Depends on number of used flowsq Also may not be optimal

■ Naïve: λ1 = λ2= λ3

■ Optimal: λ1 = λ3; λ3=0

q Also possible: measure & control subflows togetherq May lead to “flappiness”,

i.e. sudden load switchesq May smear congestion too

MPTCP: Congestion control (II)

q Loosely coupled subflows:q RFC 6356 suggests with each ACK:

q Losses still halve cwndi

q Throttle subflows to not exceed rate of “virtual” single TCP flowq Alpha controlling the allowed violation of that conditionq Still: no load-balancing between interfering flowsq Several scenarios with unfairness towards Reno TCP and between

MPTCP instances discovered (not Pareto optimal)

+= #bytesAcked⇥MSSi

⇥min

✓↵

cwndtotal

↵ = cwndtotal

⇣pcwndi

Pcwndirtti

C. Raiciu: Practical Congestion Control for Multipath Transport Protocols, Tech. Rep, 2009

MPTCP: Congestion control (III)

q Opportunistic linked-increases algorithm (OLIA)q Also loosely coupled q Addresses problems with Pareto optimalityq With each ACK:

R. Khalili et al: MPTCP Is Not Pareto-Optimal: Performance Issues and a Possible Solution, IEEE/ACM Transactions on Networking, 2013

cwndi +=

pcwndi

rttiP cwndirtti

Like before!Controls

aggressiveness for subflow

q "i is positive for subflows that have not reached the estimated bandwidth/delay ratio

Multipath TCP – Discussion

q Handshake:q Why is MP_CAPABLE sent three times?q Why is the second handshake based on a normal TCP handshake?q Why is it a four way handshake?

q Scenarios:q Can I use MPTCP with a single NIC?q Can I use MPTCP with a single IP address?q Can I use MPTCP to increase performance if I have two DSL lines (with

NAT)?q Does MPTCP help with delay problems?q Do applications need to be aware of MPTCP? What does MPTCP mean

for “legacy” application?q Security:

q IDS & Firewalls?q SYN cookies?

Content

Stream Control Transmission Protocol (SCTP)

q Protocol developed to transport SS7 messages over IPq Reliable and message-orientated

q Like a crossbreed of TCP and UDP

q First RFC 2960 in October 2000, current RFC 4960 (September 2007)q Many shortcomings of TCP have been known during design phaseq So from scratch SCTP supported

q Selective ACKsq Multistreamingq Multihomingq Heartbeatsq TLV coding of extension headersq SYN flood protectionq Better protocol state handling (no half-open connections)q …

SCPT – Multistreaming

IEEE Communications Magazine • April 200466

upper-layer applications. In other words, theHOL effect is limited within the scope of indi-vidual streams, but does not affect the entireassociation.

Multistreaming and HOL blocking are illus-trated in Fig. 4 where an SCTP associationconsisting of four streams is shown. Segmentsare identified by stream sequence numbers(SSNs) [1] that are unique within a stream, butdifferent streams can have the same SSN. Inthe figure, SSN 11 in stream 1 has been deliv-ered to the upper-layer application, and SSN 9of the second stream is lost in the network;SSNs 10, 11, 12 are therefore queued in thebuffer of the second stream, waiting forretransmitted SSN 9 to arrive. Arriving SSN 13at stream 2 will also be queued. Similarly, SSN4 of stream 3 is missing during the transmis-sion resulting in the blocking of SSNs 5, 6, and7. For stream 4, SSN 21 is being delivered tothe upper-layer application, while arriving SSN23 will be queued in the buffer because ofmissing SSN 22. Note that when SSN 12 arrivesat the buffer of stream 1, it can be deliveredimmediately even if the other streams areblocked. This illustrates that segments arrivingon stream 1 can still be delivered to the upper-layer application, although streams 2 and 3 are(and stream 4 will be) blocked because of lostsegments.

An example application of using SCTP multi-streaming in Web browsing is shown in Fig. 5.Here, an HTML page is split into five objects: aJava applet, an ActiveX control, two images, andplain text. Instead of creating a separate connec-tion for each object as in TCP, SCTP makes useof its multistreaming feature to speed up thetransfer of HTML pages. By transmitting eachobject in a separate steam, the HOL effectbetween different objects can be eliminated. Ifone object is lost during the transfer, the otherscan still be delivered to the Web browser at theupper layer while the lost object is being retrans-mitted from the Web server. This results in a

better response time to users while opening onlyone SCTP association for a particular HTMLpage.

CONGESTION CONTROLSCTP congestion control is based on the wellproven rate-adaptive window-based congestioncontrol scheme of TCP. This ensures that SCTPwill reduce its sending rate during network con-gestion and prevent congestion collapse in ashared network. SCTP provides reliable trans-mission and detects lost, reordered, duplicate, orcorrupt packets. It provides reliability by retrans-mitting lost or corrupt packets. However, thereare several major differences between TCP andSCTP:

•SCTP incorporates a fast retransmit algo-rithm based on SACK gap reports similar to thatof TCP SACK. This mechanism speeds up lossdetection and increases the bandwidth utiliza-tion. One of the major differences betweenSCTP and TCP is that SCTP does not have anexplicit fast recovery phase. SCTP achieves fastrecovery automatically with the use of SACK [1].

•Compared to TCP, the use of SACK ismandatory in SCTP, which allows more robustreaction in the case of multiple losses from a sin-gle window of data. This avoids a time-consum-ing slow start stage after multiple segment losses,thus saving bandwidth and increasing through-put.

•During slow start or congestion avoidanceof SCTP, the congestion window (cwnd) isincreased by the number of acknowledged bytes;in TCP it is increased by the number of ACKsegments received. Since the TCP sender

■ Figure 3. An SCTP association consisting of four streams carrying data fromone upper layer application.

Association Stream

SCTPStreambuffers

Stream1

Stream2

Stream3

Stream4

Application (source)

Stream4

Stream3

Stream2

Stream1

Application (destination)

■ Figure 4. An illustration showing HOL blockingof individual streams at the receiver.

Association

Stream1

Stream2

Stream3

Stream4

Application

q Multistreaming at transport layer avoids head of line blocking

S. Fu: SCTP: State of the art in research, products, and

technical challenges, IEEE Communications Magazine 42(4):64 - 76 · May 2004

SCPT – Connection management

q Four way handshakeq Server allocates states AFTER

cookie echoq INIT + INIT ACK may contain TLV

coded optionsq What does this mean to

extensibility? Think of the cookie mechanism

q Connection identified by two tags (cmp. IPsec SA)

q Shutdown leads to immediate packet flush

q No half-open connectionsq Smaller protocol state machine

Hand-shake

ConnectionClose

SCTP – Chunks (I)

q SCTP common header only contains port numbers, a “verification tag” (i.e. connection id) & CRC-32 checksum

q Any payload & protocol data transported in “chunks”q Used even for internal purposes, e.g. address configuration

q Multiple chunks maybe aggregated in a packet

Congestion and Flow Control 471

3.2 Counting Outstanding Bytes

As pointed out, cwnd has an influence on the network load and thus on thethroughput. Therefore, the way the outstanding bytes, that limit cwnd, arecounted, is important and should be examined.

Looking at an SCTP packet containing several data chunks, the amount ofuser data can vary significantly with the size of the individual chunks (i.e. mes-sages) assuming the same packet length.

IPHeader

SCTPCommonHeader

DataChunkHeader

User Data

143620

(a) One chunk with 1436 bytes of user data

IPHeader

SCTPCommonHeader

DataChunkHeader

UserData

DataChunkHeader

UserData

DataChunkHeader

UserData

28 281620

(b) 33 chunks, each containing 28 bytes of user data

length[bytes]

Fig. 1. SCTP packet format

In Figure 1(b) the packet contains 33 DATA-chunks with 28 bytes of user dataeach, adding up to 924 bytes of user data compared to 1436 bytes in the packetin Figure 1(a). Both packets have a size of 1484 bytes. Whereas the overhead isjust 1 % in (a) the headers add up to 36 % in (b) and can be more than 60 %for even smaller user message sizes.

Therefore, we have to distinguish between the amount of data that is injectedinto the network and the user data that arrive at the application layer. Whereasthe first has a direct impact on the network load, the second results in thegoodput. Both depend on the number of packets (1), that are allowed by thecwnd.

NoOfPackets =!

cwndSizeP

Calculating the size of a packet (SizeP ), the headers for IP (HIP ) and SCTP(HSCTP ) have to be considered as well as the size of the DATA-chunks (SizeChunk ).

SizeP = HIP + HSCTP + CPP · SizeChunk (2)

The number of the chunks per packet (CPP ) is calculated as

CPP =#

MTU − HIP − HSCTP

UMS + PUMS + HChunk

The average user message size (UMS ) per packet and the corresponding paddingbytes (PUMS ) feature the variable parts of the packets.

I. Rüngeler et al.: Congestion and Flow Control in the Context of the Message-Oriented Protocol SCTP, Networking 2009

SCTP – Chunks (II)

q General chunk header format

q Well-known chunk types: 0 - Payload Data (DATA) 1 - Initiation (INIT) 2 - Initiation Acknowledgement (INIT ACK) 3 - Selective Acknowledgement (SACK) 4 - Heartbeat Request (HEARTBEAT) …

q If chunk type unknown highest 2 bit of chunk type code:q 00 - Stop processing the rest of SCTP packetq 01 - Stop and report an 'Unrecognized Chunk Type’q 10 - Skip this chunk and continue processingq 11 - Skip this chunk and continue processing, but error

Chunk type Chunk flags Chunk length

Chunk specific value

Padding (up to 3 bytes)

SCTP – Data chunk

q Chunks may add own headerq Example: Payload data chunk

q Flags carry reorder requirement and fragmentation flagsq Payload Protocol Identifier passed to application transparentlyq Stream field are used to transport multiple data streams over an SCTP

connection

Type = 0 Chunk flags Chunk length

User Data

Transmission Sequence Number (TSN)

Stream Identifier Stream Sequence NumberPayload Protocol Identifier

Padding (up to 3 bytes)

SCTP – Multiple paths

q Alternate paths are probed by HEARTBEAT messages including a 64-bit nonceq Addresses exchanged during INIT sequenceq Allows secure setup of alternative pathsq Support for dynamic addresses added with RFC5061

■ Addresses added and removed using authenticated chunks (iff globally addressable)

■ Still requiring verification

q Messages are only sent over the primary pathq Switch after failure detectionq Does not directly allow for load-sharing!q Multipath SCTP: https://tools.ietf.org/html/draft-tuexen-tsvwg-sctp-

multipath-13 (December 2016, but no significant changes lately)

SCTP – Current State

q Not widely deployed

q Many reasons:q No killer featureq Application developers

must explicitly enable itq Firewall & NATs?

q RFC for NAT support not even done yet

http://www.caida.org/data/realtime/passive/?monitor=equinix-chicago-

dirA&row=timescales&col=sources&sources=proto&graphs_sing=ts&counters_sing=bits&timescales=24&timescales=168&time

scales=672&timescales=17520

Content

SPDY and HTTP/2

q Is this not application layer?!

Advanced Networking (SS 17): 07 - Transport Layer Evolutionhttp://www.caida.org/data/realtime/passive/?monitor=equinix-chicago-dirA

SPDY and HTTP/2

q Is this not application layer?!

SPDY and HTTP/2

q 2009 Google announced to develop an HTTP successor: SPDYq Goal: 50% reduction of page load timeq Includes HTTP header compressionq As of 2015 it is deprecated

q Now HTTP/2 is gold standard (RFC 7540)q Shares many of the ideas of SPDY

q Addressed key problem:q HTTP 1.1 pipelining is broken due to misbehaving applications and head

of line blockingq In practice disabled mostly� Problems with TCP congestion controlq Solution: building multi-stream support on top of TCP/TLS

■ Idea similar to SCTP but heavily optimized for web traffic & backward compatible with home routers

HTTP/2 – Binary encoding

Ilya Grigorik: High Performance Browser Networking, O'Reilly, 2013

q HTTP/2 emulates “normal” HTTP to applicationq Internal encoding using binary data & compression

14.06.17, 14)13

of 1https://hpbn.co/assets/diagrams/ae09920e853bee0b21be83f8e770ba01.svg

HTTP/2 – Streams

q Multiple streams may interleavedq Prevents head of line blocking

q Client initiated streams carry odd numbers

q Proactive object delivery by server over server initiated streamsq Promises allow server to advertise upcoming proactively pushed

objectsq Streams may be priorized

14.06.17, 14)17

of 1https://hpbn.co/assets/diagrams/47ba5b32e42cf5a06c3741d29ef9b94a.svg

Ilya Grigorik: High Performance Browser Networking, O'Reilly, 2013

HTTP/2 – Performance (I)

q Obvious: Object pushing & binary encoding optimize speed

bingwikiped

eb site

Fig. 4. Page load time with an ADSL Livebox, 50ms latency.

time of 400ms. There is still some benefit: the page load timedecreases by 20% on average. Naturally, we expect to seeworse performances on a 3G network. The reality is that therewas not enough packet loss on the 3G network to influencethe page load time. The recent study by AT&T on SPDY’sperformances in [20] stated that the performances of SPDYwere worse than those of HTTP/1.1 over cellular networks.One would expect this to be valid over HTTP/2 as it is anevolution of SPDY.

wikiped

Web site

HTTP/1.1HTTP/2

Fig. 5. Page load time with a 3G modem, 400ms latency.

3) Local Area Network tests: Latency. Because the major-ity of the Internet browsing is moving to mobile devices, it isworthwhile to look at the influence of latency and packet losson HTTP/2. To this end, we first vary the network latency onour local platform.

Figure 6 shows the page load time in HTTP/1.1 andHTTP/2 for various latency values. For each value, we plottedthe minimum and maximum value, the lower and upperquartiles, along with the median. Interestingly, an increasinglatency widens the difference between HTTP/1.1 and HTTP/2,which means that HTTP/2 reacts well to latency. This sug-gests that this positive influence might also occur on cellularnetworks as they suffer from higher latency.

Packet loss. We saw HTTP/2 reacts positively to high la-tency. But another important characteristic of cellular networksis important packet losses. That is why we conduct a similarexperiment, this time with a fixed latency and varying the

0 50 100 150 200

Latency in milliseconds

HTTP/1.1HTTP/2

Fig. 6. Impact of latency, 0% loss. By pairs, left: HTTP/1.1, right: HTTP/2.

packet loss. Figure 7 shows a poor behaviour: the higher thepacket loss, the lesser the benefits of HTTP/2. Furthermore,the page load time ratio between HTTP/2 and HTTP/1.1 oftenexceeds 1, meaning that HTTP/2 takes longer than HTTP/1.1.

This can be explained as follows: HTTP/2 uses only oneTCP connection to communicate between the client and theserver. When this single connection suffers from packet loss,all streams running over this unique TCP connection arenegatively impacted. In HTTP/1.1, the situation is differentas several TCP connections are open between the client andthe server and this mitigates the packet loss problem. AT&Tin [20] already found similar results for SPDY who is theancestor of HTTP/2.

Web site

Fig. 7. Impact of packet loss, 100ms latency.

From an overall perspective, HTTP/2 decreases page loadtimes, because it goes past the head of line blocking issueby using multiplexing. However, several studies [9] [10] [20]have already stated that SPDY was negatively impacted bypacket loss on cellular networks. This statement is likely tohold true for HTTP/2 because it keeps the same idea as SPDYof multiplexing requests over a single TCP connection. Thisproblem stems from the underlying transport protocol, and assuch only a switch to another transport protocol can solve it.

B. Evaluations on Server push and Priority

Besides the multiplexing and compression mechanisms,there is a second class of new features which is optional

18th IEEE Global Internet Symposium

H. Saxcé et al.: Is HTTP/2 Really Faster Than HTTP/1.1?, 18th IEEE Global Internet Symposium, 2015

HTTP/2 – Performance (II)

q Key question: Does larger congestion control window outperform loss due to head of line blocking?q Discuss: Why may HOL still occur?q Discuss: What is the impact of loss and delay?

HTTP/2 – Performance (III)

bingwikiped

eb site

wikiped

Web site

HTTP/1.1HTTP/2

0 50 100 150 200

HTTP/1.1HTTP/2

Web site

HTTP/2 – Performance (IV)

bingwikiped

eb site

wikiped

Web site

HTTP/1.1HTTP/2

0 50 100 150 200

HTTP/1.1HTTP/2

Web site

Content

Quick UDP Internet Connections (QUIC)

q New transport layer protocol introduced by Google to remove shortcomings of SPDY/HTTP/2 over TCP

q Currently IETF draftq See https://tools.ietf.org/html/draft-ietf-quic-transport-04

q Goals:q Multi-streaming without HOLq Multi-homingq Backward compatibleq Built-in security (i.e. TLS)q Reduced latency by more simple handshakeq Decoupling of congestion control algorithm from protocolq FEC

QUIC in the protocol stack

q QUIC operates at session/transport/application layerq UDP only used for backward compatibility (port 80 or 443)q Sessions identified by 64 bit connection ID

TLS 1.2

HTTP/2

HTTP/2 API

J. Iyengar: QUIC - Redefining Internet Transport

Payload

QUIC – Packet format (according to RFC draft)

q Long header:

q Short header:

1 Type

Connection ID (64 bit)

Packet Counter (32 bit)Version (32 bit)

Connection ID (0 or 64 bit)

Packet Counter (8, 16 or 32 bit)

QUIC – Connection “establishment”

TCP + TLS QUIC(equivalent to TCP + TLS)

0-RTT! No! Just no timeouts – properties may be cached “forever”

Magic?

QUIC – Actual connection establishment

q Indication of server: alternate-protocol:443:quic,p=0.02

q Client initiates with version and server name

q Server “rejects” giving certificates, configuration & “source-address token” to prevent spoofing

q Normal “0-RTT” handshake followsq Always contains source-address token q Contains servers DNS name

q Discuss: q What does this handshake mean for

DoS resistance?q What does it mean for PFS?q What happens if the first packet is

reordered?

Server starts to commit resources

QUIC – Change in security model significant

Client Attacker Server0-RTT key-exchange messages0-RTT data "request"

process "request"accept 0-RTT

key-exchange response messages

enforce loss of state (e.g., reboot)

replay 0-RTT key-exchange messagesreplay 0-RTT data "request"

reject after state lossfor security reasons

reject 0-RTTkey-exchange response messages

final key exchange messagesresend data "request" under final key(to ensure reliable transmission) process "request"

(again)

Figure 1: Generic replay attack discovered by Daniel Kahn Gillmor in the IETF TLS working groupdiscussion around TLS 1.3 [Res15b]. The 0-RTT data "request" could, e.g., be an HTTP request "POST/buy-something".

Note that the contrived requirement that the attacker is able to reboot the server (while the clientkeeps waiting for a response) vanishes in a real-world scenario with distributed server clusters, where theattacker instead simply forwards the 0-RTT messages to two servers and drops the first server’s response.The described attack hence in particular a�ects the cryptographic design of QUIC, which (among others)specifically targets settings with distributed clusters. Holding up the originally envisioned 0-RTT fullreplay protection being impossible, Langley and Chang write in the specification of July 2015 [LC15](Rev 20150720) that this design is “destined to die” and will be replaced by (an adapted version of) theTLS 1.3 handshake. We, however, argue here that QUIC’s strategy in Rev 20130620 still supports somekind of replay resistance, only at a di�erent level. TLS 1.3, in contrast, forgoes any protection mechanismsand instead accepts replays as inevitable (on the channel level). Developers using TLS 1.3 are supposedto be provided with a di�erent API call for sending 0-RTT data [Res16e, Appendix B.1], indicating itsreplayability, and responsible for taking replays into account for such data.

There is, then, a significant conceptual gap between replays (of key-exchange messages and keys) onthe key-exchange level, and the replay of user data faced on the level of the overall secure channel protocolin the 0-RTT setting. While the former can e�ectively be prevented within the key exchange protocol,this does not necessarily prevent the latter which can be (and in practice is) induced by the network stackof the channel actively and automatically re-sending (presumably) rejected 0-RTT data under the mainkey. The latter type of logical, network-stack replays is hence fundamentally beyond of what key exchangeprotocols can protect against.

M. Fischlin et al.: Replay Attacks on Zero Round-Trip Time: The Case of the TLS 1.3 Handshake Candidates, 2nd IEEE European Symposium on Security and Privacy (EuroS&P 2017)

QUIC – Countering opportunistic ACK attacks

q Danger of opportunistic ACKs: Hostile clientq Uses HTTP to “download” huge fileq Injects ACKs even though has not received dataq Server uses up much of its bandwidth

q TCP offers no protection itself

q QUIC does so by allowing servers to skip sequence rangesq Design criterionq May reduce the load induced by hostile clients

QUIC – Production but work in progress

q Latest value found: 9.05% of Google traffic is QUIC

q General standardization of QUIC – In progressq Using BBR with QUIC – In progressq FEC support – Removed due to performance decreaseq Multihoming & multipath – Not implemented yetq Requirement due to some middleboxes: There must always be a

WORKING fallback path to TCP

q Other applications? q Currently very tight bundling to HTTP/2q Various difficulties: First packet may be retransmit silently, fallback

requirement, privacy issues due tracking of connection ID?q See https://tools.ietf.org/html/draft-kuehlewind-quic-applicability-00

advanced networking technologies - startseite tu ilmenau · 1 advanced networking technologies...

Documents

basic to advanced networking

cn1047 introduction to computer networking chapter …...

wireless advanced networking slides - itrainonline ·...

advanced networking technology courses

advanced networking(new)

computer networking (datakom) chapter 4: network layer

control architectures for multi-layer networking

next gen networking using software defined networking (sdn...

computer networking chapter 4 link layer

networking -application layer

advanced networking information

motivation for networking - florida gulf coast...

4 networking transport layer

netscaler 9.3 advanced networking - · pdf fileadvanced...

lte-advanced physical layer

gmpls multi-layer networking - diva - simple search

advanced linux networking

networking fundamentals - dlt solutionsnetworking...

networking layer basics - chinanetcloud training

networking network layer. networking – network layer the...