Multipath TCPCostin Raiciu
University Politehnica of Bucharest
Joint work with: Mark Handley, Damon Wischik, University College LondonOlivier Bonaventure, Sébastien Barré, Université catholique de Louvain and many many others
Christoph PaaschUniversité catholique de Louvain
Thanks to
Networks are becoming multipath
Mobile devices have multiple wireless connections
Networks are becoming multipath
Networks are becoming multipath
Networks are becoming multipath
Datacenters have redundant topologies
Networks are becoming multipath
Servers are multi-homed
Client
How do we use these networks?TCP.
Used by most applications, offers byte-oriented reliable delivery,adjusts load to network conditions
[Labovits et al – Internet Interdomain traffic – Sigcomm 2010]
TCP is single path
A TCP connection Uses a single-path in the network regardless of network topology
Is tied to the source and destination addresses of the endpoints
Mismatch between network and transport
creates problems
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Poor Performance for Mobile Users
3G celltower
Offload to WiFi
Poor Performance for Mobile Users
3G celltower
All ongoing TCP connections die
Collisions in datacenters
[Fares et al - A Scalable, Commodity Data Center Network Architecture - Sigcomm 2008]
Single-path TCP collisions reduce throughput
[Raiciu et. Al – Sigcomm 2011]
Multipath TCP
Multipath TCP (MPTCP) is an evolution of TCP that can effectively use multiple paths within a single transport connection
• Supports unmodified applications
• Works over today’s networks
• Standardized at the IETF (almost there)
Multipath TCP components
Connection setupSending data over multiple pathsEncoding control informationDealing with (many) middleboxesCongestion control
[Raiciu et. al – NSDI 2012]
[Wischik et. al – NSDI 2011]
Multipath TCP components
Connection setupSending data over multiple pathsEncoding control informationDealing with (many) middleboxesCongestion control
[Raiciu et. al – NSDI 2012]
[Wischik et. al – NSDI 2011]
MPTCP Connection Management
SYN
MP_CAPABLE X
MPTCP Connection Management
SYN/ACK MP_CAPABLE Y
MPTCP Connection ManagementSUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Connection Management
SYN JOIN Y
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Connection Management
SYN/ACK
JOIN X
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Connection Management
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
TCP Packet Header
Source Port Destination Port
Sequence Number
Acknowledgment Number
Receive WindowHeaderLength Reserved Code bits
Checksum
Options
Urgent Pointer
Data
Bit 0 Bit 15 Bit 16 Bit 31
20Bytes
0 - 40Bytes
TCP Packet Header
Source Port Destination Port
Sequence Number
Acknowledgment Number
Receive WindowHeaderLength Reserved Code bits
Checksum
Options
Urgent Pointer
Data
Bit 0 Bit 15 Bit 16 Bit 31
20Bytes
0 - 40Bytes
Sequence Numbers
Packets go multiple paths.– Need sequence numbers to put them back in sequence.– Need sequence numbers to infer loss on a single path.
Options:– One sequence space shared across all paths?– One sequence space per path, plus an extra one to put
data back in the correct order at the receiver?
Sequence Numbers
• One sequence space per path is preferable.– Loss inference is more reliable.– Some firewalls/proxies expect to see all the sequence
numbers on a path.
• Outer TCP header holds subflow sequence numbers.– Where do we put the data sequence numbers?
MPTCP Packet Header
Source Port Destination Port
Sequence Number
Acknowledgment Number
Receive WindowHeaderLength Reserved Code bits
Checksum
Options
Urgent Pointer
Data
Bit 0 Bit 15 Bit 16 Bit 31
20Bytes
0 - 40Bytes
Subflow
Subflow
Subflow Subflow
Data sequence number Data ACK
MPTCP Operation
DATASEQ1000
DSEQ10000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ1000
DSEQ10000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ1000
DSEQ10000
options
……
DATASEQ5000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ1000
DSEQ10000
options
……
DATASEQ5000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ1000
DSEQ10000
options
……
DATASEQ5000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP OperationACK2000
Data ACK11000
… …
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
DATASEQ5000
DSEQ11000
options
……
FLOW Y
MPTCP Operation
DATASEQ5000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ5000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
MPTCP Operation
DATASEQ2000
DSEQ11000
options
……
SUBFLOW 2CWNDSnd.SEQNORcv.SEQNO
SUBFLOW 1CWNDSnd.SEQNORcv.SEQNO
FLOW Y
Multipath TCPCongestion Control
42Packet switching ‘pools’ circuits.Multipath ‘pools’ linksTCP controls how a link is shared.How should a pool be shared?
Two circuits A link Two separate links
A pool of links
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 1:Multipath TCP should be fair to regular TCP at shared bottlenecks
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many subflows it is using.
A multipath TCP flow with two subflows
Regular TCP
43
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 2:MPTCP should use efficient paths
Each flow has a choice of a 1-hop and a 2-hop path. How should split its traffic?
12Mb/s
12Mb/s
12Mb/s
44
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 2:MPTCP should use efficient paths
If each flow split its traffic 1:1 ...
8Mb/s
8Mb/s
8Mb/s
12Mb/s
12Mb/s
12Mb/s
45
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 2:MPTCP should use efficient paths
If each flow split its traffic 2:1 ...
9Mb/s
9Mb/s
9Mb/s
12Mb/s
12Mb/s
12Mb/s
46
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 2:MPTCP should use efficient paths
• If each flow split its traffic 4:1 ...
10Mb/s
10Mb/s
10Mb/s
12Mb/s
12Mb/s
12Mb/s
47
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 2:MPTCP should use efficient paths
• If each flow split its traffic ∞:1 ...
12Mb/s
12Mb/s
12Mb/s
12Mb/s
12Mb/s
12Mb/s
48
To be fair, Multipath TCP should take as much capacity as TCP at a bottleneck link, no matter how many paths it is using.
Design goal 3:MPTCP should get at least as much as TCP on the best path
Design Goal 2 says to send all your traffic on the least congested path, in this case 3G. But this has high RTT, hence it will give low throughput.
wifi path: high loss, small RTT
3G path: low loss, high RTT
49
Maintain a congestion window w.
Increase w for each ACK, by 1/w
Decrease w for each drop, by w/2
How does TCP congestion control work?50
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
51
How does MPTCP congestion control work?
€
αwr
r
∑
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
52
How does MPTCP congestion control work?
€
αwr
r
∑Goal 2
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
53
How does MPTCP congestion control work?
€
αwr
r
∑Goals 1&3
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
54
How does MPTCP congestion control work?
Applications of Multipath TCP
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
56
100Mb/s
100Mb/s
2 TCPs @ 50Mb/s
4 TCPs@ 25Mb/s
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
57
100Mb/s
100Mb/s
2 TCPs @ 33Mb/s
1 MPTCP @ 33Mb/s
4 TCPs@ 25Mb/s
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
58
100Mb/s
100Mb/s
2 TCPs @ 25Mb/s
2 MPTCPs @ 25Mb/s
4 TCPs@ 25Mb/s
The total capacity, 200Mb/s, is shared out evenly between all 8 flows.
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
59
100Mb/s
100Mb/s
2 TCPs @ 22Mb/s
3 MPTCPs @ 22Mb/s
4 TCPs@ 22Mb/s
The total capacity, 200Mb/s, is shared out evenly between all 9 flows.
It’s as if they were all sharing a single 200Mb/s link. The two links can be said to form a 200Mb/s pool.
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
60
100Mb/s
100Mb/s
2 TCPs @ 20Mb/s
4 MPTCPs @ 20Mb/s
4 TCPs@ 20Mb/s
The total capacity, 200Mb/s, is shared out evenly between all 10 flows.
It’s as if they were all sharing a single 200Mb/s link. The two links can be said to form a 200Mb/s pool.
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
61
100Mb/s
100Mb/s
5 TCPs
First 0, then 10 MPTCPs
15 TCPs
time [min]
throughput per flow [Mb/s]
We confirmed in experiments that MPTCP nearly manages to pool the capacity of the two access links.Setup: two 100Mb/s access links, 10ms delay, first 20 flows, then 30.
At a multihomed web server, MPTCP tries to share the ‘pooled access capacity’ fairly.
62
100Mb/s
100Mb/s
5 TCPs
First 0, then 10 MPTCPs
15 TCPs
MPTCP makes a collection of links behave like a single large pool of capacity —i.e. if the total capacity is C, and there are n flows, each flow gets throughput C/n.
Multipath TCP can pool datacenter networks
Instead of using one path for each flow, use many random paths
Don’t worry about collisions.
Just don’t send (much) traffic on colliding paths
Multipath TCP in data centers
Multipath TCP in data centers
MPTCP better utilizes the FatTree network
MPTCP on EC2
• Amazon EC2: infrastructure as a service– We can borrow virtual machines by the hour– These run in Amazon data centers worldwide– We can boot our own kernel
• A few availability zones have multipath topologies– 2-8 paths available between hosts not on the same
machine or in the same rack– Available via ECMP
Amazon EC2 Experiment
• 40 medium CPU instances running MPTCP• For 12 hours, we sequentially ran all-to-all
iperf cycling through:– TCP– MPTCP (2 and 4 subflows)
MPTCP improves performance on EC2
SameRack
Implementing Multipath TCP
in the Linux Kernel
Linux Kernel MPTCP About 10000 lines of code in the Linux Kernel Initially started by Sébastien Barré Now, 3 actively working on Linux Kernel MPTCP
Christoph Paasch Fabien Duchêne Gregory Detal
Freely available at http://mptcp.info.ucl.ac.be
MPTCP-session creation
Application creates regular TCP-sockets
MPTCP-session creation
The Kernel creates the Meta-socket
MPTCP creating new subflows
The Kernel handles the different MPTCP subflows
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP Performance with apache
100 simultaneous HTTP-Requests, total of 100000
MPTCP on multicore architectures Flow-to-core affinity steers all packets from one TCP-flow to
the same core. MPTCP has lots of L1/L2 cache-misses because the individual
subflows are steered to different CPU-cores
MPTCP on multicore architectures
MPTCP on multicore architectures Solution: Send all packets from the same MPTCP-session to the
same CPU-core Based on Receive-Flow-Steering implementation in Linux
(Author: Tom Herbert from Google)
MPTCP on multicore architectures
Multipath TCP on
Mobile Devices
MPTCP over WiFi/3G
TCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
MPTCP over WiFi/3G
WiFi to 3G handover with Multipath TCP
A mobile node may lose its WiFi connection.
Regular TCP will break!
Some applications support recovering from a broken TCP (HTTP-Header Range)
Thanks to the REMOVE_ADDR-option, MPTCP is able to handle this without the need for application support.
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
WiFi to 3G handover with Multipath TCP
Related Work
Multipath TCP has been proposed many times before– First by Huitema (1995),CMT, pTCP, M-TCP, …
You can solve mobility differently– At different layer: Mobile IP, HTTP range – At transport layer: Migrate TCP, SCTP
You can deal with datacenter collisions differently– Hedera (Openflow + centralized scheduling)
Multipath topologies need multipath transport
Multipath TCP can be usedby unchanged applicationsover today’s networks
MPTCP moves traffic away from congestion,making a collection of links behave like asingle pooled resource
Backup Slides
Packet-level ECMP in datacenters
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
107
How does MPTCP congestion control work?
Design goals 1&3:At any potential bottleneck S that path r might be in,look at the best that a single-path TCP could get,and compare to what I’m getting.
Maintain a congestion window wr, one window for each path, where r ∊ R ranges over the set of available paths.
Increase wr for each ACK on path r, by
Decrease wr for each drop on path r, by wr /2
108
How does MPTCP congestion control work?
Design goal 2:We want to shift traffic away from congestion.
To achieve this, we increase windows in proportion to their size.