circuit emulation for bulk transfers in distributed storage and clouds
DESCRIPTION
Assuming that majority of in-cloud networking is Ethernet-based at least at departure and entry points, it is widely recognized that TCP/UDP communications fail to achieve the necessary throughput during bulk transfers. While modern switches support maximum achievable throughput via the cut-through mode of operation, the practical benefit of this mode is diminished when the network is contended by multiple communication parties. This research removes this problem by implementing circuits-over-packets emulation. Circuits are simply optimal schedules for communication sessions where each session gets exclusive access to the network. Transfer of chunks of Big Data, pieces of storage, VM images, etc. all fall under the category of bulk transfers.TRANSCRIPT
.
Setting the Mood
• "It's time to get rid of TCP/UDP protocols in DCs"
• DCs/Clouds are closed worlds, brand new technologies are OK
• with bulk transfers (BigData, ...), the business value of a TCP/UDP alternative is high
• circuits are an alternative to packets
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 2/32...
2/32
.
Ethernet is the Best
.Ethernet.....
.
... is the cheapest and most available technology with e2esupport
• Fiber Channel (FC), SATA, etc. require expensive hardware, lowcompatibility, no e2e support
• FCoE = Ethernet, same problems, expensive hardware, no e2e support
• network virtualization is best fit for Ethernet
• disclaimer: one of proposed models will work with optical networks aswell
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 3/32...
3/32
.
Ethernet is the Worst
.Ethernet.....
.... is the worst technology in terms of throughput• CSMA/CD is the biggest throughput limitation
◦ not in modern switches, but still major problem in wireless
• contention problem cannot be easily resolved
• same applies to OBS/OPS optical technologies
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 4/32...
4/32
.
Ethernet Contention
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 5/32...
5/32
.
Ethernet and Contention
• whaterver you do, Ethernet L2 domains cannot avoid contention
Switch Switch
Qualitatively Identical
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 6/32...
6/32
.
Parallel vs Sequential (2 flows)
20 24 28 32 36 40Transfer time in contention (s)
20
24
28
32
36
40Tr
ansf
er ti
me
by e
xclu
sive
circ
uits
(s)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 7/32...
7/32
.
Ethernet Switches : Basic Facts
• cut-through versus store-and-forward• cut-through is 10..15x better
• Cisco has advanced cut-through : +bytes versus routing decision tradeoff
• store-and-forward is subjected to QoS classes◦ L3 DSCP versus L2 CoS, AF, EF, BE, SBE models
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 8/32...
8/32
.
Switchess : Modeling
C: Cut Through
Check, etc. Q: Queue
D: Drop QoS classes
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 9/32...
9/32
.
Proposal
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 10/32...
10/32
.
Proposal : Circuits
.Circuits..
.
... are emulations which allow for exclusive access to L2 domain byindividual parties
• circuits-over-packets emulation
• cut-through mode for each circuit is guaranteed
• highest possible throughput
• NOTE: will work with cheepest switches
• NOTE2: applies to optical networks as well (L2=lightpaths)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 11/32...
11/32
.
Implementation : 2 cases• left: book-then-send, right: separate control layer
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 12/32...
12/32
.
Impl.: Centralized Case
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
• same network for booking andcircuits
• inefficient but still valid/practical
• legacy-compatible,partial implementation, etc.
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 13/32...
13/32
.
Impl.: Distributed Case
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
• book on one network, send on another
• legacy-incompatible• contention-sensing possible →fully distributed models
• can also use sensing andcontention control
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 14/32...
14/32
.
Optimization
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 15/32...
15/32
.
Optimization : Basics
• same for distributed and centralized models◦ does not matter, optimization shows the overall utility of a heuristic
• practical optimization = formulation + heuristic• given: demand matrix
• expected result: a routing table mapping demand to topology
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 16/32...
16/32
.
Optimization : Basics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 17/32...
17/32
.
Optim. : OSPF → tuple notation
• OSPF is traditional in such optimizations, but too rigid for many practical cases◦ too complex for lightpaths in optical networks◦ no good heuristics for complex topologies
• OSPF notation is not very convinient1. capacity constraints2. flow preservation3. contention/congestion metrics
• alternative: tuples ... for example ⟨s, d, v, t⟩ defines demand of traffic
volume v at time t from source s to destionation d◦ this notation ismuch more flexible for several coming formulations
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 18/32...
18/32
.
Optim. : Basic Tuple Notation
• nodes: source s, destination: d and others a, b, c• individual demand tupleTi = ⟨s, d, v, t⟩• lightpathλ for optical networks
• time t, can be start time, start and end of a period, etc.
• we do not care about utility so far, just the notation, but utility is obvious inmost cases
• → means results in... or leads to...
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 19/32...
19/32
.
tOSPF : Traditional OSPF
Ti = ⟨s, d, v, t⟩ → ⟨s, a, b, ..., d⟩.Externals..
.
Using demand matrix, creates a set of per-linkweights, which define a unique route for eachdemand item.
.Internals..
.
Per-link capacity constraint, in/out flowconservation constraint, unstable for largetopologies and demand matrices
• s source
• d destination
• a, b, c, ... intermediatenodes on e2e paths/routes
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 20/32...
20/32
.
oOSPF : Optical OSPF w/out Switching
Ti = ⟨s, d, v, t⟩ → ⟨s, λ⟩.Externals..
.
Using demand matrix, maps each demand item onisolated lightpath
.Internals..
.
Simple but inefficient because the number ofe2e lightpaths is small
• s source
• d destination
• λ a wavelength for a fixed e2elightpath from s to destination
d
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 21/32...
21/32
.
oOSPFs : Optical OSPF with Switching
Ti = ⟨s, d, v, t⟩ → ⟨s, λs, λa, λb, ...⟩.Externals..
.
Using demand matrix, maps each demand item on aroute of wavelengths
.Internals..
.
Efficient, but suffers from the same problemsas traditional OSPF
• s source
• d destination
• λx an exit wavelength at agiven node x
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 22/32...
22/32
.
Proposal : Sensing Formulation
Ti = ⟨s, d, v, t1, t2⟩ → ⟨s, λ, t⟩.Externals..
.
Using a matrix of loosely scheduled demand, createa schedule of sequential sessions withexlusive access to paths
.Internals..
.
Same approach for Ethernet (one wavelength) andoptical networks
• s source
• d destination
• t1 and t2 areuser-preferred range forthe start of a session, a valuet is picked between them
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 23/32...
23/32
.
Heuristics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 24/32...
24/32
.
Centralized Case
SWITCH
NOC
Storage Node A
Storage Node B
Step 1: Book
session
Step 2: Transfer bulk
• all optimization formulations exceptsensing
• very close to traditional OSPF• same problems as in OSPF
• the biggest problem is to knowdemand matrix in advance
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 25/32...
25/32
.
Distributed Case
SWITCH
Storage Node A
Storage Node B
SWITCH
Bookingsegment
BulkSegment
• can be used for all formulations
• pefectly suited for the Sensingformulation
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 26/32...
26/32
.
The Sensing Model• contention methods in wireless and OBS will work
◦ in practice: sensing can beSNMP-like feedback on gate's status◦ no sync among users is necessary
• same model for Ethernet (+virtual nets) and optical networks
• main advantage: the offload, no need to implement funny OSPFheuristics
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 27/32...
27/32
.
Realistic Gate/Sensing Model
• an approximate view of JGNtopology
• two way = one way + ring• Gates are created at optical/ethernet border
• NOTE: already working for Ethernet
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 28/32...
28/32
.
Wrapup
• circuit emulation is necessary for effective bulk transfers◦ up to 40% faster in our lab tests
• intra-DC, DC-DC, federations, etc. -- all can benefit from circuits
• circuits formulated as OSPF are bad -- a Gate/Sensing model is better• validity: worst case is the existing technology, but upper performancebound is very high
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 29/32...
29/32
.
That’s all, thank you ...
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...
30/32
.
[01] myself (2014)High Availability Cloud Storage...NS研
[02] Cisco (2014)LAN Switching and Wireless, CCNA Exploration Companion GuideCisco Press
[03] Cisco (2014)Cut-Through and Store-and-Forward Ethernet Switching for Low-Latency....Cisco Press
[04] NetOptics (2014)Cut-Through Ethernet Switching: A Versatile Resource for Low Latency...White Paper
[05] Cisco (2006)QoS: DSCP Classification GuidelinesRFC4594
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 30/32...
30/32
.
[06] Cisco (2010)A Differentiated Services Code Point (DSCP)...RFC5865
[07] open source (current)PICA8 Project for Low Latency Virtual Networkinghttp://www.pica8.com/
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...
31/32
.
Wait-n-Send Model
Bulk size per transmission
Goodput
2 potential distributions in practice
Response curve(s)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 31/32...
31/32
.
Utility of Waiting (curve)
• I called it Wait-n-SeeCurve
• source waits for some time forexclusive access --sensing and accumulating bulk
• on timeout, the current bulkis released at best effort(fallback)
M.Zhanikeev -- [email protected] -- Circuit Emulation for Bulk Transfers in Dist. Storage and Clouds -- http://bit.do/marat140903 32/32...
32/32