fundamental methods and heuristics for massive scale data
Post on 11-Nov-2021
1 Views
Preview:
TRANSCRIPT
Australia/China SKA Big Data Workshop, April 10th, 2017
Fundamental Methods and Heuristics for Massive Scale Data Distribution
Xiaoying ZhengShanghai Advanced Research Institute (SARI),
Chinese Academy of Sciences
Australia/China SKA Big Data Workshop2017/04/10
About SARI
• Established since 2009
• Research areas
– frontier studies and advanced manufacturing
– information technology
– space technology
– energy and environment
– health sciences
1
SARI
Shanghai Tech. edu
Shanghai SRF
We’re here
Australia/China SKA Big Data Workshop2017/04/10
About Me
• Associate Professor at SARI, CAS
• Research focus
– Networking
modeling and performance evaluation of networks; peer-to-peer networks; content and service distribution; congestion control; network flow control and routing
– Cloud Computing
resource allocation and scheduling
2
Australia/China SKA Big Data Workshop2017/04/10
Background
3
• SKA data will be transferred across the globe on a Tbit/s scale
• the network connecting SKA data centers is supposed to be more stable and moderate in size compared with Internet
• the capacity limitation can equally be at the core or the edge
• Proposed approaches– Swarming
– Back-pressure based multipath routing
Australia/China SKA Big Data Workshop2017/04/10
Outline
4
• Approach 1: Swarming
• Approach 2: Back-pressure based multipath routing
• FuNet: a SDN-based network testbed at SARI
Australia/China SKA Big Data Workshop2017/04/10
Approach 1: Swarming
5
• A big success for P2P file sharing– Files are broken into many
chunks
– Receivers help each other to receive chunks
• Apply swarming to infrastructure-based SKA data distribution– More stable content servers
and networks
A snapshot of swarming. The available file chunks at each host H1 to H4 are shown by shaded boxes. e.g., H1 has chunks 1, 3, 8 and 10. The dashed lines show the current connections, e.g., H1 is downloading chunks 2 and 4 from H2, and chunk 11 from H3.
Australia/China SKA Big Data Workshop2017/04/10
Swarming: transmit by multiple multi-cast trees
6
• Equivalent to use multiple Steiner trees to distribute different file chunks
– Some other nodes who are NOT interested in the file may also participate
– to help resource limited distribution session and improve distribution efficiency
• Proposed problem: determine an optimal set of distribution trees as well as the data rate on each tree
Australia/China SKA Big Data Workshop2017/04/10
Swarming: example
7
A swarming example. (a) Node 1 distributes a file to node 2 and 3; node 4 is an out-of-session node. The numbers next to the links are the link capacities. (b) All possible distribution trees and the optimal solution for throughput maximization. The boxes represent file chunks. Three of the trees involve the out-of-session node 4.
Australia/China SKA Big Data Workshop2017/04/10
Swarming: solutions
8
• An analogy of the min-cost path in single-cast
• Use a min-cost Steiner tree in a time slot, and switch from trees to trees
• Solutions:
– Approximate min-cost Steiner tree search + column generation
– Or random min-cost Steiner tree search
Australia/China SKA Big Data Workshop2017/04/10
Outline
9
• Approach 1: Swarming
• Approach 2: Back-pressure based multipath routing
• FuNet: a SDN-based network testbed at SARI
Australia/China SKA Big Data Workshop2017/04/10
Approach 2 : back-pressure based multi-path routing
10
• Motivation:
– The bandwidth of Inter SKA data center network is expensive
– Backbone traffic volumes vary over time
– Use the time-varying leftover bandwidth to transmit non-urgent data
Australia/China SKA Big Data Workshop2017/04/10
Back-pressure based multi-path routing
11
• Difficulty: the leftover bandwidth is time-varying and unknown– Traditional solution: traffic prediction techniques
• Our solution: back-pressure based multi-path routing– data packets are temporarily stored at intermediate
datacenters, and forwarded to a neighbor node when there are available spare residual bandwidth.
– balance the buffers of two adjacent datacenter nodes as much as possible by pushing data across the link between the two nodes using the residual bandwidth, where the buffer size is regard as the pressure of the buffer
Australia/China SKA Big Data Workshop2017/04/10
Example: step 1
12
• A queue is maintained at each data center node for each directional link and each transmission session; push new packets to the source
Australia/China SKA Big Data Workshop2017/04/10
Example: step 2
13
• Push packets across each link so as to balance the queues of the link as much as possible
Australia/China SKA Big Data Workshop2017/04/10
Example: step 3
14
• Packets arrives at the next hop and are removed from the sink
Australia/China SKA Big Data Workshop2017/04/10
Example: step 4
15
• Packets are re-allocated between queues according to the expectation of the leftover bandwidth
Australia/China SKA Big Data Workshop2017/04/10
Outline
16
• Approach 1: Swarming
• Approach 2: Back-pressure based multipath routing
• FuNet: a SDN-based network testbed at SARI
Australia/China SKA Big Data Workshop2017/04/10
FuNet: a SDN-based network testbed
17
• CAS built a 1G/10G network testbedconnecting 15 cities with 30 hosts, including Beijing, Shanghai, Hefei, Zhengzhou
• Accessible from/to GENI• Supports Protocol-
Oblivious-Forward protocol (POF)– An open-source SDN
protocol stack developed by Huawei
• Be able to support the SKA data transmission prototype development
SARI
Australia/China SKA Big Data Workshop2017/04/10 18
Thank you!
zhengxy@sari.ac.cn
top related