fundamental methods and heuristics for massive scale data

Australia/China SKA Big Data Workshop, April 10th, 2017

Fundamental Methods and Heuristics for Massive Scale Data Distribution

Xiaoying ZhengShanghai Advanced Research Institute (SARI),

Chinese Academy of Sciences

Australia/China SKA Big Data Workshop2017/04/10

About SARI

• Established since 2009

• Research areas

– frontier studies and advanced manufacturing

– information technology

– space technology

– energy and environment

– health sciences

Shanghai Tech. edu

Shanghai SRF

We’re here

About Me

• Associate Professor at SARI, CAS

• Research focus

– Networking

modeling and performance evaluation of networks; peer-to-peer networks; content and service distribution; congestion control; network flow control and routing

– Cloud Computing

resource allocation and scheduling

Background

• SKA data will be transferred across the globe on a Tbit/s scale

• the network connecting SKA data centers is supposed to be more stable and moderate in size compared with Internet

• the capacity limitation can equally be at the core or the edge

• Proposed approaches– Swarming

– Back-pressure based multipath routing

Outline

• Approach 1: Swarming

• Approach 2: Back-pressure based multipath routing

• FuNet: a SDN-based network testbed at SARI

Approach 1: Swarming

• A big success for P2P file sharing– Files are broken into many

chunks

– Receivers help each other to receive chunks

• Apply swarming to infrastructure-based SKA data distribution– More stable content servers

and networks

A snapshot of swarming. The available file chunks at each host H1 to H4 are shown by shaded boxes. e.g., H1 has chunks 1, 3, 8 and 10. The dashed lines show the current connections, e.g., H1 is downloading chunks 2 and 4 from H2, and chunk 11 from H3.

Swarming: transmit by multiple multi-cast trees

• Equivalent to use multiple Steiner trees to distribute different file chunks

– Some other nodes who are NOT interested in the file may also participate

– to help resource limited distribution session and improve distribution efficiency

• Proposed problem: determine an optimal set of distribution trees as well as the data rate on each tree

Swarming: example

A swarming example. (a) Node 1 distributes a file to node 2 and 3; node 4 is an out-of-session node. The numbers next to the links are the link capacities. (b) All possible distribution trees and the optimal solution for throughput maximization. The boxes represent file chunks. Three of the trees involve the out-of-session node 4.

Swarming: solutions

• An analogy of the min-cost path in single-cast

• Use a min-cost Steiner tree in a time slot, and switch from trees to trees

• Solutions:

– Approximate min-cost Steiner tree search + column generation

– Or random min-cost Steiner tree search

Outline

Approach 2 : back-pressure based multi-path routing

• Motivation:

– The bandwidth of Inter SKA data center network is expensive

– Backbone traffic volumes vary over time

– Use the time-varying leftover bandwidth to transmit non-urgent data

Back-pressure based multi-path routing

• Difficulty: the leftover bandwidth is time-varying and unknown– Traditional solution: traffic prediction techniques

• Our solution: back-pressure based multi-path routing– data packets are temporarily stored at intermediate

datacenters, and forwarded to a neighbor node when there are available spare residual bandwidth.

– balance the buffers of two adjacent datacenter nodes as much as possible by pushing data across the link between the two nodes using the residual bandwidth, where the buffer size is regard as the pressure of the buffer

Example: step 1

• A queue is maintained at each data center node for each directional link and each transmission session; push new packets to the source

Example: step 2

• Push packets across each link so as to balance the queues of the link as much as possible

Example: step 3

• Packets arrives at the next hop and are removed from the sink

Example: step 4

• Packets are re-allocated between queues according to the expectation of the leftover bandwidth

Outline

FuNet: a SDN-based network testbed

• CAS built a 1G/10G network testbedconnecting 15 cities with 30 hosts, including Beijing, Shanghai, Hefei, Zhengzhou

• Accessible from/to GENI• Supports Protocol-

Oblivious-Forward protocol (POF)– An open-source SDN

protocol stack developed by Huawei

• Be able to support the SKA data transmission prototype development

Australia/China SKA Big Data Workshop2017/04/10 18

Thank you!

zhengxy@sari.ac.cn

fundamental methods and heuristics for massive scale data

Documents

inlining heuristics

nuclear heuristics

ee384y heuristics

moral heuristics - sunstein.pdf

game trees and heuristics 15-211: fundamental data...

sisb heuristics

plugin heuristics

finding search heuristics

usability heuristics cmpt 281. outline usability heuristics...

smartphone heuristics

heuristics- behavioural finance

monologistic heuristics

chiropractic heuristics

emotion and heuristics

information architecture heuristics

experiential marketing heuristics

automated (ai) planning - cvut.cz · automated (ai)...

healthcoach4me : heuristics

(4) heuristics

heuristics and biases