bigdata express: toward predictable, schedulable, and high

26
BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer BigData Express Research Team November 10, 2018

Upload: others

Post on 20-Nov-2021

3 views

Category:

Documents


0 download

TRANSCRIPT

BigData Express: Toward Predictable, Schedulable, and High-Performance Data Transfer

BigData Express Research Team November 10, 2018

Many people’s hard work

FNAL: Qiming Lu, Liang Zhang, Sajith Sasidharan,

Wenji Wu, Phil DeMar

ESnet: Chin Guok, John Macauley, Inder Monga

iCAIR/StarLight: Se-young Yu, Jim-Hao Chen, Joe Mambretti

KISTI: Jin Kim, Seo-Young Noh,

Univ. of Maryland: Xi Yang, Tom Lehman

ORNL: Gary Liu

Acknowledgments

This work was supported by the U.S. DOE Office of Science ASCR network research program

Data Transfer Challenges in Big Data Era

10 Gbps40 Gbps

100 Gbps

1 Tbps

400 Gbps

Throughput

High-Performance Challenge

Time

Real-time data transfer(200-500ms)

Time Constraints on Data Transfer Challenge

Deadline-bounddata transfer

(application specific)

Backgrounddata transfer

(no explicit deadline)

• High-performance challenges

• Time-constraint challenges

Data Transfer – State of the Art

• Advanced data transfer tools and services developed– GridFTP, BBCP

– PhEDEx, LIGO Data Replicator, Globus Online

• Numerous enhancements– Parallelism at all levels• Multi-stream, Multicore, Multi-path parallelism

– Science DMZ architecture

– Terabit networks

Problems with Existing Data Transfer Tools & Services

• Disjoint end-to-end data transfer loop

• Cross-interference between data transfers

• Oblivious to user requirements (e.g., deadlines and Qos requirements)

• Inefficiencies arise with existing data transfer tools running on DTNs

• Distributed resource management model– Resource contention– Performance mismatch

Problem 1 – Disjoint end-to-end data transfer loop

a

High-endDTN transfer 1

LAN

b

c

WANSite 1 Site 2DTNsLAN

Low-endDTN

High-endDTN

a

High-endDTN

transfer 1

LAN

b

c

WANSite 1 Site 2DTNsLAN

a. without coordination

b. with coordination

Performance Mismatch !

Low-endDTN

High-endDTN

a

b

DTNs

s1

s2

br

LAN

flow 1

bandwidth bottleneck

br1

r1

r3

r2

r4

br2

WAN

flow 1

bandwidth bottleneck

a. Network congestion in LANwithout coordination

b. Network congestion in WANwithout coordination

a

b

DTNs

s1

s2

br

LANflow 1

bandwidth bottleneck

c. No network congestion in LANwith coordination

br1

r1

r3

r2

r4

br2

WAN

flow 1

bandwidth bottleneck

d. No network congestion in WANwith coordination

Problem 2 – Cross-interference between data transfers

Severe Cross-interference

Resource Contention

Poor Performance

• Degraded performance• High variability in data transfer performance

Problem 3 – Oblivious to user requirements

Time

Tasks

Task1

Task2

Time

Tasks

Task1

Task2

Deadline1 Deadline2 Deadline1 Deadline2

a. without deadline awareness b. with deadline awareness

• Data transfer jobs are scheduled on a first-come, first serve basis– Without deadline awareness

• Resources are shared fairly among data transfer jobs

Problem 4 – Inefficiencies arises when existing data transfer tools run on DTNs

• I/O locality on NUMA systems• Cache thrashing• Scheduling overheads…. 100GE NIC

NUMA NODE 1 NUMA NODE 2

Local I/Os StorageRemote I/Os

Cores Cores

I/O locality problem on NUMA systems

Need high-performance data transfer tool!

Our Solution - BigData Express

• BigData Express: a schedulable, predictable, and high-performance data transfer service– A peer-to-peer, scalable, and extensible data transfer model– A visually appealing, easy-to-use web portal– A high-performance data transfer engine– A time-constraint-based scheduler– On-demand provisioning of end-to-end network paths with guaranteed QoS– Robust and flexible error handling– CILogon-based security

BigData Express Major Components

• BigData Express Web Portal– Access to BigData Express services

• BigData Express Scheduler– Time-constraint-based scheduler– Co-scheduling DTN, storage, & network

• AmoebaNet– Network as a service– Rate control

• mdtmFTP– High-performance data transfer engine– http://mdtm.fnal.gov

• DTN Agent– Manage and configure DTNs– Collect & report DTN configuration and

status

• Storage Agent– Manage and configure storage systems– I/O estimation

• Data Transfer Launching Agent– Launch data transfer jobs– Support different data transfer

protocols

BigData Express -- Distributed

A Peer-to-Peer model

DTNs

Networks

DTNs

DTNs

DTNs

DTNs

Data Transfer Federation 1

Data Transfer Federation 3

Data Transfer Federation 2

BigData Express -- Flexible

• Flexible to set up data transfer federations

• Providing inherent support for incremental deployment

BDE SchedulerSDN Agent

Storage Agent

DTN AgentDTN Agent

SDN Agent

BDE Web Portal

Data Transfer Launching AgentData Transfer

Launching AgentData Transfer Launching Agent

Message Queue

SDN Agent(AmoebaNet)

DTN Agent

Storage AgentStorage Agent

BDE Scheduler

BigData Express -- Scalable

• BigData Express scheduler manages site resources through agents• Use MQTT as message bus

BDE SchedulerSDN Agent

Storage Agent

DTN AgentDTN Agent

SDN Agent

BDE Web Portal

Data Transfer Launching AgentData Transfer Launching Agent

Data Transfer Launching Agent

mdtmFTPPlugin

GridFTPPlugin

XrootDPlugin

Message Queue

SDN Agent(AmoebaNet)

DTN Agent

Storage AgentStorage Agent

BDE Scheduler

BigData Express -- Extensible

• Extensible Plugin framework to support various data transfer protocols• mdtmFTP, GridFTP, XrootD, …

BigData Express -- End-to-End Data Transfer Model

• Application-aware network serviceo On-demand programming

• Fast-provisioning of end-to-end network paths with guaranteed QoS

• Distributed resource negotiation & brokeringLAN WAN LAN

mdtmFTP

AmoebaNet AmoebaNetSENSE

Edge DTNStorage Edge DTN Storage

DTN Agent

Storage Agent

mdtmFTP

DTN Agent

Storage Agent

Web Portal

Scheduler

Data Transfer Launching Agent

Web Portal

Scheduler

Data Transfer Launching Agent

Site A - Smart E2EData Transfer Orchestrator

Site B - Smart E2EData Transfer Orchestrator

A End-to-End Data Transfer Loop with Guaranteed QoS

Resource negotiation & brokering

Resource

negotiation & brokering

Resource

negotiation & brokering

BigData Express – High Performance Data Transfer (I)

mdtmFTP is faster than existing data transfer tools, ranging from 8% to 9500%!@ESnet 100GE SDN Testbed,

Note 1: “-” indicates inability to get transfer to workNote 2: BBCP performance is very poor, we do not list its results hereNote 3: BBCP and FDT support 3rd party data transfer. But BBCP and FDT couldn’t run 3rd party data transfer on ESNET testbed due to testbed limitation

mdtmFTP FDT GridFTP BBCPLarge file data transfer (1 X 100G) 74.18 79.89 91.18 Poor performanceFolder data transfer (30 x 10G) 192.19 217 320.17 Poor performanceFolder data transfer (Linux 3.12.21) 10.51 - 1006.02 Poor performance

Time-to-completion (Seconds) – Client/Server mode

mdtmFTP FDT GridFTP BBCPLarge file data transfer (1 X 100G) 34.976 N/A 106.84 N/AFolder data transfer (30 x 10G) 95.61 N/A - N/AFolder data transfer (Linux 3.12.21) 9.68 N/A - N/A

Time-to-completion (Seconds) – 3rd party mode

Lower is better

Lower is better

BigData Express – High Performance Data Transfer (II)

mdtmFTP is faster than GridFTP, ranging from 40% to 114%!@StarLight 100GE Testbed

StarLight 100GE Testbed

BigData Express -- Three Types of Data Transfer

• Real-time data transfer

• Deadline-bound data transfer

• Best-effort data transfer

BigData Express – Mechanism SummaryProblems with existing data transfer tools BigData Express Solutions

• Disjoint end-to-end data transfer loop

• Cross-interference between data transfers

• Oblivious to user requirements

• Inefficiencies arises when existing data transfer tools run on DTNs

• Distributed resource negotiation & brokering• Co-scheduling of DTN, storage, & networking• On-demand provisioning of end-to-end

network path with guaranteed QoS

• Time-constraint-based scheduler

• Three classes of data transfer

• mdtmFTP – A high-performance data transfer engine

• Admission control• Rate control

• Time-constraint-based scheduler

BigData Express vs. Globus OnlineFeatures BigData Express Globus Online

Architecture • Distributed service• Flexible to set up data transfer federations • Centralized service

Supported Protocols

• Extensible plugin framework to support multiple protocols: o mdtmFTPo GridFTP, XrootD, SRM (coming soon)

• GridFTP

SDN Support• Yes, Network as a service• Fast-provisioning end-to-end network paths with

guaranteed QoS• Not in production

Supported Data Transfers• Real-time data transfer• Deadline-bound data transfer• Best-effort data transfer

• Best-effort data transfer

Error Handling • Checksum• Retransmit

• Checksum• Retransmit

BigData Express SC18 DEMO

4/3

4/6

FNAL Border router

ESNET

40GE

vlan 3619

49

50

BDE4

Pica8 P3930

40GE

FNAL

SENSE Service AmoebaNet

bde-hp2.fnal.gov

yosemite.fnal.gov

BDE Web Protal

BDE Scheduler

47

48

65

66

Testbed Switch

100GE

Border Router

vlan 3619

DTN

UMD

SENSE Service

180-147.research.maxgigapop.net

BDE Web Protal

BDE Scheduler

STP

10.36.19.15

vlan 3619SENSE Path

180-149.research.maxgigapop.net

DTN

180-148.research.maxgigapop.net

BDE3

ProductionSwitch

Production Switch

40GE

HP Z91000

KISTI

AmoebaNet

BDE Web Protal

BDE Scheduler

DTN3DTN2

134.75.125.77

134.75.125.78 134.75.125.79

192.2.2.8 192.2.2.910GE 10GE

StarLight

vlan 1662STP

4/1

4/2

4/3

4/4

4/5

4/6

BDE1 BDE2

Pica8 P5101

40GE 40GE40GE

7374

192.2.2.1 192.2.2.2

77

KREONET

4/1

vlan 1662

STP

STP

StarLight

165.124.33.157

BDE Web Protal

BDE Scheduler

DTN 165.124.33.142

DTN

CENI162.244.229.52Ottawa

DTN

162.244.229.116Hanover

100GE 100GE

UVA

145.100.132.188

BDE Web Protal

BDE Scheduler

DTN 145.100.132.187

vlan

203

8

40GE

KSTAR

BDE Web Protal

BDE Scheduler

DTN3DTN2

203.230.120.130

203.230.120.127

100GE 100GE

10.36.19.11

203.230.120.128

203.230.120.227 203.230.120.228

10.250.38.107

10.250.38.53

BigData Express -- Deployment• Asia– KISTI, South Korea

• https://sc-demo-01.sdfarm.kr:2888/– KSTAR, South Korea

• Https://203.230.120.130:8080

• Europe– University of Amsterdam, Netherlands

• https://bde-01.lab.uvalight.net/

• North America– Fermilab

• https://Yosemite.fnal.gov:5000– StarLight, Northwestern University

• https://starlight.bigdataexpress.website/– UMD/MAX, University of Maryland, College Park

• https://180-147.research.maxgigapop.net/

Next Stage R&D Plan – Functional Perspective

DTN Agent

SDN Agent

Storage Agent mdtmFTP

PluginGridFTP

PluginXrootD Plugin

Data Transfer Launching Agent

DataBase

Configuration managementWebPortal Account

ManagementBigData Express

REST APIs

Error handlingScheduler

Rucio, Adios-based scientific applications, other scientific workflows

More information about BigData Express

http://bigdataexpress.fnal.gov

Contact: [email protected]

This document was prepared by BigData Express using the resources of the Fermi National Accelerator Laboratory (Fermilab), a U.S. Department of Energy, Office of Science, HEP User Facility. Fermilab is managed by Fermi Research Alliance, LLC (FRA), acting under Contract No. DE-AC02-07CH11359.