data reservoir: utilization of multi-gigabit backbone network for data-intensive research

60
AWOCA2003 Data Reservoir: Data Reservoir: Utilization of Multi-Gigabit Utilization of Multi-Gigabit Backbone Network for Backbone Network for Data- Data- Intensive Research Intensive Research Mary Inaba, Makoto Nakamura, Kei Hiraki University of Tokyo AWOCA 2003

Upload: beau-shepherd

Post on 30-Dec-2015

30 views

Category:

Documents


0 download

DESCRIPTION

Data Reservoir: Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research. Mary Inaba , Makoto Nakamura, Kei Hiraki University of Tokyo. AWOCA 2003. Today’s Topic. New infrastructure for data intensive scientific research Problems of using the Internet. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003Data Reservoir: Data Reservoir: Utilization of Multi-Gigabit Backbone Utilization of Multi-Gigabit Backbone

Network forNetwork for    Data-Intensive ResearchData-Intensive Research

Mary Inaba, Makoto Nakamura, Kei Hiraki

University of Tokyo

AWOCA 2003

Page 2: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Today’s Topic

• New infrastructure for data intensive scientific research

• Problems of using the Internet

Page 3: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

One day, I was surprised

One professor (Dept. of Astronomy) said Network is for E-mail and paper exchange. FEDEX is for REAL Data exchange. (They use DLT tapes, and airplanes)

Page 4: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

    Huge Data Producers

AKEBONO Sattelite

Radio Telescope in NOBEYAMA

SUBARU telescope

KAMIOKANDE (Novel Prize)

High Energy Accelerator

A lot of Data suggest a lot of scientific truth, by computation.Now, we can compute. Data Intensive Research

Page 5: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Huge Data Transfer (inquiry to Profs.)

Current StateData Transfer by DLT, EVERY WEEK.Expected Data Size in a few years

10GB/day for Satellite Data50GB/day High Energy Accelerator50PB tape archive for Earth Simulation

Observatories are shared by many researchers,hence, NEED to bring data to Lab., somehow.

Does Network help?

Page 6: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Super-SINET backbone

TohokuUniv

KEK,Tsukuba Univ Univ. Tokyo ,

NAO, NII,Titech,WasedaISAS

Kyoto Univ,Doshisha

Univ

NagoyaUniv,

OkazakiLabs

OsakaUniv

OpticalCross-connect

HokkaidoUniv

KyushuUniv

Start 2002 Jan

Network for Universities and Institute

Combination of 10Gbps ordinary Line several 1Gbps Project Lines (physics, genome, Grid, etc.)

Page 7: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Currently

It is not so easy to transfer HUGE data by fully utilizing bandwidth for long distance,

Because,

TCP/IP is popularly used,

for TCP/IP latency is the problem.

Disk I/O speed (50MB/sec)

       …

Page 8: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Recall HISOTRYInfrastructure for Scientific Research Projects

• Utilization of computing systems at the time

– From the birth of a electronic computer

• Numerical computation ⇒ Tables 、 Equations ①• Supercomputing(vector) ⇒ Simulation ② ③• Servers ⇒ Database 、 Data-mining 、 Genome ④• Internet ⇒ Information Exchange 、

Documentation⑤

Scientific researchers always utilize top-end systems

① ② ③ ④ ⑤EDSAC CDC-6600 CRAY-1 SUN Fire15000 10G Switch

Page 9: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Frontier of Information Processing

New transition period -- Balance of computing systems

– Very high-speed network– Large scale disk storage New

infrastructure for– Cluster computers Data Intensive ResearchCPU

GFLOPS

MemoryGB

NetworkInterface

Gbps

Remote DisksLocal Disks

Page 10: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Research Projects with Data Reservoir

Name Project Domestic Connection Oversea Connection Current amount of traffic

HideyukiSakai

High Energy PolarimeterSMART

RIKENRCNP

CERNBrook Haven

CERN LEP 70 DAT/monthCERN LHC 100 MB/secBrook Haven 100 MbpsRCNP Accelerator 50 GB/day

YoshiakiSobue

Radio telescop (VLBI)Nobeyama RadioObservatory

Max PlankObservatory

VLBI data 200 GB

SadanoriOkamura

Slone Digital Sky SurveyNational AstronomicalObservatory

Fermi Lab.Survey Data : 10 TBData Exchange betweenFermi Lab

KazuoMakishima

Satellite observation ofearly universe

ISASHiroshima Univ.Saitama Univ.

NASAEuropean SpaceAgency

Current Satellite 1GB/day

ToshioYamagata

Simulation of Global ChangeFrontier Research Systemfor Global Change

N/A1Simulation 10 TBCurrently, data archivesystem with 50 PBytes

TomioKobayashi

:JC ATLAS ExperimentKEKKyoto Univ.Univ. of Tsukuba

CERN CERN LHC 100 MB/sec

TakashiOnaka

Infra-red observationSatellite

IRISNagoya Univ

ESA receiving site(Sweeden)

Downlink …... 200MBData exchange within aminutes

Jun'ichiroMakino

Astronomical Simulationby GRAPE-6

National AstronomicalObservatory

Advanced Study,Princeton Univ.Musium of NaturalHistory

Maxmum Throughput:100MB/s1Simulation: 10TB

HiroakiAihara

KEK b-factoryKEKNaboya Univ.

Princeton Univ.Raw Data:600 GB/dayData exchange : 10GB/day

SadanoriOkamura

SUBARU telescopeNational AstronomicalObservatory

Hawaii Observatory100 GB/day。Peak bandwidth 0.5 GB /sec(4Gbps)

Page 11: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

DataReservo

ir

DataReservo

ir

High latencyVery high bandwidthNetwork

Distribute   Shared   File

(DSM like architecture)

Cache Disks

Cache Disks

Local file accesses

Physically addressedParallel andMulti-stream transfer

Local file accesses

Basic Architecture

Page 12: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Very   High-speedNetwork

DataReservoir

Data analysis at University of Tokyo

Belle Experiments

CERN

X-ray astronomy Satellite ASUKA

SUBARUTelescope

NobeyamaRadio

Observatory( VLBI)

Nuclear experiments

DataReservoir

DataReservoirLocal

Accesses

DistributedShared

files

Data intensive scientific computation through SUPER-SINET

Digital Sky Survey

Page 13: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Design Policy• Modification of disk handler under VFS

layer• Direct access to raw device for efficient

data transfer• Multi-level striping for scalability

• Use of iSCSI protocol• Local file accesses through LAN• Global disk transfer through WAN• Single file image• File system transparency

File System

SCSI driveriSCSI driver

iSCSI daemon

SCSI driver(mid)SCSI Driver(low)

sd sg st

- sg -

Application

md (RAID) driverData Server

Disks

Page 14: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Disk Server

Scientific Detectors

User Programs

IP Switch

File Server File Server

Disk Server

IP Switch

File Server File Server

Disk Server Disk Server

1st level striping

2nd level striping

Disk access by iSCSI

File accesses on Data Reservoir

Page 15: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Disk Server

Scientific Detectors

User Programs

IP Switch

File Server File Server

Disk Server

IP Switch

File Server File Server

Disk Server Disk Server

1st level striping

2nd level striping

Disk access by iSCSI

File accesses on Data Reservoir

User’s View

Page 16: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Scientific Detectors

User Programs

File Server File Server File Server File Server

IP Switch IP Switch

Disk Server Disk ServerDisk Server Disk Server

iSCSI Bulk Transfer

Global Network

Global Data Transfer

Page 17: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IP

TCP/UDP

NFS

System Call

EXT2

Linux RAID

iSCSI driver

sd Driver sg Driver

Application

Network

Implementation(File Server)

Page 18: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IP

TCP

iSCSI daemon

System Call

iSCSI Driver

sg Driver

Application Layer

dr Driver

SCSI Driver

Data Stripe

Network

Disk

Implementation(Disk Server)

Disk

Disk

Page 19: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Performance evaluation of Data Reservoir

1. Local experiment 1 Gbps model (basic performance)

2. 40 km experiments 1 Gbps model 、 U. of ⇔ ISAS

3. 1600 km experiments 1 Gbps model

• 26ms latency (Tokyo ⇔ Kyoto⇔Osaka⇔Sendai⇔Tokyo)

• High-quality network (SUPER-Sinet Grid project lines)

4. US-Japan experiments

1. 1Gbps model

2. U. of Tokyo ⇔ Fujitsu Lab. America (Maryland, USA)

3. U. of Tokyo ⇔ Scinet (Maryland, USA)

5. 10 Gbps experiments compare four different switch configuration

1. Extreme Summit 7i, Trunked 8 Gigabit Ethernets

2. RiverStone RS16000 Trunked 8 and 121000BASE-SX

3. Foundry BigIron 10GBASE-LR modules

4. Extreme BlackDiamond Trunked 8 1000BASE-SX

5. Foundry BigIron Trunked 2 10BASE-LR

• the bottleneck (8Gbps) , Trunking 8 Gigabit Ethernets

Page 20: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Performance Comparison to ftp(40km)

• ftp ---- Optimal performance (minimum disk head movements)

• iSCSI – Queued operation• iSCSI transfer is 55% faster than ftp on single TCP

stream

FTP 1GB file transfer (DISK to DISK)

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

rate

(MB/

s)

1GB file default1GB file tune

iSCSI transfer (DISK to DISK)

0.00

5.00

10.00

15.00

20.00

25.00

30.00

35.00

40.00

45.00

Queue 1 Queue 2 Queue 4 Queue 8 Queue 16

rate

(MB/

s)

AVERAGEMAXMIN

Page 21: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

1600 km experiment System

• 870 Mbps file transfer BW

Univ. of Tokyo (CISCO 6509)

↓ 1G Ether (Super-SINET)

   Kyoto Univ (Extreme Black

Diamond )

↓ 1G Ether (Super-SINET)

   Osaka Univ. (CISCO 3508)

↓ 1G Ether (Super-SINET)

   Tohoku Univ. (Jumper fiber)

↓ 1G Ether (Super-SINET)

   Univ. of Tokyo (Extreme Summit 7i)

Page 22: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IBM

IBM

IBM

IBM

Univ. ofTokyo

TohokuUniv.

(sendai)

KyotoUniv.

OsakaUniv.

550mile

300mile

250mile

IBM

IBM

1000mile line GbE

Network for 1600km experiments

・  Grid project networks of SUPER-Sinet ・ One-way latency 26ms

Page 23: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

870828

812737

700 707

499478 493

0

100

200

300

400

500

600

700

800

900

1000

1*4*8 1*4*(2+2)1*4*4 1*2*8 1*2*(2+2)1*2*4 1*1*8 1*1*(2+2)1*1*4

Tra

nsfe

r R

ate

(Mbp

s)Transfer speed on 1600 km experiment

Maximum bandwidth by SmartBits  =  970 MbpsOverheads of headers ~ 5 %

System configuration (file-servers * disk servers * disks/disk server)

Page 24: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

10Gbps experiment

• Local connection of two 10Gbps models• 10GBASE-LR or

   8 to 12 1000BASE-SX• 24 disk servers + 6 file servers

– Dell 1650, 1.26GHz PentiumIII× 2     1GB memory 、 ServerSet III

HE-SL– NetGear GE NIC– Extreme Summit 7i (Trunking)– Extreme Black Diamond 6808– Foundry Big Iron (10GBASE-LR)– RiverStone RS-16000

11.7 Gbps transfer BW

Page 25: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Performance on10Gbps model• 300GBytes file transfer (iSCSI streams)• 5% header loss due to TCP/IP, iSCSI • 7% performance loss due to trunking • Uneven use of disk servers

2416840.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

Number of disk servers

Th

rou

gh

pu

t (G

bp

s)

100GB file transfer in 2 minutes

Page 26: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

US-Japan Experiments at SC2002 Bandwidth Challenge

92% Usage of Bandwidth using TCP/IP

Page 27: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Brief Explanation of TCP/IP

Page 28: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

User’s View

TCP

TCP

Internet

abcde

Byte stream abcde

TCP is PIPE

Output Same Data In the same order

Input Data

Page 29: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

TCP’s View

TCP

TCP

Internet

abcde

Byte stream abcdeCheck all data has come?Re-order when arrival order is wrongAsk “re-send” when data misses.Speed Control

Page 30: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

TCP’s feature

• Keep data until “Acknowledgement” arrives.

• Speed Control (Congestion Control) without knowing the state of routers.

Use Buffer (Window), and when get ACK from receiver new data is moved to buffer

Make Buffer (Window) small, when congestion is guessed to be occurred.

Page 31: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Window Size and Throughput

Roughly speaking

RTT: Round Trip Time

Hence, Longer RTT needs

Larger Window Size for same throughput.

Throughput = Window Size / RTT

Page 32: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Congestion ControlAIMD Additive Increase

Multiplicative Decrease

AIMD phaseDoubled for every ACK(start phase)

time

Window Size

Gradually accelerate once after congestion occurs, Rapidly slow-down, when congestion is expected.

Page 33: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Another Problem

Denote “network with long latency and wide bandwidth” as LFN(Long Fat Pipe Network)

LFN needs large window size,

But, since increment is triggered by ACK.

speed of increment is also SLOW.

(LFN suffers, AIMD)

Page 34: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Network Environment

The Bottle Neck (about 600Mbps) Note that 600Mbps < 1Gbps

Page 35: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

92% using TCP/IP is good,but, still we have a PROBLEM

Several Streams work after other streams finish

Page 36: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Fastest and slowest streamin the worst case

The slowest 3 times slower Than the fastest.Even other streams finishThroughput did not recover

Sequence Number

Time

Page 37: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Hand-made Tools

• DR Gigabit Network Analyzer– Need accurate Time Stamp with 100ns

accuracy– Dump full packets

• Comet Delay and Drop Pseudo Long Fat Pipe Network(LFN)

Gigabit Ether a packet is sent every 12 μsec

Page 38: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Programmable NIC(Network Interface Card)

Page 39: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

DR Giga Analyzer

Page 40: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Comet Delay and Drop

Page 41: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Unstable Throughput

• We examined Long Distance Data Transfer, throughput is

8Mbps to 120Mbps.

(When we use Gigabit Ethernet Interface)

Page 42: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Fast Ethernet is very stable

Page 43: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Analysis of single stream.Number of packetswith 200msec RTT

Page 44: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Packet Distribution

Number ofPacketsPer msec

Time(sec)

Page 45: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Packet Distribution of Fast Ethernet

Number ofPacketsPer msec

Time(sec)

Page 46: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Gigabit Ethernet interfacev.s. Fast Ethernet interface

Even, same “20Mbps”,Behavior of 20Mbps of Gigabit Ethernet Interface and20Mbps of Fast Ethernet Interface Is completely different.

Gigabit Ethernet is very bursty.Router might not like this.

Page 47: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

2 problems

• Once packets are sent burstly, router sometimes cannot bear.

(Unlucky stream slow, lucky stream fast)

Especially when bottleneck is under Gigabit.

• More than 80% of time, the sender do not send anything.

Page 48: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Problem of implementation

1Gbps speed, suppose ether packet 1500B,

1 packet should be sent every 12 μsec.

On the other hand, UNIX Kernel Timer is

10msec.

Page 49: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IPG(Inter Packet GAP)

• Transmitter is always on,

• When no packet sent, idle state.

• Each Frame at least 12bytes IPG (IEEE 802.3) sender

• Tunable by e1000 driver, (8bytes – 1023 bytes)

Page 50: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IPG tuning for short distance

IPG 8bytes IPG 1023 bytesFast Ethernet 94.1Mbps 56.7MbpsGigabit Ethernet 941Mbps 567Mbps

Suppose Ether Frame is 1500bytes,1508: 2523 is approximately  567: 94 1These work theoretically.(Gigabit ether has been perfectly tuned already for short distance data transfer)

Page 51: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

IPG tuning for Long Distance

Page 52: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

MAX,MIN,Average, Standard Deviation of Throughput

FastEther

Page 53: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Some patterns of throughput change

Page 54: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Detail (Slow Start Phase)

Page 55: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Packet Distribution

Page 56: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

But

• They are like an ad-hoc patch.

What is the essential Problem?

Page 57: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

One big problem• Good MODEL does not exist.

Old type MODEL does not work well.

such as

queueing theory M/M/1

packt distribution Poisson Distribution

Experiment says it is not good.

Currently, simulation and using real network is the only way to check.

(No Theoretical background)

Page 58: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

What is the difference of telephone network?

AUTONOMY

Page 59: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

For Telephone network,

• Telephone Company knows, manages and controls whole network.

• End-node doesn’t have to do heavy job, such as congestion control.

Page 60: Data Reservoir:  Utilization of Multi-Gigabit Backbone Network for Data-Intensive Research

AWOCA2003

Current Trend(?)

• Analyze NETWORK using Game Theory.

• Nash Equilibrium