p2p data copy program bbcp - slac national accelerator ... · • test novel data transfer...

15
CHEP 2001 1 P2P Data Copy Program bbcp http://www.slac.stanford.edu/~abh/CHEP2001/p2p_bbcp.htm Andrew Hanushevsky, Artem Trunov, Les Cottrell Stanford Linear Accelerator Center September, 2001 Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy

Upload: others

Post on 31-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 1

P2P Data Copy Programbbcp

http://www.slac.stanford.edu/~abh/CHEP2001/p2p_bbcp.htm

Andrew Hanushevsky, Artem Trunov, Les CottrellStanford Linear Accelerator Center

September, 2001

Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy

Page 2: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 2

Why bbcp?• Assess peer-to-peer technology

Fast deployment (minimal administration)• Test novel data transfer algorithms

C++ component design• Assess high level, low cost security

ssh-based authentication (control path)Single use passwords (data path)

• Test familiar syntax to expedite useSyntax same as for scp

Page 3: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 3

Peer-to-Peer Technology• No identifiable client or server

Any node can act as data source or sink• Well established for file sharing

Napster, Aimster, Gnutella, etc.In many ways, rcp and scp similar to P2P

• Well suited for fast service deploymentIf you have the program you have the service

◆ Usually no need for administrators to get involved

Page 4: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 4

Peer-to-Peer Architecture

bbcp

bbcp

Data

bbcp

Data

Page 5: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 5

Peer-to-Peer & Firewalls

bbcp

bbcp

Data

bbcp

Data

Normal connection modeNormal connection modeReverse Reverse ((--z) z) connection modeconnection mode

sshconnection

sshconnection

Alternatively, in /etc/servicesbbcpfirstbbcpfirst & bbcplastbbcplast

define acceptable port rangebbcpbbcp auto-detects definition

Page 6: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 6

Novel Algorithms• Data pipelining

Multiple streams “simultaneously” pushed◆ Automatically adapts to router traffic shaping◆ Maximum rate controlled via data “clocking”

• Coordinated buffersAll buffers same-sized from end to end

◆ Data queues never over- or under-filled

• Page aligned buffersAllows direct I/O on many filesystems

◆ Veritas

Page 7: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 7

Algorithm Design

MultiplexorMultiplexor DemultiplexorDemultiplexorSerial data out

clockEqual Slices

Parallel Transfer

Serial data in

Page 8: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 8

Security• Low cost, simple and effective security

Leveraging widely deployed infrastructure◆ If you can ssh there you can copy data

• Sensitive data is encryptedOne time passwords and control information

• Bulk data is not encryptedPrivacy sacrificed for speed

• Minimal sharing of informationSource and Sink do not reveal environment

Page 9: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 9

Protocol Interactions

bbcp

bbcp

Data

bbcp

Data

ssh bbcp snkssh bbcp src

send source file list, pswdtarget host, & port number

send data target & pswdreceive port number

login with pswd

1

4

3

2

5

Plain TextPlain TextEncryptedEncrypted

6 Sink asks for file list

7

8

Source sends list with handles

Sink requests data using handle

Page 10: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 10

Invocation• Familiar syntax

bbcp [ options ] source [ source [ … ] ] target• Sources and target can be anything

[[username@]hostname:]]path/dev/zero or /dev/null

• Easy but powerfulCan gather data from multiple hostsMany usability and performance options

•• http://www.http://www.slacslac..stanfordstanford..eduedu/~/~abhabh//bbcpbbcp

Page 11: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 11

Usability Features• Serial Input & Output

Source & target can even be a pipe or tape drive• Auto-resume failed copies

Only un-copied data transferred after failure• Preserve group ownership and file times• Auto-create directories• Command line include files that list files to be copied• History log file• Periodic progress messages• Transfer rate limiting• MD5 Checksums• Compression• Multiple performance tuning options• And more …

http://www.http://www.slacslac..stanfordstanford..eduedu/~/~abhabh//bbcp

Optional

bbcp

Page 12: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 12

The Inside Details

Link

Link

Link

Link Link

Link

Link

Link

Node Node

OutQZlib

MD5

MD5

MD5

MD5

MD5

MD5

Source

OptionalCompression

OptionalChecksum

OptionalChecksumPipeline

Data

InQ

Data

Zlib

Target

OptionalCompression

Agentsshssh sshssh

Page 13: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 13

The Results

linear fit

log fit

bbcp can perform within 5% of iperf reported maximum performance

For all results visit: For all results visit: http://wwwhttp://www--iepmiepm..slacslac..stanfordstanford..eduedu/monitoring/bulk//monitoring/bulk/bbcpbbcp.html.html

Page 14: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 14

Future Work• Reduce start-up overhead

Make overhead largely independent of streams• Real-time Adaptation

Transfer unit size and number of streams◆ Investigating network feedback monitors

• Real-time loggingIncorporate netlogger

Page 15: P2P Data Copy Program bbcp - SLAC National Accelerator ... · • Test novel data transfer algorithms C++ component design • Assess high level, low cost security ssh-based authentication

CHEP 2001 15

Conclusion• bbcp algorithms are effective

Excellent performanceAllow data serialization

• A different approachEast vs West

◆ East: Transfer data in balance and harmony◆ West: Blast the data as fast as you can