p2p data copy program bbcp - slac national accelerator ... · • test novel data transfer...
Post on 31-Aug-2020
0 Views
Preview:
TRANSCRIPT
CHEP 2001 1
P2P Data Copy Programbbcp
http://www.slac.stanford.edu/~abh/CHEP2001/p2p_bbcp.htm
Andrew Hanushevsky, Artem Trunov, Les CottrellStanford Linear Accelerator Center
September, 2001
Produced under contract DE-AC03-76SF00515 between Stanford University and the Department of Energy
CHEP 2001 2
Why bbcp?• Assess peer-to-peer technology
Fast deployment (minimal administration)• Test novel data transfer algorithms
C++ component design• Assess high level, low cost security
ssh-based authentication (control path)Single use passwords (data path)
• Test familiar syntax to expedite useSyntax same as for scp
CHEP 2001 3
Peer-to-Peer Technology• No identifiable client or server
Any node can act as data source or sink• Well established for file sharing
Napster, Aimster, Gnutella, etc.In many ways, rcp and scp similar to P2P
• Well suited for fast service deploymentIf you have the program you have the service
◆ Usually no need for administrators to get involved
CHEP 2001 4
Peer-to-Peer Architecture
bbcp
bbcp
Data
bbcp
Data
CHEP 2001 5
Peer-to-Peer & Firewalls
bbcp
bbcp
Data
bbcp
Data
Normal connection modeNormal connection modeReverse Reverse ((--z) z) connection modeconnection mode
sshconnection
sshconnection
Alternatively, in /etc/servicesbbcpfirstbbcpfirst & bbcplastbbcplast
define acceptable port rangebbcpbbcp auto-detects definition
CHEP 2001 6
Novel Algorithms• Data pipelining
Multiple streams “simultaneously” pushed◆ Automatically adapts to router traffic shaping◆ Maximum rate controlled via data “clocking”
• Coordinated buffersAll buffers same-sized from end to end
◆ Data queues never over- or under-filled
• Page aligned buffersAllows direct I/O on many filesystems
◆ Veritas
CHEP 2001 7
Algorithm Design
MultiplexorMultiplexor DemultiplexorDemultiplexorSerial data out
clockEqual Slices
Parallel Transfer
Serial data in
CHEP 2001 8
Security• Low cost, simple and effective security
Leveraging widely deployed infrastructure◆ If you can ssh there you can copy data
• Sensitive data is encryptedOne time passwords and control information
• Bulk data is not encryptedPrivacy sacrificed for speed
• Minimal sharing of informationSource and Sink do not reveal environment
CHEP 2001 9
Protocol Interactions
bbcp
bbcp
Data
bbcp
Data
ssh bbcp snkssh bbcp src
send source file list, pswdtarget host, & port number
send data target & pswdreceive port number
login with pswd
1
4
3
2
5
Plain TextPlain TextEncryptedEncrypted
6 Sink asks for file list
7
8
Source sends list with handles
Sink requests data using handle
CHEP 2001 10
Invocation• Familiar syntax
bbcp [ options ] source [ source [ … ] ] target• Sources and target can be anything
[[username@]hostname:]]path/dev/zero or /dev/null
• Easy but powerfulCan gather data from multiple hostsMany usability and performance options
•• http://www.http://www.slacslac..stanfordstanford..eduedu/~/~abhabh//bbcpbbcp
CHEP 2001 11
Usability Features• Serial Input & Output
Source & target can even be a pipe or tape drive• Auto-resume failed copies
Only un-copied data transferred after failure• Preserve group ownership and file times• Auto-create directories• Command line include files that list files to be copied• History log file• Periodic progress messages• Transfer rate limiting• MD5 Checksums• Compression• Multiple performance tuning options• And more …
http://www.http://www.slacslac..stanfordstanford..eduedu/~/~abhabh//bbcp
Optional
bbcp
CHEP 2001 12
The Inside Details
Link
Link
Link
Link Link
Link
Link
Link
Node Node
OutQZlib
MD5
MD5
MD5
MD5
MD5
MD5
Source
OptionalCompression
OptionalChecksum
OptionalChecksumPipeline
Data
InQ
Data
Zlib
Target
OptionalCompression
Agentsshssh sshssh
CHEP 2001 13
The Results
linear fit
log fit
bbcp can perform within 5% of iperf reported maximum performance
For all results visit: For all results visit: http://wwwhttp://www--iepmiepm..slacslac..stanfordstanford..eduedu/monitoring/bulk//monitoring/bulk/bbcpbbcp.html.html
CHEP 2001 14
Future Work• Reduce start-up overhead
Make overhead largely independent of streams• Real-time Adaptation
Transfer unit size and number of streams◆ Investigating network feedback monitors
• Real-time loggingIncorporate netlogger
CHEP 2001 15
Conclusion• bbcp algorithms are effective
Excellent performanceAllow data serialization
• A different approachEast vs West
◆ East: Transfer data in balance and harmony◆ West: Blast the data as fast as you can
top related