bulk data transfer activities we regard data transfers as “first class citizens,” just like...
TRANSCRIPT
![Page 1: Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data](https://reader035.vdocuments.us/reader035/viewer/2022081807/5a4d1b4d7f8b9ab0599a616f/html5/thumbnails/1.jpg)
Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.
2 We used the experimental DiskRouter tool instead of Globus GridFTP for cache-to-cache transfers. We obtained an end-to-end throughput (from SRB to UniTree) of 20 files per hour (5.95 MB/sec).
Unitree not responding
Diskrouter reconfigured and restarted
1 We used the native file transfer mechanisms for each underlying system: SRB, Globus GridFTP, and UniTree for the transfers. We described each data transfer with a five stage pipeline, resulting in a 5x2611 node workflow (DAG) managed by DAGMan. We obtained an end-to-end throughput (from SRB to UniTree) of 11 files per hour (3.2 MB/sec).
SRB Server UniTree Server
SDSC Cache NCSA Cache
SRB get
Globus-url-copy
MSS put
Submit Site
A
B C
D
Move X from C to D
Move X from A to B
Remove X from A
Move X from B to C
Remove X from B
Move X from C to D
Move X from A to B
Remove X from A
Move X from B to C
Remove X from B
Move X from C to D
Move X from A to B
Remove X from A
Move X from B to C
Remove X from B
Move X from C to D
Move X from A to B
Remove X from A
Move X from B to C
Remove X from B
DAG File
SRB Server UniTree Server
NCSA Cache
SRB getMSS put
Submit Site
A
C
D
3 We skipped the SDSC cache, and performed direct SRB transfers from SRB server to NCSA cache.
We described each data transfer with a three stage pipeline, resulting in a 3x2611 node workflow (DAG). We obtained an end-to-end throughput (from SRB to UniTree) of 17 files per hour (5.00 MB/sec).
SRB server problem
DAG File
Move X from A to C
Move X from C to D
Remove X from C
Move X from A to C
Move X from C to D
Remove X from C
Move X from A to C
Move X from C to D
Remove X from C
Move X from A to C
Move X from C to D
Remove X from C
Unitree maintenancePDQ Expedition
Condor is a specialized workload management system for compute-intensive jobs. Condor provides a job queuing mechanism, scheduling policy, priority scheme, resource monitoring, and resource management. Condor chooses when and where to run jobs based upon a policy, carefully monitors their progress, and ultimately informs the user upon completion. http://www.cs.wisc.edu/condor
What a batch system means for computational jobs, Stork means the same for data placement activities (ie. transfer, replication, reservations, staging) in Grid: it schedules, runs, monitors data placement jobs and ensures that they complete.Stork can interact with heterogeneous middleware and end-storage systems easily and recover from failures successfully.Stork makes data placement a first class citizen of Grid computing.http://www.cs.wisc.edu/condor/stork
DAGManDAGman (Directed Acyclic Graph Manager) is a meta-scheduler for Condor. It manages dependencies between jobs at a higher level than the Condor Scheduler. DAGMan can now also interact with Stork.http://www.cs.wisc.edu/condor/dagman
DiskRouterMoves large amounts of data efficiently (on the order of terabytes)Uses disk as a buffer to aid in large data transfersPerforms application level routingIncreases network throughput by using multiple sockets and setting tcp buffer sizes explicitlyhttp://www.cs.wisc.edu/condor/diskrouter
GridFTP:High performance, secure, reliable data transfer protocol from Globushttp://www.globus.org/datagrid/gridftp.html
SRB: Storage Resource Broker Client-Server middleware that provides a uniform interface for connecting to heterogeneous data resourceshttp://www.npaci.edu/DICE/SRB
UniTree:NCSA’s High-speed, high-capacity mass storage systemhttp://www.ncsa.uiuc.edu/Divisions/CC/HPDM/unitree
SRB server maintenance
SDSC cache reboot & UW CS Network outage