connecting arbitrary data sources to the grid shunde zhang australian research collaboration service...
TRANSCRIPT
![Page 1: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/1.jpg)
Connecting arbitrary data sources to the grid
Shunde ZhangAustralian Research Collaboration Service
(ARCS)
eResearch SA
School of Computer Science, University of Adelaide
![Page 2: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/2.jpg)
Background
Australian Research Collaboration Service
A successor of APACServices
– HPC– Data– Collaboration tools: AccessGrid, EVO,
Plone, drupal, Sakai
![Page 3: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/3.jpg)
ARCS Data Fabric
![Page 4: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/4.jpg)
ARCS Data Fabric (cont.)
A national serviceProvided to all Australian
researchersBased on iRODS
![Page 5: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/5.jpg)
The Problem
Interoperability with “The Grid”– “The Grid”: Globus, gLite, condor, etc.– Data sources
• GridFTP-compatible: dCache• Non GridFTP-compatible: iRODS, SRB
Possible solutions– “Manual” copy (or do it in PBS script)– Copy queue
![Page 6: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/6.jpg)
The Problem (cont.)
Movement of massive data– Both ends use same software (talks
same protocol)– Different systems are used (talks
different protocol)– Efficiency
Possible solutions– Transfer via an intermediate point
![Page 7: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/7.jpg)
A solution - old fashioned
AWS Import/Export for Amazon S3– Ship the hard-disks by courier
company
![Page 8: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/8.jpg)
Our Solution - GridFTP
De facto standard– Compatible with the Grid, and many grid
clientsEfficiency
– Parallel transfer– Data channel reuse– Large file transfer - in small blocks
Compatible with many file transfer services– Monitoring– Scheduling
![Page 9: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/9.jpg)
An overview of GridFTP protocolBased on FTP with extensionsThird-party transfer
– Intermediate point not neededSecurity - GSIExtended block mode
– Parallel transfer– Striped transfer– Partial transfer
Reliable and restartableTCP and UDP
![Page 10: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/10.jpg)
The Architecture
GridFTP interface
Generic File System Framework
Data Source Plugin
Data Source
![Page 11: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/11.jpg)
Generic File System Framework
FileSystem
FileSystemConnection
FileObject
RandomAccessFileObject
creates
creates
creates
![Page 12: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/12.jpg)
FileSystem interface
public String getSeparator();
public void init() throws IOException;
public FileSystemConnection
createFileSystemConnection(GSSCredential credential) throws
FtpConfigException, IOException;
public void exit();
![Page 13: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/13.jpg)
FileSystemConnection interface
public FileObject getFileObject(String path);
public String getHomeDir();
public String getUser();
public void close() throws IOException;
public boolean isConnected();
public long getFreeSpace(String path);
![Page 14: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/14.jpg)
FileObject interfacepublic String getName();public String getPath();public boolean exists();public boolean isFile();public boolean isDirectory();public int getPermission();public String getCanonicalPath() throws IOException;public FileObject[] listFiles();public long length();public long lastModified();public RandomAccessFileObject getRandomAccessFileObjec(String type) throws IOException;public boolean delete();public FileObject getParent();public boolean mkdir();public boolean renameTo(FileObject file);public boolean setLastModified(long t);
![Page 15: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/15.jpg)
RandomAccessFileObject interfacepublic void seek(long offset) throws IOException;public int read() throws IOException;public int read(byte[] b) throws IOException;public int read(byte[] b, int off, int len) throws
IOException;public void close() throws IOException;public String readLine() throws IOException;public void write(int b) throws IOException;public void write(byte[] b) throws IOException;public void write(byte[] b, int off, int len) throws
IOException;public long length() throws IOException;
![Page 16: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/16.jpg)
The Implementation - Griffin
GridFTP interface
Generic file system framework
GridFTP client
Grid job submission system
Data transfer service
Adaptor for iRODS
Adaptor for local file system
Other adaptors
iRODS Local File System Other data source
Griffin
![Page 17: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/17.jpg)
Features
GridFTP protocol version 1Java-based
– Spring framework– OS-independent
Lightweight, stand-alone, self-contained– No need to install Globus Toolkit
Two plugins included– iRODS plugin– Local file system plugin
Open source (Apache 2 & GPL)
![Page 18: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/18.jpg)
Parallel transfer with Griffin
Client GriffinData Source
WAN LAN/localhost
![Page 19: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/19.jpg)
Authentication
GSI– iRODS plugin
User mapping – local file system plugin– XML file
• Maps GSI authentication (certificate DN) to internal user management system
![Page 20: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/20.jpg)
Use case
Integration of the Grid and Data Fabric– iRODS plugin for Data Fabric– Third-party transfer to cluster (Globus
GridFTP)
Tested with– Globus.org– Globus-url-copy (5.0 and 4.x)– Globus GridFTP GUI
![Page 21: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/21.jpg)
Performance Evaluation
Server: Two quad-core Xeon 3.16GHz CPU, 16GB memory
Client: IBM xSeries 346 with two hyper-threaded Intel Xeon 3.20GHz CPUs, 4GB memory
Network: 1Gbps LANWAN: two 10Gbps linksTransfer: 256MB, 512MB, 1GB, 2GB,
4GB, 8GB, 16GB– iCommands– Globus-url-copy
![Page 22: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/22.jpg)
Evaluation Set up - Griffin vs iCommands
Client
iRODS
Local File System
Griffin
Jargon Adaptor
globus-url-copy iCommands
![Page 23: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/23.jpg)
Evaluation Result Chart - Griffin vs iCommands
![Page 24: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/24.jpg)
Evaluation Set up -Griffin vs Globus GridFTP
Client
Globus GridFTP server
Local File System
Griffin
Local FS Adaptor
globus-url-copy
![Page 25: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/25.jpg)
Evaluation Result Chart - Griffin vs Globus GridFTP
![Page 26: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/26.jpg)
Related work
Client library– SAGA/jSAGA– Commons-vfs
Data transfer service– Stork– PAFTP
Globus– XIO– DSI
![Page 27: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/27.jpg)
Griffin vs. Globus GridFTP
Griffin Globus GridFTP
Java C
OS-independent *nix
Simple, standalone complex
![Page 28: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/28.jpg)
Conclusion
A generic solution to connect arbitrary data sources to the grid– Data in/out of the grid– Data transfer between different data
sources
Java-based implementation– Standalone, lightweight– Plugable– Not depend on Globus
![Page 29: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/29.jpg)
Future work
Currently working on a plugin for MongoDB
Java NIOUDPStriped transfer
![Page 30: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/30.jpg)
MongoDB plugin
MongoDB– NOSQL database– Stores JSON-style documents– GridFS component
• Stores files
Plugin for griffin– Read/write files via GridFS
![Page 31: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/31.jpg)
Acknowledgements
ARCS funded
![Page 32: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/32.jpg)
Current Status
ARCS production serviceUsed to transfer data in/out of
ARCS Data FabricWebsite
– https://projects.arcs.org.au/trac/griffin
![Page 33: Connecting arbitrary data sources to the grid Shunde Zhang Australian Research Collaboration Service (ARCS) eResearch SA School of Computer Science, University](https://reader037.vdocuments.us/reader037/viewer/2022102716/5514d875550346935c8b520c/html5/thumbnails/33.jpg)
Thank you!
Questions/Comments?