building and extensible storage ecosystem with wos
DESCRIPTION
In this presentation from the DDN User Meeting at SC13, Erik Deumans from SSERCA describes how the institution is sharing data with WOS from DDN. Watch the video presentation: http://insidehpc.com/2013/11/13/ddn-user-meeting-coming-sc13-nov-18/TRANSCRIPT
Building an Extensible Storage Ecosystem with
WOS Dr. Erik Deumens
SSERCA SC’13 DDN User Meeting
SSERCA
• Sunshine State Education & Research Computing Alliance o Members: FIU, FSU, UCF, UF, UM, USF o Affiliates: FAMU, FAU, FIT, UNF o Glue: Florida LambdaRail regional network provider
• Enable and enhance o collaborative research o for faculty and their teams in the state
• Making them more competitive o by providing advanced cyber infrastructure
Proposal Vision and Overview
The researchers and their collaborations are the central focus driving all design aspects of the proposed extensible storage environment.
Intellectual Merit
• Address the need of working researchers head-on
• Not centered on some hardware or software design
• Naturally extensible • Intrinsically sustainable • Inclusive of new approaches
Broader Impacts
• Open to all communities • Provide a framework to explore and broaden
a data centric research environment • Provide long-term roadmap to address
archival storage and transitioning data to it • Link campus and NSF XSEDE resources in
flexible way (eXtreme Science and Engineering Discovery Environment)
Project Vision • What challenges are addressed? • What will the proposed project build with
NSF funding? • How are XSEDE resources leveraged? • Features of the architecture
o Sustainable o Extensible o Flexible and adaptable
• What can others build leveraging this NSF funded project?
Challenges for Storage Providers
• Multiple sources, multiple sizes of data o Instrument data o Spreadsheets
• Multiple places to store data o Campus systems o Cloud systems (Google Drive, Dropbox, etc)
• Multiple actions and timescales in data life o Analysis - compute and data intensive o Distribution - web site accessibility
§ general and restricted o Life cycle management - initial,
maturing, archiving
Principles Create: • Effective environment for researchers….
• to work collaboratively • with complex workflows
• Involving large and small data
We propose to bring the essence and simplicity of cloud infrastructure to research:
Interactivity and instant gratification. Think of something, and start doing it!
Proposal: XDESE The eXtreme Digital Extensible Storage
Ecosystem - XDESE • Ecosystem is more complex than
environment • NSF funded and supported core
o Distributed by design o Multi-access, multi-protocol, multi-owner o Leverage XSEDE resources o XRAC allocation process adapted for data
§ defined quota for defined time span
Storage Architecture
XDESE - FIU
XDESE - UCF
Internet
Researcher
XSEDE – XRAC Authentication Authorization
Data Gateway
Data Gateway
Data Replication
XSEDE resources: Stampede
Kraken
Proposal: XDESE (2)
• Extensible with other funding o Geographically: campus and regional add-ons
§ plug and play racks o Organizationally: multiple communities
§ astrophysics, religion, archeology, ... o Functionally: add new protocols and formats o Public data: NSF funded o Restricted data: funded from other sources o Archival data and data repositories
XDESE Extension Architecture
• Basic concept o WOScore storage system at remote location o WOScore provides
§ data replication and motion § policy and demand based
o Add WOSaccess gateway to provide local § CIFS (personal) and NFS (organizational)
o Add WOS GS bridge gateway to provide local § GPFS on GridScaler or Lustre on ExaScaler
Extension Architecture
SSERCA XDESE Internet Campus
WOS GS Bridge HPC
WOS Access
campus net NFS/CIFS
XSEDE
WOS GS Bridge
Stampede
Leverage XSEDE Resources
• Users store and maintain data in XDESE o Long term project data o In support of collaboration
§ meaning easy access to many people § fine control over who can see and do what
o Not intended for temporary data o XSEDE storage resources are suitable for that
Leverage XSEDE Resources (2)
• Transfer data to XSEDE processors o Stampede, Kraken, etc o Bulk transfer o Complex data flow including data selection o XDESE will respond from multiple sites
§ improved performance, reliability, flexibility
Leverage XSEDE Resources (3)
• Option 1 data transfer to XSEDE scratch file system o During computation on XSEDE systems o Optimal performance is obtained
Leverage XSEDE Reseources (4)
• Option 2 XSEDE compute job controls data o Program can control data selection..
o from the XDESE storage o initiate transfer to and from of selected parts
o XDESE storage (DDN WOScore) will optimize data location among distributed XDESE storage nodes o use one of the extensions for further optimization
Partnerships
• Network partner FLR o Provide transport o Performance optimization with SDN and OpenFlow o Provide connection to Internet2 and XSEDEnet
• Storage system vendor DDN o Provides hardware, system software, and expertise o Builds the extension racks
• Software interfaces o Data transfer: Globus Online
ddn.com ©2012 DataDirect Networks. All Rights Reserved.
SSERCA XDESE Storage Solution
Florida State University
University of Florida
Florida Interna3onal University
University of South Florida
University of Central Florida
University of Miami
SSERCA Storage Cloud
SSERCA End-‐Users State Wide
ddn.com ©2012 DataDirect Networks. All Rights Reserved.
XDESE Building Block
At each SSERCA site Storage server
• 2.1 PB raw
ddn.com ©2012 DataDirect Networks. All Rights Reserved.
WOS6000 Cabinet
WOS6000 storage server • 12 drawers • 180 TB per drawer (2 nodes) • 2.1 PB raw capacity • Policy based data protecEon
• Ranges from 100% to 20% • ReplicaEon 100% overhead • RAID-‐like encoding 20% overhead
Resource Details
• Primary data interface to the web o WOScloud (dropbox-like, REST over SSL, Oauth) o WOSshare (Amazon S3-like, S3=simple storage service, REST
interface, BitTorrent)
• Generic server for Globus Online transfers o DDN customization needed for optimal speed o Initially simple NFS client via WOSaccess
• Interface to SSERCA campus HPCs o Grid/ExaScaler to stage to GPFS/Lustre o Later read via NFS
Hardware Architecture • At the 6 SSERCA sites
o Object Storage at 6 sites o Web server with data control panel
• Data transfer mechanisms over FLR o XSEDEnet and Internet2
• Extension racks at other locations o Object storage o Network infrastructure OpenFlow capable o Provide multiple data path options to local campus
resources like NFS and CIFS access o Optional: compute resources
with scratch storage
XDESE: Extending and Complementing XSEDE Storage
XDESE offers • Easy user interface • Composability of data flows and workflows • Multiple authentication domains • Ability to easily share data • Easy ingestion of instrument data User focused!
XDESE Storage and XSEDE Compute • Full integration with XSEDE compute
resources • Easy data transfer is part of data and
workflow • The extensibility includes the option to..
o install WOS GS bridge gateway o at XSEDE compute site(s) o for improved performance o works like Hierarchical File System
Authentication Interface
• To be successful, compatibility with multiple campus systems is also required o Need to design a simple system o Must allow users to manage multiple identities easily
§ XSEDE, XDESE, local campus, Google Drive, Dropbox, Amazon S3, etc
§ globus Online supports transfer across authentication domains
§ other tools like BitTorrent play a role too
Performance and Innovation for Science & Engineering Applications
• Performance, scalability, extensibility, sustainability
• Describe the general use case o Example from humanities and social sciences
• Select some strong science, engineering application(s)
• Innovation: explore archival strategies
Sustainable and Extensible
• Distributed from inception o Basic functionalities will be tested and supported
• Extensible simply by adding an XDESE rack o Like NSF funded GENI project and GENI racks o Multiple vendors can supply the racks o Learn once from XDESE, apply everywhere o Path for even the smallest institutions
§ leverage NSF funded resource and get started quickly § single faculty can start working with XDESE
Use case: Generic researcher
Alice works on a project that involves.. • data from an instrument and • more data generated by analysis and
modeling
Use Case: Setup
• Alice gets an XDESE allocation • She arranges data to flow to the storage
from the instrument o If the data flow demands it, she can set up a staging
rack (needs funds) with specs and support from XDESE
Use Case: Data and Workflow
• With the XDESE data & work control station o Looks like Galaxy https://main.g2.bx.psu.edu/
• She controls data and workflow o Orchestrates data movement o Get all data in the right place o Right place is where the software and compute capability is at
XSEDE resources or on campus
• Tools execute the movement o Globus Online, etc.
Use Case: Results
• The results can be viewed with tools from the location specified in the flow
• Collaborators can get accounts and access to her allocation
• Multiple ways to access the data are available
• Further visualization and other processing can easily be orchestrated
Use Case: Lifecycle Management
• She can prepare the data for long-term sharing
• Tools for creating metadata are provided o Rules for lifecycle management can be set up, e.g. iRODS
interface o Data can be annotated and recorded, e.g. Dataverse Network
• Transition data to compatible systems o Campus libraries o Discipline-specific societies
Innovation: Archival Strategies
• Proposed Architecture o XDESE provides an efficient path for exploration of
options o Institutions and libraries can buy an XDESE rack
§ dedicated to archival storage § data transfer in and out is supported § establish criteria for users to deposit data
• e.g. pass a data quality test of sufficient metadata
Thank You