rda data support section. topics 1.what is it? 2.who cares? 3.why does the rda need cisl? 4.what is...
Post on 14-Dec-2015
218 Views
Preview:
TRANSCRIPT
RDA
Data Support Section
Topics
1. What is it?
2. Who cares?
3. Why does the RDA need CISL?
4. What is on the horizon?
1. What is it?
Research Data Archive (RDA) 600+ datasets that are significant to many
NCAR and University scientists Archive work began over 40 years ago Branded as RDA in 2003 Generally, focused on atmospheric and
oceanic environmental measurements or analyzed products derived from them
Critical data for weather and climate studies
Who cares?
Growth in user access via the web, 2001 - 08• Promoted with more online data and better interfaces
Consistent user access from the MSS• Represents provision to NCAR computers
26-year record for filling one-off data requests• Decreasing as web increases in recent years
Over 6000 Unique Users in 2008
Rely heavily on CISL infrastructure and experts: Secure and reliable MSS/HPSS storage Disk to support web services Networks to bring data in and distribute out to users Computing platforms to prepare and serve the RDA DSS is Geo-science educated; need technical advise/support
Current metrics Storage:
Primary – 400+ TB, 4+M files All – 800+ TB (backup/working/etc) Disk: 40TB on SAN
Servers and laptops Servers (8) mix of SunOS & Linux About 12 laptops/desktops
Data movement and growth
Why does RDA need CISL?
Complete User CommunityAdvantages:
Fast access to online data – limited part of RDA
Access to all RDA content metadata
Access to RDA data processing servicesComplete User
CommunityDisadvantages:
Slow access to MSS data – delayed mode
Have to create a separate RDA account and log in
Data processing requests take a long time to finish
Slow download speeds for some users
HPC User CommunityAdvantages:
Access to full RDAFast computingNo login required
HPC User CommunityDisadvantages:
No access to online data
Use MSS as a file server
No direct access to RDA metadata
No direct access to RDA data processing services
Require separate account to access RDA web server
HPC User CommunityImprovements:
Fast access to full RDAAccess to all RDA content
metadataAccess to RDA data
processing servicesSingle CISL account Single “first point of
contact”
Complete User CommunityImprovements:
Fast access to full RDAExpanded data processing
services availableSingle CISL account - no
separate RDA accountFaster download speeds –
grid-based tools, e.g. GRIDFTP
Single “first point of contact” for user support
Resolved all the disadvantagesNew Challenges: GPFS and HPSS don’t have
generic file use logging • Need for metrics &
services HPSS doesn’t have
sophisticated file access control• Some RDA assets have
limited access policies Abandon a functional RDA
registration system – retool a 20K+ user DB
Build command line tools to integrate RDA services into HPC environment
Of course, there will be more!
Big transition while maintaining normal RDA content growth and services
What is on the horizon?
Transition off all SunOS to Linux Move SAN storage to GPFS GLADE Put more data online in GLADE (O 130TB)
Fast access path internal and external
Transition ALL RDA from MSS to HPSS Implement more on demand products
Data extraction and computing across TB datasets
Must be successful in GLADE, with HPSS, and using a scalable DA compute environment
Questions
1. What is it?
2. Who cares?
3. Why does the RDA need CISL?
4. What is on the horizon?
top related