rda data support section. topics 1.what is it? 2.who cares? 3.why does the rda need cisl? 4.what is...

9
RDA Data Support Section

Upload: lee-hamilton

Post on 14-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

RDA

Data Support Section

Page 2: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Topics

1. What is it?

2. Who cares?

3. Why does the RDA need CISL?

4. What is on the horizon?

Page 3: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

1. What is it?

Research Data Archive (RDA) 600+ datasets that are significant to many

NCAR and University scientists Archive work began over 40 years ago Branded as RDA in 2003 Generally, focused on atmospheric and

oceanic environmental measurements or analyzed products derived from them

Critical data for weather and climate studies

Page 4: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Who cares?

Growth in user access via the web, 2001 - 08• Promoted with more online data and better interfaces

Consistent user access from the MSS• Represents provision to NCAR computers

26-year record for filling one-off data requests• Decreasing as web increases in recent years

Over 6000 Unique Users in 2008

Page 5: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Rely heavily on CISL infrastructure and experts: Secure and reliable MSS/HPSS storage Disk to support web services Networks to bring data in and distribute out to users Computing platforms to prepare and serve the RDA DSS is Geo-science educated; need technical advise/support

Current metrics Storage:

Primary – 400+ TB, 4+M files All – 800+ TB (backup/working/etc) Disk: 40TB on SAN

Servers and laptops Servers (8) mix of SunOS & Linux About 12 laptops/desktops

Data movement and growth

Why does RDA need CISL?

Page 6: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Complete User CommunityAdvantages:

Fast access to online data – limited part of RDA

Access to all RDA content metadata

Access to RDA data processing servicesComplete User

CommunityDisadvantages:

Slow access to MSS data – delayed mode

Have to create a separate RDA account and log in

Data processing requests take a long time to finish

Slow download speeds for some users

HPC User CommunityAdvantages:

Access to full RDAFast computingNo login required

HPC User CommunityDisadvantages:

No access to online data

Use MSS as a file server

No direct access to RDA metadata

No direct access to RDA data processing services

Require separate account to access RDA web server

Page 7: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

HPC User CommunityImprovements:

Fast access to full RDAAccess to all RDA content

metadataAccess to RDA data

processing servicesSingle CISL account Single “first point of

contact”

Complete User CommunityImprovements:

Fast access to full RDAExpanded data processing

services availableSingle CISL account - no

separate RDA accountFaster download speeds –

grid-based tools, e.g. GRIDFTP

Single “first point of contact” for user support

Resolved all the disadvantagesNew Challenges: GPFS and HPSS don’t have

generic file use logging • Need for metrics &

services HPSS doesn’t have

sophisticated file access control• Some RDA assets have

limited access policies Abandon a functional RDA

registration system – retool a 20K+ user DB

Build command line tools to integrate RDA services into HPC environment

Of course, there will be more!

Big transition while maintaining normal RDA content growth and services

Page 8: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

What is on the horizon?

Transition off all SunOS to Linux Move SAN storage to GPFS GLADE Put more data online in GLADE (O 130TB)

Fast access path internal and external

Transition ALL RDA from MSS to HPSS Implement more on demand products

Data extraction and computing across TB datasets

Must be successful in GLADE, with HPSS, and using a scalable DA compute environment

Page 9: RDA Data Support Section. Topics 1.What is it? 2.Who cares? 3.Why does the RDA need CISL? 4.What is on the horizon?

Questions

1. What is it?

2. Who cares?

3. Why does the RDA need CISL?

4. What is on the horizon?