eosdis remote data store -...

31
1 Building Cost Effective Remote Data Building Cost Effective Remote Data Storage Capabilities for NASA’s EOSDIS Storage Capabilities for NASA’s EOSDIS Stephen Marley Stephen Marley Mike Moore & Bruce Clark Mike Moore & Bruce Clark 8 8 - - Apr Apr - - 2003 2003 IEEE/NASA MSST2003 IEEE/NASA MSST2003

Upload: others

Post on 02-Jun-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

1

Building Cost Effective Remote Data Building Cost Effective Remote Data Storage Capabilities for NASA’s EOSDISStorage Capabilities for NASA’s EOSDIS

Stephen MarleyStephen MarleyMike Moore & Bruce ClarkMike Moore & Bruce Clark

88--AprApr--20032003IEEE/NASA MSST2003IEEE/NASA MSST2003

2

AgendaAgenda

Overview of the EOS MissionOverview of the EOS MissionRemote Data Store ProgramRemote Data Store ProgramModeling Petabyte Data StoresModeling Petabyte Data Stores

3

EOSDIS MissionEOSDIS Mission

4

Mission CapacitiesMission Capacities

01000020000300004000050000600007000080000

19971998 19992000 20012002 20032004 20052006 20072008

MFL

OPS

& G

ranu

le C

ount

010002000300040005000600070008000900010000

GB

/Day

& T

B

DAAC Processing Capacity (MFLOPS) Cumulative Granules ('000)DAACDistribution (GB/Day) Cumulative Archive (TB)

5

AgendaAgenda

Overview of the EOS MissionOverview of the EOS MissionRemote Data Store ProgramRemote Data Store ProgramModeling Petabyte Data StoresModeling Petabyte Data Stores

6

Remote Data StoreRemote Data Store

Remote Data Store (RDS) GoalRemote Data Store (RDS) Goal“To provide automated, secure, seamless and efficient remote on-line redundant storage, recovery and access of operational EOS mission data products.”

NASA GoalNASA Goal— Establish a partnership with industry to develop and evolve distributed

scientific data services for a petabyte scale data set Relationship to Existing EOS Data ServicesRelationship to Existing EOS Data Services— EOSDIS DAACs remain the primary data production; archive and

distribution service provider for the EOS Missions— RDS provides additional support to the EOSDIS mission

1. RDS will provide an additional capability for the remote storage of part of a national data resource

2. RDS will begin to research emerging data storage & access technologies for large distributed scientific data sets

3. Provide additional data service capacity for data managed by theEOSDIS DAACs

7

RDS ConceptRDS Concept

EMSNetEMSNet

DPL

DPLDPL

DPL

StorageDAAC Traffic

RDS TrafficECS – EOSDIS Core SystemDPL – Data Pool extension to ECSRDS – Remote Data Store

NSIDC

LP GES

LaRC

RDSFairmont, WV

ECSECS

ECSECS Public IP WAN

Public IP WAN

8

RDS Expectations RDS Expectations

AutomatedAutomated –– Operations are a key cost driver for longOperations are a key cost driver for long--term data archives. To the term data archives. To the maximum extent practicable, the RDS needs to leverage storage mamaximum extent practicable, the RDS needs to leverage storage management tools nagement tools to minimize operational maintenance activitiesto minimize operational maintenance activities

Secure Secure –– This data is considered a National Data Asset, and as such needThis data is considered a National Data Asset, and as such needs to be s to be secure from both physical and electronic attack whether intentiosecure from both physical and electronic attack whether intentional or otherwisenal or otherwise

Seamless Seamless –– This means storage location independent accessThis means storage location independent access

EfficientEfficient –– Not all data will need to be accessible with same level of servNot all data will need to be accessible with same level of service. ice. Efficient access is defined as providing the most cost effectiveEfficient access is defined as providing the most cost effective level of service level of service consistent with data useconsistent with data use

Remote Remote –– Implying an IP WAN scale of distributionImplying an IP WAN scale of distribution

OnOn--lineline –– As technology evolves the definition of what onAs technology evolves the definition of what on--line means can get line means can get blurred. For our purposes, we define onblurred. For our purposes, we define on--line to imply access latencies not to line to imply access latencies not to exceed a few seconds, but potentially as fast as millisecondsexceed a few seconds, but potentially as fast as milliseconds

Redundant Storage, Recovery & AccessRedundant Storage, Recovery & Access –– Moving beyond mere backup or Moving beyond mere backup or mirroring. Redundant storage implies an intelligence to the datamirroring. Redundant storage implies an intelligence to the data duplication that duplication that reflects varying levels of data criticality and access loads. Threflects varying levels of data criticality and access loads. This intelligence allows is intelligence allows flexibility in the way the system can respond to individual storflexibility in the way the system can respond to individual storage, recovery and age, recovery and access requestsaccess requests

9

Key Operational Goals for RDSKey Operational Goals for RDS

The RDS will provide two types of service in support of the The RDS will provide two types of service in support of the EOSDIS Mission:EOSDIS Mission:

a. Data Integrity Services– The RDS will coordinate Remote Data Backup and Recovery

Services as well as Redundant Shared Data Pool Data Storage across the EOSDIS Data Center Enterprise.

b. Data Access Services– The RDS will present a consolidated user view of the Data Center

Enterprise holdings that are both unified and seamless:Unified – meaning that only a single reference for each product is

presented to the end-user or application independent of the number of copies of the product that are stored in the enterprise for redundancy and load sharing purposes;

Seamless – meaning that the physical location (both geographic and storage technology) of products is transparent to the end-user or application.

10

RDS VisionRDS Vision

DAAC

ArchivePersistent

Archive

User I/F

Catalogue

Data Grid-like Transport

Catalogue Interoperability

Phase 2 User View

Phase 1 User View

Phase 1 Data Exchange

Phase 2-4 Data Exchange

Phase 3-4 Information

Exchange

RDSTransientArchive

GSFCDPL Interoperable

Data & InformationExchangePhase 3-4

Phase 4 On-line Archive

11

Key Research Goals for RDSKey Research Goals for RDS

Vendor Independence Vendor Independence — Science Data Archives are evolving from short-term “Mission” focused storage to long-

term “Data Asset Management”— Single vendor “fixed” solutions become unviable because technology will evolve

beyond any given storage technology or vendor solution — The RDS needs to be built on open industry standards that will permit the integration of

new technology.— This will permit NASA to use the RDS solution as a research test-bed for new

technologies. Examples for research include:1. Heterogeneous Storage over IP; Object-based Storage Devices (OSD) and Services

Enterprise Information ManagementEnterprise Information Management— Volumes of data available for research are increasing. — Need to reference data in a way meaningful to the end user and not the storage system— The RDS needs to provide an environment which enables the implementation of

Enterprise Information Management services on top of the storage services. Examples for research in this area include:

1. Data Grid; Data Pool

12

Architecture ConsiderationsArchitecture Considerations

When defining the solution architecture, the following key driveWhen defining the solution architecture, the following key drivers need to be rs need to be integrated together. integrated together.

— Persistent, secure physical storage: Both long term and short term storage solutions must provide a high data integrity environment, that is tolerant of hardware and application software failure. In addition, although the data itself is not sensitive, it is part of a national data set and considered a national resource, and so needs to be securely stored and accessed.

— Cost effective persistent storage: The scale of the data being archived is tremendous. The total cost of ownership (TCO), therefore, becomes a critical issue. Not just the costs of physical disk, but also the server architecture and management environment are all important drivers.

— Higher performance transient storage: EOS data access patterns vary considerably as the data ages. Data is accessed by user and applications at a higher frequency within the first 30 days of acquisition. This was the one of the drivers for the generation of Data Pools at the DAACs

— Enterprise Data Management: Although the individual DAACs define the contents of the Data Pools (i.e. what data is promoted from the ECS archive), the management of the data across the enterprise for data and service integrity needs to be managed through software at an enterprise level, and not just through inter-DAAC agreement or management mandate.

— Enterprise Information Management: Sitting above the Enterprise Data Management, this layer provides the domain or science context that justifies having the data available to a broader community. This layer presents the information space that the data represents to the end-user, in terms that the end user recognizes.

13

Candidate ArchitectureCandidate Architecture

RAID Mgmt. RAID Mgmt. RAID Mgmt.

Persistent Storage Mgmt

Enterprise Data Management

Enterprise Information Management

TransientStorage Mgmt

RAID Mgmt.

Enterprise Storage

Management

14

Key CapabilitiesKey Capabilities

Layer Capabilities Enterprise Information Mgmt

Application View; Storage Repository Abstraction Thematic Abstraction; Knowledge Based Access; Data Authorship

Layer Capabilities Enterprise Data Mgmt

Homogeneous Storage View; Policy Mgmt

Enterprise Storage Management

Persistent Storage Systems Transient Storage Systems Layer Capabilities Layer Capabilities

SAN FS Management

Shared Server

Enterprise Device Mgmt

Storage Network Management

Object-based Storage Devices

Large Block “Fixed Content”; Abstracted File System

SAN Fabric Management

SAN Connectivity

Layer Capabilities RAID Management Logical Volume Management

Duplexing; mirroring; flash store; Striping

Layer Capabilities RAID Hardware

Storage Buffering

15

StandardsStandards

During a multiphase development such as RDS it is critical for tDuring a multiphase development such as RDS it is critical for the longhe long--term term success of the program to adhere to appropriate standards througsuccess of the program to adhere to appropriate standards throughout all hout all phases. phases.

TelecommunicationsTelecommunications— By adhering to defined telecommunication and network standards in the early stages

of implementation, RDS will be better positioned to evolve to inter-SAN communications technologies as they mature.

Storage Interconnectivity Storage Interconnectivity — All storage technologies implemented under RDS will be expected to conform to

current IEEE and SNIA standards, with a strategy that takes into account the evolving nature of existing (or draft) iSCSI, FCIP and iFCP storage interconnectivity standards as well as other standard initiatives as they emerge.

Storage Management Storage Management — RDS intends to leverage emergent initiatives like those supported by SNIA in support

of its enterprise storage management goals (e.g. Bluefin)

Applications/GISApplications/GIS— It is anticipated that data services based on FGDC metadata standards and OGC

applications standards will be implemented on the Data Pools within the next 2 years. It is expected that RDS will utilize these services.

16

Technology/Standards MaturityTechnology/Standards Maturity

The level of maturity of distributed, heterogeneous The level of maturity of distributed, heterogeneous storage technology currently falls short of the storage technology currently falls short of the ultimate needs for RDS. ultimate needs for RDS. The need for RDS is immediate, and so the The need for RDS is immediate, and so the deployment of the RDS systems and functionality will deployment of the RDS systems and functionality will use a phased approach that follows the onuse a phased approach that follows the on--going going evolution and maturity of storage technologies that evolution and maturity of storage technologies that support or compliment the RDS mission. support or compliment the RDS mission.

17

Phasing RoadmapPhasing RoadmapLayer Phase 1 Phase 2 Phase 3 Phase 4

Enterprise Information Management

No

Prototype solution

Yes

Preliminary Implementation

Yes

Leverage Integrated Storage & Data Management

Yes

Leverage shared file system & integrated DAAC Archives

Enterprise Data Management

No

Prototype solution

Management by Operator Procedure

Optional

If cost and interoperability technology plan permit

Yes

Integrate across RDS/DAAC Enterprise

Yes

Integrate existing DAAC Archives

Transient Storage

Optional

If cost and interoperability technology plan permit

Yes

Prototype Interoperability

Yes

Implement Interoperability

Prototype shared File System

Yes

Implement Shared File System

Persistent Storage

Yes Initial Archive Capacity

Yes Supplemental Capacity

Yes Supplemental Capacity

Yes Supplemental Capacity

RAID Management

Yes Yes Yes Yes

RAID Hardware

Yes Yes Yes Yes

18

Next StepsNext Steps

RDS project will begin to test some of the RDS project will begin to test some of the underlying assumptions and conclusions of this underlying assumptions and conclusions of this paper. paper. Initial SANInitial SAN--based transient archive solution is in based transient archive solution is in place place GridGrid--enabled Content Addressable Storage (CAS) enabled Content Addressable Storage (CAS) based persistent archive prototype is currently based persistent archive prototype is currently being prepared for deployment. being prepared for deployment. Will provide a significant testWill provide a significant test--bed capability bed capability against which to evaluate emerging storage and against which to evaluate emerging storage and storage management solutions.storage management solutions.

19

AgendaAgenda

Overview of the EOS MissionOverview of the EOS MissionRemote Data Store ProgramRemote Data Store ProgramModeling Petabyte Data StoresModeling Petabyte Data Stores

20

Modeling Petabyte ArchivesModeling Petabyte Archives

The cost of RDS needs to take into account not The cost of RDS needs to take into account not only the cost of the acquisition but also the cost only the cost of the acquisition but also the cost of operations and the cost of maintenance. of operations and the cost of maintenance. Total Cost of Ownership ApproachTotal Cost of Ownership Approach— Technology refresh is a way of life— Take advantage of technology refresh— Identify the real costs of operations into the future

21

Total Cost of OwnershipTotal Cost of Ownership

TCO Elements Include:TCO Elements Include:a. Hardware: includes racks, servers, disks, cabling,

acquisition costs, and maintenance costs.b. Software: includes acquisition costs, maintenance costs,

license management, upgrades, and monitoring.c. Personnel: includes operations staff, maintenance support

staff, vendor staff, and consulting staff.d. Availability: includes component MTBF, system MTBF, and

enterprise MTBFe. Performance: includes time to first byte, total volumes

served, and time to complete.f. Facilities: includes equipment floor space, power, cooling,

and staff floor space.

22

TCO Contribution by PhaseTCO Contribution by Phase

Contributing Element

Phase 1 Phase 2 Phase 3 Phase 4

Hardware High

Initial Acquisitions

Neutral

Improved Cost Performance of Storage Environment

Lower

Improved Cost Performance of Storage Environment

Low

Improved Cost Performance of Storage Environment

Software High

Initial Acquisitions

High

Functionality Evolution

Lower

Functionality Enhancements

Low

Functionality Complete

Personnel High

Establishing Operations

Neutral

Improved cost performance of operations

Neutral

Distributed data management costs incurred

Low

Operations routine

Availability Neutral

Distributed Data Storage. Backup Only

Lower

Site Interoperability Established

Low

Site Interoperability Routine

Low

Site Interoperability Routine

Performance Neutral

Sites Unconnected

Lower

Site Interoperability Established

Low

Network cost performance improvements

Low

Network cost performance improvements

Facility High

Establishing Operations

Neutral

Improved cost performance as build-out costs depreciate & storage footprint improves

Lower

Improved cost performance as build-out costs depreciate, automation increases, & storage footprint improves

Low

Improved cost performance as build-out costs depreciate, automation increases & storage footprint improves

23

Mission Model DescriptionMission Model Description

Mission Modeling Variables Mission Modeling Variables ––factors that affect the factors that affect the volume of data being generated at the Data Centersvolume of data being generated at the Data Centers— Mission Assumptions:

1. Satellites: Terra; Aqua; Aura; ICEsat (significant volume products only)

2. Products: L0 – L1A (archive only); L1B-L4 (Archive & Production)

3. Volumes: Mission daily production baselines4. Duration: Design Life of Mission x 1.55. Reprocessing: 7x production (retained for 6 months)

24

Predicted Data VolumesPredicted Data Volumes

01,000,0002,000,0003,000,0004,000,0005,000,0006,000,0007,000,0008,000,0009,000,000

Jan-

98

Jan-

00

Jan-

02

Jan-

04

Jan-

06

Jan-

08

Jan-

10

GB

GSFC EDC LaRC

NSIDC Grand Total

0500,000

1,000,0001,500,0002,000,0002,500,0003,000,000

Jan-

98

Jan-

00

Jan-

02

Jan-

04

Jan-

06

Jan-

08

Jan-

10

GSFC EDC LaRC

NSIDC Grand Total

Persistent Volume – Growth projected into the future.

Transient Volume – Capacity for Reprocessing

25

Capacity Ramp UpCapacity Ramp Up

0

2,000,000

4,000,000

6,000,000

8,000,000

10,000,000

12,000,000Ja

n-03

Jan-

05

Jan-

07

Jan-

09

GB

Stor

ed a

t RDS

05

10152025303540

Jan-

03

Jan-

04

Jan-

05

Jan-

06

Jan-

07

Jan-

08

Jan-

09

Jan-

10

Data

Tra

nsfe

r (O

C ra

ting)

Communication Capacity – User access modeled at 1x production

RDS Total Storage – Notional plan to achieve 100% mid-2010

26

Staffing ModelStaffing Model

Staffing levels associated with hardware NOT Staffing levels associated with hardware NOT volumesvolumes— Assumes a significant improvement in the enterprise

management of disk systems— Assumes a technology refresh rate that permits

“genocidal sparing”

Activity Count Shift Work

Disk Rack Mgmt. 5 Racks/Staff YesTape Mgmt. 1 Staff YesCommunications Mgmt 1 Staff NoStorage Policy Mgmt. 1 Staff NoSystem/Install Engineer 1 Staff NoCenter Manager 1 Staff NoSystem / Storage Administrators 2 Staff Yes

27

Modeling ScenariosModeling Scenarios

Scenario

Longer-Term

Archive

Short-Term

Archive

Comment

1 Near-Line

On-line (SCSI)

Traditional static data architecture. Adequate support for “re-processing campaign” access, poor support for extensive random archive access applications.

2 Off-Line

On-line (SCSI)

Used when the archive is accessed in large “campaign” sized chunks.

3 Near-line

20% On-line (SCSI) 80% Near-Line

Same concept as Scenario 1, but for data access patterns that drop off even more steeply.

4 On-Line (SCSI)

On-line (SCSI)

The high performance solution for the fastest possible access at all times to all data.

5 On-Line (ATA)

On-Line (SCSI)

Compromise between Scenario 1 & Scenario 4. Provides higher performance than near-line solutions at a better cost performance than all SCSI.

28

Hardware CostsHardware Costs

0

1,000,000

2,000,000

3,000,000

4,000,000

5,000,000

6,000,000

7,000,000

8,000,000

Jan-

03

Jul-0

3

Jan-

04

Jul-0

4

Jan-

05

Jul-0

5

Jan-

06

Jul-0

6

Jan-

07

Jul-0

7

Jan-

08

Jul-0

8

Jan-

09

Jul-0

9

Jan-

10

Jul-1

0

Cost

s ($

/Qua

rter)

Scenario 1 Scenario 2 Scenario 3Scenario 4 Sceanrio 5

29

Total Cost PerformanceTotal Cost Performance

0.01

0.10

1.00

10.00Ja

n-03

Jul-0

3

Jan-

04

Jul-0

4

Jan-

05

Jul-0

5

Jan-

06

Jul-0

6

Jan-

07

Jul-0

7

Jan-

08

Jul-0

8

Jan-

09

Jul-0

9

Jan-

10

Jul-1

0

$/G

B

Scenario 1 Scenario 2 Scenario 3Scenario 4 Sceanrio 5

30

Modeling Preliminary ConclusionsModeling Preliminary Conclusions

The results of taking a Total Cost of Ownership The results of taking a Total Cost of Ownership (TCO) approach to model large(TCO) approach to model large--scale (multiscale (multi--petabyte) scientific data archives has led to the petabyte) scientific data archives has led to the following conclusions:following conclusions:— On-line storage staffing needs is the most critical cost

factor in future data center TCO.— Facilities costs are and will remain an insignificant cost

consideration at 5-10% of the TCO.— ATA vs. SCSI can offer considerable savings in the near-

term for on-line storage, and will be competitive with near-line tape storage in the longer-term.

— In the longer-term, if staffing is manageable, the TCO for data storage will become relatively insensitive to the storage media choices.

31

SummarySummary

The development of longThe development of long--term data storage term data storage solutions for EOS data that supports the changing solutions for EOS data that supports the changing needs of the science communities is an onneeds of the science communities is an on--going going activityactivityConstantly changing technology is a way of life, Constantly changing technology is a way of life, and should not hinder the start of this workand should not hinder the start of this workRDS represents a solution path that provides both RDS represents a solution path that provides both a practical solution path using today’s a practical solution path using today’s technology, and a platform from which to evolve technology, and a platform from which to evolve solutions as technology changes solutions as technology changes