The EOSDIS Data Server
Jeanne BehnkeEarth Science Data Information Systems/NASA GSFC
Code 423, Greenbelt MD 20771+1-301-614-5326
Presented at the THIC Meeting at the Naval Surface Warfare Presented at the THIC Meeting at the Naval Surface Warfare Presented at the THIC Meeting at the Naval Surface Warfare Presented at the THIC Meeting at the Naval Surface Warfare Center Center Center Center CarderockCarderockCarderockCarderock
Bethesda MDBethesda MDBethesda MDBethesda MD
October 3, 2000October 3, 2000October 3, 2000October 3, 2000
2
A Big Mission!A Big Mission!
• EOS = Earth Observing System– Centerpiece of NASA’s Earth Science Enterprise– Collect Earth remote sensing data for a 15 year global
change research program
• EOSDIS Data Information System– Software architecture is designed to receive, process,
archive and distribute several terabytes of science data on a daily basis
– User community consists of several thousands of science and non-science users
– 7 major facilities across the US • Distributed Active Archive Centers (DAACs)
3
EOS Terra in orbitEOS Terra in orbit
4
Terra ImageTerra Image
5
TERRA Image TERRA Image -- MODISMODIS
Cyclone Hudah in Indian Ocean taken March 29, 2000
6
TERRA Image TERRA Image -- ASTERASTER
Reno, Nevada taken April 18, 2000
7
TERRA Image TERRA Image -- MISRMISR
8
LandsatLandsat Browse ImageBrowse Image
9
EOSDIS has 3 segments:• Networks• Flight Ops• Science Data Processing System
EOSDIS is composed ofseveral geographicallydistributed elements thatwill appear as a single,integrated, logical entity
EOSDIS is working withNOAA and other agenciesto ensure long term availability of Earth science data
EOSDIS ConceptEOSDIS Concept
10
JPL
ASF
EDC
LaRC
GSFC
NSIDC
ORNL
DISTRIBUTED ACTIVE DISTRIBUTED ACTIVE ARCHIVE CENTERSARCHIVE CENTERS
for EOS Datafor EOS Data
11
Predicted Data Volumes for 2000Predicted Data Volumes for 2000
• Expect launch of EOS-Aqua and ADEOS satellites this year
• 260 different data products and sets of raw instrument data
• 1.6 TB of processed data stored daily by end of 2000DataCenter
ArchiveVolumesGB/Day
# ofgranulesper day
ArchiveVolumesIn TBper year
# o f Granulescumulativeper year
DistributionviaNetworkGB/day
Distributionvia tapeGB/day
EDC 522 6886 190 2,513,390 194 159GSFC 688 5545 251 2,023,925 226 226LaR C 312 2945 114 1,074,925 102 102NSIDC 22 1083 8 395,295 6 6Total 1544 16459 563 6,007,535 528 493
12
SDPS System GoalsSDPS System Goals
• Flexible, Scaleable, Reliable• Use Open System standards• Support standard interface to Earth science to
enable coordinated data analysis• Maximize the use of COTS packages and respond to
technological advances and techniques• Inevitable change and new additions• Architecture to support these goals:
– EOSDIS Core System (ECS)
13
ECS Context DiagramECS Context Diagram
Dat a Se rve rS u b s y s t e m
In t e ro p e ra b il i t y
Dat aMa n a g e m e n tS u b s y s t e m
Clie n t /In t e rn a l
&Ex t e rn a l
Us e rsManagementSubsystem
Pla n n in gS u b s y s t e m
Da t aPro c e s s in g
In g e s tS u b s y s t e m
Dat aCo lle c t io n Da t a
P r o v i d e r s
Us e r re g is t ra t io n ,Orde r s t at us ,Us e r p ro f ile
Dat a p ro c e s s ingr e q u e s t s
Da t aIn g e s t
P l a n s
Dat a &O n - De m a n dRe q u e s t s
Dis c o v e r y /A d v e r t is e m e n t s
Se arch andA c c e s s
A c q u i r e
Da t a /
Do c u m e n t s /A d v e r t is e m e n t s
S e r v i c e s
14
Data Server SubsystemData Server Subsystem“Heart of the ECS system”“Heart of the ECS system”
• Object-oriented C++ on a multiplatform environment of SUNs and SGIs
• Three Software Configuration Items (CI)– Science DataServer CI
• DBMS, geospatial search, inventory
– Storage Management CI• Manages all peripherals including robotic silos
– Data Distribution CI• Places data in distribution location
• Ingest Subsystem CI is also significant to data archiving
15
COTS PackagesCOTS Packages
• System uses ~ 75 Off-The-Shelf packages from commercial and government sources
• Principal COTS that impact design:– Sybase Relational DBMS/SQS - dbms and spatial query system– AMASS - file storage management system for robotic storage devices– Autosys - scheduling software for the processing system– Tivoli - system management tools– HP Openview- graphical tool for system management – RogueWave - libraries used to map components to objects– DCE - distributed computing environment– ClearCase - CM tool to manage completion of different builds– Remedy - trouble-ticketing software used across project
16
Data Server SubsystemData Server SubsystemHardware Context DiagramHardware Context Diagram
Data Repository - STK
Diskstorage
Repository Manager - SGIs
Data DistributionManager - SUNs
Data Server Manager - SGIs
InventoryDatabase
DistributionPeripherals
Tape drives
Push/PullStaging
CD ROM
*configuration is similar at all sites
17
Data RepositoryData Repository
DAAC Make/Model Qty Drive Type MediaCapacity
Total #of Mediain silos
GSFC StorageTek STKPowderhorn
4 13 D3 drives10 9840 drives
50 GB40 GB
12,000
NSIDC StorageTek STKPowderhorn
1 3 D3 drives 50 GB 600
EDC StorageTek STKPowderhorn
3 14 D3 drives8 9840 drives
50 GB40 GB
7,700
LaRC StorageTek STKPowderhorn
2 8 D3 drives8 9840 drives
50 GB40 GB
3,100
DAAC Make/Model Qty Drive Type MediaCapacity
GSFC Exabyte tape drivesCD Rom Writers
82
8mm tape 50 GB600 MB
NSIDC Exabyte tape drivesCD Rom Writers
22
8mm tape 50 GB600 MB
EDC Exabyte tape drivesCD Rom WritersD3 tape drive
821
8mm tape
D3 tape
50 GB600 MB50 GB
LaRC Exabyte tape drivesCD Rom Writers
22
8mm tape 50 GB600 MB
ArchiveRobotic Storage
DistributionSystems
18
STK SiloSTK Silo
19
20
Drive CabinetDrive Cabinet
21
Typical Cartridge Media for SiloTypical Cartridge Media for Silo
22
Mass Storage I/O SystemMass Storage I/O System
• Consists of silo, RAID disk, server hosts– Capable of 40 MB/s throughput sustained at all times
(3.5 TB of data per day)
• Able to push 3 to 4 times as much data because of double buffer mechanism in the storage management system design– Minimizes stress on robotics– Creates our own persistent cache
23
Mass Storage I/O SystemMass Storage I/O System
• Utilize Volume Groups (groups of tapes together)– Group tapes by ‘science data type’ (for example, all
Landsat data in a silo is grouped together on a specified collection of tapes)
– Enables load balancing– Assures minimum performance levels– Allows logical management of the archive
• Additional information in two poster papers at this conference– Fault Tolerant Design – Scalable Architecture for Maximizing Concurrency
24
Archive OperationsArchive Operations
• Strive for an automated archive system– Continuous connection to the archive systems by operations
personnel– At least two operations personnel at each site:
• Principle activities include error notification; backup; monitoring; problem resolution
– Strive for lights out administration
• Support several modes at each site for system upgrade– One operational mode; 2 test modes
• Scheduled maintenance includes hardware monitoring, media monitoring, format and cleanup
25
ConclusionConclusionSo How Are We Doing?So How Are We Doing?
DAAC Type Archive SizeEDC Landsat7/
ASTER/MODIS Land
20 TB274 GB/day
NSIDC MODIS Land(snow
products)
9 TB25 GB/ day
GSFC MODIS L1/Atmos/Ocean
50 TB300 GB/day
LaRC MISR 18 TB88 GB/day
Archive Size to
Date