nist big data | nist
TRANSCRIPT
NASAFrom Data Curation to Data A l ti A Bi D t
Science Mission
Directorate
NASAAnalytics: A Big Data Experiment in NASA Earth Science
Tsengdar J. Lee, Ph.D.Weather Data Analysis Program ManagerHigh-End Computing Manager
Turning Observations into Knowledge Products
2
Data Acquisition to Data Access
Data Acquisition
Tracking &
Distribution, Access, Interoperability &
Reuse
Flight Operations, Data Capture, Initial
Processing & Backup Archive
Data Transport to DAACs
Science Data Processing, Data
Mgmt., Data Archive & Distribution
SpacecraftData Relay
Satellite (TDRS) Research
Education
Value-AddedProviders
Interagency
NASA Integrated Services
Data Processing & Mission
Control
EOSDIS ScienceData Systems
(DAACs)
GroundStations
g yData Centers
InternationalPartners
Network (NISN)
Mission Services
Control WWWIP
InternetROSES NRA
Polar Gro nd Stations
Use in EarthSystem Models
BenchmarkingDSS
ScienceTeams Measurement
T
3
Polar Ground Stations
TECHNOLOGY
Teams
Where Can We Research/Play with Big Data Problems?
• Ian Foster talked about data processing workflow all the way to data distribution but why stop there?
• Michael Stonebraker talked about moving compute to d t b t th i t d h ll (I ’tdata but there is a tremendous challenge. (I can’t even do it at one NASA center).
Funding ($M)
TOPS1999
NEX Precursors and Development HistoryTOPS
IDU (AERO)(Pl i d S h d li SOA)
1999
2000-02
0.3
1.5(Planning and Scheduling, SOA)
REASoN(Automated data acquisition, first external user = NPS, NEED: Web Interface, Data summarization) 2003-07
1.5
4.5
Inv
R & A(NEED: Flexible analysis
capabilities)
Applied Sci(NOAA, EPA, 20+ partners)
ACCESS(OGC, Web-based maps)2008-09 7.1
vestment
AIST(Anomaly Detection and Analysis Framework)
ARRA/vTOPS/Water (Social Network, Big Data, Technology
prototyping)2009-11 15.5
ts
NASA Earth ExchangeNEX
(Community, Supercomputing,Collaboration)
NASA Earth ExchangeNEX
(Community, Supercomputing,Collaboration)
AIST (1)(Workflow Capabilities)
ACCESS (1)
R & A (11)CMS, NCA, LCLUC,CC,TE, IDS
A li d S i (9)
2012-152.6 (FY12)
))ACCESS (1)(Data and tool management)
Applied Sci. (9)(ECOFOR, WATER)HECC (2)
(NEX Infrastructure)
Access to community/knowledge (240 members) Access to ready-to-use data (24 products, 350TB)
NEX: the Big Data experiment and work completed to date
Access to models/analysis toolsClimate, weather, carbon, hydrology
Access to workflows to build upon
(VisTrails)
Ready for sophisticated users, needs proper integration for the rest…..
NEX SCIENCE USERSEARTH SCIENCECOMMUNITY USERS NEX HPC USERS• Access to Portal
• Access to compute resources• Access to NAS
supercomputing resources
Portal Sandbox NAS HPC
supercomputing resources
• Workflow Management
Point of entry to NEX collaborative environment
• Project Information• Collaboration and Social
Networking
Virtualized NEXcompute environment
Environment forComputing at scale
• Execution of Jobs migrated from SandboxSt f lt i NEX D t
Domain Platform
Infrastructure Platform
g• Provenance• Rich semantic search• Data/Model/Tool access (API)
Networking • Document Publication• Resource Requests• Data Discovery
• Storage of results in NEX Data Management environment
Infrastructure Platform• Virtualization Support• Model and Analytic Tool
Execution
Runs on NAS web servers
Runs on NAS supercomputers and storage
C t
Data Management
• Data acquisition and pre-processing
• Data storageComponent Architecture
• Data storage
Runs on dedicated NEX servers and storage
NEX Portal
Search capabilities
Who is doing what whereScience network through
abstracts and papersWho's the expertWorkflowsArchived seminars
Reporting capabilities
Annual reportsHi hli htHighlightsPublicationsSpatial distribution of funding
Virtual Institute
• Following the lead from Astrobiology and NASA Lunar Science Institutes to create a virtual institute, and ffoffer• Summer short courses• Seminars• Conferences• Presentations
Have each funded project• Have each funded project do one or two seminars that can be archived
Benefits
Science 2007 GRL 2010
Would have taken less than a week in NEX Instead of oneweek in NEX, Instead of one year it took to repeat the analysis
Efficient use of resourcesLowering barriers to entryInterdisciplinary workTransparency, repeatability, extensibilityextensibilityCost reduction
TravelHardwareIT personnelData acquisitionNetwork costs
30 years of change analysis (CC)Global Land Survey from Landsat (LCLUC)
Benefits (Unprecedented Opportunities)
Global monthly Landsat (WELD, MEASURES)Biophysical products from Landsat (CMS)
AGB, t/ha
Crop water management with LandsatWorldView-2, 50cm, 8 bands (NGA), , ( )
11
Big Data Metric?
Space-time-resolution metric(higher spatial resolution, larger extent,longer time-series studies)
NEX Experiments Demonstrate Value of Collaborative Environment
• In a first application of NEX, a research team from around the U.S. used the
i t t dj i denvironment to adjoin and atmospherically correct a mosaic of 9,000 LandsatThematic Mapper scenes and retrieve global gvegetation density at a 30-meter resolution.
• The entire processing of the nearly 340 billion pixels in the composite took just athe composite took just a few hours on the Pleiades supercomputer, allowing the team to experiment with new algorithms and
• NEX’s collaboration and knowledge-sharing platform for the Earth science community combines supercomputing, Earth system modeling, workflow management and NASA remote sensing data feeds toproducts within just a few
days.
POC: Petr Votava petr votava@nasa gov (650) 604-4675;
management, and NASA remote sensing data feeds to deliver a complete work environment for users to explore/analyze large datasets, run modeling codes, collaborate, and share results.
POC: Petr Votava, [email protected], (650) 604-4675; Ramakrishna Nemani, [email protected], (650) 604-6185, NASA Ames Research Center
High End Computing Capability Project
Future Work
• Data ManagementHow to load and offload data in a multioHow to load and offload data in a multi-tier storage environment?
oDistributed Multi-site Analytics?• Workflow Reuse
oWhy create your own if a similaranalytics was done before?analytics was done before?
oUse workflow to capture the data provenance?
• Workflow DiscoveryoWhy not mine the literature to uncover
the workflow?the workflow?