http://esd.lbl.gov/bwc/ deb agarwal (ucb and lbnl) catharine van ingen (msft) berkeley water center...

21
http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July 13, 2006 Designing CyberInfrastructure to Support End Science

Upload: elvin-york

Post on 03-Jan-2016

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

http://esd.lbl.gov/BWC/

Deb Agarwal (UCB and LBNL)Catharine van Ingen (MSFT)

Berkeley Water CenterMicrosoft TCI

IndoFlux Meeting, Chennai, India, July 13, 2006

Designing CyberInfrastructure to Support End Science

Page 2: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Project Motivation

Data is now being gathered into common data archives

Data archives provide an opportunity for cross-discipline and cross-site investigations

Data analysis techniques which worked well on small data sets often do not scale

Current CS tools have evolved in support of other disciplines – Investigate their ability to facilitate data analysis

Page 3: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Distributed Data Sets

Building BWC Water Cyberinfrastructure to

Connect Data, Resources, and People

Science Portal

Data Harvesting and

TransformationsData Cleaning,

Models, Analysis Tools

ComputationalResources

Page 4: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Web Service Interface to Data and Tools

Data Providers:Host AmerifluxClimate DataStatsgo Soils DataMODIS products

Web-basedWorkbench access

Tools:StatisticalGraphical

LAITempFparVeg IndexSurf ReflNPP Albedo

Choose Ameriflux Area/Transect, Time Range, Data Type

Gap Fill, A technique

Gap Fill, B technique

Design Workflow

Statistical &graphical analysis

Canoak Model Site 9

Data harvest Sites 1-16

Canoak Model Site 1

Version control

Network display LAI

Statistical & Graphical analysis

Data Cleaning Tools

Data Mining and

Analysis Tools

Modeling Tools

Visualization Tools

Ecology Toolbox

Compute Resources

Carbon Community Workbench

ClimateStatsgoMODIS

Import other Datasets

Knowledge Generation Tools

Page 5: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Approach

Work closely with the end scientists to define, prototype, and test the system

Provide a solution that leverages both server-based and local desktop/laptop environments

Leverage commercial tools to the extent possible

Page 6: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Some Critical Capabilities

Support for versioning of data sets Work with multiple data sets Advanced data selection and plotting

capabilities Select data relative to an event Simple calculation across any specified date

range Statistical information available Plots - scatter, diurnal, time series, probability

density function, tiled, correlation Ability to access capabilities from desktop

Page 7: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Data Pipeline

ORNL AmerifluxSite

CSV Files

BWC SQL Server Database

Data Cube

Excel Pivot Table and Chart

Page 8: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Data Cleaning and Versioning

BWC SQL Server Database

Excel spreadsheet of current data

Investigator updated spreadsheet

Page 9: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Analysis Services Data Cube An organized view of the data A multi-dimensional view into the data Can integrate multiple data sources Define measures and dimensions

Measure – a value you want to be able to plotDimension – An axis you want to be able to

use to select data and as axis Calculations – define new measures

Page 10: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July
Page 11: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July
Page 12: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Precipitation trends and totals

Summer precipitation:Tonzi and Vaira ~ 2% of totalMetolius ~ 24% of totalWalker Branch ~ 40% of total

Precipitation Trends for 2004

0

50

100

150

200

250

300

1 3 5 7 9 11

Month

Pre

cip

itat

ion

(m

m)

Tonzi

Vaira

Metolius

Walker

*Plot created by Gretchen Miller of UC Berkeley

Page 13: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Temperature at North American Sites

-10

0

10

20

30

20 30 40 50 60 70 80

Latitude

Ave

rag

e T

emp

mer

atu

re in

oC

`

Other applications

*Plot created by Gretchen Miller of UC Berkeley

Page 14: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Temperature at North American Sites

-30

-20

-10

0

10

20

30

Jan Feb Mar Apr May June July Aug Sept Oct Nov Dec

Month

Ave

rag

e T

emp

mer

atu

re i

n oC

31.5 40.0

49.9 70.5

Observations by latitude

*Plot created by Gretchen Miller of UC Berkeley

Page 15: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Average NEE

-6

-5

-4

-3

-2

-1

0

1

2

Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Month

NE

E (

mm

ol

m-2

s-1

)

Deciduous broadleafforest

Evergreen needleleafforest

Mixed forest

Observations by ecosystem type

*Plot created by Gretchen Miller of UC Berkeley

Page 16: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Some Lessons Learned so Far

Data naming and unit consistency is critical to easy ingest of large amounts of data

Commercial tools do not necessarily provide all the right analysis capabilities directly

Scaling capabilities of the tools not yet clear We will need tools to aid in notification of

PIs

Page 17: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Portal Deployment Behind the portal are a

collection of databases and data cubes

Distribution for ease of use Only see the data of interest Private data remains stable

Distribution for scaling Smaller queries on smaller

databases take less resources Larger databases and cubes

can be replicated across machines

Batch job like infrastructure for managing very long running queries

Page 18: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July
Page 19: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

Acknowlegements Science Team

Dennis Baldocchi Bev Law Gretchen Miller

Cyberinfrastructure Matt Rodriguez Monte Goode

Microsoft Tony Hey Nolan Li

Oak Ridge National Lab CDIAC personnel Berkeley Water Center

Yoram Rubin Susan Hubbard

Page 20: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

URLs and Connection Coordinates

Web Sitehttp://esd.lbl.gov/BWC

Bloghttp://dsd.lbl.gov/BWC/amfluxblog

[email protected]

Page 21: Http://esd.lbl.gov/BWC/ Deb Agarwal (UCB and LBNL) Catharine van Ingen (MSFT) Berkeley Water Center Microsoft TCI IndoFlux Meeting, Chennai, India, July

http://esd.lbl.gov/BWC/