sarah callaghan: british atmospheric data center sylvia murphy: noaa/cires cisl seminar, august 29...

37
Metadata and Associated Tools for the 5th Coupled Model Intercomparison Project Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th , National Center for Atmospheric Research, Boulder CO

Upload: rhoda-andrews

Post on 29-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Metadata and Associated Tools for the 5th Coupled Model Intercomparison

ProjectSarah Callaghan: British Atmospheric Data

Center Sylvia Murphy: NOAA/CIRES

CISL Seminar, August 29th, National Center for Atmospheric Research, Boulder CO

Page 2: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

CMIP5 Metadata and the Metafor project

Sarah Callaghan (Metafor project manager)

[email protected]

With many thanks particularly, but not limited to:V. Balaji, Philip Bentley, Cecelia DeLuca, Sebastien Denvil, Gerry Devine, Mark

Elkington, Rupert W. Ford, Eric Guilyardi, Michael Lautenschlager, Bryan Lawrence, Mark Morgan, Marie-Pierre Moine, Sylvia Murphy, Charlotte Pascoe, Hans Ramthun,

Paul Slavin, Lois Steenman-Clark, Frank Toussaint, Allyn Treshansky,and Sophie Valcke

and many other colleagues from the The Global Organisation for Earth System Science Portals, Earth System Curator

and theEarth System Grid Federation

Page 3: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Global community activity under the auspices of the World Meteorological Organisation (WMO) via the World Climate Research Programme (WCRP)

Aim:

to address outstanding scientific questions that arose as part of the IPCC AR4 process,

improve understanding of climate, and

to provide estimates of future climate change that will be useful to those considering its possible consequences.

Method: standard set of model simulations in order to:

evaluate how realistic the models are in simulating the recent past,

provide projections of future climate change on two time scales, near term (out to about 2035) and long term (out to 2100 and beyond), and

understand some of the factors responsible for differences in model projections, including quantifying some key feedbacks such as those involving clouds and the carbon cycle

CMIP5: Fifth Coupled Model Intercomparison Project

Page 4: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Climate model and experiment documentation

What is it ? List of climate model properties Whys and wherefores of simulations Conformance to experimental protocol Standard to describe and compare within a Model

Intercomparison Project aka “metadata”: data describing data

What for ?• Archive, locate, assess, make sense of climate model data

Page 5: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Simulations:~90,000 years~60 experiments within CMIP5~20 modelling centres (from around the world) using~several model configurations each~2 million output “atomic” datasets ~10's of petabytes of output~2 petabytes of CMIP5 requested output~1 petabyte of CMIP5 “replicated” output

Which will be replicated at a number of sites (including ours), arriving now!

Of the replicants:~ 220 TB decadal~ 540 TB long term~ 220 TB atmos-only

~80 TB of 3hourly data~215 TB of ocean 3d monthly data!~250 TB for the cloud feedbacks!~10 TB of land-biochemistry (from the long term experiments alone).

CMIP5 numbers!

(May 2011: All these data output volumes probably a factor of 2 too low!!!)

Page 6: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Why the focus on metadata in CMIP5?

From “Data Storage and Distribution: Lessons from the CMIP3” Karl Taylor, 2009 http://wcrp.ipsl.jussieu.fr/Workshops/Downscaling/Documents/Presentations/Taylor_CMIP3_lessons1.pdf

• How can the process be improved?

– ingest model documentation and expt. details into a searchable database

• Summary of lessons learned in previous MIP’s

– Require some model documentation prior to accepting model output for distribution.

Page 7: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

The CMIP5 questionnaireA year into the project, METAFOR became “a major international focal point for earth system modelling metadata definition” (Karl Taylor, PCMDI)

Metafor was tasked by WGCM/CMIP to define, collect and provide the CMIP5 model metadata

This is when life really started to get interesting!

Metafor's original objective was:

“... to develop a Common Information Model (CIM) to describe climate data and the models that produce it in a standard way, and to ensure the wide adoption of the CIM”

Page 8: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What is the CIM• The CIM (Common Information Model) is a domain model of the

concepts and relationships used in climate modeling

– It includes descriptions not only of climate data, but also of the models that generated and/or used that data, the simulations that those models implemented, the experiments for which those simulations were run, the people/institutions that were involved and why they bothered

– It tries to describe the full provenance of climate modeling artifacts

• It's a metadata model that can be paired with climate modeling artifacts

• It's an emerging standard

• It's the core of a related set of tools and services

• It's the structure around which the CMIP5 metadata is based

Page 9: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What does the CIM look like?Software

Activity

Data

Grids

Quality

Shared ISO

Page 10: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

E x per iment S imulat ion

I nput : CouplingO ut put : Dat a

M odel M odel

Requir ement

1..*

Conf or mance

0 ..*

S of t war eComponent

N amePr oper t iesDescr ipt ionCoupling F r amewor k

0 ..1 Par ent

0 ..* Ch i ld

W hatW hy H ow

http://metaforclimate.eu/trac/browser/CIM/tags/version-1.5

What is the CIM?

Page 11: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Controlled vocabularies and how they were created

Page 12: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Controlled vocabularieshttp://metaforclimate.eu/trac/browser/controlled_vocabularies/trunk/Software/Atmosphere_bdl.mm

The CIM provides the structure for the questionnaire, while the controlled vocabularies provide the content.

The controlled vocabularies can be customised for other domains, allowing the CMIP5 questionnaire to be reused for those domains.

Page 13: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

CMIP5 questionnaire

Page 14: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Completing the questionnaire

Files/References/ Responsible parties

Grids

Models

Simulations

Platform

Screenshot of an institutes Summary (or hub) page

CMIP5 Questionnaire: Summary PageFrom the summary page you can add and edit new platform, model and grid descriptions and also navigate to the different elements of the questionnaire

Simulations

Inputs

Platform

Model

Grids

Grey tab text indicates that this element still needs to be created on the summary page.

Only then can the tab be used for navigation

S im u la tio n

Suggested order for filling in the questionnaire

Page 15: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

15

CMIP5 Users

24 modelling groups, 25 platforms being described, 44 models, 65 grids, and 223 simulations

CAWCR -  Centre for Australian Weather and Climate ResearchCCCMA -  Canadian Centre for Climate Modelling and AnalysisCCSM - Community Climate System ModelCMA-BCC - Beijing Climate Center, China Meteorological AdministrationCMCC - Centro Euro-Mediterraneo per I Cambiamenti ClimaticiCNRM-CERFACS - Centre National de Recherches Meteorologiques - Centre Europeen de Recherche et Formation Avancees en Calcul Scientifique.EC-Earth - EuropeFIO  - The First Institute of Oceanography, SOA, ChinaGCESS  - College of Global Change and Earth System Science, Beijing Normal UniversityGFDL  - Geophysical Fluid Dynamics LaboratoryINM -  Russian Institute for Numerical MathematicsIPSL -  Institut Pierre Simon Laplace

LASG -  Institute of Atmospheric Physics, Chinese Academy of Sciences      ChinaMIROC - University of Tokyo, National Institute for Environmental Studies, and Japan Agency for Marine-Earth Science and TechnologyMOHC -  UK Met Office Hadley CentreMPI-M -  Max Planck Institute for MeteorologyMRI -  Japanese Meteorological InstituteNASA GISS-  NASA Goddard Institute for Space Studies USANCAR  - US National Centre for Atmospheric ResearchNCAS - -UK National Centre for Atmospheric ScienceNCC - Norwegian Climate CentreNIMR - Korean Naitonal Institute for Meteorological ResearchQCCCE-CSIRO - Queensland Climate Change Centre of Excellence and Commonwealth Scientific and Industrial Research OrganisationRSMAS  - University of Miami - RSMAS

Page 16: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Getting help with the questionnaire

• Contact the questionnaire help team– [email protected]– We want to improve the questionnaire so please

tell us how you are getting on and what you would like to change.

• Book an online training session for your team

• More help documentation is available in the questionnaire – soon this will include help videos

Page 17: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What other things do Metafor and the CIM do?

• CIM Viewer – given an ID, display a document

• CIM Query tool – given a query, return a result set

• CIM Differencing tool

• CIM Document tracking

• CIM Document validator

Aim: compose these services into portals which navigate through these options in “user-community-friendly” ways.

The aim is to have services and tools that can be integrated into institutional portals as well as accessed through the Metafor portal.

Page 18: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

The Metafor portal

http://www.purl.org/org/esmetadata/cim/portal

Page 19: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder
Page 20: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Query Tool

Page 21: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

• CIM Differencing differs from other types of feature comparison tools because...– There will be several variants of comparisons depending on the type of

CIM instances being compared and the type of information being requested.

– The set of features being compared is potentially orders of magnitude larger than those typically found in online catalogs (hundreds vs. tens).

– CIM instances have a very rich structure to draw on. Sometimes this helps; Other times it is a hindrance.

• So...– Only small focused comparisons across the same document type will

be supported

– And users should be able to constrain how the results are presented to them in real-time

Differencing Tool

Page 22: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What can be done in the futureQuestionnaire pages are totally driven from the mindmaps:

– Update mindmaps, change content.

– New mindmaps, new questions!

Seeing other applications (beyond CMIP5) with:

– Statistical Downscaling (beyond Metafor)

– Ensembles (collecting EU Ensembles metadata)

– Extending for Impact Assessment Models (UK MIRP project)

– Other extensions: possible new US activity on dynamical cores

Software Improvements:

– Deployment within national infrastructure

– Metafor portal, based on “cleaner” ingestion of CIM documents, and a RESTful web service layer providing search, validation etc.

- Integration with ESGF

Page 23: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Life after Metafor

• Metafor formally finishes in September 2011

• The CIM tools and services will be handed over to IS-ENES to develop and maintain – We want to develop a community and ecosystem for the tools to

flourish

• Community governance for CIM and Controlled Vocabularies:– “Standards Committee” under WCRP/CMIP

• CIM use beyond CMIP5:– statistical downscaling CV

– CMIP5 metrics

– library to ingrain CIM generation within GCMs

Page 24: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Watch the Metafor Cartoonhttp://www.youtube.com/watch?v=76MCRXK4Itc

Page 25: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Earth System Curator and Model Metadata Discovery and Display for

CMIP5

Sylvia Murphy and Cecelia Deluca (NOAA/CIRES)

CISL Seminar, Boulder, COAugust 29, 2011

Page 26: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Outline

• Background (CIM within ESG):– What is Curator?– What is the Earth System Grid (ESG)?– Metadata and the Curator Project – Trackback display features

• Background (CIM within ESMF):– The Earth System Modeling Framework (ESMF)– How ESMF is implementing the CIM

• Live Demonstration of CMIP5 model metadata in ESG

Page 27: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What is Curator?

• The Curator project collaboratively develops software infrastructure to support end-to-end modeling in the Earth sciences.– Funded initially by NSF in 2005– Now supported by NASA, NOAA GIP, and NSF CDI and TeraGrid funds

• Curator collaborates with many groups across the U.S. and internationally – ESG, the NOAA Geophysical Fluid Dynamics Laboratory (GFDL), the DOE Program for Climate Model Diagnosis and Intercomparison (PCMDI), METAFOR, and many others.

• Project Objectives:– Span the gaps between modeling and data services.– Use metadata to document models.– Automate routine processes with workflow software.– Develop software infrastructure that can facilitate the governance of

community software projects within the Earth sciences.• One important focus is preparing for CMIP5. Curator’s role in CMIP5 is to

serve as a liaison between METAFOR and the Earth System Grid (ESG) and to implement the display of CMIP5 metadata in the ESG Gateway.

Page 28: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

What is the Earth System Grid (ESG)?

• The Earth System Grid (ESG) is a network of nodes for federated data access and related services that supports research on Earth’s climate and its impacts.

• Goals– Make data more useful for researchers and policy makers.– Meet the needs of international climate projects for distributed

databases, data access and data movement.– Provide a universal, Web-based data access portal for multi-model,

observational, and reanalysis data collections.– Provide a wide range of climate data-analysis tools and diagnostic

methods to international and U.S. climate centers.• The Earth System Grid - Center for Enabling Technologies (ESG-CET) is

funded by the U.S. Department of Energy as part of the SciDAC (Scientific Discovery through Advanced Computing) program.

Content courtesy of Dean Williams (PCMDI) and Don Middleton (NCAR) from the ESG website and “Cyberinfrastructure and the Global Environmental Data Challenge", Feb 2011, e-Science Institute, Edinburgh

Page 29: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

ESG’s Federated Architecture

Image courtesy of Luca Cinquini (NASA/NOAA) and used with the permission of Don Middleton (NCAR) from “Cyberinfrastructure and the Global Environmental Data Challenge", Feb 2011, e-

Science Institute, Edinburgh

Page 30: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Display Features: Tabs and Component Trees

Curator display in ESG showing metadata from

a CMIP5 run.

Page 31: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Display Features: Pop-up Definitions

Page 32: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

The Earth System Modeling Framework (ESMF)

• ESMF is high-performance software infrastructure that is used by a broad spectrum of weather, climate, and related models. It enables models to be organized as sets of components representing physical domains and processes, such as atmospheres, oceans, and land masses.

• The components can be reused in different contexts and shared by multiple research and operational centers. ESMF also provides toolkits for common modeling functions, so modelers don't need to develop those utilities independently.

• One of these utilities is a Attribute Class that can be used to make models self-describing. It represents metadata as name-value pairs, organized in packages that reflect current community standards (ISO, Climate and Forecase, CIM).

Page 33: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

ESMF and the CIM

• ESMF is implementing the CIM as a series of nest-able Attribute Packages that can then be exported at model initialization as a CIM XML.

• This work is ongoing, but as of ESMF release 5.2.0r, which was released in July 2011, ESMF supports:– General component description– Simulation properties– Responsible parties – ISO citations– Platform descriptions– Couplings/Inputs– Field descriptions– Custom attributes

• These features are currently being implemented into the Community Earth System Model (CESM).

Page 34: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Automated Metadata Workflows in CESM

• The XML file that CESM generates using ESMF can automatically be ingested into ESG since it is in the same format as the metadata generated by the CMIP5 questionnaire.

• The advantage of having the model generate metadata is that it can be generated and customized more easily.

• Example output at right

Screenshot

Page 35: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Future Work

• Finalize display of CMIP5 model metadata• Explore more sustainable technologies for the CIM-to-

display conversion• Leverage Curator metadata capabilities in other projects

(e.g. a shared data analysis and visualization workspace, the National Climate Prediction and Projections Platform, a dynamical core workshop)

• Explore a joint implementation of the CIM portal and Curator trackback interface in future versions of ESG

Page 36: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Live Demo of Curator Display in ESG

View at: http://www.earthsystemgrid.org/home.htm

Page 37: Sarah Callaghan: British Atmospheric Data Center Sylvia Murphy: NOAA/CIRES CISL Seminar, August 29 th, National Center for Atmospheric Research, Boulder

Questions?

METAFOR: http://metaforclimate.euEarth System Curator: http://curator.ucar.edu

ESG: http://earthsystemgrid.org/PCMDI: http://www-pcmdi.llnl.gov/

ESMF: http://www.earthsystemmodeling.org/

[email protected]@noaa.gov