![Page 1: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/1.jpg)
Summary Report fromThursday, 3 March 2011 Pine Room Data Integration Breakout Group
Geo-Data Informatics (GDI) Workshop: Exploring the Life Cycle, Citation and Integration of Geo-Data
![Page 2: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/2.jpg)
Discussion Prompt In your view/experience what parts of data integration
implementations/applications or frameworks are well established (or not) in your discipline(s) and what are the common gaps?
Moderator: Cyndy Chandler (WHOI, BCO-DMO)Rapporteur: Chris Mattmann (NASA JPL, USC)Discussion notes kept at TWC hosted titanpad site
![Page 3: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/3.jpg)
Participants• Bob Arko (Lamont-Doherty Earth Observatory)• Joanne Luciano (TWC, RPI)• Anna Milan (National Geophysical Data Center)• Bob Simons (NOAA)• Brian Wee (NEON, Inc.)• Leslie Hsu (LDEO)• Roland Viger (USGS)• James Wilson (James Madison University)• Tom Narock (NASA/GSFC)• Cathy Constable (SIO, UCSD)• Ruth Duerr (NSIDC)• Yoori Choi (CUAHSI)• Lee Allison, Arizona Geological Survey • Erin Robinson (ESIP)• Kavitha Chandrasekar, Indiana University• Bob Detrick (NSF)• Clifford Jacobs (NSF)• Leonard Jonson (NSF)
![Page 4: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/4.jpg)
Data Integration
• What does that mean?Combining more than one data source into a single data
object. Different from display of multiple data sources in a single view.
Example: a database joinTime series data sets made up of a variety of sources of
data often require data integration.Data aggregation and interoperability are related concepts.
Group did not come to consensus.
![Page 5: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/5.jpg)
Geo Disciplines Represented
• Geology• Hydrology• Oceanography• Geophysics• Geography• Marine geology and geophysics• Space science• Air quality• Computational neuroscience• Multi-disciplinary or discipline-agnostic: data management,
computer science and archive
![Page 6: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/6.jpg)
Geo-Data Integration
• What aspects are well established or not?• Identify common gaps?
![Page 7: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/7.jpg)
• For many projects, two common themes emerged as being associated with some level of success in ability to do data integration:– ‘long-term’ commitment of funding support– Active engagement of funding managers
Examples:Unidata (Atmospheric Sciences)CUASHI (Hydrography)IRIS (Earthquake)US JGOFS, US GLOBEC, US WOCE (Ocean Sciences)ODP (Ocean Drilling)NEON
![Page 8: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/8.jpg)
Support for Data Integration
Development of community of practice• Infrastructure to foster communication (workshops)• Mentoring of students and early career PIs• Development of tools (e.g. Unidata developed
NetCDF which has been adopted by many communities)
• Education and training• The persistence and recognition of a ‘named’
community can enable funds to flow from some agencies to researchers
![Page 9: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/9.jpg)
Support for Data Integration
• Some communities agreed on common data formats that facilitated data integration
• Pressures from funding agencies or community needs resulted in common software tools
• Some communities identified ‘primary’ or ‘core’ variables (e.g. common, essential measurements)
![Page 10: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/10.jpg)
Summary
• ‘Long-term’ funding support enables development of a community-of-practice that fosters communication, education and training, development and adoption of common tools and identification of core measurements. Communities-of-Practice can divide up the labor and work collaboratively to address shared challenges (economy of scale).
![Page 11: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/11.jpg)
Additional Observations
• Tension between local and global (single PI to coordinated project to national to international). An awareness of global use of data could help with subsequent data integration.
• Early planning/specs for data management are important but traditionally difficult to obtain funding.
![Page 12: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/12.jpg)
Gaps
• Lack of awareness/understanding that keeping data ‘alive’ (usable) is not free
• Many people think data stewardship and data preservation are "solved problems” (not).
• "bit level preservation" has been solved, but what is the useful lifespan of those files? What effort is required to make the archived data compatible with all the latest tools and technology. Ability to use a dataset declines over time, without continuing and ongoing attention to ensure that it's still meeting the current access requirements.
![Page 13: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/13.jpg)
Gaps
• Historical or legacy data (originating PI is no longer active in the research community)
• no national policy for scientific preservation• different disciplines have different
interpretations of features in a dataset• Lack of guidelines for best practices regarding
metadata required to document model results* software, methodology, inputs, outputs, etc
![Page 14: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/14.jpg)
Gaps
• Misconception that you create metadata one time, and it's forever good– not a true statement– somehow the metadata needs to be updated– systems and the infrastructure need to support
this– metadata needs to evolve over time
![Page 15: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/15.jpg)
Suggestion
Group agreed that ESIP would be an appropriate community in which to continue these discussions and start to do some much needed planning and cross-disciplinary solutions needed to address the gaps and improve infrastructure for geo-data integration.
![Page 16: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/16.jpg)
Additional Comments
• NRC study done 7-8 years ago about the loss of data and samples in the geosciences:
http://www.nap.edu/openbook.php?record_id=10348&page=R1
• Geoscience Data and Collections: NATIONAL RESOURCES IN PERIL
![Page 17: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/17.jpg)
Additional Comments
• Marine Metadata Interoperability (MMI) http://marinemetadata.org/
Collection of ‘Guides’ on topics including Semantic Web technologies, controlled vocabularies, ontologies, standards, metadata best practices, and much more.
• MMI Ontology Registry and Repository (ORR) is a web application through which you can create, update, access, and map ontologies and their terms. http://mmisw.org/orr/#b
![Page 18: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/18.jpg)
Additional
• CUASHI: Hydrologic Ontology System (funded by NSF)
http://his.cuahsi.org/ontologyfiles.htmlhttp://water.sdsc.edu/hiscentral/startree.aspx
• "Data Management Plan" template available from CUAHSI (February 2011). It is available at http://www.cuahsi.org/his-dmp.html; and includes data inventory, data and metadata standards, data management life cycle, etc.
![Page 19: Summary Report from Thursday, 3 March 2011 Pine Room Data Integration Breakout Group](https://reader035.vdocuments.us/reader035/viewer/2022070418/56815752550346895dc4f68b/html5/thumbnails/19.jpg)
Additional Comments
• EXILIR http://www.bbsrc.ac.uk/science/international/elixir.aspx European life science infrastructure for biological information.
• Its Mission: To construct and operate a sustainable infrastructure for biological information in Europe to support life science research and its translation to medicine and the environment, the bio-industries and society.