methods for data discovery – portals portal facilitates access to and also assimilation of data...
TRANSCRIPT
![Page 1: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649dd25503460f94ac8195/html5/thumbnails/1.jpg)
Methods for Data Discovery – Portals
• Portal facilitates access to and also assimilation of data• Portal is not simply a web site: it offers services such as data reformatting, subsetting, brokering,
etc.• Portal is not just a collection of information and links: portal takes you elsewhere through a
service• Portal answers questions: abstracts data or does simple analysis• Identify phases:
– Phase 1: need a simple presence (web page) to start: avoid initial overreaching• Could be multiple portals/interfaces• Define discovery
– Identifying what you know you want – Also, importantly, “accidental” discoveries that derive from the broad scope of disciplines and nations– PIs want “definitive datasets”: vetted for quality, coverage, etc.
• Metadata is key– In US, 10% of all IT spending is for metadata generation– 85% of data is unstructured– Need a new means—other than a list returned from a search—to present the data to the users
• Vetted datasets– Desired and useful– Danger of cliques taking control– Root of ‘vet’ also leads to ‘veto’; overreaching?
• A desired interface: a list that is classified and aggregated• Who are the users? Don’t forget education and outreach community
![Page 2: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649dd25503460f94ac8195/html5/thumbnails/2.jpg)
Methods for Data Discovery – Portals
• IPY legacy:– Need long term stewardship of metadata and data
• Define audiences: scientists and public– Public needs access to information products
• Phase 0: list of datasets and datacenters• Phase 1: metadata for datasets• 2: publications• 3: Services: visualizations• Start with a single data center (?) NSIDC?• Stages:
– 1. IPY project honeycomb charts: identify sources of data• Done by 2007• Science base • Dataflows:
– Regional focus, discipline focus which point to archive or individuals
– 2. Complementary Portals (links)– 3. Services that allow discovery (esp. databases) of unexpected connections
• Search – access• Interactive – community tools• Visualization• Integrative
![Page 3: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649dd25503460f94ac8195/html5/thumbnails/3.jpg)
Methods for Data Discovery – Portals
• Portal must be accessible though search engines (Google)• Alignment of commercial interests with IPY• GoogleBase as a metadata service• Target audiences: scientists and education and outreach
– Also recognize that • Not designing a portal—actually designing a process• Portal captures user interaction and uses this to enhance future use
(e.g. Amazon)• Need to address ontology, metadata design, data collection design
early in the process; counterpoint: we don’t have enough a priori information to design
• Data managers come up with good plans, but implementation is spotty, unless compelled
• Location is a common element that could tie discovery and integration together
• Involve projects in classifying the honeycomb and building the initial lists in Stage 1
![Page 4: Methods for Data Discovery – Portals Portal facilitates access to and also assimilation of data Portal is not simply a web site: it offers services such](https://reader036.vdocuments.us/reader036/viewer/2022082710/56649dd25503460f94ac8195/html5/thumbnails/4.jpg)
Methods for Data Discovery – Portals
• Addendums following group discussion– Who is going to do this? (Implementation plan)
• Agencies• National Committees• PIs• DIS• Arctic Council working groups• International bodies• NGOs
– Use lessons learned from groups like ice coring, oceanographers, etc. who are already good at sharing data– All of this goes into the “funding agency data management letter”; can this be articulated in time?– Letter needs to go to agency IPY point of contact.– Three questions
• Who is responsible for IPY?• How will info be used• Wher will info go? (ipy.org)
– Create metadata to describe portals• AMD is an example for metadata and services descriptions• Enable search of portals• Annotate with keywords to limit search results
– Geographic focus– Stakeholders– Disciplines
– Create an online mechanism for users to input list of portals and annotate them; that is, put the burden on the community
• Suggestions: use GCMD and AMD• Use this to solicit feedback and ideas that are desired by the user community