17 nov 2003australia vo - atnf1 metadata and registries: describing and finding vo resources r....
TRANSCRIPT
17 Nov 2003Australia VO - ATNF 1
Metadata and Registries: Describing and Finding
VO ResourcesR. Hanisch1, R.Plante2, G. Greene1, A.E. Linde3, T. McGlynn4, W. O’Mullane5, A.M.S. Richards6,
R. Williams7, R. Williamson2, E. C. Auden8,K. T. Noddle3
1) Space Telescope Science Institute2) National Center for Supercomputing Applications
3) University of Leicester4) NASA Goddard Space Flight Center
5) The Johns Hopkins University6) Jodrell Bank Observatory
7) California Institute of Technology8) Mullard Space Science Laboratory
THE US NATIONAL VIRTUAL OBSERVATORY
17 Nov 2003Australia VO - ATNF 2
Resource Metadata
• A resource is any VO entity that can be described and given a name and unique identifier– Data collection (archive)– Catalog or collection of catalogs– Organization– Software packages– Bandpass filter functions– Services
• Services are VO resources that can be invoked by a user or software agent to perform some action on their behalf
• Metadata describes VO resources. This metadata generally includes information the user or a computer program needs to determine if a resource is of interest and how a service is invoked.
17 Nov 2003Australia VO - ATNF 3
Resource Metadata
• Resource metadata is described by– A prose document that defines concepts independent
of an encoding scheme– XML Schemas that encode metadata and metadata
relationships
• Draws on Dublin Core metadata– An interdisciplinary standard for core resource
metadata http://dublincore.org
• Can be categorized– Identity– Curation– General content
– Collection/service content– Data quality– Service invocation
17 Nov 2003Australia VO - ATNF 4
Resource Metadata
17 Nov 2003Australia VO - ATNF 5
Resource Metadata ExampleIdentity metadataTitle Sloan Digital Sky SurveyShortName SDSSIdentifier ivo://stsci.edu/mast/sdss Curation metadataPublisher Space Telescope Science Institute/MASTPublisherID ivo://stsci.edu/mastCreator Sloan Digital Sky Survey ConsortiumCreator.Logo http://archive.stsci.edu/images/sdss_logo.gifContributor Sloan Digital Sky Survey ConsortiumDate 2001-06-15Version SDSS EDRReferenceURL http://archive.stsci.edu/sdss/index.htmlContact.Name Archive Branch, Space Telescope Science InstituteContact.Address3700 San Martin Drive, Baltimore, MD 21218 USAContact.Email [email protected] +1-410-338-4547 General content metadataSubject galaxies, quasars, stars, CCD photometry,
spectroscopy, redshift, sky surveysDescription The Sloan Digital Sky Survey is using a dedicated
2.5-m telescope and a large format CCD camera to obtainimages of over 10,000 square degrees of high Galactic latitude sky in five broad bands (u', g', r', i' and z', centeredat 3540, 4770, 6230, 7630, and 9130 Å, respectively)…
Source 2002AJ….123..485SType Survey, Catalog, EPOResourceContentLevel ResearchRelationship mirror-ofRelationshipID ivo://sdss.org/sdss/edr
Required keywords shown in red
Collection and service content metadataFacility Apache Point Observatory, Sloan 2.5-m TelescopeInstrument Five-band clocked CCD cameraCoverage.Spatial polygon (FK5, 145.17, 1.25, 235.9, 1.25, 235.9, -1.25, 145.17, 1.25) or polygon (FK5, 250.71, 66.29, 267.0, 66.29,
267.0,52.15, 250.71, 66.29) or polygon (FK5, 350.43, 1.17, 360.0, 1.17,360.0, -1.25, 350.43, -1.25) or polygon (FK5, 0.0, 1.17, 56.37, 1.17, 56.37, -1.25, 0.0, -1.25)
Coverage.RegionOfRegard 0.0001Coverage.Spectral OpticalCoverage.Spectral.Bandpass u’, g’, r’, i’, z’Coverage.Spectral.MinimumWavelength 400.e-9Coverage.Spectral.MaximumWavelength 850.e-9Coverage.Temporal.StartTime 1999-12-25Coverage.Temporal.StopTime 2001-07-15Coverage.Depth 3.e-6Coverage.ObjectDensity 6.e4Coverage.ObjectCount 2.e7Coverage.SkyFraction 0.01Resolution.Spatial 0.00028Resolution.Spectral 5000Resolution.Temporal 120UCD Not ProvidedFormat text/xmlRights Public Data quality metadataDataQuality AUncertainty.Photometric 3.e-7Uncertainty.Spatial 0.00003Uncertainty.Spectral 1.e-11Uncertainty.Temporal 0.1
17 Nov 2003Australia VO - ATNF 6
Resource Metadata Example Service metadataService.InterfaceURL http://archive.stsci.edu/cgi-bin/sdss/catalog.htmlService.BaseURL http://archive.stsci.edu/cgi-bin/sdss/catalogService.HTTPResults text/xmlService.StandardID ivo://ivoa.net/Services/ConeSearchService.StandardURL ivo://www.ivoa.net/Documents/REC/ConeSearch.htmlService.MaxSearchRadius 0.2Service.MaxReturnRecords 5000
17 Nov 2003Australia VO - ATNF 7
Resource Metadata: XML Schema
• Classes of ResourcesOrganization, DataCollection, Service, Registry– Specific classes inherit from generic <Resource>
• Organized into separate schemas:– Core resource metadata: VOResource
– Various extensions schemas containing specific types
• Capable of describing…– Data centers, research organizations, missions,
observatories– Data collections, archives – VO standard services: Cone Search, Simple Image
Access– Existing Browser/CGI-based services
17 Nov 2003Australia VO - ATNF 8
The Role of Resource Registries
• Used to discover and locate resources—data and services—that can be used in a VO application
• Registry: a list of resource descriptions– Expressed as structured metadata
to enable automated processing and searching
• Registries are themselves VO Resources
17 Nov 2003Australia VO - ATNF 9
Registry Requirements
• Allow user to select resources that are likely to pertain to a scientific question
• Select resources based on characteristics…– Type of resource: catalogs, image archives, EPO, services– Coverage in space, time, and frequency– Where data comes from, who curates it
• Dynamic: resources will come and go
• Distributed: Should not depend on a single point of failure or single view of the VO.
• Preserve the data providers’ control over their data– Curators control what gets registered, content, updates– Allow integration with existing resource management
• Allow extension to new types of resources
17 Nov 2003Australia VO - ATNF 10
IVOA Registry Working Group (RWG)
• Common approach to registries
• Work packages– Science requirements and use cases– Resource metadata– Registry interfaces– Prototyping
• Distributed model for registries
17 Nov 2003Australia VO - ATNF 11
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
Registry Model
17 Nov 2003Australia VO - ATNF 12
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
Registry Model
harvest(pull)
17 Nov 2003Australia VO - ATNF 13
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
Registry Model
harvest(pull)
replicate
17 Nov 2003Australia VO - ATNF 14
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
Registry Model
harvest(pull)
replicate
selectiveharvesting
17 Nov 2003Australia VO - ATNF 15
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
ClientApplications
searchqueries
Registry Model
17 Nov 2003Australia VO - ATNF 16
Local PublishingRegistry Local
SearchableRegistry
FullSearchableRegistry
Local PublishingRegistry
FullSearchableRegistry
DataCenters
VOProjects
SpecializedPortals & Services
ClientApplications
searchqueries
Registry Model
17 Nov 2003Australia VO - ATNF 18
NVO Prototype Registry
• To support a Data Inventory Service (DIS)
What is known about a position in the sky?
– Use a registry to locate and query standard services:• Cone Search Services: querying catalogs• Simple Image Access Services:
querying image archives and cutout services
Components – Publishing Registries– Searchable Registry– Resource Metadata– Harvesting Protocol– Populated with service descriptions
17 Nov 2003Australia VO - ATNF 19
Publishing Registries: getting information into registries
• Two publishing registries established at Caltech and NCSA.
• Motivation: – Register Simple Image
Access Services– Develop techniques for
easy registration
• Resource descriptions stored as XML documents using VOResource schema
17 Nov 2003Australia VO - ATNF 20
Harvesting Interface
• Adopted Open Archives Initiative (OAI) Protocol for Metadata Harvesting– HTTP/CGI-based protocol for exposing metadata to
harvesters (e.g. searchable registries)
• Advantages:– Existing, field-tested design we didn’t have to re-invent– Fairly easy to implement– Existing tools for emitting and harvesting metadata– Exposes our metadata to larger digital library
community
17 Nov 2003Australia VO - ATNF 21
• Curator uses another site’s registry– Good for a few resources whose descriptions are fairly
statice.g. @NCSA: http://nvo.ncsa.uiuc.edu/nvoregistration.html
• VORegistry-in-a-box:– Deployable package that allows a data provider to run
own registry “out of the box”http://nvo.ncsa.uiuc.edu/VO/software
– Good for larger number of resources that might be updated often
• Curator builds own OAI interface– Good for very large number of resources – Automate XML generation using site’s existing
information management tools
Models for Registering Resources
17 Nov 2003Australia VO - ATNF 22
Searchable Registry
• Searchable Registry was set up at JHU/STScI http://skyserver.pha.jhu.edu/devel/registry
• OAI harvester collects resource descriptions – from Publishing Registries at Caltech & NCSA– Loads data into relational database
• SOAP Web Service interfacehttp://skyserver.pha.jhu.edu/devel/registry/registry.asmx
– Searching• Currently provides specialized querying useful for DIS
– Re-harvest request• To get updated records from publishing registries
17 Nov 2003Australia VO - ATNF 23
Local PublishingRegistry
FullSearchableRegistry
Local PublishingRegistry
Caltech
JHU/STScI
harvest(pull)
DataInventory Service
search forservices
Registry Model
NCSADIS
17 Nov 2003Australia VO - ATNF 24
ConeSearchService
ConeSearchService
Simple ImageAccess
Simple ImageAccess Local
PublishingRegistry
FullSearchableRegistry
Local PublishingRegistry
Caltech
JHU/STScI
harvest(pull)
DataInventory Service
search forservices
Registry Model
NCSADIS
ConeSearchService
Simple ImageAccess
DataProviders
17 Nov 2003Australia VO - ATNF 25
Summary
• We built a working prototype registry system to support an end-user VO service– Distributed Publishing and Searchable components– Encoded descriptions using emerging VO XML standard
schemas– OAI Harvesting Standard deployed easily– Used to discover Cone Search and SIA services
• What’s next: Interoperable registries IVOA-wide – Implement newly agreed-upon Resource Metadata standard
and VOResource XML schema– Demonstrate harvesting and replication– Populate registries with broad base of VO resources– Standardize registry query interfaces