some problems with standard geospatial metadata...simon cox, bruce simons, nick car 12 march 2015...
TRANSCRIPT
Simon Cox, Bruce Simons, Nick Car
12 March 2015
LAND AND WATER FLAGSHIP
Some problems with standard geospatial metadata
This presentation
• Asks some questions
• Does not provide all the answers • … but suggests some directions …
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 2 |
Outline
• ANZLIC and GeoNetwork
• Where did ANZLIC come from?
• Records
• Uses of metadata
• UML vs XML
• RDF
• RDF vocabularies
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 3 |
ANZLIC Metadata
Presenter name | Presenter title
31 January 2012
ADD BUSINESS UNIT/FLAGSHIP NAME
Problems with metadata | Nick Car 4 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
5 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
6 |
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
7 |
(horse designed by committee =
camel)
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
○ US FGDC metadata a
strong precedent
○ requirements collected in
the 1990s
○ image and map librarians
8 |
(horse designed by committee =
camel)
Where did ANZLIC come from?
● ANZLIC a profile of ISO
19115:2003
● ISO 19115 designed by a
committee
○ US FGDC metadata a
strong precedent
○ requirements collected in
the 1990s
○ image and map librarians
9 |
(horse designed by committee =
camel)
> dawn of the internet, dataset == file
> 10,000s datasets in standard series,
metadata == digital ‘index cards’
Problem #1: Data ≠ Datasets?
• When cataloguing books, maps, images, even files, the card-index metaphor is OK • A discrete record for each item of data
• Now we expect to access data at a variety of granularities, the dataset/metadata record paradigm no longer applies
• It is a sea of data, and should be matched by a sea of metadata (maybe in the same place)
Problems with metadata | Nick Car 10 |
Breaking it down
• Structural decomposition
Problems with metadata | Nick Car 11 |
• Functional decomposition
Lawrence, Lowry, Miller, Snaith & Woolf, Information in environmental data grids. Phil. Trans. A, 2009
Problem #2: One record can’t serve all purposes
• But one ‘record’ is all you got!
Problems with metadata | Nick Car 12 |
ISO metadata was formalized as UML classes
Problems with metadata | Nick Car 13 |
GeoNetwork stores metadata as XML documents in a text database (Lucene)
Problems with metadata | Nick Car 14 |
Problem #3: Documents package text, not objects
• Instances of UML classes = Objects
• XML document = serialization for transport
• Treating the XML document as ‘canonical’ makes a basic category error: ➢XML validation ≠ quality control
➢if you only intend to manage it as text, why bother with a UML analysis?
For object-oriented behavior, the serialized form must be ‘un-marshalled’ for processing
Problems with metadata | Nick Car 15 |
Metadata creation
Problems with metadata | Nick Car 16 |
Problem #4: Index cards are not infrastructure
• Metadata-entry paradigm encourages record counting as a KPI
• Surely there are better measures of usefulness?
• How can we know, if it is not part of a joined-up architecture
Problems with metadata | Nick Car 17 |
What does everyone else do?
1. Specialist systems for specialized communities – Is spatial special? Do we want our spatial data in the mainstream?
2. Don’t bother with metadata, just index the content – The original strategy of the search engines
– Google Knowledge Graph now works with entities, not text
– (shame the entities don’t have persistent URIs …)
3. Metadata annotations – schema.org – semantic-web-lite
4. What about the Data Repositories?
Problems with metadata | Nick Car 18 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
Problems with metadata | Nick Car 19 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
Problems with metadata | Nick Car 20 |
Research Data Repositories
• Still a lot of variation • RIF-CS
• MARC
• Dublin Core
• Data Catalog Vocabulary (DCAT)
• RDF vocabularies? • DC, DCAT
• FOAF, PROV-O, VoID, SKOS, ADMS, LOCN
Problems with metadata | Nick Car 21 |
INSPIRE profile of DCAT-AP
Problems with metadata | Nick Car 22 |
INSPIRE metadata record as RDF
Problems with metadata | Nick Car 23 |
RDF benefits
• Standard vocabularies used in the broader community
• Intrinsically object/resource oriented
• URIs for keys - linked data
• Open world – missing information doesn’t make it invalid
• No intrinsic granularity
Problems with metadata | Nick Car 24 |
Summary
ANZLIC + GeoNetwork:
☹ Record-oriented metadata doesn’t match granularity of data
☹ Each record must serve multiple functions
☹ Object oriented design, but serialization-oriented processing
☹ Incentive to create records, not architecture
☹ Not aligned with anyone else’s metadata
RDF?:
☺ Graph of metadata to match graph of data
☺ Targeted metadata subsets can be constructed using SPARQL
☺ Intrinsically resource-oriented
☺ Part of web of Linked Data
☺ Standard RDF vocabularies
Problems with metadata | Nick Car 25 |
LAND AND WATER FLAGSHIP
Thank you Land and Water Flagship Nick Car Research Engineer
t +61 7 3833 5600 e [email protected]
Land and Water Flagship Simon Cox Research Scientist
t +61 3 9252 6342 e [email protected] w people.csiro.au/C/S/Simon-Cox