dcmi workshop on metadata and search vendor panel presentation bradley p. allen [email protected]
TRANSCRIPT
DCMI Workshop on Metadata and Search
Vendor Panel Presentation
Bradley P. [email protected]://www.siderean.com
Copyright © 2003 Siderean Software LLC. All rights reserved.
Overview
• Our perspective is that of a Semantic Web application vendor
• Our belief is that faceted search will be the first killer application of the Semantic Web
• Our goal is to show how this is possible and what the benefits are
• But first, some general statements…
Copyright © 2003 Siderean Software LLC. All rights reserved.
Tools that leverage Dublin Core
• Do supportable tools exist that take advantage of Dublin Core and other metadata standards to enhance search results?
• Yes, our work is a case in point• Also relevant:
• Weblog CMS• RSS aggregators• Other RDF applications
Copyright © 2003 Siderean Software LLC. All rights reserved.
What's missing?
• What do people need to be able to do to actually use metadata effectively on their intranets?
• Start using what’s out there
• Data in relational tables
• CMS-generated metadata
• A lot of metadata is lying around unexploited
Copyright © 2003 Siderean Software LLC. All rights reserved.
Are Dublin Core guidelines sufficient?
• What additional specifications are needed?• None: DC is an excellent minimal vocabulary that
has achieved broad acceptance
• What we need are best practices, e.g.:
• Encouraging resource values over literal values for DC attributes as good style
• dc:subject using controlled vocabularies
• dc:creator using authority records
• dc:date using temporal hierarchies
• Implementing DCMI validation services
Copyright © 2003 Siderean Software LLC. All rights reserved.
Is XML the primary coding language?
• Is it being used for Dublin Core and other metadata applications?
• Yes, for all the right reasons• Open standards• Leverage of existing tools
• What other encoding methods are being used?
• RDF/N3 for some RDF-based applications
Copyright © 2003 Siderean Software LLC. All rights reserved.
Our application: Seamark• A navigation engine built on three key
ideas• Metadata represented in Resource
Description Framework (RDF) is aggregated from existing enterprise content and data
• Faceted metadata retrieval turns the RDF into a navigation web service
• Web services make navigation applications easy to install and integrate with existing Web applications
Copyright © 2003 Siderean Software LLC. All rights reserved.
Faceted search and RDF: why?• Enabling more effective retrieval is a major goal for
the Semantic Web• RDF is a superb foundation for faceted search
• RDF as an open standard for metadata exchange• RDF Schema as a framework for defining facets
• The Semantic Web will enable faceted search to become pervasive
• Widespread sharing and reuse of ontologies, vocabularies and DC instance data becomes possible
• The blogosphere as an existence proof• “View Source” for the Semantic Web
Copyright © 2003 Siderean Software LLC. All rights reserved.
Seamark, Dublin Core, and CVs• Enables Dublin Core
• Using RDF encodings of DC
• Handles controlled vocabularies• Using emerging RDF-based standards like
TIF(S)
• Supports building and maintaining controlled vocabularies
• Concepts and terms represented as resources and encoded in RDF in the same way as other content
• Therefore the same tools apply
Copyright © 2003 Siderean Software LLC. All rights reserved.
Seamark’s search interface
Use of flat or hierarchical controlled
vocabulariesTransparency
and customizability
of results ranking
Parametric search with
customizable pull-down
menus
Copyright © 2003 Siderean Software LLC. All rights reserved.
Lookups into large CVs in Seamark
Use of standard
vocabularies represented in RDF (e.g.
LC’s Thesaurus
of Graphical Materials
Faceted search over controlled vocabulary
terms
Syndication of CVs, instance
data and ontologies for
sharing
Copyright © 2003 Siderean Software LLC. All rights reserved.
Query processing in Seamark• Based on XML for Retrieval By Reformulation
(XRBR)• A query language that
• Provides support for query reformulation and refinement while minimizing roundtrips
• Supports a stateless protocol for faceted metadata retrieval with SOAP as a transport mechanism
• Handles very large result sets gracefully
• Think of XRBR as an application profile in the digital library sense
• Specifies a view over heterogeneous metadata schemas with hints as to its interpretation and display
Copyright © 2003 Siderean Software LLC. All rights reserved.
Query processing in Seamark• Disambiguation
• Suggestions provide this implicitly
• Query expansion and concept mapping• RDF models plus XRBR structure queries
provide a general mechanism for this
• Entity extraction• XSLT extensions at import augments raw
metadata with additional extracted attributes
• Natural language processing• Direct manipulation now; QA to come
Copyright © 2003 Siderean Software LLC. All rights reserved.
Searching across collections
• Metadata aggregation using RDF provides a general platform for federated search
• We can directly leverage emerging SW approaches to:
• Thesaurus mapping • tif:concept-equivalence
• Schema mapping • rdfs:subPropertyOf
Copyright © 2003 Siderean Software LLC. All rights reserved.
Setup and maintenance• Installation and configuration for Windows,
Linux and Mac OS X• Administration
• Simple web-based administration interface for aggregating feeds and specifying initial queries
• Training• 135 page tutorial• Extensive on-line API documentation
• Courses • One-day on-site introduction
Copyright © 2003 Siderean Software LLC. All rights reserved.
Setup and maintenance
• Shelley Powers, “Practical RDF”, O'Reilly & Associates, 2003:
• “... the application is easily installed and configured, and comes with considerable documentation”
• “What I was most impressed with about the product, though, was how quickly and easily it integrated my RDF/XML data … into a sophisticated query engine with little or no effort.”
Copyright © 2003 Siderean Software LLC. All rights reserved.
Seamark’s administration interface
Users can specify URLs
serving RDF to load into a
given model
… then load them manually
or on a schedule
basis
Alternatively, queries can be
executed against an
SQL database
XSLT stylesheets
transform XML documents
and SQL result sets into RDF
Aggregated models can be
dumped to RDF
Copyright © 2003 Siderean Software LLC. All rights reserved.
Sites using Seamark
Copyright © 2003 Siderean Software LLC. All rights reserved.