dcmi workshop on metadata and search vendor panel presentation bradley p. allen [email protected]

19
DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen [email protected] http:// www.siderean.com

Upload: leslie-crawford

Post on 27-Mar-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

DCMI Workshop on Metadata and Search

Vendor Panel Presentation

Bradley P. [email protected]://www.siderean.com

Page 2: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Overview

• Our perspective is that of a Semantic Web application vendor

• Our belief is that faceted search will be the first killer application of the Semantic Web

• Our goal is to show how this is possible and what the benefits are

• But first, some general statements…

Page 3: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Tools that leverage Dublin Core

• Do supportable tools exist that take advantage of Dublin Core and other metadata standards to enhance search results?

• Yes, our work is a case in point• Also relevant:

• Weblog CMS• RSS aggregators• Other RDF applications

Page 4: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

What's missing? 

• What do people need to be able to do to actually use metadata effectively on their intranets?

• Start using what’s out there

• Data in relational tables

• CMS-generated metadata

• A lot of metadata is lying around unexploited

Page 5: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Are Dublin Core guidelines sufficient?

• What additional specifications are needed?• None: DC is an excellent minimal vocabulary that

has achieved broad acceptance

• What we need are best practices, e.g.:

• Encouraging resource values over literal values for DC attributes as good style

• dc:subject using controlled vocabularies

• dc:creator using authority records

• dc:date using temporal hierarchies

• Implementing DCMI validation services

Page 6: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Is XML the primary coding language?

• Is it being used for Dublin Core and other metadata applications?

• Yes, for all the right reasons• Open standards• Leverage of existing tools

• What other encoding methods are being used?

• RDF/N3 for some RDF-based applications

Page 7: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Our application: Seamark• A navigation engine built on three key

ideas• Metadata represented in Resource

Description Framework (RDF) is aggregated from existing enterprise content and data

• Faceted metadata retrieval turns the RDF into a navigation web service

• Web services make navigation applications easy to install and integrate with existing Web applications

Page 8: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Faceted search and RDF: why?• Enabling more effective retrieval is a major goal for

the Semantic Web• RDF is a superb foundation for faceted search

• RDF as an open standard for metadata exchange• RDF Schema as a framework for defining facets

• The Semantic Web will enable faceted search to become pervasive

• Widespread sharing and reuse of ontologies, vocabularies and DC instance data becomes possible

• The blogosphere as an existence proof• “View Source” for the Semantic Web

Page 9: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark, Dublin Core, and CVs• Enables Dublin Core

• Using RDF encodings of DC

• Handles controlled vocabularies• Using emerging RDF-based standards like

TIF(S)

• Supports building and maintaining controlled vocabularies

• Concepts and terms represented as resources and encoded in RDF in the same way as other content

• Therefore the same tools apply

Page 10: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark’s search interface

Use of flat or hierarchical controlled

vocabulariesTransparency

and customizability

of results ranking

Parametric search with

customizable pull-down

menus

Page 11: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Lookups into large CVs in Seamark

Use of standard

vocabularies represented in RDF (e.g.

LC’s Thesaurus

of Graphical Materials

Faceted search over controlled vocabulary

terms

Syndication of CVs, instance

data and ontologies for

sharing

Page 12: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Query processing in Seamark• Based on XML for Retrieval By Reformulation

(XRBR)• A query language that

• Provides support for query reformulation and refinement while minimizing roundtrips

• Supports a stateless protocol for faceted metadata retrieval with SOAP as a transport mechanism

• Handles very large result sets gracefully

• Think of XRBR as an application profile in the digital library sense

• Specifies a view over heterogeneous metadata schemas with hints as to its interpretation and display

Page 13: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Query processing in Seamark• Disambiguation

• Suggestions provide this implicitly

• Query expansion and concept mapping• RDF models plus XRBR structure queries

provide a general mechanism for this

• Entity extraction• XSLT extensions at import augments raw

metadata with additional extracted attributes

• Natural language processing• Direct manipulation now; QA to come

Page 14: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Searching across collections

• Metadata aggregation using RDF provides a general platform for federated search

• We can directly leverage emerging SW approaches to:

• Thesaurus mapping • tif:concept-equivalence

• Schema mapping • rdfs:subPropertyOf

Page 15: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Setup and maintenance• Installation and configuration for Windows,

Linux and Mac OS X• Administration

• Simple web-based administration interface for aggregating feeds and specifying initial queries

• Training• 135 page tutorial• Extensive on-line API documentation

• Courses • One-day on-site introduction     

Page 16: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Setup and maintenance

• Shelley Powers, “Practical RDF”, O'Reilly & Associates, 2003:

• “... the application is easily installed and configured, and comes with considerable documentation”

• “What I was most impressed with about the product, though, was how quickly and easily it integrated my RDF/XML data … into a sophisticated query engine with little or no effort.”

Page 17: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Seamark’s administration interface

Users can specify URLs

serving RDF to load into a

given model

… then load them manually

or on a schedule

basis

Alternatively, queries can be

executed against an

SQL database

XSLT stylesheets

transform XML documents

and SQL result sets into RDF

Aggregated models can be

dumped to RDF

Page 18: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.

Sites using Seamark

Page 19: DCMI Workshop on Metadata and Search Vendor Panel Presentation Bradley P. Allen ballen@siderean.com

Copyright © 2003 Siderean Software LLC. All rights reserved.