strategiestaxonomy june 9, 2014copyright 2014 taxonomy strategies. all rights reserved. the search...

30
Strategies Taxonomy June 9, 2014 Copyright 2014 Taxonomy Strategies. All rights reserved. The Search for Meaning and Semantics: Taxonomies Get It Done Joseph Busch – Why Semantics Matter

Upload: clemence-sherman

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

StrategiesTaxonomy

June 9, 2014 Copyright 2014 Taxonomy Strategies. All rights reserved.

The Search for Meaning and Semantics: Taxonomies Get It Done

Joseph Busch – Why Semantics Matter

2Taxonomy Strategies The business of organized information

Agenda

v Why semantics matter (… a quick review from 2001)v What is semantic search, SKOS and Linked Data?v Some semantic search examples?

3Taxonomy Strategies The business of organized information

Why Semantics MatterMay 20, 2001

4Taxonomy Strategies The business of organized information

When you own a Rembrandt you can spell his name any way you want.

5Taxonomy Strategies The business of organized information

But when you want to find a Rembrandt … you better spell his name correctly.

6Taxonomy Strategies The business of organized information

Vocabulary resources can help find the right artist even if their name is typed incorrectly.

7Taxonomy Strategies The business of organized information

Users cannot type in the complex queries needed to find all the relevant items... But this can be done automatically.

8Taxonomy Strategies The business of organized information

Complex queries are even more important when you search the entire web.

9Taxonomy Strategies The business of organized information

So you find Rembrandt the Dutch guy...

10Taxonomy Strategies The business of organized information

… And not Rembrandt the toothpaste.

11Taxonomy Strategies The business of organized information

Getty Vocabularies Linked Data ServicesFebruary 19, 2014

12Taxonomy Strategies The business of organized information

Agenda

v Why semantics matterv What is semantic search, SKOS and Linked Data?v Some semantic search examples?

13Taxonomy Strategies The business of organized information

Search Failure

v 19% Character errors. (Young, et al)

v 40% Vocabulary errors. (Seaman. Norgard, et al)

v 20% Index confusion.v 21% Successful (Nielsen)

40%20%

19%21%

14Taxonomy Strategies The business of organized information

15Taxonomy Strategies The business of organized information

Semantic search solution

v Semantic search improves search accuracy by inferring the contextual meaning of terms via:

Disambiguation Part of speech (POS) analysis Synonyms, variations and quasi-synonyms Concept matching Natural language query analysis Key sentence detection

v Generate more consistent content to search on. v Correct user errors.v Map the language of users to the language of the target content.v Augment search results with linked data.

16Taxonomy Strategies The business of organized information

What semantics do for search?

Function Description

Related search Query corrections … did you mean?

Concept search Query expansion with synonyms, abbreviations, acronyms, etc. … do you also want?

Ontology-based search Query expansion with narrower or broader terms; scoping exhaustive search results

Faceted search Dynamic filtering of search results; online shopping

Clustering Dynamically bucketing search results into pre-defined categories

Stored queries RSS feeds, alerts, SDI (selective dissemination of information), etc.

Personalization Weighting search results based on explicit profiles and implicit data (where you’ve been and what you’ve done)

17Taxonomy Strategies The business of organized information

What is SKOS?

v Provides the basis for any user, tool, or program to identify, define and link concept vocabularies.

Relationship Definition

Concept A unit of thought, an idea, meaning, or category of objects or events. A Concept is independent of the terms used to label it.

Preferred Label A preferred lexical label for the resource such as a term used in a digital asset management system.

Alternate Label An alternative label for the resource such as a synonym or quasi-synonym.

Broader Concept Hierarchical link between two Concepts where one Concept is more general than the other.

Narrower Concept Hierarchical link between two Concepts where one Concept is more specific than the other.

Related Concept Link between two Concepts where the two are inherently "related", but that one is not in any way more general than the other.

18Taxonomy Strategies The business of organized information

Subject Predicate Object

lc:sh85052028 skos:prefLabel Fringe parking

lc:sh85052028 skos:altLabel Park and ride systems

lc:sh85052028 skos:altLabel Park and ride

lc:sh85052028 skos:altLabel Park & ride

lc:sh85052028 skos:altLabel Park-n-ride

trt:Brddf skos:prefLabel Fringe parking

trt:Brddf skos:altLabel Park and ride

trt:Brddf skos:altLabel P&R system

Trt:Brdd skos:broader Parking

lc:sh85052028

Fringe parking

Park and ride

systems

Park and ride

CONCEPT

trt:BrddfPark &

ride

Park-n- ride

altLabel

altLabel

altLabel

prefLabel

prefLabel

altLabel

altLabel

P&R system

altLabel

broader

Parking

trt:Brdd

prefLabel

19Taxonomy Strategies The business of organized information

Why SKOS? According to Alistair Miles* (SKOS co-author)

v Ease of combination with other standards Vocabularies are used in great variety of contexts.

– E.g., databases, faceted navigation, website browsing, linked open data, spellcheckers, etc.

Vocabularies are re-used in combination with other vocabularies.– E.g., Library of Congress Subject Headings +

Transportation Research Thesaurus; USPS states + USPS zip codes + US Congressional districts; etc.

v Flexibility and extensibility to cope with variations in structure and style

Variations between types of vocabularies– E.g., list vs. classification scheme

Variations within types of vocabularies – E.g., Z39.19-2005 monolingual controlled vocabularies and the

Transportation Research Thesaurus

* Head of Epidemiological Informatics at Oxford University Wellcome Trust Centre for Human Genetics (formerly OUP Senior Computing Officer)

20Taxonomy Strategies The business of organized information

Why SKOS? (2)

v Publish managed vocabularies so they can readily be consumed by applications

Identify the concepts– What are the named entities?

Describe the relationships– Labels, definitions and other properties

Publish the data– Convert data structure to standard format– Put files on an http server (or load statements into an RDF server)

v Ease of integration with external applications Use web services to use or link to a published concept, or to one or more

entire vocabularies.– E.g., Google maps API, NY Times article search API, Linked open data; etc.

v A W3C standard like HTML, CSS, XML and RDF, RDFS, and OWL.

21Taxonomy Strategies The business of organized information

Agenda

v Why semantics matterv What is semantic search, SKOS and Linked Data?v Some semantic search examples?

22Taxonomy Strategies The business of organized information

Taxonomy browser

23Taxonomy Strategies The business of organized information

Taxonomy-powered search results

24Taxonomy Strategies The business of organized information

AudienceAudience ProductsProductsLocationLocationOrganizationOrganization Content TypeContent Type

Product LineProduct Line

Application

Technology

Industry Solution

PersonPerson

Oracle.com top-level taxonomy

Has a

Is a

25Taxonomy Strategies The business of organized information

Oracle event finderhttp://events.oracle.com/

Filter on Location and Language

More filters based on this result

Results shown on Google maps UI

Subscribe to RSS feed based on the criteria set on this page

26Taxonomy Strategies The business of organized information

APS Taxonomy browser

27Taxonomy Strategies The business of organized information

Linked data example

APS TaxonomyBroad Subject AreasMethods & TheoriesPhenomenaPhysical Systems Astronomical systems

Atomic-scale objectsBeamsComplex systemsDynamical systemsElectric & magnetic fieldsEngineered materialsFundamental particlesGases deleteInformation systemsLiquids deleteMaterialsNonlinear systemNucleiPlasmaQuasiparticles

Materials by CompositionMaterials by DimensionalityMaterials by PropertyMaterials by Structure

Elements by GroupGroup 1Group 2Group 3Group 4Group 5Group 6Group 7Group 8Group 9Group 10Group 11Group 12Group 13Group 14Group 15Group 16Group 17Group 18

Elements of the periodic table, and common isotopes

CadmiumCoperniciumMercuryZinc

194Hg196Hg198Hg199Hg200Hg201Hg202Hg204Hg

A faceted taxonomy of concepts in physics

28Taxonomy Strategies The business of organized information

Paper submission tagging (prototype)

29Taxonomy Strategies The business of organized information

QUESTIONS

Joseph A BuschMobile 415-377-7912

[email protected]

30Taxonomy Strategies The business of organized information

Session description

v Semantic search – a phrase that is increasingly used in the popular as well as the professional literature. What does it look like, and how will it work. Panelists will present their visions of semantic search. Program is designed to be interactive with audience participation – suggestions for functions and features they see in the future.

What is semantic search? What are the components of semantic search? How can it be used in libraries?