t he e volving s emantic w orld barbara mcglamery taxonomist martha stewart living omnimedia

76
THE EVOLVING SEMANTIC WORLD Barbara McGlamery Taxonomist Martha Stewart Living Omnimedia

Upload: eugene-whitehead

Post on 26-Dec-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

THE EVOLVING SEMANTIC WORLD

Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia

ABOUT ME Masters in Library and Information Science

Long Island University

New York Public Library Branch librarian NYPL for the Performing Arts – Drama reference

Entertainment Weekly Data Manager

Time Inc. Senior Data Manager, Taxonomist, Metadata Architect,

Ontologist

Martha Stewart Living Omnimedia Taxonomist

AGENDA

What is the Semantic Web? Big “S” and little “s” semantics

What we used to believe Time Inc. & the theory of overkill

What we know now Martha Stewart and the theory that less is more

Where we’re going Leaner and meaner (but more standards)

WHAT IS THE SEMANTIC WEB?

The Semantic Web is a web of data…. (it) provides a

common framework that allows data to be shared and

reused across applications, enterprise, and community

boundaries.

--w3c

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better enabling computers and people to work in cooperation.”

--Tim Berners-Lee, James Hendler, and Ora Lassila, Scientific American, 2001

The Semantic Web is about making knowledge

machine and human-readable

---- Amit Agarwalhttp://www.labnol.org/internet/web-3-concepts-explained/8908/

Web 1.0 Web 2.0 Web 3.0

Connections Collaboration Intelligence

Big S semantic web

Little s semantic web

BIG S SEMANTIC WEB

…big "S" web technologies provide a

framework for describing data on a web page when

the data on the website is published. If data is read

or captured, because the data's semantic meaning

has already been described, you don't have to go

through the process of understanding the meaning

of the data after the fact.

--Sean Martin, CEO of Cambridge Semantics

LITTLE S SEMANTICS

Little "s" web technologies capture and filter data with no description or understanding of the data provided after the capture process. The process of understanding the meaning of that data starts once data capture has happened. People have to intervene to provide the context and meaning for language on the web.

--Sean Martin, CEO of Cambridge Semantics

Big S–

W3C approved

standard

Little s

Looser groups of unaffiliated

standards

BIG S SEMANTICS

ESSENTIALS OF BIG S SEMANTIC WEB

URI – Uniform Resource Identifier

RDF – Resource Description Framework

OWL – Web Ontology Language

Semantic reasoner (inference engine)

URI – UNIFORM RESOURCE IDENTIFIER

Way to identify things Images, pages of text, locations

De-referenceable Freebase

http://www.freebase.com/view/en/will_smith

• URI’s are unique, no two are the same

• Will Smith http://www.freebase.com/view/en/

will_smith

RDF – RESOURCE DESCRIPTION FRAMEWORK

Framework used to describe relationships between objects

Extends and formalizes XML

Subject>Predicate>Object

RDF – RESOURCE DESCRIPTION FRAMEWORK

Subject>Predicate>Object

http://ew.com/PersonsTax/Will_Smith

http://ew.com/EntertainmentOnt/leadPerformanceIn

http://ew.com/EntertainmentTax/Movies/Bad_Boys

Will Smith Bad

Boys

>> >>>is the lead actor >>>>>>

OWL – WEB ONTOLOGY LANGUAGE

…designed to be used by applications that need to process the content of information instead of just presenting it to humans

-- W3C

OWL – WEB ONTOLOGY LANGUAGE

Metadata model Extends RDF to further define properties

Ex: Equivalent relationships

>> >>>is married to>>>>>>

>> >>>is married to>>>>>>

SEMANTIC REASONER

Software able to infer logical consequences from a set of asserted facts

Follows inference rules specified by OWL properties

Inverse Transitive Symmetric Functional/Inverse functional Equivalent

PUTTING IT ALL TOGETHER

Ontology Rule set

Classes and Properties

Taxonomy Application of Rule Set

Tags and Relationships

Everything is a statement Subject>Predicate>Object

Ex: Will Smith is lead performer in Bad Boys

BENEFITS OF RDF/OWL

Persistent URIs

Verifiable XML

Unambiguous Relationships

Polyhierarchy

Interoperability

LIMITATIONS OF RDF/OWL

Difficult to propagate across web

Challenge to integrate with legacy systems

Expensive queries

No “Killer App”

SEMANTIC WEB LAYER CAKE

LITTLE S SEMANTICS

RDFa - Resource Description Framework (in) Attributes

W3C recommendation that adds a set of attribute-level extensions to XHTML for embedding rich metadata within Web documents

Easy to implement Not HTML 5 compliant

RDFA: BEST BUY

LINKED OPEN DATA 2007

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-

cloud.net/”

“Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”

Linked Open Data2010

MICROFORMATS

Semantic markup which seeks to re-use existing HTML/XHTML class attributes to structure data

Easy to implement Limited formats

MICROFORMATS: BON APPÉTIT

MICRODATA

A WHATWG HTML5 specification used to nest semantics within existing content on web pages

Officially supported by Bing, Yahoo, & Google Can imbed other markup languages like

RDFa, microformats, and Dublin Core Not well-known (yet)

MICRODATA:STEVE: THE MUSEUM SOCIAL TAGGING PROJECT

OPEN GRAPH PROTOCOL

Facebook-created markup language that turns any web page into an Open Graph Objects allowing for any page to become a Facebook page

I “Like” you Good for targeted advertising Limited in scope

OGP: MARTHA STEWART

BACK-OF-THE-NAPKIN COMPARISON

Features RDF/OWL

RDFa MF MD OGP

W3C standard

X X X

Extensible X X X

Pre-existing Vocabs

X X

Uses URIs X X

Easy to implement

X X X X

HMTL 5 compliant

X X X

Inferencing

X

STATUS REPORT ON S SEMANTIC WEB

Linked Open Data graph growing

Many countries have developed government sites with rich semantics

Development of Semantic search

More widespread adoption of lighter semantics

WHERE WE MIGHT BE GOING

Pharmaceutical industry identifies trends across clinical studies, and not just within them

News industry better targets content by locale

Department of Defense using it to make better decisions in the field

Utilized in advertising to drive more and more revenue

WHAT WE USED TO BELIEVE

TIME INC. AND TOPICS

TIME INC

Largest magazine media company in U.S.

48 websites worldwide

Websites attract more than 50M unique visitors each month

Domains includes lifestyle, entertainment, style, news, sports, and business

Early adopter (2005-2006) of SW technologies

GOALS

Enhance data integrity

Improve editorial efficiency

Create contextual presentation of content

Develop relationships that cannot be derived from content

Share resources among titles

Improve search and facilitate guided navigation

CHALLENGES

Aging CMS with sites on different versions

Many different domains

Scalability to accommodate volume of data and development of complex relationships

Lack of resources, money, and time

45

Star Wars: Episode I -- The Phantom MenaceEpisode 1Episode IPhantom MenaceStar Wars Episode I The Phantom MenaceStar Wars Episode I: The Phantom MenaceStar Wars prequelStar Wars: Episode 1 -- The Phantom MenaceStar Wars: Episode i -- the Phantom MenaceStar Wars: Episode I: The Phantom MenaceStar Wars: Episode I--The Phantom MenaceStar Wars: Episode I--The Phantom MenanceStar Wars: Episode One -- The Phantom MenaceStar Wars: The Phantom MenaceStar Wars: The Phantom Menace -- Episode IThe Phantom MenaceThe Phanton Menace

WHY WE NEED CONTROLLED VOCABULARIES (OR WHY FREEFORM KEYWORDS JUST DON’T WORK)

Star Wars: Episode I -- The Phantom Menace

WHAT STANDARD TO ADOPT?

RDF Flexible Scalable Fits business needs New technology but industry standard

Microformats Easy to implement No inferencing Solved some business needs but not all No standards Limited formats

SEARCH FOR VENDORS

In 2005 few commercial RDF/OWL tool available that fit our needs

Open source reasoners like Jena and a proprietary design seemed more cost-effective and realistic

TOPICS

Time Ontologies for Publishing, Inference, Classification and Semantics

WHAT IS TOPICS?

Librarian Tool – allows librarians to create resources and properties

Relationship Tool - generates unambiguous connections between data

Classification Tool - allows editors to add uniform, structured metadata to content

Semantic reasoner - finds new facts from existing data

Query Engine - manages logical retrieval of data

TECHNICAL DETAILS OF SYSTEM

Java application Jena semantic reasoner Joseki query engine Sybase database

SITES AOL Home Instyle Entertainment Weekly People This Old House

AOL HOME

Features

Faceted browse

Related content

REAL SIMPLE

Features

Faceted browse

Related content

INSTYLEFeatures

Faceted browse

Related content

Improved search

Navigational taxonomy

ENTERTAINMENT WEEKLY

Aggregated content

Related content

Improved search

Sharing of resources among titles

Features

PEOPLE

Aggregated content

Related content

Improved search

Sharing of resources among titles

Features

THIS OLD HOUSE

Aggregated content

Navigational taxonomy

Improved search

Related content

Faceted browse

Features

THIS OLD HOUSE

STRENGTHS OF TOPICS

Utilizes URIs

Sharable

Create once use many times

Unambiguous relationships

Facilitates aggregation of content

Controlled SEO keywords

,

WEAKNESSES OF TOPICS

Creates massive database of RDF triples

Expensive to query

Based on unsupported open source code (Jena)

Polyhierarchy makes it difficult to create navigational taxonomies

62

QUERY RESULTS

set taxons [TII_TOPICS_GET_ENTITY "MediaProductsTax:MovieCasinoRoyale"]

WHAT WE KNOW NOW

MARTHA STEWART AND LITTLE “S” SEMANTICS

MARTHA STEWART LIVING OMNIMEDIA

MSLO is a Publishing, Broadcasting and Merchandising businesses

Extensive cross-promotion of content and products

3 websites and numerous digital apps

Domains include home, food, weddings, and healthy living

GOALS

Enhance data integrity

Improve editorial efficiency

Share resources among titles and types of content

Create contextual presentation of content

Improve search and facilitate guided navigation

CHALLENGES

Between CMS’s Vingette to Drupal 6

Limited resources, time, money Working on new CMS

Fuzzy business requirements Unclear plan for redesign

SEARCH FOR STANDARDS

RDF/OWL

RDFa

Microformats

Microdata

Open Graph Protocol

DECISIONS DECISIONS

RDF/OWL Expensive to implement No easy HTML 5 implementation No business reason to undertake such a large

endeavor

Roadblocks (Lots) LOE (Great) Time (Massive) Resources (Plenty)

DECISIONS DECISIONS

RDFa No easy HTML 5 implementation

Microformats Useful for recipes but limited formats

Microdata Useful for recipes, but new and untested

Open Graph Protocol Facebook use only, but critical to deploy ASAP

JUST ENOUGH SEMANTICS

Now Microformats

Google Rich Snippets and Recipe search OGP

Site-wide implementation

Next up Probably Microdata from Schema.org

Google approved Integration of other formats

Shiny and new, untested

LESSONS LEARNED (SO FAR)

Educate the troops

Buy-in from senior leadership

Loose, but coherent implementation plan

Concise, easy-to-reach business goals to start

One content type to start, then branch out

WHAT’S NEXT FOR MARTHA

Microdata deployed across all sites

Development of more sophisticated relationships with our content

Roll out of more robust faceted search

Integration of all content types into topic pages

FUTURE OF SEMANTIC WEB

Move from web of objects to web of data

More personalized experiences

Positive impact on content management costs

Classifying content well allows for unanticipated uses and users; cataloging allows for audience targeting.

QUESTIONS?

Barbara McGlameryTaxonomistMartha Stewart Living Omnimedia(212)[email protected]