smxeastbarbarastarr2012

Post on 15-Dec-2014

1.738 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Schema 101, Why the New Metadata Matters. "From a Search Engine Perspective" SMX East 2012

TRANSCRIPT

Why Metadata Matters: From a Search Engine Perspective.

Schema 101

By: Barbara StarrTwitter: @BarbaraStarrEmail: bstarr@Ontologica.us

• Pursued a doctorate in Artificial Intelligence from South Africa in the 80's.

• Recruited to build intelligent/predictive trading systems on Wall Street

• Migrated to government-based contracts, several of which turned into real world products like

– SIRI (PAL from DARPA)– WATSON (Acquaint - IBM Watson Labs was a team

member)• From the vantage of a semantic technologist, I keenly

watched the evolution of the Semantic Web.• “Shocked into the real world” when working as a

consultant @ Overstock• Today - Educator, Consultant, Developer.

Meta InformationME

By: Barbara StarrTwitter: @BarbaraStarrEmail: bstarr@Ontologica.usLinkedin: http://www.linkedin.com/in/barbarastarr

My favorite author: Isaac Asimov

Favorite book: I Robot

Favorite character: MULTIVAC

Additional MetainformationFor the purpose of this talk:

same-as

MY ROBOT or Artificially Intelligent Entity or Search Engine

SEARCH ENGINE POINT OF VIEW

How can I exploit metadata or

“semantic search”?

SEARCH ENGINE POINT OF VIEW

RICH SNIPPETS 2009

tiles

Searchmonkey 2008I can directly extract

information to enhance SERP displays

SEARCH ENGINE POINT OF VIEW

I can search directly on consumed metadata!

SEARCH ENGINE POINT OF VIEW

I can provide direct answers to queries by

searching on consumed, verified and validated information

SEARCH ENGINE POINT OF VIEWI can even aggregate answers or deduce

them (like a timeline of events)

SEARCH ENGINE POINT OF VIEW

I can even use it in conjunction with machine learning techniques- to eg.

Train other components

I can detect relevancy

signals: i.e what content to show

to what audience

I can use it to Assist in

interpreting a user query

Penn Treebank tagset

?

SEARCH ENGINE POINT OF VIEW

Really interesting in terms of exposing long tail

content too. It makes things findable for me

when pages are published with structured markup!

I meant the beer brewer

in Arizona

SEARCH ENGINE POINT OF VIEW

I’m a Search Engine Robot

I could really use this stuff. And it is like the tower

of babel out there!

MicrodataMicroformatsRDFa

Syntax Ontology:Vocabulary or lexicon

Multiple conflicting vocabularies that I will have to align internally

and multiple syntax formats as well.

Prior to Schema.org

Goodrelations for e-commerce

SEARCH ENGINE POINT OF VIEW

Time to get Serious!

What has been the history?

Percentage of URLs with embedded metadata in various formats

Five-fold increase between March, 2009 and October, 2010

Another five-fold increase between October 2010 and January, 2012

RDFa exploded in 2012 – Source Peter Mika - Yahoo

Current state of metadata on the Web

• 31% of webpages, 5% of domains contain some metadata– Analysis of the Bing Crawl (US crawl, January, 2012)– RDFa is most common format

• By URL: 25% RDFa, 7% microdata, 9% microformat• By eTLD (PLD): 4% RDFa, 0.3% microdata, 5.4% microformat

– Adoption is stronger among large publishers• Especially for RDFa and microdata• See also

– P. Mika, T. Potter. Metadata Statistics for a Large Web Corpus, LDOW 2012– H.Mühleisen, C.Bizer.Web

Data Commons - Extracting Structured Data from Two Large Web Corpora, LDOW 2012

What’s been the HistoryLinked Open Data exploded from 2007 thru 2010

Oct 2007

Nov 2007

What’s been the History

Sept 2008

March 2009

Linked Open Data exploded from 2007 thru 2010

What’s been the HistoryLinked Open Data exploded from 2007 thru 2010

LOD Cloud

Sept 2010

SEARCH ENGINE POINT OF VIEW

Align and consume many vocabularies that may not be of interest to search

engines?

Rather mandate vocabulary And Syntax - microdata

A Search Engine alliance has the power

to MANDATE vocabulary and syntax!

Sample portion

SEARCH ENGINE POINT OF VIEW

On the other hand – Not wise to

ignore standards bodies like W3C

No mandate on Syntax

SEARCH ENGINE POINT OF VIEW

Did I tell you I don’t like spam?

SEARCH ENGINE POINT OF VIEW

Make sure you are not cloaking by

feeding one set of information to me

and another to human users!

Ensure your data feeds match

information with the structured

markup or “metadata” on

your web pages.

Your Logo

SEARCH ENGINE POINT OF VIEW

Serving RELEVANT

ANSWERS are IMPERATIVE!

& central to my very being!

SEARCH ENGINE POINT OF VIEW

ELSE I AM

SEARCH ENGINE POINT OF VIEW

X

SEARCH ENGINE POINT OF VIEW

Adding context in search verticals really

helps me serve up relevant information

(Seriously increases my recall), as does

geospatial information.Consumed information - Structured Data Dashboard

Google’s “SearchVerticals”

Notice any correlations?I would advise you to!

OH! and be sure to check out Moores law

SEARCH ENGINE POINT OF VIEW

I also have a pretty good understanding of

big data and web intelligence so I can

leverage them!

SIRI

“Amazing fact: same amount of computing to answer one Google Search query as all the computing done -- in flight and on the ground -- for the entire Apollo program!

SEARCH ENGINE POINT OF VIEW

I can leverage metadata for better image

search

SIRI

I can combine it with computer vision techniques.

I can enhance user’s shopping experience.

SEARCH ENGINE POINT OF VIEW

Know rather than Recognize?

INTRODUCING THE KNOWLEDGE GRAPH

Symbolic reasoning vs

stochastic reasoning (Latter is

more like NLP or page rank)

SEARCH ENGINE POINT OF VIEWTalk of increase in screen real estate

and CTR?

And if you thought the knowledge graph was cool,

checkout the knowledge carousel!

SEARCH ENGINE POINT OF VIEW

Thank you for your time!

And just a bye-the-bye, this technology is still in it’s nascent stages. Can you imagine what I will

be able to do soon?

Barbara StarrEmail: bstarr@ontologica.usTwitter: @BarbaraStarr

Resources to help you! Make sure to use them wisely!

Resources at this point in timeCaveat: Some training may be required for some of the tools

Programming Languages:JavaSCript: Microdatajs Live microdataPhp: MicrodataphpRuby: RDF Microdata RDF Lib plugin PerlRuby: RDF Microdata Gem MidaJava: Sindice any23 library

PublishingForm Based tools:

Schema Creator Microdata generator

Standalone toolsWeb.instadata

Editors:Topbraid ComposerProtege

Platforms:DrupalJoomlaWordpress (about 7 of them)VirtuosoTopbraid Composer

Validators, Testers and More Check.rdfa.info Sindice InspectorRich Snippets Testing Tool Bing ValidatorStructured data Linter Online Parser?viewer and RSS generatorValidator.nu Google Structured Data Tester

Resources at this point in timeGoodrelations: Resources, generators, validators, more, ….

Franz new toolSoon to be released for SEO

Other Semantic Web Resources

OpenCalais – Can extract information about people, places and thingsAlchemyAPI – named entity extraction, topic recognition, keyword tagging, more ….Cogito – Expert SystemFranz Inc. – Gruff Many More….

Barbara StarrTwitter: @BarbaraStarr

Email: bstarr@Ontologica.usLinkedin: http://www.linkedin.com/in/barbarastarrFor more info contact:

Caveat: Some training may be required for some of the tools

top related