freebase - semantic technologies 2010 code camp

30
Freebase A socially managed semantic database Jamie Taylor SemTech 2010 Data Camp

Upload: jamie-taylor

Post on 20-Aug-2015

4.899 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Freebase - Semantic Technologies 2010 Code Camp

FreebaseA socially managed semantic database

Jamie TaylorSemTech 2010 Data Camp

Page 2: Freebase - Semantic Technologies 2010 Code Camp
Page 3: Freebase - Semantic Technologies 2010 Code Camp

Freebase has Many Types of Things

Page 4: Freebase - Semantic Technologies 2010 Code Camp

12 Million Topics

Page 5: Freebase - Semantic Technologies 2010 Code Camp
Page 7: Freebase - Semantic Technologies 2010 Code Camp

education

nationality

contained-by

education

member-of

eventalbums

label

contained-by

contains

member-of

400 MillionRelations

Page 8: Freebase - Semantic Technologies 2010 Code Camp

What’s in Freebase?

Page 9: Freebase - Semantic Technologies 2010 Code Camp
Page 10: Freebase - Semantic Technologies 2010 Code Camp

http://www.myspace.com/shakira

http://www.facebook.com/shakira

http://twitter.com/shakira

http://www.daylife.com/topic/Shakira

http://www.bestbuy.com/site/She+Wolf…

http://www.guardian.co.uk/music/shakira

http://www.last.fm/music/Shakira

http://www.netflix.com/RoleDisplay/Shakira/20046629

Page 11: Freebase - Semantic Technologies 2010 Code Camp

99% pure

All data undergoes rigorous QA before load

Major focus is reconciliation

Use sampling to assure 99% accuracy

Data that does not meet 99% accuracy is not loaded

Page 12: Freebase - Semantic Technologies 2010 Code Camp

What's been built on Freebase?

Page 13: Freebase - Semantic Technologies 2010 Code Camp

Up to 100,000 Queries a Day

Quarterly dumps of graphhttp://download.freebase.com

Page 14: Freebase - Semantic Technologies 2010 Code Camp
Page 15: Freebase - Semantic Technologies 2010 Code Camp
Page 16: Freebase - Semantic Technologies 2010 Code Camp

Users extend the data model

Users contribute data

Page 17: Freebase - Semantic Technologies 2010 Code Camp

The Freebase Commons·American football ·Internet·Anime/Manga ·Language·Architecture ·Law·Astronomy ·Library·Automotive ·Location·Aviation ·Martial Arts·Awards ·Measurement Unit·Baseball ·Media Common·Basketball ·Medicine·Bicycles ·Metaweb Types·Biology ·Meteorology·Boats ·Military·Broadcast ·Music·Business ·Olympics·Celebrities ·Opera·Chemistry ·Organization·Comics ·People·Common ·Geography·Computers ·Projects·Conferences ·Protected Places·Cricket ·Publishing·Data World ·Radio·Digicams ·Rail·Education ·Religion·Engineering ·Royalty·Event ·Soccer·Clothing and Textiles ·Spaceflight·Fictional Universes ·Sports·Film ·Symbols·Food & Drink ·Tennis·Freebase ·Theater·Games ·Time·Geology ·Transportation·Government ·Travel·Hobbies and Interests ·TV·Ice Hockey ·Video Games·Influence ·Visual Art

Top-level domains

schema = vocabulary

Page 18: Freebase - Semantic Technologies 2010 Code Camp

The Scope of Schema

10,448 Properties

describing

4,936 Types*

organized into

641 Domains

(77 Commons)

*types with 10 or more instances

Page 19: Freebase - Semantic Technologies 2010 Code Camp

Type Instances

1

10

100

1,000

10,000

100,000

1,000,000

10,000,000

100,000,000

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 11000

Rank

Inst

an

ces

Strength through Exemplars

>10 instances,4936 types

1424 Commons

Page 20: Freebase - Semantic Technologies 2010 Code Camp

MQL

[{ "name" : null, "type" : "/film/film"}]

Metaweb Query Language

Page 21: Freebase - Semantic Technologies 2010 Code Camp

[{ "name" : null, "type" : "/film/film",

"directed_by":{"id":"/en/george_lucas"}, "starring":[{

"actor":{"id":"/en/harrison_ford"}}]

}]

MQL

Page 22: Freebase - Semantic Technologies 2010 Code Camp

[{ "name" : null, "type" : "/film/film",

"directed_by":{"id":"/en/george_lucas"}, "starring": [{ "actor": { "name": null, "film": [{ "film": {"id": "/en/the_great_escape"} }] } }]}]

Donald PleasenceTHX 1138

Page 23: Freebase - Semantic Technologies 2010 Code Camp

Freebase Suggest

Page 24: Freebase - Semantic Technologies 2010 Code Camp

{ "/type/object/name":"Blade Runner", "/type/object/type":"/film/film", "/film/film/starring/actor":["Harrison Ford", "Rutger Hauer"], "/film/film/director":"Ridley Scott", "/film/film/release_date_s":"1981" }

Reconciliation

[{ "id":"/guid/9202a8c04000641f8000000000009e89", "name":["Blade Runner", "Bladerunner"], "score":1.4320519, "match":true, "type":["/common/topic", "/film/film","/media_common/adapted_work", "/award/award_winning_work", ]}, { "id":"/guid/9202a8c04000641f80000000002643d0", "name":["Blade"], "score":0.48852453, "match":false, "type":["/common/topic", "/film/film", "/award/award_winning_work", "/award/award_nominated_work", ]}

http://data.labs.freebase.com/recon/

Page 25: Freebase - Semantic Technologies 2010 Code Camp

Topic Blocks

Page 26: Freebase - Semantic Technologies 2010 Code Camp

Topic API

http://www.freebase.com/experimental/topic/standard?id=/en/ncis

Shortcut to building Topic displays

Two forms:

basic (names, types, description)

standard (basic + keys, properties)

Page 28: Freebase - Semantic Technologies 2010 Code Camp

Gridworks

Page 29: Freebase - Semantic Technologies 2010 Code Camp

Acre Development Environment

Page 30: Freebase - Semantic Technologies 2010 Code Camp

Getting Started++• Freebase Documentation Hub

• http://www.freebase.com/docs

• Developer Mailing List• http://lists.freebase.com/mailman/listinfo/freebase-discuss

• http://freebase.markmail.org

• Real Time help on IRC• Freenode #freebase

• Freebase Happenings• http://blog.freebase.com

• About the Graph Store• Google: "ACM SIGMOD schema last tuple store"