ny freebase workshop 10 dec 2009

43
Freebase New York Workshop 10 Dec 2009

Upload: metawebrobert

Post on 08-May-2015

847 views

Category:

Technology


1 download

DESCRIPTION

Intro slides for NY Freebase Workshop on Dec 10, 2009

TRANSCRIPT

Page 1: Ny Freebase Workshop 10 Dec 2009

FreebaseNew York Workshop

10 Dec 2009

Page 2: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Presenters

Robert Cook

Jamie Taylor

Will Moffat

Page 3: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Today’s Workshop

9:30 – Intro

10:30 – Prepackaged Freebase solutions

12:30 – Lunch

1:15 – Connecting your data to Freebase

2:30 – Freebase in the data service ecosystem

3:30 – Wrap up, “office hours”

Page 4: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Agenda

Intro to Freebase

Freebase as an identity directory

The Freebase platform

Page 5: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Metaweb

Technology company based in San Francisco

~60 person team of engineers and business people

Venture funded, with long-term outlook

Focused on Freebase.com platform

Page 6: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Freebase is a database of entities

One entity per thing in the worldStable, long-lived identifiers

Inclusive policy

Practical dataFocus on available data

People, places, products, etc.

/en/sienna_miller

/en/sony_dsc_s750

/en/frost_nixon_2008

Data to build appsNames, images, descriptions

Dates, measurements and relationships

Page 7: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Actresses (37,079)

Page 8: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Football Players (16,568)

Page 9: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Cheeses (488)

Page 10: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Musical Instruments (1,034)

Page 11: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Airports (11,556)

Page 12: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

TV Programs (33,630)

arrested_develop

Page 13: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Related entities are connected, forming a graph

• ~10M entities

• ~275M facts

Current stats:

• Continuous data input, cleanup, and syncing

• ~1,800 “types”

—Celebrity

—Movie

—TV show

—Book

—Company

—Location

—Sports team

—Product

—Etc.

Page 14: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Each entity contains rich, structured metadata

Page 15: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Entities are language independent

Page 16: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

As a writeable graph, Freebase gets better over time

• Add (or remove) entities

• Add (or remove) metadata (facts, keys, translations, etc.)

• Extend and improve the schemas

Page 17: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Bulk data into Freebase

15 person group dedicated to algorithmic data import, processing, and tools development

Reconciliation, reconciliation, reconciliation Critical part of everything we do

Automate wherever possible

Crowdsource for tasks requiring human judgment (semi-automated)

Pipelined, ongoing syncing with large external sources(Wikipedia, partners, etc.)

Page 18: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Reconciliation

Guaranteeing one entity per thing in the world

Page 19: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Reconciliation

Page 20: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Reconciliation

Page 21: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Reconciliation

Page 22: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

“US Politicians who have taken more than $30K from foreign companies”

Page 23: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Freebase is open

Page 24: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Open platform means more data

Creative Commons Attribution(CC-BY) licensing

Apps

Robust set of APIsHTTP/REST

SLAs for higher volume users (typically >100K API calls per day)

Hosted developer platform for building tools and apps on top of the data

Page 25: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

External site data and/or keys

TV episode (715,032)

The TVDB, TV Rage, etc.

Beer (3,100) The Oxford Bottled Beers Database

Page 26: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

A global community is actively improving it

Curating existing datasprocketonline

Jet Engines

spatialedHummingbirds

tfmorrisMaritime museums

Creating new data sets

Page 27: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

The community is defining new schemas

∙American football ∙Internet∙Anime/Manga ∙Language∙Architecture ∙Law∙Astronomy ∙Library∙Automotive ∙Location∙Aviation ∙Martial Arts∙Awards ∙Measurement Unit∙Baseball ∙Media Common∙Basketball ∙Medicine∙Bicycles ∙Metaweb Types∙Biology ∙Meteorology∙Boats ∙Military∙Broadcast ∙Music∙Business ∙Olympics∙Celebrities ∙Opera∙Chemistry ∙Organization∙Comics ∙People∙Common ∙Geography∙Computers ∙Projects∙Conferences ∙Protected Places∙Cricket ∙Publishing∙Data World ∙Radio∙Digicams ∙Rail∙Education ∙Religion∙Engineering ∙Royalty∙Event ∙Soccer∙Clothing and Textiles ∙Spaceflight∙Fictional Universes ∙Sports∙Film ∙Symbols∙Food & Drink ∙Tennis∙Freebase ∙Theater∙Games ∙Time∙Geology ∙Transportation∙Government ∙Travel∙Hobbies and Interests ∙TV∙Ice Hockey ∙Video Games∙Influence ∙Visual Art

Top-level domains

Page 28: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Agenda

Intro to Freebase

Freebase as an identity directory

The Freebase platform

Page 29: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Everybody is creating entities

Topic pages

User profiles

Artist pages

Other fans

Relevant apps

Page 30: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Millions of users are helping them

(Movies, Celebrities, Companies, Products, etc.)

@robcook (Person) #sxsw09 (Event)

Page 31: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Will Smith(Actor)

Freebase is connecting these entities together

/index.html?curid=154698

/name/nm0000226

/BandsAndArtists/S/Smith,_Will

willsmith.com

/artist/Will+Smith

/Will-Smith/e/B000APUOJC

/people/s/will_smith

/RoleDisplay/86971

/artist/Will+Smith

/WillSmith

/music/Will+Smith

Page 32: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

An entity directory can power

new applications

Page 33: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

1. Each film review is tagged with the corresponding movies in Freebase

TheIncredibles

(film)

Alfie(film) 2. When the pages loads,

it grabs data from Freebase (images, film info and links) to enhance the article

3. Freebase also returns links to related WSJ film reviews the user might enjoy (based on genre, director, actors, release year, etc.)

4. A Freebase search box allows the user to quickly find any film review in the WSJ archives

Example:

Page 34: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Agenda

Intro to Freebase

Freebase as an identity directory

The Freebase platform

Page 35: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Freebase architecture

Page 36: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Query editor

Page 37: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

[{

"type": "/spaceflight/astronaut",

"name": null,

"/people/person/nationality": ”russia"

}]

Querying Freebase

“Russian cosmonauts”

Page 38: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

{

"type": "/meteorology/tropical_cyclone",

"name": null,

"formed>=": "1990",

"a:formed<": "2000”

}

“Tropical storms in the 90s”

Querying Freebase

Page 39: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

{

"type": "/film/actor",

"name": null,

"/people/person/gender": "female",

"/people/person/date_of_birth<=": "1939",

"/people/person/nationality": "France",

"sort": "/people/person/date_of_birth"

}

“French actresses born pre-WWII”

Querying Freebase

Page 40: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

ACRE

Server side Javascript + webpage

templating

WSJ (and other) applications developed

Advanced APIs

Code sharing – programmer ecosystem

Page 41: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

ACRE IDE

Page 42: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

Other platform services

Freebase suggest

Lucene-based topic search interface

Blob store (text, image thumbnailing)

Reconciliation service

Extended MQL

Page 43: Ny Freebase Workshop 10 Dec 2009

Metaweb confidential – do not distribute

www.freebase.comblog.freebase.comtwitter.com/fbase

[email protected]