dsp bbc-jem rayfield-semtech2011

45
Journalism BBC MMIX BBC Dynamic Semantic Publishing [DSP] Jem Rayfield : Senior Technical Architect BBC Future Media and Technology

Upload: jem-rayfield

Post on 11-May-2015

4.103 views

Category:

Technology


1 download

DESCRIPTION

BBC Dynamic Semantic Publishing.Transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. Supporting BBC World Cup 2010, BBC Sport and BBC Olympics 2012 online.http://www.bbc.co.uk/worldcup/http://news.bbc.co.uk/sport/http://www.bbc.co.uk/2012/

TRANSCRIPT

Page 1: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC Dynamic Semantic Publishing [DSP]

Jem Rayfield : Senior Technical Architect

BBC Future Media and Technology

Page 2: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Outline

BBC News Online

BBC World Cup 2010

BBC Sport 2011

BBC Olympics 2012

Page 3: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Radio since 1922 TV Since 1930 Web since 1994

Page 4: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

http://bbc.co.uk/news

online

Page 5: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC News [Static Publishing]

Page 6: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Static News Architecture

Page 7: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC CPS/CMS

AssetAuthoring

Page 8: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC CPS/CMS

IndexAuthoring

Page 9: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Static NewsThe Good

1) Simple

2) Scales cheaply

3) Difficult to break [bad rendering logic etc..]

4) Handles high load

Page 10: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Static NewsThe BAD 1) Relational taxonomic

meta model

2) Static! Inflexible! SSI!

3) Document publishing

4) Content non re-usable

5) Content non repurpose-able

6) Difficult to personalize

7) Publication per output

Page 11: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC World Cup 2010

http://bbc.co.uk/worldcup

Page 12: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

1. 32 teams, 8 groups, 736 players 776 pages

2. Fixtures & Results, Groups & Teams pages

3. To many web pages for too few journalists

4. Improve the publishing system to help achieve all of this

World Cup 2010

Page 13: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Page Per Playerhttp://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney

Page 14: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Page PerTeam

Page 15: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Page PerGroup

Page 16: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Semantic publishing

USER EXPERIENCE

ONTOLOGY

TRIPLE STORE

Page 17: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Rationale

• Automated content publishing• Huge increase in content breadth (number of manageable pages)• Content re-use and re-purposing, increasing reach• Simplified content management• Journalist headcount reduction• Multi-dimensional entry points and semantic navigation• Improved user experience with high levels of user engagement• Dynamic, state (time|event) and semantic driven page layout• Personalized content • Open data and API’s

Page 18: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Dynamic Semantic Architecture [DSP]

Page 19: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

API Stack

Page 20: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Highly Scalable Clustered BigOWLIM• Horizontally scalable• No single point of failure• Fault tolerant

Page 21: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Plenty ofCaching

Page 22: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

ExtendableDomain DrivenAssetTagging

Page 23: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Open Ontology/Dataset reuse Event | Geonames | Foaf | Etc.

Page 24: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

World cup ontology

Page 25: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Graffiti: Suggest -> Tag [Player]

Page 26: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Graffiti: Suggest -> Tag [Location] (Geonames)

Page 27: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Tag playerInfer teamInfer competition

Happy Journalist

Page 28: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

World Cup statistics

• 750+ Dynamic aggregations/pages (Player, Squad, Group, etc..)

• Average unique page requests a day : 2 million +

• Average BigOWLIM SPARQL queries a day : 1 million

• 100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. Including sports statistics

• Multi data center fully resilient, clustered 6 node triple store

Page 29: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC Sport Online Refresh

http://bbc.co.uk/sport

Page 30: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Sport Refresh : Stealth Infra upgrade [DSP]

http://bbc.co.uk/sport1/hi/football/teams/c/chelsea

Page 31: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

REST APIContent negotiation: json rdf, xml rdf, turtle

Publically accessible (with SSL cert)

GET /sport/football/teams/<TEAM>Accept: application/rdf+json

GET /sport/football/<COMPETITION>Accept: application/rdf+xml

GET /assets/<ASSET>Accept: text/rdf+n3

Etc….

Page 32: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

<http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> .

<http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> .

<http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName "Chelsea"^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name "Chelsea" ; domain:shortName "Chelsea"^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> .

<http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> .

<urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> .

<http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName "Premier League"^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> .

GET Accept text/rdf+n3https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea

Page 33: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

GET Accept application/rdf+jsonhttps://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea

{ "http:\/\/www.chelseafc.com\/":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/documentType":[ { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/homepage", "type":"uri" }, { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/external", "type":"uri" } ] }, "http:\/\/www.bbc.co.uk\/things\/2acacd19-6609-1840-9c2b-b0820c50d281#id":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/externalId":[ { "value":"http:\/\/dbpedia.org\/resource\/Chelsea_F.C.", "type":"uri" }, { "value":"urn:sports-stats:137316635", "type":"uri" } ], "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[ { "value":"http:\/\/www.bbc.co.uk\/ontologies\/sport\/CompetitiveSportingOrganisation", "type":"uri" } ], "http:\/\/www.bbc.co.uk\/ontologies\/domain\/name":[ { "value":"Chelsea", "type":"literal" } ], "http:\/\/www.bbc.co.uk\/ontologies\/sport\/competesIn":[ { "value":"http:\/\/www.bbc.co.uk\/things\/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id", "type":"uri" } ],

Page 34: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

PHP->EasyRDF->APIPHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/)

EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC)

protected function getOptions() { return array( "config" => array("usecert" => true), "headers" => array( "Accept" => "application/rdf+json", "X-Expect" => "http://www.bbc.co.uk/things/platforms/hiweb" )

);

$options = $this->getOptions()$response = $this->get("https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea", $options)$this->data = new EasyRdf_Graph("http://www.bbc.co.uk", $response->getBody());$teams = $this->data->allofType("sport:CompetitiveSportingOrganisation”)

Page 35: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

But?..... “Our website is the API”

http://www.bbc.co.uk/programmes/

Program “The Carpenters’ Story” HTML => http://www.bbc.co.uk/programmes/b011rf7f RDF => http://www.bbc.co.uk/programmes/b007cllb.rdf

Sport .RDF coming……soon…

Page 36: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Augment architecture with a Content Store

1. Atomic content assets stored in MarkLogic XML store

2. XML content queryable via Xquery

3. Content Assets searchable

4. Sports statistics searchable/queryable via XQuery

5. Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic

Page 37: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

API Stack

Page 38: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Ontology aware NLP

GATE +Ontotext

Page 39: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Euro 2012

Dynamic semantic aggregation pages for

8 Venues

4 Groups

16 Teams

336 Players

Page 40: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Olympics 2012 http://www.bbc.co.uk/2012/

Page 41: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Olympics 2012 – The requirements

1. Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue A lot of output…

2. Almost real time statistics and live event pages

3. Time coded, metadata annotated, on demand video, 58,000 hours of content

1. Far too many web pages for far too few journalists

2. DSP annotation architecture to automate content aggregation

Page 42: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

BBC Sport: http://www.bbc.co.uk/ontologies/sportOpen Sport Ontology

Page 43: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

More…. BBC Open Ontologies

Programmes : http://www.bbc.co.uk/ontologies/programmes

Wildlife : http://www.bbc.co.uk/ontologies/wildlife/

Page 44: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

• Entire BBC sport site re-engineered and domain modeled using RDF framework

• Geospatial (GeoSPARQL) powered news aggregations. Stories about London or Berlin…

• News Event and time based asset aggregations

• Additional domain modeling and extensions. (Business, wildlife, programmes etc..).

• Replicated triple store to facilitate a public facing BBC SPARQL endpoint and API

• SportML and BBC Sport ontology mapping

Platform future…..

Page 45: Dsp bbc-jem rayfield-semtech2011

Journalism BBC MMIX

Questions? [email protected]