dsp bbc-jem rayfield-semtech2011

Post on 11-May-2015

4.103 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

BBC Dynamic Semantic Publishing.Transformational technology strategy the BBC Future Media & Technology department is using to evolve from a relational content model and static publishing framework to a fully dynamic semantic publishing (DSP) architecture. Supporting BBC World Cup 2010, BBC Sport and BBC Olympics 2012 online.http://www.bbc.co.uk/worldcup/http://news.bbc.co.uk/sport/http://www.bbc.co.uk/2012/

TRANSCRIPT

Journalism BBC MMIX

BBC Dynamic Semantic Publishing [DSP]

Jem Rayfield : Senior Technical Architect

BBC Future Media and Technology

Journalism BBC MMIX

Outline

BBC News Online

BBC World Cup 2010

BBC Sport 2011

BBC Olympics 2012

Journalism BBC MMIX

Radio since 1922 TV Since 1930 Web since 1994

Journalism BBC MMIX

http://bbc.co.uk/news

online

Journalism BBC MMIX

BBC News [Static Publishing]

Journalism BBC MMIX

Static News Architecture

Journalism BBC MMIX

BBC CPS/CMS

AssetAuthoring

Journalism BBC MMIX

BBC CPS/CMS

IndexAuthoring

Journalism BBC MMIX

Static NewsThe Good

1) Simple

2) Scales cheaply

3) Difficult to break [bad rendering logic etc..]

4) Handles high load

Journalism BBC MMIX

Static NewsThe BAD 1) Relational taxonomic

meta model

2) Static! Inflexible! SSI!

3) Document publishing

4) Content non re-usable

5) Content non repurpose-able

6) Difficult to personalize

7) Publication per output

Journalism BBC MMIX

BBC World Cup 2010

http://bbc.co.uk/worldcup

Journalism BBC MMIX

1. 32 teams, 8 groups, 736 players 776 pages

2. Fixtures & Results, Groups & Teams pages

3. To many web pages for too few journalists

4. Improve the publishing system to help achieve all of this

World Cup 2010

Journalism BBC MMIX

Page Per Playerhttp://news.bbc.co.uk/sport/football/world_cup_2010/groups_and_teams/team/england/wayne_rooney

Journalism BBC MMIX

Page PerTeam

Journalism BBC MMIX

Page PerGroup

Journalism BBC MMIX

Semantic publishing

USER EXPERIENCE

ONTOLOGY

TRIPLE STORE

Journalism BBC MMIX

Rationale

• Automated content publishing• Huge increase in content breadth (number of manageable pages)• Content re-use and re-purposing, increasing reach• Simplified content management• Journalist headcount reduction• Multi-dimensional entry points and semantic navigation• Improved user experience with high levels of user engagement• Dynamic, state (time|event) and semantic driven page layout• Personalized content • Open data and API’s

Journalism BBC MMIX

Dynamic Semantic Architecture [DSP]

Journalism BBC MMIX

API Stack

Journalism BBC MMIX

Highly Scalable Clustered BigOWLIM• Horizontally scalable• No single point of failure• Fault tolerant

Journalism BBC MMIX

Plenty ofCaching

Journalism BBC MMIX

ExtendableDomain DrivenAssetTagging

Journalism BBC MMIX

Open Ontology/Dataset reuse Event | Geonames | Foaf | Etc.

Journalism BBC MMIX

World cup ontology

Journalism BBC MMIX

Graffiti: Suggest -> Tag [Player]

Journalism BBC MMIX

Graffiti: Suggest -> Tag [Location] (Geonames)

Journalism BBC MMIX

Tag playerInfer teamInfer competition

Happy Journalist

Journalism BBC MMIX

World Cup statistics

• 750+ Dynamic aggregations/pages (Player, Squad, Group, etc..)

• Average unique page requests a day : 2 million +

• Average BigOWLIM SPARQL queries a day : 1 million

• 100s RDF statement updates/inserts per minute with full OWL reasoning and associated inference. Including sports statistics

• Multi data center fully resilient, clustered 6 node triple store

Journalism BBC MMIX

BBC Sport Online Refresh

http://bbc.co.uk/sport

Journalism BBC MMIX

Sport Refresh : Stealth Infra upgrade [DSP]

http://bbc.co.uk/sport1/hi/football/teams/c/chelsea

Journalism BBC MMIX

REST APIContent negotiation: json rdf, xml rdf, turtle

Publically accessible (with SSL cert)

GET /sport/football/teams/<TEAM>Accept: application/rdf+json

GET /sport/football/<COMPETITION>Accept: application/rdf+xml

GET /assets/<ASSET>Accept: text/rdf+n3

Etc….

Journalism BBC MMIX

<http://www.chelseafc.com/> domain:documentType <http://www.bbc.co.uk/things/document-types/homepage> , <http://www.bbc.co.uk/things/document-types/external> .

<http://www.bbc.co.uk/sport/football/teams/chelsea> domain:documentType <http://www.bbc.co.uk/things/document-types/bbc-document> , <http://www.bbc.co.uk/things/document-types/homepage> .

<http://www.bbc.co.uk/things/2acacd19-6609-1840-9c2b-b0820c50d281#id> a sport:CompetitiveSportingOrganisation ; domain:canonicalName "Chelsea"^^<xsd:string> ; domain:document <http://www.chelseafc.com/> , <http://www.bbc.co.uk/sport/football/teams/chelsea> ; domain:externalId <http://dbpedia.org/resource/Chelsea_F.C.> , <urn:sports-stats:137316635> ; domain:name "Chelsea" ; domain:shortName "Chelsea"^^<xsd:string> ; sport:competesIn <http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> .

<http://dbpedia.org/resource/Chelsea_F.C.> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/dbpedia> .

<urn:sports-stats:137316635> domain:externalIdType <http://www.bbc.co.uk/things/external-id-types/bbc-sport-stats> .

<http://www.bbc.co.uk/things/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id> domain:canonicalName "Premier League"^^<xsd:string> ; domain:externalId <urn:sports-stats:118996114> ; sport:competitionType <http://www.bbc.co.uk/things/competition-types/domestic-league> .

GET Accept text/rdf+n3https://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea

Journalism BBC MMIX

GET Accept application/rdf+jsonhttps://api.live.bbc.co.uk/dsp/sport/football/teams/chelsea

{ "http:\/\/www.chelseafc.com\/":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/documentType":[ { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/homepage", "type":"uri" }, { "value":"http:\/\/www.bbc.co.uk\/things\/document-types\/external", "type":"uri" } ] }, "http:\/\/www.bbc.co.uk\/things\/2acacd19-6609-1840-9c2b-b0820c50d281#id":{ "http:\/\/www.bbc.co.uk\/ontologies\/domain\/externalId":[ { "value":"http:\/\/dbpedia.org\/resource\/Chelsea_F.C.", "type":"uri" }, { "value":"urn:sports-stats:137316635", "type":"uri" } ], "http:\/\/www.w3.org\/1999\/02\/22-rdf-syntax-ns#type":[ { "value":"http:\/\/www.bbc.co.uk\/ontologies\/sport\/CompetitiveSportingOrganisation", "type":"uri" } ], "http:\/\/www.bbc.co.uk\/ontologies\/domain\/name":[ { "value":"Chelsea", "type":"literal" } ], "http:\/\/www.bbc.co.uk\/ontologies\/sport\/competesIn":[ { "value":"http:\/\/www.bbc.co.uk\/things\/5cd4682a-7643-f445-8b1f-bcbaf450bc89#id", "type":"uri" } ],

Journalism BBC MMIX

PHP->EasyRDF->APIPHP Render layer consumes RDF from REST API via EasyRDF (http://www.aelius.com/njh/easyrdf/)

EasyRDF open PHP library (Primary committer Nicholas Humfrey BBC)

protected function getOptions() { return array( "config" => array("usecert" => true), "headers" => array( "Accept" => "application/rdf+json", "X-Expect" => "http://www.bbc.co.uk/things/platforms/hiweb" )

);

$options = $this->getOptions()$response = $this->get("https://api.test.bbc.co.uk/dsp/sport/football/teams/chelsea", $options)$this->data = new EasyRdf_Graph("http://www.bbc.co.uk", $response->getBody());$teams = $this->data->allofType("sport:CompetitiveSportingOrganisation”)

Journalism BBC MMIX

But?..... “Our website is the API”

http://www.bbc.co.uk/programmes/

Program “The Carpenters’ Story” HTML => http://www.bbc.co.uk/programmes/b011rf7f RDF => http://www.bbc.co.uk/programmes/b007cllb.rdf

Sport .RDF coming……soon…

Journalism BBC MMIX

Augment architecture with a Content Store

1. Atomic content assets stored in MarkLogic XML store

2. XML content queryable via Xquery

3. Content Assets searchable

4. Sports statistics searchable/queryable via XQuery

5. Ontological SPARQL via BigOWLIM, assets Xquery via MarkLogic

Journalism BBC MMIX

API Stack

Journalism BBC MMIX

Ontology aware NLP

GATE +Ontotext

Journalism BBC MMIX

Euro 2012

Dynamic semantic aggregation pages for

8 Venues

4 Groups

16 Teams

336 Players

Journalism BBC MMIX

Olympics 2012 http://www.bbc.co.uk/2012/

Journalism BBC MMIX

Olympics 2012 – The requirements

1. Page per Athlete [10,000+], Page per country [200+], Page per Discipline [400-500], Page per venue A lot of output…

2. Almost real time statistics and live event pages

3. Time coded, metadata annotated, on demand video, 58,000 hours of content

1. Far too many web pages for far too few journalists

2. DSP annotation architecture to automate content aggregation

Journalism BBC MMIX

BBC Sport: http://www.bbc.co.uk/ontologies/sportOpen Sport Ontology

Journalism BBC MMIX

More…. BBC Open Ontologies

Programmes : http://www.bbc.co.uk/ontologies/programmes

Wildlife : http://www.bbc.co.uk/ontologies/wildlife/

Journalism BBC MMIX

• Entire BBC sport site re-engineered and domain modeled using RDF framework

• Geospatial (GeoSPARQL) powered news aggregations. Stories about London or Berlin…

• News Event and time based asset aggregations

• Additional domain modeling and extensions. (Business, wildlife, programmes etc..).

• Replicated triple store to facilitate a public facing BBC SPARQL endpoint and API

• SportML and BBC Sport ontology mapping

Platform future…..

Journalism BBC MMIX

Questions? jem.rayfield@bbc.co.uk

top related