from publisher to platform: how the guardian used content, search, and open source to build a...

Post on 09-May-2015

9.614 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Last year The Guardian launched The Open Platform, a suite of services and tools that enable content partners and developers to build applications leveraging The Guardian's rich content. This talk will cover how The Guardian opened up their content, enriched it, and reached new markets with it's platform strategy. We cover the background platform strategy, technical architecture, implementation of Solr, and how the new release of the Guardian's Open Platform, launched May 20th, 2010, has embraced disruption in the media space, while at the same time accelerating revenue.

TRANSCRIPT

21 May 2010Apache Lucene EuroCon

1

From publisher to platformHow the guardian used content, search, and open source to build a powerful new business modelStephen Dunn, Guardian News and Media

21 May 2010Apache Lucene EuroCon

To secure the financial and editorial independence of the Guardian in perpetuity. To promote freedom in the press and liberal journalism globally.

To become the world's leading liberal voice.

“To secure the financial and editorial independence of The Guardian in perpetuity.”

“To promote freedom in the press and liberal journalism globally.”

21 May 2010Apache Lucene EuroCon

Swine flu

Keyword page

Twitter updates

Content partnerships

Audio

Video Data API

Live blogs

Comment

Mobile siteiPhone app

Newspapers

2010

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

21 May 2010Apache Lucene EuroCon 9

2009★ 1.5M pages

and counting

★ 250M+ pages/month

★ 30M visitors/month

★ 4x Webby award winner (best newspaper site)

21 May 2010Apache Lucene EuroCon 12

2. Addressable★ Resources are “about” something - ready for the

social web.

★ We live in “the age of point-at-things” (Coates 2005)

21 May 2010Apache Lucene EuroCon

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

+business/globaleconomy

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

+business/globaleconomy

21 May 2010Apache Lucene EuroCon

/technology/internet

/technology/all

/environment/climatechange

The hackable guardian.co.ukhttp://www.guardian.co.uk/....

/rss

/rss

+business/globaleconomy/rss

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs

First release

Final ReleaseSite traffic growthUnique Users

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs Pre - project

First release

Final ReleaseSite traffic growthUnique Users

21 May 2010Apache Lucene EuroCon 17

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs Pre - project

First release

Final ReleaseSite traffic growthUnique Users

36M

21 May 2010Apache Lucene EuroCon 23

....”How I stopped worrying about my website and learned to love the whole Internet.”

Matt McAlister

21 May 2010Apache Lucene EuroCon 24

OPEN IN

Bring in data and apps from the Internet

OPEN OUT

Enable partners to build applications using Guardian content and services for other digital platforms

The Open Strategy

21 May 2010Apache Lucene EuroCon 28

"Our most interesting experiments lie in combining what we know with the experience, opinions and expertise of the people who want to participate rather than passively receive.”

21 May 2010Apache Lucene EuroCon 30

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

BETA

OPEN IN

Bring in data and apps from the Internet

21 May 2010Apache Lucene EuroCon 30

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

BETA

21 May 2010Apache Lucene EuroCon 31

The suite of services enabling partners to build

applications with the Guardian

BETA

21 May 2010Apache Lucene EuroCon

32

CONTENT API

A service for selecting and

collecting content from the Guardian

for re-use

DATA STORE

A directory of useful data curated by Guardian editors

POLITICS API

Open database of candidates, voting

records, constituencies, election results,

live data on election day

BETA

21 May 2010Apache Lucene EuroCon

Guardian database

CMSSearch engine

REST API

Your App Here!BETA

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

21 May 2010Apache Lucene EuroCon

BETA

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

21 May 2010Apache Lucene EuroCon 39

POLITICS APIOpen database of candidates, voting

records, constituencies, election results, live data on election day

BETA

21 May 2010Apache Lucene EuroCon 41

3 Tiers of access, 3 Revenue models

BESPOKE: Take, reformat, augment our content. Same access as Guardian. Revenue model to be negotiated. Combination of Media, Fees, Downloads.

APPROVED: Take our full article content, with an advert. Guardian keeps ad revenue, you keep rest-of-page revenue

KEYLESS: Take our headlines. You keep associated revenues

1

21 May 2010Apache Lucene EuroCon 43

OPEN OUT: Developers can now access our full content APIs on demand with keys post-approved.

We are now positioning the platform as a place to do business with us.

So, rapid scalability, reliability, performance, are now core requirements

What this means

21 May 2010Apache Lucene EuroCon

44

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

DATA STOREA directory of

useful data curated by Guardian

editors

POLITICS APIOpen database of candidates, voting

records, constituencies,

election results, live data on election day

2 Open In

21 May 2010Apache Lucene EuroCon

44

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

DATA STOREA directory of

useful data curated by Guardian

editors

POLITICS APIOpen database of candidates, voting

records, constituencies,

election results, live data on election day

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

2 Open In

21 May 2010Apache Lucene EuroCon 45

OPEN OUT

Allow partners to build applications using Guardian content and services for other digital platforms

OPEN IN

Bring in data and apps from the Internet

21 May 2010Apache Lucene EuroCon 49

What this meansOpen In: Partners can now more easily integrate into our core

The Open Platform will become key to our commercial future.

21 May 2010Apache Lucene EuroCon 51

From Publisher to Platform

★Seeking massive growth, but no longer only broadcasting content

★User/partner engagement & contribution on★journalism★data★software★applications★revenue and ads

★ Support developers and partners with data and APIs, need scalability, reliability, speed

21 May 2010Apache Lucene EuroCon

App server App server App server

Web server Web server Web server

CMS

Oracle

Memcached

21 May 2010Apache Lucene EuroCon

App server App server App server

Web server Web server Web server

CMS Data feeds

Oracle

Memcached

Why RDBMS?

5 years ago, fewer alternatives

Understand operations procedures

Can easily recruit DBAs / devs

Developer/ops tools

Business critical system: a safe choice

21 May 2010Apache Lucene EuroCon 55

3,750,000

7,500,000

11,250,000

15,000,000

18,750,000

22,500,000

26,250,000

30,000,000

Sep 2005 Feb 2006 Jul 2006 Dec 2006 May 2007 Oct 2007 Mar 2008 Aug 2008 Jan 2009

Uni

que

Use

rs

Unique Users

21 May 2010Apache Lucene EuroCon

12,250,00014,500,00016,750,00019,000,00021,250,00023,500,00025,750,00028,000,000

May 2008 Jul 2008 Sep 2008 Nov 2008 Jan 200956

Unique Users

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

21 May 2010Apache Lucene EuroCon

Whatʼs going on?

57

★We tag our content (multifaceted)

★Guardian.co.uk is a faceted browse through our tag-space, with editorial teams “spotlighting” key resources on selected nodes.

★Can apply multiple facets in queries faster in a search-like architecture, than an RDBMS

21 May 2010Apache Lucene EuroCon

Guardian database

CMSSearch engine

REST API

Your App Here!

CONTENT APIA service for selecting and collecting content from the Guardian for

re-use

21 May 2010Apache Lucene EuroCon

We used Solr/LuceneCan perform complex queries, including full text search

We can change the schema with no downtime.

On our dataset most queries are of a similar cost

Scales very well horizontally

Replication makes it easy to work in the cloud

62

Solr

Content API

Cloud, EC2

21 May 2010Apache Lucene EuroCon

App server

Web servers

CMS

Memcached

Core

Solr

Solr

Solr

Solr

Solr

rdbms

63

21 May 2010Apache Lucene EuroCon

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

Simple REST/ HTTP framework allows lightweight development

Applications proxied for performance

Apps generally hosted in the cloud, hot deployment into production

Open in?

21 May 2010Apache Lucene EuroCon

MICROAPPSA framework for

integrating 3rd party applications into guardian.co.uk.

Simple REST/ HTTP framework allows lightweight development

Applications proxied for performance

Apps generally hosted in the cloud, hot deployment into production

Open in?

21 May 2010Apache Lucene EuroCon

App server

Web servers

CMS

Memcached

Core

App

App

App

App

App

App

Apps

Proxy

external hostingapp engine etc

rdbms

65

21 May 2010Apache Lucene EuroCon

App servers

Web servers

CMS

Memcached

Solr

Solr

Solr

Solr

Solr

Solr

Cloud, EC2

App

App

App

App

App

App

Proxyexternal hostingapp engine etc

rdbms

OPEN IN OPEN OUT

21 May 2010Apache Lucene EuroCon 68

Thank you

http://www.guardian.co.uk/open-platform

Twitter: @openplatform @cuica (Stephen Dunn)

top related