martin stabe, interactive producer, financial times

Post on 13-Dec-2014

1.375 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

The data workflowpast, present, future

Martin StabeFinancial Times

News:Rewired May 27, 2011

“Computer assisted reporting”

• As the phrase suggests, harks to an era when computerised analysis was rare– History can trace to 1950s, esp elections

• Key examples from 1980s, esp US legal stories

– Bringing social science methods to journalism• Statistics• Polling• GIS• Social network analysis

“Enterprise Joins”

• “Enterprise”– US journo jargon for a story between

‘off-diary’ and ‘investigative’

• “Join”– Database jargon for combining records from

two tables– Using common content to

locate common fields across tables– May be complex, using ‘lookup tables’

“Enterprise Joins”

• In other words, finding stories by linking two datasets, esp those not originally intended to be linked

• Often centred on common geographical records used across government– Postcodes (very good in UK)– Statistical output areas– Administrative or electoral geographies

“Interviewing data”

• Database queries are like questions to an interviewee

• Data can be a reluctant source. “Dirty” data: Artifacts of – data entry errors– Lack of coding conventions– Esoteric systems for storing stray data– Discrete collection

(eg local authorities, government departments)

Adding interactivity

• “Data is only useful if it is personal – I want to find out about schools in my area, restaurants near me and so on – or when it reveals something remarkable.”- Bella Hurrell

“The canvas for CAR”

• “The Web is the canvas for CAR, better than any other platform we’ve come up with as an industry. It has every advantage that should be available to the CAR practitioners, including unlimited depth, the ability to customize or personalize and the luxury of designing a database so that it will truly be useful to readers. Some papers get this, or are beginning to realize it.” – Derek Willis

“A fundamental change”

• “Newspapers need to stop the story-centric worldview. … So much of what local journalists collect day-to-day is structured information: the type of information that can be sliced-and-diced, in an automated fashion, by computers. Yet the information gets distilled into a big blob of text -- a newspaper story -- that has no chance of being repurposed.”– Adrian Holovaty

The data workflow• Obtain data

– Open data releases– Advanced search– Screen scraping– Freedom of Information Act– APIs, Web

• Clean, analyse and warehouse data– Excel– Google Refine– Google Fusion Tables– Visokio Omniscope (or Tableau)– Stata (or SPSS, SAS, R)– ArcView (or other GIS tools)– MySQL (or other database manager)

• Publish Data– Google Fusion Tables– Static XML (via FTP)– Dynamic XML (via PHP)

• Parsed by ActionScript in Flash• Parsed by JavaScript

The data workflow

• Visualising complex dataset– Bank debt exposure data

• Monitor site for updates• CSV source• Clean in Excel• Import to MySQL database• Generate SQL query• Publish XML• Parse with ActionScript• Publish with Flash

• Newsrewired\BIS_monitoring.PNG

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

• Adding social media on input and output– Crowdsourcing (Guardian MP expenses)– Games and viral promotion (NYT budget cutter)

The data workflow: the future

• Shifted from static to dynamic output

• Next step is automating the input side– Source APIs– Web scraping– “The web as database”

• Adding social media on input and output– Crowdsourcing (Guardian MP expenses)– Games and viral promotion (NYT budget cutter)

Cleaning data

www.ft.com/interactive

www.martinstabe.com

martin.stabe@ft.com

@martinstabe

top related