data journalism 101
DESCRIPTION
Data Journalism 101 workshop, presented by AP data journalist Serdar Tumgoren on April 29, 2014 to Bay Area journalists. Organized by the Society of Professional Journalists - Northern California chapter.TRANSCRIPT
![Page 1: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/1.jpg)
Data Journalism 101
![Page 2: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/2.jpg)
What is data journalism?
![Page 3: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/3.jpg)
DJ in the wild
![Page 4: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/4.jpg)
![Page 5: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/5.jpg)
![Page 6: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/6.jpg)
![Page 8: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/8.jpg)
What is data journalism?
?
?
? ?
?
??
?
? ??
? ???
![Page 9: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/9.jpg)
“Wrangling, vetting and visualizing data to bring forth news stories in the public interest that we never would have found otherwise.” - Garance Burke, AP data journalist
![Page 10: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/10.jpg)
“A data journalist is anyone ...who can fluently work with this primary source [data]. It’s the same as a traditional reporter, who should know how to hunt down human sources and interview them.”- Me (I know, so lame to quote yourself)
![Page 11: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/11.jpg)
“Data journalism is a form of reporting that makes use of structured data (e.g. spreadsheets, databases) as a key component of researching and telling stories.”- Chad Skelton, data journalist at Vancouver Sun and journalism instructor
![Page 12: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/12.jpg)
“Data can be the source of data journalism, or it can be the tool with which the story is told — or it can be both. Like any source, it should be treated with scepticism; and like any tool, we should be conscious of how it can shape and restrict the stories that are created with it.”- Paul Bradshaw, Data Journalism Handbook
![Page 13: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/13.jpg)
Step-by-Step Guide on How To Become a Journicorn
![Page 14: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/14.jpg)
Step 1: Master the Basics
In no particular order:
Excel, MySQL, Postgres, SPSS, R, Javascript, Linux, Python, Ruby, QGIS, pdftk, ARCGIS, Ruby on Rails, Django, Backbone, Node, Hadoop, Mongo, C, Algol, Hypercard, Can, You, Tell, I’m, Just, Making, Shit, Up, Now?
![Page 15: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/15.jpg)
Don’t try to be a Journicorn.(Hint: They don’t exist.)
![Page 16: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/16.jpg)
Be a journalist who uses data.
Data is just another source.
![Page 17: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/17.jpg)
Start with a Question, then Data
● Are housing prices going up?● Do reports of falling crime bear out across
the entire city?● Are developers helping to finance
campaigns of politicians who approved their projects?
● Are public employee salaries on the rise?
![Page 18: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/18.jpg)
Data sources
● Public agencies (local, county, state, federal)● Data.gov sites● Social networking sites (often APIs)
● Nonprofits/industry experts● Academic institutions● Manually gathered
![Page 19: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/19.jpg)
Databases of Databases
● Paid○ Accurint ($)○ Nexis ($)
● Free○ BRB○ Online Searches○ Libraries
![Page 20: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/20.jpg)
Not everything is on the web.
A whole world of data may never see light of day on gov websites. How do you find it?
● Government forms provide clues● Gov employees● Software contracts and manuals
![Page 21: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/21.jpg)
Useful datasets● Building permits● Campaign finance● Corporate records● Election● Inspections● Planning & Zoning● Land records● Etc. Etc.
![Page 22: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/22.jpg)
Open Records Laws
● Know and understand your rights● Try to negotiate first● Seek expert advice (CalAware, CFAC)● Don’t go fishing; craft targeted requests● Follow through on requests
![Page 23: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/23.jpg)
FOIA Resources
● RCFP Letter Generator● RCFP Open Gov Guide● FOIA Machine
![Page 24: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/24.jpg)
So I’ve found data. Now what?
![Page 25: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/25.jpg)
Understand the Data.
● What is the origin of the data?● What do the fields mean?● What rules surround the data?● Seek expert advice and sanity checks.
![Page 26: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/26.jpg)
Wrangle the Data.
● What format is the source data?● How do I convert the data for tool of choice?● Explore the data. Is it dirty?● What cleanups are needed to answer my
question?
![Page 27: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/27.jpg)
Sort, Filter, Sum, etc.
● Spreadsheets can take you far.● Aggregate functions in SQL.● Patterns and outliers in stats programs.
![Page 28: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/28.jpg)
Add tools as needed.
Tools are abundant, free and paid.Knowledge is abundant, freely shared*.
(*see IRE-L/NICAR-L)
![Page 29: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/29.jpg)
Keep reporting.
Most often data is a starting point or supplement. Check conclusions in the real world and circle back to refine and qualify data analyses.
![Page 30: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/30.jpg)
If you’re a visual person...
...confounded by the last few bits (like me)...
![Page 31: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/31.jpg)
Talk to people
“What data do I need to answer my question?”
Get The Data
Clean The Data
Check The Data
Interview The Data Interview People
Display The Data
Tell The Story
The Data Journalism Process
![Page 32: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/32.jpg)
Quick Hit Data Wrangling
![Page 33: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/33.jpg)
Story idea is the key.
Most stats were already available and supported or confirmed by reporting. But we wanted county breakdowns for 2013 (most recent full year of granular data). So...
![Page 34: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/34.jpg)
Data wrangling ain’t pretty.
We got (dirty) data for 2013.
● copy/paste -> Excel = Fail● pdftk -> CSV -> Excel = Fail● pdftk -> CSV -> python -> Excel = Success
![Page 35: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/35.jpg)
Check the data.
A few strategies to ensure accuracy:
● Manually calculate a sample of subtotals, compare to calculated results.
● Compare totals to summary stats from third party.
● Have someone else check your work.
![Page 36: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/36.jpg)
Keep a Data Diary
● Document data sources● Document field descriptions, quirks, etc.● Document data cleaning process● Document analysis
![Page 37: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/37.jpg)
Remember.
Journicorns don’t exist.
![Page 38: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/38.jpg)
The Data Padawan
● See data as another source.● Find and master tools, as needed.● Write stories.● Keep learning. ● Rinse and repeat.● The end.
![Page 39: Data Journalism 101](https://reader034.vdocuments.us/reader034/viewer/2022042500/54c64eff4a7959ad7b8b45a0/html5/thumbnails/39.jpg)
Join the Community
If you do nothing else, sign up forIRE-L and NICAR-L.
Also, shameless plug for PythonJournos.