Download - An introduction to Data Journalism
Data journalism - setting the
stageAnders Pedersen @anpe
@SchoolOfData
Open Knowledge
Open Knowledge is a worldwide non-profit network of people passionate about openness, using advocacy, technology and training to unlock information and enable people to work with it to create and share knowledge.
Evidence is power
School of Data works to empower civil society organizations, journalists and citizens with the skills they need to use data effectively – in their effort to create better societies.
Target audience
We work mostly with change makers: NGOs and journalists.
We empower them to use data effectively to advance their cause and mission through a combination of training and long terms support.
Why School of Data
School of Data is a critical component of the open data ecosystem:
● provides tools and training to empower people to use open data for good - especially to people new to open data;
● supports outreach and engagement by creating a supportive community of learners and mentors - working with Open Knowledge Foundation Local Groups;
● creates opportunities for people and communities to use open data to make an impact;
● works both with governments to open up data and data users such as journalists and NGOs.
Slide name here
● Data expeditions - online and offline short gatherings where a group of people with different backgrounds tackle a data related problem
● Data clinics - hands on support working directly with people’s data● Mentoring - local mentors working with local communities● Online content - tutorial and walkthroughs ● Offline resources e.g. Data Journalism Handbook
Slide name here
● We work globally, with a focus on the following regions: Latin America, Sub Saharan Africa and Middle East, Europe
● School of Data is translated in Spanish and Portuguese● Future: French, Greek and Italian● Over 10 fellows working in countries like: Egypt, Lebanon,
Uganda, Mexico, Costa Rica, Brazil, etc.
Data Journalism:
Setting the stage
Where do gun owners live?
Complex stories can now be told
Budget information that readers can understand
But be aware of complexity!
How quickly will the ambulance arrive?
Source: http://visualoop.com/media/2012/11/How-fast-is-LAFD-where-you-live-750x298.jpg
Enables you to focus locally
And how about the fire truck?
Fire fighter response times in London
Granularity is king
Tip: the story is almost always buried in granular data
Source: Mapumental
Granularity is king
Who benefits from government subsidies?
Who are benefiting from government contracts?
Source: http://usual-suppliers.pudo.org/
Data journalism is also text mining
● U.K. MP expenses – 700,000 documents in PDF-format
● Wikileaks Iraq war data – 391,832 structured records, each including a text descriptions
● Wikileaks diplomatic cables – 251,287 cables, each a few pages long
● NSA files leaked by Snowden – 50,000 to 200,000 according to the NSA
A text document also contains data
Source: Jonathan Stray, Overview project
Telling clear stories
Where do companies live?
Company ownership networks
Where do people live?
Source: Where nobody lives, http://mapsbynik.tumblr.com/post/82791188950/nobody-lives-here-the-nearly-5-million-census
Demographics: Where nobody lives
Using statistics can help you find stories
Stories in statistics: regression analysis and outliers → test fraud cases
Condition: Machine readable data
Nothing beats a good CSV file
Good data is rarely available
How we often get important data
Government official: “Please receive our annual audit reports in this stack of papers.”
Hard copies = hard work!
Crowd cleaning of data
When data is messy: Readers can assist extracting and cleaning data
Crowd cleaning of data
Readers can annotate documents
Mapping people, power and money
Source: “Who is in charge” created by CIVIO (Spain), http://quienmanda.es/
Mapping relationships
Who are friending who?
What is in a picture? Matching faces to names
Source: vg.no mapping the royal family network in Norway (left), Dirty Energy Money (right)
Connected China
Source: “Who is in charge” created by CIVIO (Spain), http://quienmanda.es/
Data on relationships
Crowd collection of data
Readers can assist collecting data
A clear bar chart is often all you need
Spending: make readers understand
Where to find the data?
The data journalism tool box● Extraction and scraping
○ Tabula○ Scraperwiki○ Online OCR
● Data cleaning○ Open Refine ○ Spreadsheets - yes, you cannot live
without● Visualisation
○ DataWrapper - http://datawrapper.de/○ D3.js - http://d3js.org/
The Data Journalism HandbookSchool of Data
The tools you need
The data journalism tool box● Extraction and scraping
○ Tabula○ Scraperwiki○ Online OCR
● Data cleaning○ Open Refine ○ Spreadsheets - yes, you cannot live
without● Visualisation
○ DataWrapper - http://datawrapper.de/○ D3.js - http://d3js.org/
The Data Journalism HandbookSchool of Data
The tools you need
Mailing lists
Thank you!Stay in touch:
[email protected] | [email protected] @anpe | @SchooOfData