datajournalism: how to get data and process them?
DESCRIPTION
Workshop on datajournalism given at the DataDays organised by the Open Knowledge Foundation on the 17th of February 2014.TRANSCRIPT
Workshop on Data Journalism
February 17, 2014Ghent
How to get the data and
how to process them?
Lorenzo Pellizzari1
2
About me …
Get the data
Receive it
Advanced search techniquesScrape it
How to get the data?
3
Receive it
4
1
Analyzing the War Logs (Associated Press)
Advanced search techniques: Google
5
2
79.300.000 results
5results
Advanced search techniques: SPARQL
6
2
http://dbpedia.org/sparql
Advanced search techniques: SPARQL
7
2
Advanced search techniques: SPARQL
8
2
http://latemar.science.unitn.it/spacetime/spacetime.html
Freedom of Information laws
9
3
Freedom of Information laws
10
3
Scrape your data
11
4
“Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.” (Wikipedia)
http://www-news.iaea.org/
Scrape your data
12
4
Scrape your data
13
4
14
What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]
Process the data
http://www.kdnuggets.com/
15
The software for data analysis
Share of R- or SAS-related posts to Stack Overflow by week.
http://r4stats.com/articles/popularity/
16
The software for data analysis
17
Example: ABC News
Scraping: Main data coming from gouvernemental websites
Variety of reports: Data on salt and water
FOI: Data on chemical releases
Interactive map of gas wells and leases in Australia
http://datajournalismhandbook.org/
18
Example: ABC News
• A web developer and designer
• A lead journalist
• A part time researcher with expertise in data extraction, excel spread sheets and data cleaning
• A part time junior journalist
• A consultant executive producer
• A academic consultant with expertise in data mining, graphic visualization and advanced research skills
• The services of a project manager and the administrative assistance of the ABC’s multi-platform unit
• Importantly we also had a reference group of journalists and others whom we consulted on a needs basis
http://datajournalismhandbook.org/
19