datajournalism: how to get data and process them?

19
Workshop on Data Journalism February 17, 2014 Ghent How to get the data and how to process them? Lorenzo Pellizzari 1

Upload: lorenzo-pellizzari

Post on 22-Nov-2014

310 views

Category:

Technology


0 download

DESCRIPTION

Workshop on datajournalism given at the DataDays organised by the Open Knowledge Foundation on the 17th of February 2014.

TRANSCRIPT

Page 1: DataJournalism: How To get data and process them?

Workshop on Data Journalism

February 17, 2014Ghent

How to get the data and

how to process them?

Lorenzo Pellizzari1

Page 2: DataJournalism: How To get data and process them?

2

About me …

Page 3: DataJournalism: How To get data and process them?

Get the data

Receive it

Advanced search techniquesScrape it

How to get the data?

3

Page 4: DataJournalism: How To get data and process them?

Receive it

4

1

Analyzing the War Logs (Associated Press)

Page 5: DataJournalism: How To get data and process them?

Advanced search techniques: Google

5

2

79.300.000 results

5results

Page 6: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

6

2

http://dbpedia.org/sparql

Page 7: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

7

2

Page 8: DataJournalism: How To get data and process them?

Advanced search techniques: SPARQL

8

2

http://latemar.science.unitn.it/spacetime/spacetime.html

Page 9: DataJournalism: How To get data and process them?

Freedom of Information laws

9

3

Page 10: DataJournalism: How To get data and process them?

Freedom of Information laws

10

3

Page 11: DataJournalism: How To get data and process them?

Scrape your data

11

4

“Web scraping (web harvesting or web data extraction) is a computer software technique of extracting information from websites.” (Wikipedia)

http://www-news.iaea.org/

Page 12: DataJournalism: How To get data and process them?

Scrape your data

12

4

Page 13: DataJournalism: How To get data and process them?

Scrape your data

13

4

Page 14: DataJournalism: How To get data and process them?

14

What Analytics, Data mining, Big Data software you used in the past 12 months for a real project (not just evaluation) [798 voters]

Process the data

http://www.kdnuggets.com/

Page 15: DataJournalism: How To get data and process them?

15

The software for data analysis

Share of R- or SAS-related posts to Stack Overflow by week.

http://r4stats.com/articles/popularity/

Page 16: DataJournalism: How To get data and process them?

16

The software for data analysis

Page 17: DataJournalism: How To get data and process them?

17

Example: ABC News

Scraping: Main data coming from gouvernemental websites

Variety of reports: Data on salt and water

FOI: Data on chemical releases

Interactive map of gas wells and leases in Australia

http://datajournalismhandbook.org/

Page 18: DataJournalism: How To get data and process them?

18

Example: ABC News

• A web developer and designer

• A lead journalist

• A part time researcher with expertise in data extraction, excel spread sheets and data cleaning

• A part time junior journalist

• A consultant executive producer

• A academic consultant with expertise in data mining, graphic visualization and advanced research skills

• The services of a project manager and the administrative assistance of the ABC’s multi-platform unit

• Importantly we also had a reference group of journalists and others whom we consulted on a needs basis

http://datajournalismhandbook.org/

Page 19: DataJournalism: How To get data and process them?

19