From unstructured data to structured journalism
Giuseppe FutiaNexa Center for Internet and Society, Politecnico di Torino (DAUIN)
April 12, 2016Master in Giornalismo "Giorgio Bocca" di Torino
Nexa Center for Internet & Society at Politecnico di Torino
Website: http://nexa.polito.it/
Communication ManagerWebsite, social media,
mailing-list
Research FellowGitHub account:
https://github.com/giuseppefutia
Start with Why
Presentation ofJonathan Stray
(Journalist, data scientist)
YouTube Video:
https://www.youtube.com/watch?v=z4wHiv4bs-Y
Who said What?Best tool for multi-lingual
journalists
#newsHack 2016
organized byBBC Connected Studio
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
Team
• 1 Product manager
• 1 Software engineer
• 2 Researchers
• And journalists…?
New York Times, BBC, Washington Post
Source: Poynter.org
Using "machine learning," technologists at news outlets around the world are helping newsrooms eliminate extra time-consuming tasks and giving humans more time to do what they do best: reporting the news (Poynter.org)
Juicer BBC News Labs
Linked Data CloudSource:
https://en.wikipedia.org/wiki/Linked_data
Knowledge Map Washington Post
Panama papers leak Source: Wired.com
Panama papers leak
• 11.5 million of documents
– 4.8 million of mails
– 4 million of database entries
– 2 million of PDFs
– 1 million of images
– 320.000 text documents
• 100 news organisations and 400 journalists
Panama papers processing
• Sort and organise the files
• Index these files
• Bring out all of the metadata
• Investigate data from the big data and analytical perspective
Panama papers result
• The final database: 30 per cent of the original data size
• Bring out entities: first names and second names
• Analytics to find how these names refer to the documents
TellMeFirst http://tellmefirst.polito.it
Public Contracts http://public-contracts.nexacenter.org/
Data journalism as a framework
BBC News Labs Project
“To help news organisationscurate stories that scale, adapt and connect across platforms
and use cases”
Thanks!
GitHub Repository
https://github.com/giuseppefutia/