data, big data, open data

87
data, big data, open data Vincenzo Patruno Roma, 29 gennaio 2013 Innovazione tecnologica, web e statistica

Upload: vincenzo-patruno

Post on 01-Dec-2014

1.538 views

Category:

Education


10 download

DESCRIPTION

 

TRANSCRIPT

Page 1: data, big data, open data

data, big data, open data Vincenzo Patruno

Roma, 29 gennaio 2013

Innovazione tecnologica,

web e statistica

Page 2: data, big data, open data

Un mondo di dati

Page 3: data, big data, open data

Obama’s Election

Victory

Page 4: data, big data, open data

Combining disparate data sources of potential donors, volunteers and voters

(email, postal, telephone, mobile and social contacts with historical voting

records, polling and fundraising data)

They built a single view of individuals that informed

their strategies for raising funds, mobilizing

volunteers and securing votes.

Creating a “single source of truth”

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Page 5: data, big data, open data

Demographics and data collected by fieldwork on the campaign trail were

added to the mix, allowing predictive modelling to score people on their

likelihood to donate or vote for the Democrats.

Channels of communication were optimized, and the

type of messaging was tailored to maximize the

likelihood of response.

Profiling and predicting

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Page 6: data, big data, open data

The power of localised networks and

neighbourhoods

Using centralized data to provide geo-targeted insight, campaign

volunteers could base themselves in the areas that mattered most, talking

to the voters they had got to know since the start of the 2008 campaign.

Deliver their message from within communities

The impact of this saw them receive double the votes they achieved in

2008 in the marginal states.

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Turning data into the human touch

Page 7: data, big data, open data

Sono stati oltre due milioni i piccoli donatori che hanno

versato nelle casse della sua campagna oltre 427

milioni di dollari.

Circa il 55% dei fondi raccolti sono arrivate da donazioni

sotto i 200 dollari.

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Turning data into the human touch

Page 8: data, big data, open data

Regular polling of states like Ohio throughout the

campaign provided valuable data for the team to process

and analyze trends.

For example, the analysts could track the impact of the three TV debates on

the democratic vote in real-time and were able to identify specific segments to

target with campaign material – split by region, demographics and the profile

scoring that had been modeled in the new database. One Democrat official

commented that they scenario tested the election 66,000 times every night in

order to calculate predicted outcomes for swing states.

Campaign resource was then allocated appropriately to persuade undecided

voters most likely to pledge their allegiance to Obama.

By the time election day came around, the Democrats had

a clear idea of how voting in the swing states was looking. Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Focus on the swing states

Page 9: data, big data, open data

Data science involvement in the election wasn’t

just restricted to the candidates’ teams.

Nate Silver used sabermetrics to accurately predict the outcome of

all 50 state votes

Source: http://connect.icrossing.co.uk/obamas-big-data-election-victory_9423

Page 10: data, big data, open data

Big Data – What Is It?

Volume. Variety. Velocity.

Variability. Complexity.

Taken together, these three “Vs” of Big Data were originally posited by Gartner’s

Doug Laney in a 2001 research report.

Big Data – What Is It?

Volume. Variety. Velocity.

Variability. Complexity.

Taken together, these three “Vs” of Big Data were

originally posited by Gartner’s Doug Laney in a 2001 research report.

Page 11: data, big data, open data

“It’s difficult to imagine the power that you’re going to have when so many different sorts of data are available” Tim Berners Lee

Page 12: data, big data, open data
Page 13: data, big data, open data

Data never sleeps

Page 14: data, big data, open data
Page 15: data, big data, open data
Page 16: data, big data, open data
Page 17: data, big data, open data

Source: http://ipcarrier.blogspot.it/2010/12/facebook-world.html

Facebook World

Page 18: data, big data, open data

http://youtu.be/xJXOavGwAW8

Page 19: data, big data, open data

The Data Deluge

Page 20: data, big data, open data

MOBI methodology combines online measurement, cloud

computing and market research that provides live consumer

sentiment data around brands, products and purchase influencing

factors using decision-supported information from millions of

unsolicited opinions.

Mass Opinion Business Intelligence (MOBI) analyzes and

classifies comments made online and distills the information into a

pre-defined, structured database.

http://en.wikipedia.org/wiki/WiseWindow

Page 21: data, big data, open data

Financial Services Industry: Bloomberg and

WiseWindow use social media and big data to improve

investment returns. http://en.wikipedia.org/wiki/WiseWindow

Page 22: data, big data, open data

Natural disasters: Twitter was a richer and more up-to-

date source of information about the 5.8 magnitude

quake in Virginia.

Page 23: data, big data, open data

Traffic Twitter after Japan earthquake

http://youtu.be/PThAriHjk10

Page 24: data, big data, open data

Automotive Industry: Big data analysis of social media

comments can predict trends in automotive equipment

failures.

Page 25: data, big data, open data

Telecommunications: T-Mobile used big data integrated

with its transaction systems and social media to

dramatically cut customer defections in one quarter.

Page 26: data, big data, open data

Energy/Utility Industry: GE is going to use social media

reports to track outages faster and better.

Page 27: data, big data, open data

Advertising Industry: Dachis Group used big data

analysis of social media to create a more up-to-date and

accurate ranking of the competitive position of

engagement at large companies.

Page 28: data, big data, open data

Marketing: Nestle is using social media listening and

analytics to engage at scale in the market using its big

data powered central command center.

Page 29: data, big data, open data

Education Industry: DoSomething.org engaged 200,000

people worldwide in Facebook to combat bullying in

schools and analyzed their sentiments.

Page 30: data, big data, open data

Criminal Justice: Police department around the United

States now use social media analysis extensively to

fight crime.

Page 31: data, big data, open data

Health Care Industry: Using social media and big data to

track cholera outbreaks in Haiti faster and more

accurately.

Page 32: data, big data, open data

Application

Programming

Interface

API

Page 33: data, big data, open data

API

Page 34: data, big data, open data

API

Page 35: data, big data, open data

API

API

Page 36: data, big data, open data

http://apistat.istat.it/?q=gettable&dataset=DCIS_POPORESBIL&dim=82,0,0,0&lang=

0&tr=&te=

query string

API

Page 37: data, big data, open data

API

Page 38: data, big data, open data

https://stream.twitter.com/1.1/statuses/sample.json

https://dev.twitter.com/ http://developers.facebook.com/

Es:

Page 39: data, big data, open data

http://cs.croakun.com/

Page 40: data, big data, open data

[…]

Page 41: data, big data, open data

5% pointless babble

50%

work

10% spare time activities

7%

TV and Radio 3% politics

Thanx Piet!

Page 42: data, big data, open data

http://youtu.be/iReY3W9ZkLU

Page 43: data, big data, open data

1. Big Data is Only About Massive Data Volume

Generally speaking, experts consider petabytes of data volumes as the starting point for

Big Data, although this volume indicator is a moving target. Therefore, while volume is

important, the next two “Vs” are better individual indicators.

Variety refers to the many different data and file types that are important to manage and

analyze more thoroughly, but for which traditional relational databases are poorly suited.

Some examples of this variety include sound and movie files, images, documents, geo-

location data, web logs, and text strings.

Velocity is about the rate of change in the data and how quickly it must be used to create

real value. Traditional technologies are especially poorly suited to storing and using high-

velocity data. So new approaches are needed. If the data in question is created and

aggregates very quickly and must be used swiftly to uncover patterns and problems, the

greater the velocity and the more likely that you have a Big Data opportunity.

Top 5 Myths about Big Data

Page 44: data, big data, open data

2. Big Data Means Hadoop

Hadoop is the Apache open-source software framework for working with Big Data. It was derived

from Google technology and put to practice by Yahoo and others. But, Big Data is too varied and

complex for a one-size-fits-all solution. While Hadoop has surely captured the greatest name

recognition, it is just one of three classes of technologies well suited to storing and managing Big

Data. The other two classes are NoSQL and Massively Parallel Processing (MPP) data stores.

(See myth number five below for more about NoSQL.) Examples of MPP data stores include

EMC’s Greenplum, IBM’s Netezza, and HP’s Vertica.

Top 5 Myths about Big Data

Page 45: data, big data, open data

3. Big Data Means Unstructured Data

Big Data is probably better termed “multi-structured” as it could include text strings,

documents of all types, audio and video files, metadata, web pages, email

messages, social media feeds, form data, and so on. The consistent trait of these

varied data types is that the data schema isn’t known or defined when the data is

captured and stored. Rather, a data model is often applied at the time the data is

used.

Top 5 Myths about Big Data

Page 46: data, big data, open data

4. Big Data is for Social Media Feeds and

Sentiment Analysis

Simply put, if your organization needs to broadly analyze web traffic, IT system logs,

customer sentiment, or any other type of digital shadows being created in record

volumes each day, Big Data offers a way to do this. Even though the early pioneers of

Big Data have been the largest, web-based, social media companies -- Google, Yahoo,

Facebook -- it was the volume, variety, and velocity of data generated by their services

that required a radically new solution rather than the need to analyze social feeds or

gauge audience sentiment.

Top 5 Myths about Big Data

Page 47: data, big data, open data

5. NoSQL means No SQL

NoSQL means “not only” SQL because these types of data stores offer domain-specific access and

query techniques in addition to SQL or SQL-like interfaces. Technologies in this NoSQL category

include key value stores, document-oriented databases, graph databases, big table structures, and

caching data stores. The specific native access methods to stored data provide a rich, low-latency

approach, typically through a proprietary interface. SQL access has the advantage of familiarity and

compatibility with many existing tools. Although this is usually at some expense of latency driven by the

interpretation of the query to the native “language” of the underlying system.

For example, Cassandra, the popular open source key value store offered in commercial form by

DataStax, not only includes native APIs for direct access to Cassandra data, but CQL (it’s SQL-like

interface) as its emerging preferred access mechanism. It’s important to choose the right NoSQL

technology to fit both the business problem and data type and the many categories of NoSQL

technologies offer plenty of choice.

Top 5 Myths about Big Data

Page 48: data, big data, open data
Page 49: data, big data, open data

http://youtu.be/0eUeL3n7fDs

Page 50: data, big data, open data

http://www.whitehouse.gov/sites/default/files/microsites/ostp/big_data

_press_release_final_2.pdf

Page 51: data, big data, open data

“Data scientist”

Page 52: data, big data, open data

http://open.nasa.gov/blog/2012/10/04/what-is-nasa-doing-with-big-

data-today/

Page 53: data, big data, open data

Possono i BD essere utilizzati per misurare

fenomeni Economici, Sociali, Ambientali?

Page 54: data, big data, open data

Indagini Campionarie

Archivi Amministrativi

Page 55: data, big data, open data

Le statistiche sui prezzi

Page 56: data, big data, open data

Significance magazine august 2012

Big Data and City Living – what can it do for us?

Page 57: data, big data, open data

Administrative

Big Data Sources

Sensors Transactional

Behavioural

Tracking Devices

Page 58: data, big data, open data
Page 59: data, big data, open data

Web Scraping

Page 60: data, big data, open data

Esempi

http://www.comune.torino.it/ambiente/aria/qualita_aria/dati_aria/valori

_annuali_pm10.shtml

https://scraperwiki.com/scrapers/valori_pm10_in_comune_di_torino/

Web Scraping

Page 61: data, big data, open data

Milano, 13 Dicembre 2012

Esempi

http://thebiobucket.blogspot.it/2011/10/little-webscraping-

exercise.html#more

Web Scraping

Page 62: data, big data, open data
Page 63: data, big data, open data

Esempi http://www.metoffice.gov.uk/climate/uk/stationdata/armaghdata.txt

Web Scraping

Page 64: data, big data, open data

http://elezionistorico.interno.it/

Page 65: data, big data, open data
Page 66: data, big data, open data
Page 67: data, big data, open data

L'Open Data si basa sulla constatazione che il dato pubblico è stato prodotto con denaro pubblico, quindi della collettività. Ed è a questa che il dato deve essere restituito.

Open Data

Page 68: data, big data, open data

Dati liberamente accessibili a tutti in formato aperto senza restrizioni di copyright, brevetti o altre forme di controllo che ne limitino l’utilizzo.

Open Data

Page 69: data, big data, open data

Si intende un modello di Governance a

livello centrale e locale basato sull'apertura

(partecipazione e collaborazione) e sulla

trasparenza nei confronti dei cittadini

Open Government

Page 70: data, big data, open data

Le iniziative

Page 71: data, big data, open data

Le iniziative

Page 72: data, big data, open data

Open Data

Government Data

Community Data

Corporate Data

Open Data

Page 73: data, big data, open data

Community Data

Page 74: data, big data, open data

Corporate Data

Page 75: data, big data, open data
Page 76: data, big data, open data

I cataloghi di dati

Page 77: data, big data, open data

Es. http://www.istat.it/it/files/2012/12/Tavole_XLS.zip

I formati degli Open Data

Page 78: data, big data, open data

url

Metadati

fonte licenza

categoria

data descrizione

titolo

territorio

I cataloghi di dati

Page 79: data, big data, open data

Volume Fonti

Relazioni Contesto

Page 80: data, big data, open data

Ricoveri ospedalieri

Data Integration

Page 81: data, big data, open data

Ricoveri ospedalieri

Cause di morte

Spesa sanitaria

Mappe

Dati ambientali

Industrie per ATECO

Delibere comunali

Provvedimenti

Regionali

Concessioni edilizie

Dichiarazioni dei Politici

Casellario Giudiziario

Data Integration

Page 82: data, big data, open data

Ricoveri ospedalieri

Cause di morte

Spesa sanitaria

Dati Geografici

Dati ambientali

Industrie per ATECO

Delibere comunali

Provvedimenti

Regionali

Concessioni edilizie

Dichiarazioni dei

Politici

Casellario Giudiziario

Data Integration

Page 83: data, big data, open data

RDF

Page 84: data, big data, open data

LOD Cloud

Page 85: data, big data, open data

LOD Cloud

Page 86: data, big data, open data

Semantic Web

Linked Open Data

Page 87: data, big data, open data

@vincpatruno

[email protected]

Grazie dell’attenzione!

http://www.vincenzopatruno.org