use of geospatial and web data for oecd...
Post on 18-Aug-2020
0 Views
Preview:
TRANSCRIPT
USE OF GEOSPATIAL AND WEB DATA FOR OECD STATISTICS
CCSA SPECIAL SESSION ON SHOWCASING BIG DATA 1 OCTOBER 2015
Paul Schreyer Deputy-Director, Statistics Directorate, OECD
OECD APPROACH
• OECD: – Facilitator of discussion on new data sources
for NSOs – OECD’s own use of new data sources
• From Big Data to Smart Data – Not every New data source is Big
Not every Big data source is New
Business value analysis: why are we working on this?
• More granularity or coverage of existing data
(e.g. spatial disaggregation) • New output (e.g., measuring trust, inequalities) • Greater timeliness – nowcasting • Increased impact – analysis supporting OECD
mission, possibility to link areas • Increased responsiveness – capacity to address new
topics quickly, respond to what-if questions
– Capacity to identify, evaluate and access new data sources
– Command of methodology – Proven quality and metadata frameworks – Suitable IT infrastructures – Established legal and ethical frameworks – Skills and training capacity
Business process analysis: Necessary capabilities
* Online Real estate prices (OECD GOV)
* Measuring trade restrictiveness by scraping and analysing trade laws (OECD TAD)
Web crawling, web scraping
Content Analysis Mobility studies Sensor and geospatial data
* African Economic Outlook (AEO): Civil tensions and political governance indicators (OECD DEV)
* Big Data Measures of Human Well-Being – Evidence from US Google Index (OECD STD)
* Measure transport reliability from geolocalisation logs (ITF)
* Air quality and land cover data (OECD GOV)
* Enriching the metropolitan database using geo-spatial data (OECD GOV)
* PIAAC log file data (OECD EDU)
4 types of new sources and examples of use cases
EXAMPLE 1 ENVIRONMENTAL INDICATORS Using geospatial data (satellite data)
– Where air pollution is above recommended levels
– Where improvements in air quality have happened
– Linking air pollution to health
Average population exposure to air pollution (PM2.5)
Key messages that the indicator should communicate
Source: Raster (satellite observations)
9
Ground-based stations Satellite observations Advantages • Direct measures
• Offer regular levels of air pollution over time
• More pollutants are available
• Global coverage • Consistent method to compute air
pollution in cities, regions and countries
• Consistent time-series data, spanning more than a decade
Disadvantages • Low coverage in developing countries • Uneven coverage within and across
countries • PM2.5 concentration rarely monitored • Site selection, measurement
techniques, and reporting methods differ across regions and countries
• Modelled data • Satellite observations are less precise
for bright surfaces (snow or desert) • Current data are on a multi-year
average, evaluation of short-term events often unavailable
Satellite observations • Raster: van Donkelaar et al. (2014) • Resolution: ~10 km2 • Years: 1998-2012
1. The satellite-based values of air pollution are multiplied by the population living in the area (using a 1km2 resolution grid)
2. The exposure to air pollution in a region is given by the sum of the population weighted values of PM2.5 in the 1km2 grid cells falling within the boundaries of the region
3. Finally, dividing this aggregated value by the total population in the region, we obtain the average exposure to PM2.5 concentration in a region
Basic methodology
• 68% of the urban population in OECD countries (376 million people) are exposed to pollution above the WHO’s recommended levels.
• OECD estimates show wide variation in PM2.5 exposure levels across cities within countries, the largest in Mexico, Italy, Japan and Korea
11
Levels and trends in OECD cities M
érid
a Pale
rmo
Naha
Ulsa
n
Toul
on
Portl
and
Gdańsk
Las
Palm
as
Brem
en
Stoc
khol
m
Gla
sgow
Brno
Conc
epció
n
Gen
eva
Que
bec
Utre
cht
Lisb
on
Athe
ns
Antw
erp
Linz
Cuer
nava
ca
Mila
n
Kum
amot
o
Cheo
ngju
Stra
sbou
rg
Buffa
lo
Krak
ów
Zara
goza Es
sen
Mal
mö
Live
rpoo
l Ost
rava
Sant
iago
Zuric
h
Toro
nto
The
Hagu
e
Porto
Thes
salo
nica
Brus
sel
Vien
na
Buda
pest
Brat
islav
a
Ljub
ljana
Cope
nhag
uen
Helsi
nki
Tallin
n
Oslo
Dubl
in
-10
010
2030
40
Mex
ico (3
3)
Italy
(11)
Japa
n (3
6)
Kore
a (1
0)
Fran
ce (1
5)
Unite
d St
ates
(70)
Pola
nd (8
)
Spai
n (8
)
Ger
man
y (2
4)
Swed
en (3
)
Unite
d Ki
ngdo
m (1
5)
Czec
h Re
publ
ic (3
)
Chile
(3)
Switz
erla
nd (3
)
Cana
da (9
)
Neth
erla
nds
(5)
Portu
gal (
2)
Gre
ece
(2)
Belg
ium
(4)
Aust
ria (3
)
Hung
ary
(1)
Slov
ak R
epub
lic (1
)
Slov
enia
(1)
Denm
ark
(1)
Finl
and
(1)
Esto
nia
(1)
Norw
ay (1
)
Irela
nd (1
)
Metropolitan minimum Country average Metropolitan maximum
Coun
try (N
o.of
citie
s)
Source: Brezzi and Sanchez-Serra (2014)
Europe USA Japan World
Raster name
Corine land cover
National land cover dataset (NLCD)
Japan National Land Service Information data
MODIS 500 Map of Global Urban Extent
Resolution 25 metres 30 metres 100 metres 500m
Years 2000-06 2001-06 1997-2006 2008
Classif. of urban land
44 land urban classes
21 land cover classes
11 land cover classes
17 land cover classes Water
Other example: raster sources used for land cover
…feeds into the OECD Regional Well-Being Database
Links: Regional Well-Being database Regional Well-Being web tool
EXAMPLE 2 TRADE POLICY ANALYSIS Using qualitative data from government websites
Basic idea
Traditionally: • Policy questionnaires to countries • ‘Manual’ screening of government websites New: • Machine-based monitoring of government web sites • Automatic check for changes or addition of rules and
regulations
Test case: qualitative information for the OECD’s trade restrictiveness information and index
Text comparison - Initial discovery Run a text comparison between the original document
and the new updated document Detect and flag specific paragraphs changed or
updated inside long documents Text comparison - Advanced discovery. Changes in rules and regulations can also happen
through new pages Use ‘big data’ techniques to compare in house
structured information to the universe of laws and regulations in a given country.
Work on text definitions similar to the original ones to help identifying potentially relevant documents.
How?
Web-crawling: scripts to systematically scan governmental websites where regulations can be found (federal, provincial, regional, etc.).
Web-scraping: scripts to extract the relevant information in documents, possibly based on articles and paragraphs (text analysis).
Document conversion: most laws and regulations are in pdf but possibly in other formats that would need to become text documents to run text analysis.
Text comparison: tools and dictionaries to compare the text of updated documents with the original text, to calculate similarity coefficients with other documents, in a variety of languages with the option to also use proximity of similar words.
IT Tools
Promising results on French legal texts (Legifrance)
Web scraping / Text analysis
• Significant potential • Use cases and pilots provide really
important reality checks • Smart data and multiple source, not
necessarily big data • Initiatives have sprung in many parts of
OECD • Need to be accompanied by overall
strategy being developed at OECD
Summary
Thank you!
top related