openstreetmap and wikipedia: a method for identifying ...€¦ · openstreetmap and wikipedia: a...

1
Green bold = +10% increase Green = 0-10% increase Red = 0-10% decrease Red bold = -10% decrease OpenStreetMap and Wikipedia: A Method for Identifying Cultural Ecosystem Services Programme: MSc Geographic Information Science Name of Student: Christopher French Supervisors: Muki Haklay and Gianfranco Gliozzo UCL Department of Civil, Environmental and Geomatic Engineering, Gower St, London ,WC1E 6BT Introduction The development of web 2.0 technologies [1] has led to a valuable source of geographic content called volunteered geographic information (VGI) [2] making way for a multitude of open source applications. This project focuses on two of these, OpenStreetMap (OSM) and Wikipedia, with an aim of using them to identify and analyse distributions of cultural ecosystem services (CES’s). Ecosystem services are the benefits that people experience from environments, material and immaterial [3] . This work targets CESs, these include religious and spiritual enrichment, recreational or aesthetic experiences, tourism and benefits with cognitive, educational or scientific value. Research in the field is growing and this is of great importance because there are strong linkages between CESs and health, welfare and social relations; however CES supply is declining and demand is rising. Aims and Objectives To carry out spatial and temporal analysis similar to methods used in public participatory ecosystem service research [4] but by implementing VGI data from OSM and Wikipedia instead in order to build on current knowledge and test the credibility of these data sources in this research field. A further aim was to improve upon previous OSM data extraction and processing methods [5] in terms of time and computational intensiveness, while delivering equally robust and valuable results. The chosen study area is South Wales, the extent is diverse, encompassing major coastal cities, The Valleys, the Brecon Beacons National Park and the Wye Valley, which is an area of outstanding natural beauty (AONB). References [1] O’Reilly, T. (2005). What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. http://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=1. [2] Goodchild, M. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal. 69, 211-221. [3] Millennium Ecosystem Assessment. (2005). Ecosystems and Huma Well-Being. Millennium Ecosystem Assessment. Washington, DC: World Resources Institute. [4] Brown, G., Fagerholm, N. (2015). Empirical PPGIS/PGIS mapping of ecosystem services: A review and evaluation. Ecosystem Services. 13, 119-133. [5] Mooney, P., Corcoran, P. (2012a). Characteristics of Heavily Edited Objects in OpenStreetMap. Future Internet. 4, 285-305. [6] Korner, P. (2011). GitHub: Source code for the OpenStreetMap history splitter. https://github.com/MaZderMind/osm-history-splitter. [7] OSM (2016). Full OSM history dump. http://planet.openstreetmap.org/planet/full-history. [8] Django, 2016. Documentation: GeoDjango. https://docs.djangoproject.com/en/1.10/ref/contrib/gis . [9] MEA (Millennium Ecosystem Assessment). (2005). Ecosystem and human well-being: synthesis. Island Press, Washington, D.C., USA. [10] Alessa, L., Kliskey, A., Brown, G. (2008). Social-ecological hotspots mapping: a spatial approach for identifying coupled social ecological space. Landsc. Urban Plann. 85, 27-39. Methodology The osm-history-splitter extraction software [6] was complied on the Elastic Compute Cloud (EC2) from Amazon Web Services, allowing faster download and processing of the file [7] . This was made faster by using Python code to split the edits into sub-bounding boxes based on their associated tag, grouping ones with the same tag (natural=peak). This allowed faster importing into the database instead of loading them individually as has been done previously. GeoDjango [8] was also implemented to create and populate the databases to avoid writing lengthy SQL. Selecting CES categories was done by reviewing key papers to find the most commonly used categories. These were: recreation, aesthetics, spiritual and religion, education and cultural heritage. Grouping tags into categories was challenging as no classification scheme exists therefore it had to be carried out subjectively. This process would have benefited from a sensitivity analysis, principle component analysis or peer review to add validity to the decision making. Methodological framework of the data preparation and analysis. Kernel density estimation (KDE) of all cultural ecosystem services Results Figure 2 - Total yearly OSM edits relating to CES’s, June 2006 - June 2016 Figure 4 - Total OSM edits relating to each of the CES’s, split into 2 year time periods, June 2006 - June 2016 Figure 6 - Percentage of OSM edits relating to each CES’s in each category, June 2006 - June 2016 Figure 3 - Total Wikipedia article revisions relating to CES’s, June 2006 - June 2016 Figure 5 - Total Wikipedia article revisions relating to each of the CES’s, split into 2 year time periods, June 2006 - June 2016 Figure 7 - Percentage of Wikipedia article revisions relating to each CES’s in each category, June 2006 - June 2016 Figure 8 - Kernel density estimation raster surface displaying all OSM edits relating to CES’s, June 2006 - June 2016 Figure 9 - Kernel density estimation raster surface displaying all Wikipedia article revisions relating to CES’s, June 2006 - June 2016 KDE and hotspots of each cultural ecosystem service category - OSM Recreation Aesthetics Spiritual and Religion Education Cultural Heritage KDE and hotspots of each cultural ecosystem service category - Wikipedia Recreation Aesthetics Spiritual and Religion Education Cultural Heritage OSM Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage Overall study area (%) 10.0 20.1 18.5 24.2 27.2 Urban and suburban areas (%) 12.7 13.6 41.8 19.5 12.4 Brecon Beacons National Park (%) 8.7 18.6 5.4 20.4 46.9 Wye Valley (AONB) (%) 9.4 18.8 9.9 33.1 28.7 Wikipedia Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage Overall study area (%) 20.4 22.8 12.6 27.7 16.5 Urban and suburban areas (%) 25.8 18.7 20.8 21.2 13.4 Brecon Beacons National Park (%) 25.3 27.4 10.5 23.2 13.7 Wye Valley (AONB) (%) 13.6 37.3 1.7 39.0 8.5 Figure 10 - Kernel densities and hotspots of the individual CES categories for OSM, June 2006 to June 2016 Figure 12 - The proportion of CES related OSM edits and Wikipedia revisions in urban areas, national parks and AONB’s compared to the total study area. Population is not the main contributing factor for the number of edits/revisions, many other complex factors relating to CES values and benefits are involved. Correlations exist between the spatial and temporal trends of OSM edits relating to aesthetic and education (fig 10); and Wikipedia revisions relating to aesthetics and recreation (fig 11). Comparing results to literature reveals mixed outcomes [9] . Anomalous cultural heritage OSM edits made between June 2014 and June 2016 by an individual user have skewed results greatly (figures 4 and 6). The difference in datasets is partially causing the contrasts in spatial distributions of CESs. There were 10x the number of OSM points (even after grouping into sub-bounding boxes) than the number of Wikipedia articles, but the range in number of article revisions was far higher. This meant the IDW and KDE maps have very different results depending on the search radius used. KDE outperformed IDW*, producing clearer and more natural areas of high values and is more fitting for analysing social data of this kind as expected [10] . VGI data has great potential for further ecosystem service research in a direction, offering easy access to larger datasets than achievable through PPGIS method, but does lack information on contributor demographics. Future work Focus on analysing trends in the number of contributors instead of edits themselves, and also implement contributor edit limits or another method for dealing with vandalism and anomalous edits/revisions. Develop Python code further to split data into square bounding boxes by different horizontal and vertical scale factors, for accurate overlay with external datasets. Implement a principle component analysis and/or sensitivity analysis into the categorising of OSM tags and Wikipedia articles into different CES groups. Utilise the Wikipedia article additional metadata provided by the Geonames API (i.e. rank) to further the analysis. This was not possible due to time constraints. Key Findings OSM edits and Wikipedia revisions relating to spiritual and religion are far lower in the Brecon Beacons and the Wye Valley (fig 12) most likely due to their reliance on religious buildings mostly found in populated areas. * The 2 IDW maps that were inferior to figures 8 and 9 were not included in this poster Figure 11 - Kernel densities and hotspots of the individual CES categories for Wikipedia, June 2006 to June 2016

Upload: doankhanh

Post on 30-May-2018

240 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OpenStreetMap and Wikipedia: A Method for Identifying ...€¦ · OpenStreetMap and Wikipedia: A Method for Identifying Cultural Ecosystem Services Programme: MSc Geographic Information

Green bold = +10% increase

Green = 0-10% increase

Red = 0-10% decrease

Red bold = -10% decrease

OpenStreetMap and Wikipedia: A Method for

Identifying Cultural Ecosystem ServicesProgramme: MSc Geographic Information Science

Name of Student: Christopher French

Supervisors: Muki Haklay and Gianfranco Gliozzo

UCL Department of Civil, Environmental

and Geomatic Engineering, Gower St,

London ,WC1E 6BT

Introduction

The development of web 2.0 technologies[1] has led to a valuable source of

geographic content called volunteered geographic information (VGI)[2] making way

for a multitude of open source applications. This project focuses on two of these,

OpenStreetMap (OSM) and Wikipedia, with an aim of using them to identify and

analyse distributions of cultural ecosystem services (CES’s). Ecosystem services

are the benefits that people experience from environments, material and

immaterial[3]. This work targets CESs, these include religious and spiritual

enrichment, recreational or aesthetic experiences, tourism and benefits with

cognitive, educational or scientific value. Research in the field is growing and this is

of great importance because there are strong linkages between CESs and health,

welfare and social relations; however CES supply is declining and demand is rising.

Aims and Objectives

To carry out spatial and temporal analysis similar to methods used in public

participatory ecosystem service research[4] but by implementing VGI data from

OSM and Wikipedia instead in order to build on current knowledge and test the

credibility of these data sources in this research field. A further aim was to improve

upon previous OSM data extraction and processing methods[5] in terms of time and

computational intensiveness, while delivering equally robust and valuable results.

The chosen study area is South Wales, the extent is diverse, encompassing major

coastal cities, The Valleys, the Brecon Beacons National Park and the Wye Valley,

which is an area of outstanding natural beauty (AONB).

References

[1] O’Reilly, T. (2005). What is Web 2.0: Design Patterns and Business Models for the Next Generation of Software. http://www.oreilly.com/pub/a/web2/archive/what-is-web-20.html?page=1.

[2] Goodchild, M. (2007). Citizens as sensors: The world of volunteered geography. GeoJournal. 69, 211-221.

[3] Millennium Ecosystem Assessment. (2005). Ecosystems and Huma Well-Being. Millennium Ecosystem Assessment. Washington, DC: World Resources Institute.

[4] Brown, G., Fagerholm, N. (2015). Empirical PPGIS/PGIS mapping of ecosystem services: A review and evaluation. Ecosystem Services. 13, 119-133.

[5] Mooney, P., Corcoran, P. (2012a). Characteristics of Heavily Edited Objects in OpenStreetMap. Future Internet. 4, 285-305.

[6] Korner, P. (2011). GitHub: Source code for the OpenStreetMap history splitter. https://github.com/MaZderMind/osm-history-splitter.

[7] OSM (2016). Full OSM history dump. http://planet.openstreetmap.org/planet/full-history.

[8] Django, 2016. Documentation: GeoDjango. https://docs.djangoproject.com/en/1.10/ref/contrib/gis.

[9] MEA (Millennium Ecosystem Assessment). (2005). Ecosystem and human well-being: synthesis. Island Press, Washington, D.C., USA.

[10] Alessa, L., Kliskey, A., Brown, G. (2008). Social-ecological hotspots mapping: a spatial approach for identifying coupled social–ecological space. Landsc. Urban Plann. 85, 27-39.

Methodology

The osm-history-splitter extraction

software[6] was complied on the

Elastic Compute Cloud (EC2) from

Amazon Web Services, allowing

faster download and processing of

the file[7]. This was made faster by

using Python code to split the edits

into sub-bounding boxes based on

their associated tag, grouping ones

with the same tag (natural=peak).

This allowed faster importing into

the database instead of loading

them individually as has been done

previously. GeoDjango[8] was also

implemented to create and

populate the databases to avoid

writing lengthy SQL.

Selecting CES categories was done

by reviewing key papers to find the

most commonly used categories.

These were: recreation, aesthetics,

spiritual and religion, education and

cultural heritage. Grouping tags into

categories was challenging as no

classification scheme exists

therefore it had to be carried out

subjectively. This process would

have benefited from a sensitivity

analysis, principle component

analysis or peer review to add

validity to the decision making. Methodological framework of the data preparation and analysis.

Kernel density estimation (KDE) of all cultural ecosystem services

ResultsFigure 2 - Total yearly OSM edits relating to

CES’s, June 2006 - June 2016

Figure 4 - Total OSM edits relating to each of

the CES’s, split into 2 year time periods, June

2006 - June 2016

Figure 6 - Percentage of OSM edits relating to

each CES’s in each category, June 2006 -

June 2016

Figure 3 - Total Wikipedia article revisions

relating to CES’s, June 2006 - June 2016

Figure 5 - Total Wikipedia article revisions

relating to each of the CES’s, split into 2 year

time periods, June 2006 - June 2016

Figure 7 - Percentage of Wikipedia article

revisions relating to each CES’s in each

category, June 2006 - June 2016

Figure 8 - Kernel density estimation raster surface displaying all

OSM edits relating to CES’s, June 2006 - June 2016

Figure 9 - Kernel density estimation raster surface displaying all

Wikipedia article revisions relating to CES’s, June 2006 - June 2016

KDE and hotspots of each cultural ecosystem service category - OSM Recreation Aesthetics Spiritual and Religion

Education Cultural Heritage

KDE and hotspots of each cultural ecosystem service category - Wikipedia Recreation Aesthetics Spiritual and Religion

Education Cultural Heritage

OSM

Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage

Overall study area (%) 10.0 20.1 18.5 24.2 27.2

Urban and suburban areas (%) 12.7 13.6 41.8 19.5 12.4

Brecon Beacons National Park (%) 8.7 18.6 5.4 20.4 46.9

Wye Valley (AONB) (%) 9.4 18.8 9.9 33.1 28.7

Wikipedia

Area Recreation Aesthetics Spiritual and Religion Education Cultural Heritage

Overall study area (%) 20.4 22.8 12.6 27.7 16.5

Urban and suburban areas (%) 25.8 18.7 20.8 21.2 13.4

Brecon Beacons National Park (%) 25.3 27.4 10.5 23.2 13.7

Wye Valley (AONB) (%) 13.6 37.3 1.7 39.0 8.5

Figure 10 - Kernel

densities and hotspots

of the individual CES

categories for OSM,

June 2006 to June 2016

Figure 12 - The proportion of CES related OSM edits and Wikipedia

revisions in urban areas, national parks and AONB’s compared to

the total study area.

• Population is not the main contributing factor for the number of edits/revisions,

many other complex factors relating to CES values and benefits are involved.

• Correlations exist between the spatial and temporal trends of OSM edits relating

to aesthetic and education (fig 10); and Wikipedia revisions relating to aesthetics

and recreation (fig 11). Comparing results to literature reveals mixed outcomes[9].

• Anomalous cultural heritage OSM edits made between June 2014 and June

2016 by an individual user have skewed results greatly (figures 4 and 6).

• The difference in datasets is partially causing the contrasts in spatial distributions

of CESs. There were 10x the number of OSM points (even after grouping into

sub-bounding boxes) than the number of Wikipedia articles, but the range in

number of article revisions was far higher. This meant the IDW and KDE maps

have very different results depending on the search radius used.

• KDE outperformed IDW*, producing clearer and more natural areas of high

values and is more fitting for analysing social data of this kind as expected[10].

• VGI data has great potential for further ecosystem service research in a

direction, offering easy access to larger datasets than achievable through PPGIS

method, but does lack information on contributor demographics.

Future work

• Focus on analysing trends in the number of contributors instead of edits

themselves, and also implement contributor edit limits or another method for

dealing with vandalism and anomalous edits/revisions.

• Develop Python code further to split data into square bounding boxes by different

horizontal and vertical scale factors, for accurate overlay with external datasets.

• Implement a principle component analysis and/or sensitivity analysis into the

categorising of OSM tags and Wikipedia articles into different CES groups.

• Utilise the Wikipedia article additional metadata provided by the Geonames API

(i.e. rank) to further the analysis. This was not possible due to time constraints.

Key Findings

• OSM edits and Wikipedia

revisions relating to spiritual

and religion are far lower in the

Brecon Beacons and the Wye

Valley (fig 12) most likely due to

their reliance on religious

buildings mostly found in

populated areas.

* The 2 IDW maps that were inferior to figures 8 and 9 were not included in this poster

Figure 11 - Kernel

densities and hotspots

of the individual CES

categories for Wikipedia,

June 2006 to June 2016