a new initiative for tiling, stitching and processing geospatial big data in distributed computing...
Post on 23-Jan-2017
70 Views
Preview:
TRANSCRIPT
Tiling and Stitching raster data -GIS data processing in distributed computing environment
Angéla Olasz, Binh Nguyen Thai, Dániel Kristóf
Institute of Geodesy, Cartography and Remote Sensing (FÖMI)Directorate of Geoinformation
1. Introduction2. IQmulus and IQLib short introduction3. Defining Geospatial Big Data4. Aspects of requirements and comparison of
existing solution5. Design and implementation6. Conclusion/future work
2
Content of this talk
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
Our goal is to find a solution for processing of big geospatial data in a distributed ecosystem providing an environment to run algorithms, services, processing modules without any limitations on implementation programming language as well as data partitioning strategies and distribution among computational nodes in order to run existing GIS processing scripts.
As a first step we focus on raster data representation:(i) decomposition = Tiling (& Stitching) and(ii) distributed processing. Before building this prototype system, we have 1. analyzed data
decomposition patterns. 2. defined the common GIS user requirements on processing environments of Big Geospatial Data 3. tried to identify Geospatial Big Data with the help of the 4 „V”s. 4. compared existing solutions on selected aspects.
3
Introduction
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
4
IQmulus
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
A High-volume Fusion and Analysis Platform for Geospatial Point Clouds, Coverages and Volumetric Data Sets„IQmulus will leverage the information hidden in large heterogeneous geospatial data sets and make them a practical choice to support reliable decision making.”
4 year FP7 EU Research Project 2012 November – 2016 November12 European partner, 7 countrieswww.IQmulus.eu
To have a better understanding on what are the main attributes of geospatial big data: it is hard to delineate the margin starting to “exceed the capability of spatial computing technology”.To estimate the size of the processable amount of data are use-case specific, there are some good examples (Evans et al., 2014) where the authors tried to identify the Geospatial Data and Geospatial Big Data differences.
The nature of the digital representation of the continuous space can be grouped in 4 or 5 type: vector, raster, 3D representation, network, (and geolocation-aware media)Collected characteristic on: Formats, GIS operations, and the 3 ”V”s of Big Data: Volume, Velocity, Variety, and also on Visualization
5
Defining Geospatial Big Data
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
6
Defining Geospatial Big Data
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
7
Aspects of requirements and comparison of existing solution
We have collected the most popular frameworks supporting distributed computing with GIS data. We tried to investigate the capabilities of each framework in the following aspects:
what kind of:• iInput/output data types are supported or suitable for that particular
framework, • Supported GIS processing (or executable languages)• Data Management flexibility- supervision of the data distribution • Scalability potential• Supported OS/Platform dependencies• Database model• GIS Case studies, projects …
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
8
Aspects of requirements and benchmarking of existing solution
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
While most of current processing frameworks follow the same methodology as Hadoop and utilize the same data storage concept as HDFS. One of the biggest disadvantage from processing point of view was the data partitioning mechanism performed by HDFS file system and distributed processing programming model.
In most cases we would like to have full control over our data partitioning and distribution mechanism.
Existing GIS algorithms (without or with small modification) can’t be executed (python, Matlab, R, etc.).
We decided to develop our own distributed processing framework.
Initiative for a new framework
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 9
IQLib - Objectives
10
Source: https://github.com/posseidon/IQLib/IQLib specification
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
Main goal is to allow an actor (human or machine code) to organize huge data sets describing geographical survey areas.
IQLib supports the creation of semantic data aggregations within large data sets, and can be used to overcome scalability limitations of processing algorithms.
IQLib’s core functionality is:1. Tiling is the decomposition of a survey area in which data points are either associated to polygons (regular or irregular), or grouped according to temporal attributes, or grouped into equally sized chunks, or a mixture of the above.2. IQLib should provide the functionality to stitching the output data files into a single large file.
High level concept of IQLib processing framework
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 11
Modules
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
As a result IQLib has three major modules; each module is responsible for a major step in GIS data processing.
Data Catalogue module: Data catalogue module is responsible for storing metadata corresponding to survey areas. A survey area contains all the dataset that are logically related to inspection area, store all the available, known and useful information on those data for processing.
Tiling & stitching module: Tiling algorithms usually process raw data, after these tiled data are distributed across processing nodes. Stitching usually runs after processing services have successfully done their job. Metadata of tiled dataset are registered into Data Catalogue. With this step we always know the parents of tiled data.
Distributed processing module: Distributed processing module is responsible for running processing services on tiled dataset.
12
Modules- status
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016
Data Catalogue module: We have defined and implemented the data model, the data/metadata access procedures. After final approval phase goes open source.
Tiling & stitching module:Pre-defined tiling and stitching methods
tailored for processing algorithms.
Distributed processing module: Using existing processing algorithms without any modifications or very little adjustments.
13
ConclusionIQLib Specification is available
on GitHub at https://github.com/posseidon/iqlib.
Under development:• Tiling & Stitching methods
ecosystem.• Distributed processing
system.Going to be published Open
Source soon.
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 14
Related papers
• Olasz A. and Nguyen Thai B. 2016. Development of a new framework for Distributed Processing of Geospatial Big Data, FOSS4G 2016 Bonn Academic Track
• Nguyen Thai B. and Olasz A. 2015. Raster data partitioning for support distributed GIS processing, Proceedings of ISPRS Geospatial Week. Montpellier, 28.09.-02.10.2015 ISPRS.Vol. XL-3/W3, pp. 543-550.
• Olasz A. and Nguyen Thai B. 2014. Decision support on distributed computing environment (IQmulus) Proceedings of the 3rd Open Source Geospatial Research & Education Symposium OGRS 2014
GIS data processing in distributed computing environment• ISPRS Prague, 12-19. July 2016 15
Institute of Geodesy, Cartography and Remote Sensing (FÖMI)Directorate of Geoinformation 5. Bosnyák sqr. BUDAPEST, HUNGARY 1149 www.fomi.hu, www.iqmulus.eu
Thank you for your attention!
Angéla Olaszolasz.angela@fomi.hu
16
Please find more details in our paper in ISPRS Annals 2016:
Binh Nguyen Thaintb@inf.elte.hu
top related