development of data archiving and distribution system for

15
Free and Open Source Soſtware for Geospatial (FOSS4G) Conference Proceedings Volume 15 Seoul, South Korea Article 53 2015 Development of Data Archiving And Distribution System For e Philippines' LIDAR Program Using Object Storage Systems Ken Abryl Eleazar Salanio Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, National Engineering Center, University of the Philippines, Carlo Santos Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, National Engineering Center, University of the Philippines Ruby Magturo Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, National Engineering Center, University of the Philippines Gene Paul Quevedo Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, National Engineering Center, University of the Philippines Kenan Virtucio Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, National Engineering Center, University of the Philippines is Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Free and Open Source Soſtware for Geospatial (FOSS4G) Conference Proceedings by an authorized editor of ScholarWorks@UMass Amherst. For more information, please contact [email protected]. Recommended Citation Salanio, Ken Abryl Eleazar; Santos, Carlo; Magturo, Ruby; Quevedo, Gene Paul; Virtucio, Kenan; Langga, Kenneth; Tupas, Mark Edwin; and Paringit, Enrico (2015) "Development of Data Archiving And Distribution System For e Philippines' LIDAR Program Using Object Storage Systems," Free and Open Source Soſtware for Geospatial (FOSS4G) Conference Proceedings: Vol. 15 , Article 53. DOI: hps://doi.org/10.7275/R50R9MKM Available at: hps://scholarworks.umass.edu/foss4g/vol15/iss1/53

Upload: others

Post on 18-Dec-2021

2 views

Category:

Documents


0 download

TRANSCRIPT

Free and Open Source Software for Geospatial (FOSS4G)Conference Proceedings

Volume 15 Seoul, South Korea Article 53

2015

Development of Data Archiving And DistributionSystem For The Philippines' LIDAR ProgramUsing Object Storage SystemsKen Abryl Eleazar SalanioData Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, NationalEngineering Center, University of the Philippines,

Carlo SantosData Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, NationalEngineering Center, University of the Philippines

Ruby MagturoData Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, NationalEngineering Center, University of the Philippines

Gene Paul QuevedoData Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, NationalEngineering Center, University of the Philippines

Kenan VirtucioData Archiving and Distribution Component, DREAM/Phil-LiDAR 1 Program Rm 312-316, Juinio Hall, NationalEngineering Center, University of the Philippines

This Paper is brought to you for free and open access by ScholarWorks@UMass Amherst. It has been accepted for inclusion in Free and Open SourceSoftware for Geospatial (FOSS4G) Conference Proceedings by an authorized editor of ScholarWorks@UMass Amherst. For more information, pleasecontact [email protected].

Recommended CitationSalanio, Ken Abryl Eleazar; Santos, Carlo; Magturo, Ruby; Quevedo, Gene Paul; Virtucio, Kenan; Langga, Kenneth; Tupas, MarkEdwin; and Paringit, Enrico (2015) "Development of Data Archiving And Distribution System For The Philippines' LIDAR ProgramUsing Object Storage Systems," Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings: Vol. 15 , Article 53.DOI: https://doi.org/10.7275/R50R9MKMAvailable at: https://scholarworks.umass.edu/foss4g/vol15/iss1/53

Development of Data Archiving And Distribution System For ThePhilippines' LIDAR Program Using Object Storage Systems

AuthorsKen Abryl Eleazar Salanio, Carlo Santos, Ruby Magturo, Gene Paul Quevedo, Kenan Virtucio, KennethLangga, Mark Edwin Tupas, and Enrico Paringit

This paper is available in Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings:https://scholarworks.umass.edu/foss4g/vol15/iss1/53

DEVELOPMENT OF DATA ARCHIVING AND DISTRIBUTIONSYSTEM FOR THE PHILIPPINES' LIDAR PROGRAM USING

OBJECT STORAGE SYSTEMS

Ken Abryl Eleazar Salanio1, Carlo Santos1, Ruby Magturo1, Gene Paul Quevedo1, KenanVirtucio1, Kenneth Langga1, Mark Edwin Tupas1, and Enrico Paringit1

1Data Archiving and Distribution Component, DREAM/Phil-LiDAR 1 ProgramRm 312-316, Juinio Hall, National Engineering Center, University of the Philippines, Diliman, Quezon City,

PhilippinesEmail: [email protected]

ABSTRACT

The Philippines' Department of Science and Technology in collaboration with HigherEducation Institutions, lead by the University of the Philippines, has embarked on a programfor producing hazard maps on most major river systems in the Philippines. Realising theutility of LiDAR and its derived data sets, a concurrent program on resource assessment wasalso initiated, aimed to produce essential products that can be used in urban planning,resource planning, and other purposes these geospatial data might be able to provide.

As a result, both programs produce immensely sizable datasets due for storage anddistribution. Compiling them as one large file, or a small number of large files is impractical,and storing these data sets in current spatial content management systems and geoportalsolutions also pose a challenge.

In this study we present a system for storing and delivering LiDAR and LiDAR-deriveddata using Ceph object storage system and a spatial content management system derivedfrom GeoNode. This approach hinges on our requirements of being scalable yet robustwithout much deviations from the current file system based storage structure.

Key Words: LiDAR, Data Archiving, Object Storage, Ceph, GeoNode

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

79

1. INTRODUCTION

The Philippines is situated in the Pacific Ring of Fire and the Pacific Typhoon Belt.Thus, it is prone to numerous natural hazards, particularly earthquakes, landslides, floods,and storm surges. This demands mapping of the Philippines' physical environment forpreparedness and handling of natural disasters.

Mapping using Light Detection and Ranging (LiDAR) technology is suitable for thesepurposes due to the high resolution data it produces. With this, the University of thePhilippines, through the support of the Department of Science and Technology (DOST) andin cooperation with Higher Education Institutions (HEIs) across the country, organized twoconcurrent programs that use LiDAR technology, drawing on the success of the Disaster Riskand Exposure Assessment for Mitigation (DREAM) LiDAR program. The Phil-LiDAR 1program extends the original DREAM program, producing flood hazard and assessmentmaps, while the Phil-LiDAR 2 program produces resource maps from acquired and processedLiDAR data of the first program. These endeavors aim to produce essential products such asDEMs, Orthophotos and LAS that can be used for hazard mapping, urban planning, andresource planning among others.

The products of these programs are a large amount of data that need to be distributedand archived at a swift pace. As with other LiDAR operations, handling large swaths ofspatial data is not an option, hence data sets are organized in contiguous blocks, subdividedby files and grouped by river systems and local government units, stored centrally in a file-based system.

Storing these data sets prove to be a challenge even for existing spatial contentmanagement systems and geoportal solutions. These systems have capabilities mostly forvector and raster data but do not natively support storage of point cloud data. There is also theproblem with scaling due to the sheer volume of data produced. The common approach, likethe program’s legacy storage system, is to use a centralized Network Attached Storage (NAS)server to manage several attached storage disks such that it appears as one big file system.This file system is then shared over the network for access from each personnel’sworkstation.

To facilitate the storage and retrieval of data sets for the program’s archiving and datadistribution process, we developed a system for storing LiDAR and LiDAR-derived datausing Ceph object storage system. Ceph’s support for a distributed environment reduces therisk of a single point of failure. Consequently, it has methods for maintaining data integrityamong its storage nodes which is a vital requirement for archiving. A spatial contentmanagement system derived from GeoNode provides a user interface for accessing availabledata.

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

80

2. RELATED WORK

Recent developments in digital acquisition technologies have enabled the collection ofhigh-resolution spatial data sets. With the use of LiDAR, it is possible to collect billions ofindividual point measurements of the earth’s surface (Crosby et al., 2011). Aside frommapping, these high-accuracy datasets have other several applications, including floodplainmapping, hydrology, geomorphology, forest inventory, urban planning, and landscapeecology (Chen, 2007).

To store these data, file-based storage systems are often used. In a file-based storagesystem, data sets are stored as files managed by an operating system. Each file has a uniquepath described in the file system. Navigation is done by browsing the directory structure orusing a search tool or a file system navigator (like Nautilus or Windows Explorer) installed inthe operating system. This system is usually implemented using a Network Attached Storage(NAS) platform to enable sharing of files over a network of computers. NAS is a specializeddedicated server designed to provide file access and storage to several client computers(Levine, 1998).

Fig 1. Sample File Directory Tree

The complexity of directory structure increases with the amount of data sets and theprocesses involved (e.g. raw, derived semi-processed data, processed data, versioning, etc.).Searching for the needed data set requires manually overlaying coverages and retrievingassociated data based on folder and filename convention. Phil-LiDAR 1 uses a tiled approach

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

81

for organizing and managing spatial data mentioned in a number of studies (Chen, 2007;David et al., 2008, Lewis et al., 2012).

GIS-enabled relational database management systems (RDBMS) can also be used forstoring geospatial data sets (Fox et al., 2013). These are spatially enabled databases designedto index geographically-referenced datasets. The entries are stored in a table, with the spatialindex created on a separate layer. This index is referenced when the database handlesgeospatial queries. Another approach is to design the RDBMs with specialized spatialcolumns. Some attempts to store LiDAR data includes Lewis et al (2012) who used WellKnown Binary point geometry and Ramsey (2013) who devised a spatial datatype for storingpoint cloud clusters.

However, the acquired spatial data, as well as the derived ones, sum up to sizesreaching terabytes or petabytes worth of storage. Since the indexing is external to the data fortraditional RDBMSs, updates incur additional indexing overhead. Such updates are stillcomputationally intensive and affect performance adversely as data size increases (Fox et al.,2013). This produces scalability and query time issues (Al-Naami et al., 2014).

Another way of archiving spatial data sets is to use a combination of LiDAR flat tiles,RDBMS, and a cloud computing or distributed infrastructure. The RDBMS manages themetadata, while the LiDAR tiles are stored in a dedicated storage server. Processing of thesedatasets are done by high-performance compute servers. Both the storage and computeservers are provisioned by a cloud computing infrastructure. One example is theOpenTopography System (OpenTopography, 2015). It is built upon a multi-tier serviceoriented architecture (SOA) comprised of various software services and hardware resources.This consists of a front-end portal for serving data, as well as handling authentication. Actualdata sets are stored as ASPRS LAS format on a dedicated storage server. Metadata is storedon an IBM DB2 database, and other datasets are stored on the SDSC Cloud platform. On-demand processing processing and visualization requests are handled by a dedicated largememory, multiprocessor system.

While OpenTopography’s approach is highly appealing, implementing it in thePhilippine setting raises several concerns. For one, the internet speeds in the Philippines areslow and erratic. While a high-performance data center may be set up, if access to it issluggish, then its resources will not be fully utilized. Next, it requires the program to heavilyupgrade its IT infrastructure, which can take time and valuable resources to set up.

Another alternative is to use Object-based storage systems. Object-based storagemanages data as objects as opposed to a file hierarchy from a file system. Objects are storagecontainers with a file-like interface, effectively representing a convergence between file andblock storage technologies, thus offering storage that is secure and easy to share acrossplatforms, but also high-performance and scalable (Mesnier et al., 2003). Several objectstorage technologies exist, examples of which are Amazon S3 (Amazon Web Services, 2015),Openstack Swift (SwiftStack, Inc., 2015), and Ceph (Inktank Storage, Inc., 2015).

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

82

3. Significance and Objectives

With LiDAR datasets having such high value, securing the data in a more robustarchiving solution is needed. One possible solution is migrating from a file-based storagesystem to a more apt distributed infrastructure. This requires the data sets must first betransferred into a storage system capable of scaling to terabytes and petabytes. Object storagesystems are capable of filling this role. This paper discusses Phil-LiDAR’s attempt atimplementing a LiDAR archiving and distribution system using an object storage system thatfills the need of it’s unique organizational structure.

4. Methodology

Given the varied choices for object storage technologies, Ceph was chosen due to beingopen source, its compatibility with widely used cloud services such as OpenStack andAmazon’s AWS, and the broad spectrum of programming languages it supports. It runs oncommodity hardware, and is designed to be self-healing and self-managing (Inktank Storage,Inc., 2015).

To provide management for the data uploaded into the object storage, a geospatialcontent management solution is needed. GeoNode (GeoNode Development Team, 2013) is anopen-source web-based application and platform for geospatial information systems (GIS)content management system written in Django. Built on top of GeoServer (Open SourceGeospatial Foundation, 2014), it is able to store and share rasters and vectors and usesGeoExplorer (San Diego Supercomputer Center, 2015) to display the managed data sets.GeoNode not only provides a web interface for upload, display, and download of vectors andrasters but also allows sharing of these data sets using Open Geospatial Consortium (OGC)standard web services. Additional features can be developed on any open-source platforms ituses.

In section 4.1, this paper discusses the web portal and GeoNode-based spatial contentmanagement system we designed, the Lidar Portal for Archiving and Distribution. Section 4.2describes how it interfaces with the Ceph object storage, as well as Ceph's architecture.Section 4.3 outlines the process flow for archiving spatial data sets.

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

83

4.1 Lidar Portal for Archiving and Distribution (LiPAD)

Figure 2. LiPAD Feature and Component Diagram

The Lidar Portal for Archiving and Distribution (LiPAD) is the customised GeoNodeused for managing the archived data. GeoNode has native upload and download modules, butthese are limited to rasters and vectors. Uploads are stored in the GeoServer running in thebackend. These uploaded data can then be downloaded by users, via the Layers and Mapsmodules, with the permission of the owner. Users can also combine these layers, into amontage using the Map module. For user authentication and management, GeoNode has thePeople and Groups modules. These modules are then linked to the in-office Active Directory,a method of storing user credentials in a network of computers.

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

84

Figure 3. Map Tiles Grid Selection

To be able to access the tiled data stored in Ceph object storage, we customizedGeoNode to provide a user interface by adding the Map Tiles (Figure 3) and Data Cart(Figure 4) modules, collectively under the Datastore module. Map Tiles shows a griddedshapefile of the Philippines, representative of the tiled data. The shapefile is color-coded toshow which areas have available data. Selection can be done by clicking each tile, holdingthe SHIFT button while dragging the mouse to draw a bounding box, or submitting abounding box in EPSG:900913 decimal degrees format.

Figure 4. Data Cart Item List

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

85

Submitting the selected tiles adds all available data associated to each selected tile tothe data cart. Once the user is satisfied with the cart’s contents, a download request can bemade with a click of a button. The request is then processed in the background by contactingCeph via a RESTful API to retrieve the listed data.

4.2 Ceph Object Storage

Figure 4. Ceph Object Storage Features

Ceph’s REST API provides an interface for programmatically accessing the storedobjects (e.g. download, upload, view details) . This interface is also known as the ObjectGateway (OGW), which can be utilized using HTTP queries and Python scripts.

Figure 2. Ceph Architecture

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

86

The Ceph architecture comprises of multiple Object Storage Devices (OSDs) managedby at least one monitor node. The monitor node is responsible for ensuring high availabilityof the OSDs. A client node, which often is the object gateway, interfaces with the monitornodes to handle requests for authentication, upload and download of objects, and statuschecks.

4.3 Archiving Process Flow

Figure 5. Archiving Process Flow

While acquired data goes through processing depending on the output desired, thissection will focus on the archiving of LiDAR data and its derivatives. Once the LiDAR datahas been processed and validated for archiving, the data is transferred to a virtualized hostrunning a script that tiles the data into 1 km by 1km tiles. As a naming convention and forthe purpose of spatial indexing, we use the EPSG:32651 Northing and Easting value of theupper left corner of the tile as well as the data type, as the object’s name (e.g.E449N1271_LAZ.laz for LAZ, E449N1271_DTM.tif for DTM, E449N1271_DSM.tif forDSM, or E449N1271_ORTHO.tif for Orthophoto). These tiles can later be directly queriedwith their object name once uploaded to Ceph. These tiled data sets are then uploaded intoCeph using another script. The uploading script, once done, produces a CSV file containingthe relevant metadata of each tile, which is then fed into the platform for metadata storage.

Automating the tiling and the upload in the same virtual host which houses the pre-existing file-based storage reduces the overhead introduced by working on the data sets in aworkstation separate from the server. This overhead may come in the form of transferring orcopying the files into the workstation’s own storage disk, or simply opening the files over thenetwork in a NAS.

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

87

5. FUTURE OUTLOOK

We envision the possible future scenario that the archiving and distribution capabilitiesof the program will be on a decentralized and distributed computing infrastructure to reducethe risk of a single point of failure. This will also enable the automation of some of theprocess flows, like the update of the gridded map data to reflect the objects stored in Ceph.

The use of Ceph allows the system to easily adapt to a distributed environment becauseof its compatibility with Amazon’s AWS and OpenStack. Meanwhile, computing processesthat can be parallelized, such as indexing and querying of data, can make use of technologiessuch as Hadoop (The Apache Software Foundation, 2014), a framework that allows for thedistributed processing of large data sets across clusters of computers using simpleprogramming models.

6. SUMMARY

Advances in geospatial technologies have enabled the acquisition of high-density, high-accuracy data sets with significantly detailed surface information of terrain and surfaceobjects within a short period of time (Al-Naami et al., 2014). This results into large data setsthat need to be stored for archiving, retrieval, and distribution. However, conventional storagesystems, namely native file systems and spatial databases, have problems in scaling to thesizes of the datasets produced by the current GIS acquisition technologies (Fox et al., 2013).

In this paper, we presented the use of Ceph Object Storage coupled with customizedGeoNode to store and manage LiDAR and LiDAR-derived data sets. The customizedGeoNode serves as both the front-end interface and the content management platform, whileCeph stores the large data sets subdivided into 1 KM by 1 KM tiles. This setup paves the roadahead for migrating from a file-system based infrastructure into a distributed computingplatform.

7. REFERENCES

Ackermann, F., 1999. Airborne laser scanning—present status and future expectations. ISPRS

Journal of Photogrammetry and Remote Sensing 54 (2-3), 64-67.

Al-Naami, K.M., and S. Seker, L. Khan, 2014. GISQF: An Efficient Spatial Query Processing

System. 2014 IEEE 7th International Conference on Cloud Computing (CLOUD), pp. 681 -

688.

Amazon Web Services, 2015. AWS | Amazon Simple Storage Service (S3) - Online Cloud

Storage for Data & Files. Retrieved July 2015, from https :// aws . amazon . com / s 3/

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

88

Boundless, 2015. GeoExplorer — GeoExplorer. Retrieved July 2015, from http :// suite . opengeo . org / opengeo - docs / geoexplorer /

Chen, Q., 2007. Airborne lidar data processing and information extraction. Photogrammetric

Engineering & Remote Sensing, 73(2), 91-95.

Crosby, C.J., Arrowsmith, J R., Nandigam, and V., Baru, C., 2011. A Geoinformatics

Approach To Online Access And Processing Of LIDAR Topography Data. In R. Keller and C.

Baru, Eds., Geoinformatics: Cyberinfrastructure for the Solid Earth Sciences, pp. 251-265.

London: Cambridge University Press.

David, N., Mallet, C., and Bretar, F., 2008. Library concept and design for lidar data

processing. GEOgraphic Object Based Image Analysis (GEOBIA) Conference, Calgary,

Canada.

Fox, A., Eichelberger, C., Hughes, J., and Lyon, S., 2013. Spatio-temporal indexing in non-

relational distributed databases. 2013 IEEE International Conference on Big Data, pp. 291–

299.

GeoNode Development Team, 2013. About GeoNode — GeoNode 2.0 documentation.

Retrieved July 2015, from

http://docs.geonode.org/en/master/organizational/about.html#about

Inktank Storage, Inc., 2015. Welcome to Ceph — Ceph Documentation. Retrieved July 2015,

from http://ceph.com/docs/master/

Isenburg, M., 2012. LASzip:Lossless compression of LiDAR Data. European LiDAR

Mapping Forum.

Jones, T., 2010. Ceph: A Linux petabyte-scale distributed file system. Retrieved July 2015,

from http :// www . ibm . com / developerworks / library / l - ceph /

Levine, R., 1998. NAS Advantages: A VARs View. Retrieved July 2015, from http :// www . infostor . com / index / articles / display /55961/ articles / infostor / volume -2/ issue -4/ news - analysis -

trends / nas - advantages - a - vars - view . html

Lewis P., McElhinney C., and McCarthy T., 2012. LiDAR data management pipeline; from

spatial database population to web-application visualization. Conference Proceedings at

Com.Geo 2012, Washington DC, USA.

FOSS4G Seoul, South Korea | September 14th – 19th , 2015

89

Mesnier, M., Ganger, G. R., and Riedel, E., August 2003. Object-Based Storage. IEEE

Communications Magazine, pp. 84–90.

Open Source Geospatial Foundation, 2014. About - GeoServer. Retrieved July 2015, from http :// geoserver . org / about /

OpenTopography, 2015. NSF OpenTopography Facility | About. Retrieved July 2015, from http :// www . opentopography . org / index . php / about /

Ramsey, P., 2013. LIDAR in PostgresSQL with PointCloud. Available online:

http://boundlessgeo.com/wp-content/uploads/2013/10/pgpointcloud-foss4-2013.pdf

San Diego Supercomputer Center, 2015. SDSC Cloud. Retrieved July 2015, from https :// cloud . sdsc . edu / hp / index . php

SwiftStack, Inc., 2015. OpenStack Swift | Enterprise Storage from SwiftStack. Retrieved July

2015, from https :// swiftstack . com / openstack - swift /

The Apache Software Foundation, 2014. Welcome to Apache™ Hadoop®!. Retrieved July

2015, from https :// hadoop . apache . org /

Development Of Data Archiving And Distribution System For The Philippines' Lidar Program UsingObject Storage Systems

90