o11-4

4
VO services with JPEG2000 client-server visualisation: Astronomy Data Services at Pawsey Supercomputing Centre. Vyacheslav V. Kitae12 , Daniel Marrable 2 , J.T. Mararecki 1 , Andreas Wicenec 1 , Chen Wu 1 , Jenni Harrison 2 1 International Centre for Radio Astronomy Research, The University of Western Australia, M468, 35 Stirling Hwy, Crawley 6009, WA, Australia 2 Pawsey Supercomputing Centre, 26 Dick Perry Ave, Kensington 6151, WA, Australia Abstract. There is an immense, internationally significant collection of radio astron- omy data in Australia, generated by organisations such as CSIRO and ICRAR, which are also plying an active role in building the Square Kilometre Array (SKA). Aus- tralia has constructed two of the three ocial SKA pathfinders: the Australian SKA Pathfinder (ASKAP) and the Murchison Widefield Array (MWA), so the collection of data will grow in the near future. Commonwealth (Super Science) has made a consid- erable infrastructure investment to support Data Intensive Sciences within the Pawsey Supercomputing Centre (Pawsey), MWA and ASKAP. The scientists use the co-located high performance compute and data stores to facilitate the research. Research Data Services (RDS) is an investment to support Data Intensive Sciences, such as e.g. MWA GLEAM survey, by providing an infrastructure to store large datasets. RDS already hosts many PBs of MWA data. Astronomy Data Services (ADS) project has developed a solution to provide public access to astronomy data stored on RDS infrastructure. To- gether with IVOA services, such as TAP, SIAP and ADQL, JPEG2000 encoding for imagery data, and the consecutive streaming client-server visualisation using JPIP pro- tocol have been enabled. 1. Introduction There are three strategic sections across the five aliated facilities at Pawsey: Su- percomputing, Data, Visualisation. Pawsey is home to a number of supercomputing resources including the petascale supercomputer Magnus that comprises 2,976x12 core Intel Xeon E5-2690V3 "Haswell" processors, delivering 1,097 TeraFLOPS of comput- ing power. The storage capacity of Pawsey has been recently enhanced through the Aus- tralian Government’s National Collaborative Research Infrastructure Strategy (NCRIS) which invested in Research Data Storage Infrastructure (RDSI), and recently changed to RDS. The Pawsey node of the national RDSI is comprised of a DDN Gridscaler cluster and has approximately four petabytes of high availability storage with a further mutli-petabyte DMF tape storage backend. Pawsey is part of The National eResearch Collaboration Tools and Resources (NeCTAR) research cloud federation and has made cloud computing infrastructure available to researchers through Pawsey and other participating nodes which together 1

Upload: slava-kitaeff-phd

Post on 18-Jan-2017

30 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: O11-4

VO services with JPEG2000 client-server visualisation: AstronomyData Services at Pawsey Supercomputing Centre.

Vyacheslav V. Kitaeff12, Daniel Marrable2, J.T. Mararecki1,Andreas Wicenec1, Chen Wu1, Jenni Harrison2

1International Centre for Radio Astronomy Research, The University ofWestern Australia, M468, 35 Stirling Hwy, Crawley 6009, WA, Australia2Pawsey Supercomputing Centre, 26 Dick Perry Ave, Kensington 6151, WA,Australia

Abstract. There is an immense, internationally significant collection of radio astron-omy data in Australia, generated by organisations such as CSIRO and ICRAR, whichare also plying an active role in building the Square Kilometre Array (SKA). Aus-tralia has constructed two of the three official SKA pathfinders: the Australian SKAPathfinder (ASKAP) and the Murchison Widefield Array (MWA), so the collection ofdata will grow in the near future. Commonwealth (Super Science) has made a consid-erable infrastructure investment to support Data Intensive Sciences within the PawseySupercomputing Centre (Pawsey), MWA and ASKAP. The scientists use the co-locatedhigh performance compute and data stores to facilitate the research. Research DataServices (RDS) is an investment to support Data Intensive Sciences, such as e.g. MWAGLEAM survey, by providing an infrastructure to store large datasets. RDS alreadyhosts many PBs of MWA data. Astronomy Data Services (ADS) project has developeda solution to provide public access to astronomy data stored on RDS infrastructure. To-gether with IVOA services, such as TAP, SIAP and ADQL, JPEG2000 encoding forimagery data, and the consecutive streaming client-server visualisation using JPIP pro-tocol have been enabled.

1. Introduction

There are three strategic sections across the five affiliated facilities at Pawsey: Su-percomputing, Data, Visualisation. Pawsey is home to a number of supercomputingresources including the petascale supercomputer Magnus that comprises 2,976x12 coreIntel Xeon E5-2690V3 "Haswell" processors, delivering 1,097 TeraFLOPS of comput-ing power.

The storage capacity of Pawsey has been recently enhanced through the Aus-tralian Government’s National Collaborative Research Infrastructure Strategy (NCRIS)which invested in Research Data Storage Infrastructure (RDSI), and recently changedto RDS. The Pawsey node of the national RDSI is comprised of a DDN Gridscalercluster and has approximately four petabytes of high availability storage with a furthermutli-petabyte DMF tape storage backend.

Pawsey is part of The National eResearch Collaboration Tools and Resources(NeCTAR) research cloud federation and has made cloud computing infrastructureavailable to researchers through Pawsey and other participating nodes which together

1

Page 2: O11-4

2

makes up to 30,000 CPU cores nationally. The WA NeCTAR Research Cloud features46 IBM System X 3755 M3 servers as compute nodes. Each node has: 64 cores at2.3GHz; 256GB RAM; 6x 10Gbps links for storage and external access. This makes agrand total of 2944 cores and 11.5TB of memory with 216 terabytes of storage. Thismakes on demand cloud based computing available to researchers through Openstackallowing them to instantiate virtual machines and volume storage as needed.

Astronomy Data Services at Pawsey project leverages the large capacity storagemade available through RDS, and the cloud computing through NeCTAR to publishradio astronomy data.

2. Astronomy Data Services at Pawsey

The purpose of the project is to build a sustainable astronomy data service that wouldhelp researchers to deliver the research outcomes of high impact. The primary goalof the project’s activities is to realise the astronomy ’Data as a Service’ vision for thevarious astronomy projects (and related) collections. This investment will ensure thesedata services to be delivered from the RDS-funded storage operators, Pawsey Super-computing Centre (Pawsey) and the NCI, are consistent, interoperable and high quality,and provide sustainable long-term value that are aligned with the astronomy commu-nity key objectives. Specifically this NCRIS-support activity will enable internationallysignificant research outcomes by adapting and aligning hardware, software and humaninterfaces to address the astronomy-specific challenge, using services provided by NCIand Pawsey. MWA GLEAM radio astronomy survey, that already produced 100TB ofdata to be published, has been used as the pilot prototype.

In general, the activities involve two kinds of support for the Astronomy-data-as-a-service: a) data publishing through data services, and b) user and community supportfor access to data.

3. Architecture

Figure 1 shows the top level architecture of ADS@Pawsey. The raw data from a tele-scope is transported via a network to Pawsey SCC. In the case of MWA there’s a ded-icated 10Gb/s optical-fibre link between Pawsey SCC and Murchison Radio Observa-tory (Tingay S. et al. 2013). The data is then processed in the pipeline on one of thesupercomputers and temporarily stored on a data staging storage at Pawsey. The finaldata products are then typically moved onto a storage for further analysis and data re-duction. Such a storage can be outside Pawsey, e.g. ICRAR, where it’s prepared as acollection for publishing. The movement and management of data is done using NGAS(Wu et al. 2013).

All the services are configured and deployed on virtual machines for transferabilitypurposes. The core services consist of three virtual machines: Data Ingest, VirtualObservatory Services, Visualisation.

The Data Ingest VM runs a copy of NGAS. The ingest can be triggered by eitherNGAS Client or another NGAS to initiate the ingest of data into RDS. RDS is currentlymanaged by Mediaflux1 software. NGAS access access RDS by means of Mediaflux

1https://www.sgi.com/products/storage/idm/mediaflux.html

Page 3: O11-4

3

RDS(Mediaflux)

GLEAM

VO Tools

Cloud (NeCTAR)

<<mountable, persistent>>

VisualisationCache

Radio Astronomy Data Collections (ICRAR)

PawseyData

Storage

Other surveys

NGAS

SkuareView JPIP Server

<<mountable, persistent>>

Metadata DB

Catalogues DB

ADS Services High Level ArchitectureDRAFT 150701

Legend

Virtual Machine

Service

Web browser

JPIP enabled client

Buffer

TAP

SIAP

ADQL

SkuareView JPEG2000

Encoder

WebUI

Client

Storage

NGAS

Pawsey High Performance Computing

Staging Storage

Telescope

VO Services

Visualisation

Data Ingest

Logical Volume

Figure 1. High level architecture of Astronomy Data Services at Pawsey Super-computing Centre.

Page 4: O11-4

4

programatic API wrapped in Python NGAS-plugin scripts. Scheduled for ingest FITSfiles are, firstly, transferred to a temporal Buffer, the metadata is extracted and popu-lated into Metadata Database or imported as a catalogue in VOTable form stored inCatalogue DB. Then the files are moved into RDS and the returned unique identifier isregistered in the database.

Large FITS images and mosaics can be pre-encoded for reoccurring visualisationpurposes. The Encoder converts FITS images or image cubes into JPEG2000 multi-component code-streams stored as jp2 or jpx files in Visualisation Cache. The imagesare encoded lossely with the optimised parameters, giving a typical compression ration1:10, up to 16384 fidelity/resolution layers, and remaining visually indistinguishable forbrowsing experience or quick data analyses (Peters & Kitaeff 2014). The VisualisationVM is running SkuareView software (Kitaeff et al. 2012), which provides streamingJPIP service to any JPIP enabled client (Kitaeff et al. 2015). Currently, JPIP enabledversion of Aladin (Bonnarel et al. 2000) using KDU software2. Other JPIP clients,such as kdu_show, can be used as well. The encoding of images can also be triggereddynamically for FITS files that are already in the archive. Such images can either bedownloaded or placed into visualisation cache for the remote visualisation.

VO Services virtual machine runs IVOA TAP, SAIP, ADQL services as well asWebUI server.

Currently the instances of the above mentioned VMs are running on NeCTARcloud infrastructure. Due to the colocation of Pawsey NeCTAR node with RDS@Pawseystorage that are sharing the same 10 Gb/s network, there’s a minimal latency for be-tween the services and the storage.

4. Conclusion

Astronomy Data Services at Pawsey Supercomputing Centre provide a generic frame-work for the large astronomy data archives to be published on RDS infrastructure. Vir-tualisation on a compute cloud provides a high degree of transferability and easy wayof deploying the services. JPIP SkuareView service provide an effective way of visual-isation and interaction with the large multi-dimentional images.

References

Bonnarel, F., Fernique, P., Bienayme, O., Egret, D., Genova, F., Louys, M., Ochsenbein, F.,Wenger, M., & Bartlett, J. G. 2000, ArXiv Astrophysics e-prints. astro-ph/0002109

Kitaeff, V., Cannon, A., Wicenec, A., & Taubman, D. 2015, Astronomy and Computing, 12,229

Kitaeff, V. V., Wu, C., Wicenec, A., Cannon, A. D., & Vinsen, K. 2012, in Proceedings of the2012 Workshop on High-Performance Computing for Astronomy Data (New York, NY,USA: ACM), Astro-HPC ’12, 25. URL http://doi.acm.org/10.1145/2286976.2286984

Peters, S., & Kitaeff, V. 2014, Astronomy and Computing, 6, 41Tingay S. et al. 2013, Publications of the Astronomical Society of Australia, 30, 21Wu, C., Wicenec, A., Pallot, D., & Checcucci, A. 2013, Experimental Astronomy, 36, 679.

1308.6083

2http://www.kakadusoftware.com/