powerpoint presentationw.astro.berkeley.edu/~jwang/docs/posters/2015lyot_data...title powerpoint...
TRANSCRIPT
-
The GPIES Data Cruncher:An Automated Data Processing System for the Gemini Planet
Imager Exoplanet SurveyJason J. Wang, Pauline Arriaga, Marshall D. Perrin, Dmitry Savransky, James R. Graham,Christian Marois, Julien Rameau, Jean-Baptise Ruffio, and the GPI Team
c
Summary: • The Data Cruncher can automatically process all science and calibration data from the GPI Exoplanet Survey and more• Sensitivity curves and multiple PSF subtraction products are produced one hour after the data are available• The Super Data Cruncher can also run on a supercomputing cluster and reprocess the entire campaign in a few hours
Acknowledgements: This research was supported in part by NASA NNX15AD95G, NASA NNX11AD21G, NSF AST-0909188, and the Universityof California LFRP-118057. The GPI project has been supported by Gemini Observatory, which is operated by AURA, Inc., under a cooperativeagreement with the NSF on behalf of the Gemini partnership: the NSF (USA), the National Research Council (Canada), CONICYT (Chile), theAustralian Research Council (Australia), MCTI (Brazil) and MINCYT (Argentina).
References:Marois, C., Correia, C., Galicher, R., et al. 2014, Proc SPIE, 9148. Perrin, M. D., Maire, J. , Ingraham, P., et al. 2014, Proc SPIE, 9147.Wang, J. J., Ruffio, J.-B., De Rosa, R. J., et al. 2015, Astrophysics Source Code Library, record ascl:1506.001.
Crunchable Data
GPI Exoplanet Survey Science• 1 hour H-band integral field spectroscopy planet search• 10 minute H-band snapshot broadband imaging polarimetry • 1 hour H-band deep broadband imaging polarimetryGPIES Follow-up• Multi-epoch deep follow-up observations in multiple bandsGPI Queue Programs• All coronagraphic data taken for GPIES members’ queue programsCalibrations• All calibration data taken by GPI (which are publically available)
Architecture
Super Data Cruncher
Reduced Data Products
0
0.5
1
1.5
2
2.5
3
0 5 10 15 20 25 30
Runti
me
(Hours
)
# of Datsets and Nodes
Weak Scaling
SummitTaken
DropboxStored & Synced
MySQL DBLogged
Quality Checked
• All data products produced within ~1 hour of the data being available
• All data are synced to Dropbox for accessibility
Datacubes PSF Subtracted Images Contrast Curves
CalibrationspyKLIP: ADI+SDI
pyKLIP: ADI
pyKLIP: ADI+SDIw/ methane
cADI
• Runs on NERSC’s Edison supercomputer (5576 nodes, 133,824 cores, 357 TB RAM)• Uses MPI for inter-node communication• < 100 lines of code needed to implement the Super Data Cruncher• Reprocesses the entire campaign in a few hours
Spectral Cube
Polarimetry Cube
Processing BackendNetwork Interface
Web Socket or MPI
Processing Controller
High-level Python logic that controls dataflow through the various pipelines
Uses queues to communicate between threads and monitors for
synchronization
GPI DRP(Perrin et al. 2014)
TLOCI(Marois et al. 2014)
pyKLIP(Wang et al. 2015)
cADI(UdeM pipeline)
Realtime Scanner
Queues new datasets for processing and updates
the GPIES Wiki
Reprocessor
Queries database to find and process existing datasets on demand
Update Wiki
• Written in Python with some pipeline components written in IDL• Highly modularized, multithreaded, and asynchronous
New Files
Save Reduced
Data Products
SendCommands
Query for data
Check for bad files
Data Flow