developing jwst pipelines at stsci robert jedrzejewski
TRANSCRIPT
Who we are
• The Science Software Branch at STScI
• 16 members
• Most have an astronomy background
• 6 have PhDs
• Combined experience in group: 125 years
• Combined experience at STScI: 200 years
What we do
• Develop HST calibration pipelines• STSDAS/TABLES• PyRAF, PyFITS, STScI_Python• HST Exposure Time Calculators• Other smaller projects
(Gemini/GOODS/Hubble Legacy Archive/GoogleSky/JWST Backplane Stability…)
Our preferred development model
• Python!
• We find we can be extremely productive writing in Python
• Speed is occasionally an issue, so we use C extensions when necessary
• Very little pipeline code requires performance optimization
Development style
• Use version control (subversion)• Use regression tests + nightly builds + web
reporting tools• Trac for problem tracking/wiki for
information dissemination• Unit/doc tests• Multiple platforms
(Linux/Mac/Solaris/Windows)
How we did HST pipelines
• Calfoc, calfos, calhrs, calwfpc, calwp2– First generation pipelines, written in spp, read GEIS files
• Calstis, calnic(a/b)– Second generation, written in C using hstio (which wraps IRAF
imio libraries) to read multiple extension FITS files
• Calacs– Borrowed much code from calstis imaging
• Calwfc3– Borrowed much code from calacs, calnic
• Calcos– Third generation, written in Python (+ c where needed)
• Later pipelines were more likely to be used by IDTs for calibrating ground test data
More on HST pipelines
• Pipeline operation is data-driven– Calibration steps as header keywords:
• FLATCORR=PERFORM/OMIT/COMPLETE/SKIPPED
– Reference file names as header keywords• FLATFILE=oref$g2342212_flt.fits
• This decouples some of the intelligence from the code– No need to rebuild code if step or reference file
changes
Multidrizzle
• Multidrizzle is used by the ACS and WFPC2 pipelines to combine images with small position offsets (dithered), removing cosmic rays
• It is a Python application that can be used with ACS, STIS, WFPC2, NICMOS and WFC3 data
• This breaks from our ‘tradition’ of having 1 calibration pipeline program for each instrument
How we see the JWST Pipelines
• A series of calibration steps
Calibration Step
Input stage
Output stage
ReferenceFile
Early design ideas
• No need to have separate pipeline programs for each JWST instrument– Many calibration steps depend on detector, and JWST
instruments use detectors of the same type
– We can use the same code, instead of having to replicate it (and maintain it) in more than one place
– Some calibration steps will probably be identical for all JWST data (e.g. the MASKCORR step, where a static mask from a reference is applied to the DQ array of the data)
Try not to make the mistakes we made with HST
• Use the same keywords for the same quantities• Use the same file/association structure• Use the same algorithms to do the same
calibration– Unless a team shows that a given algorithm does not
work for their instrument– Even then, try and keep as much code common as
possible, only breaking out the code that is different– Sometimes it is possible to encapsulate the differences
in the reference files, keeping the code the same
JWST Pipelines (continued…)
• Python gives us object-oriented capabilities– ‘input_stage’ and ‘output_stage’ are objects that encapsulate
information on their state and on how to calibrate themselves
– For example, they might be NIRSPEC IFU data objects, or MIRI imaging data objects
– When executing a given step, they may use their own custom method, or else defer to a method that they inherit from a more ‘generic’ datatype
– E.g. MIRI imaging data and NIRCAM imaging data may both use the flatfield() method of the JWSTImagingData class, from which they both inherit
JWST Pipelines (continued…)
• The inheritance hierarchy encapsulates information about what is the same and what is different about JWST data types– We can mix in behaviors from different types of object,
as necessary
– But, to the extent that is possible, we try and keep as much the same as possible
– The people who inherit this project will thank us
What goes in?
• IDTs and instrument teams at STScI will figure out:– Which steps are needed, and their ordering– Which instruments/modes use the steps– What each step does– What calibration reference data are needed– What tests the code needs to pass
Facilitating the process
• Calibration data will be in a “public” repository
• This will include:– Code– Test data– Documentation
Facilitating…• We will encourage everyone to try out our algorithms as
we develop them• And we encourage everyone to contribute their own
algorithms• We’ll handle keeping teams synchronized by versioning
and providing different builds– E.g. Team A may still be testing build X, when team B needs to
test the next stage of functionality in build X.1– When Team B is ready to test the functionality in build X.1, there
may already be build X.2 (which includes the functionality in build X.1 as well as new functionality)
– In the end, all the teams will test the same code
Facilitating
• How do we know that the code does the ‘right’ thing?– Teams provide test data with test results
– Then we know that the result is correct because it reproduces team-supplied answers
– Test results could be actual data (e.g. FITS files)• Pixels in pipeline-calibrated data should be identical within +/-
– Or results of analysis• Aperture photometry should be the same to within +/-
Interfacing with other languages• If teams develop code that does a lot of fancy
processing, we can try to include it by wrapping• Python talks to C/C++ using C extensions• An existing C function can be wrapped so that
Python objects can be passed to C/C++, and C objects passed back to Python
• We can wrap relatively simple C functions– Arguments are arrays or primitive datatypes
(integer/float/string…)– No objects as arguments– Structs are OK, as long as they are simple (flat)– Play nice with memory
Wishlists
• We don’t need to feel constrained by HST• What are the biggest deficiencies in HST?
– Best reference files and best calibration steps can be determined by querying a service
• Don’t need to rely on HST archive to find these out
– Reference files can be downloaded as needed– Even calibration code can be updated as needed
(don’t need to wait 6 months for the next STSDAS release)