developing jwst pipelines at stsci robert jedrzejewski

Developing JWST Pipelinesat STScI

Robert Jedrzejewski

Who we are

• The Science Software Branch at STScI

• 16 members

• Most have an astronomy background

• 6 have PhDs

• Combined experience in group: 125 years

• Combined experience at STScI: 200 years

What we do

• Develop HST calibration pipelines• STSDAS/TABLES• PyRAF, PyFITS, STScI_Python• HST Exposure Time Calculators• Other smaller projects

(Gemini/GOODS/Hubble Legacy Archive/GoogleSky/JWST Backplane Stability…)

Development Experience

• Python

• Java

• C/C++

• Fortran

• spp/cl

• IDL

• (Perl/Assembly/Tcl…)

Our preferred development model

• Python!

• We find we can be extremely productive writing in Python

• Speed is occasionally an issue, so we use C extensions when necessary

• Very little pipeline code requires performance optimization

Development style

• Use version control (subversion)• Use regression tests + nightly builds + web

reporting tools• Trac for problem tracking/wiki for

information dissemination• Unit/doc tests• Multiple platforms

(Linux/Mac/Solaris/Windows)

How we did HST pipelines

• Calfoc, calfos, calhrs, calwfpc, calwp2– First generation pipelines, written in spp, read GEIS files

• Calstis, calnic(a/b)– Second generation, written in C using hstio (which wraps IRAF

imio libraries) to read multiple extension FITS files

• Calacs– Borrowed much code from calstis imaging

• Calwfc3– Borrowed much code from calacs, calnic

• Calcos– Third generation, written in Python (+ c where needed)

• Later pipelines were more likely to be used by IDTs for calibrating ground test data

More on HST pipelines

• Pipeline operation is data-driven– Calibration steps as header keywords:

• FLATCORR=PERFORM/OMIT/COMPLETE/SKIPPED

– Reference file names as header keywords• FLATFILE=oref$g2342212_flt.fits

• This decouples some of the intelligence from the code– No need to rebuild code if step or reference file

changes

Multidrizzle

• Multidrizzle is used by the ACS and WFPC2 pipelines to combine images with small position offsets (dithered), removing cosmic rays

• It is a Python application that can be used with ACS, STIS, WFPC2, NICMOS and WFC3 data

• This breaks from our ‘tradition’ of having 1 calibration pipeline program for each instrument

How we see the JWST Pipelines

• A series of calibration steps

Calibration Step

Input stage

Output stage

ReferenceFile

Early design ideas

• No need to have separate pipeline programs for each JWST instrument– Many calibration steps depend on detector, and JWST

instruments use detectors of the same type

– We can use the same code, instead of having to replicate it (and maintain it) in more than one place

– Some calibration steps will probably be identical for all JWST data (e.g. the MASKCORR step, where a static mask from a reference is applied to the DQ array of the data)

Try not to make the mistakes we made with HST

• Use the same keywords for the same quantities• Use the same file/association structure• Use the same algorithms to do the same

calibration– Unless a team shows that a given algorithm does not

work for their instrument– Even then, try and keep as much code common as

possible, only breaking out the code that is different– Sometimes it is possible to encapsulate the differences

in the reference files, keeping the code the same

JWST Pipelines (continued…)

• Python gives us object-oriented capabilities– ‘input_stage’ and ‘output_stage’ are objects that encapsulate

information on their state and on how to calibrate themselves

– For example, they might be NIRSPEC IFU data objects, or MIRI imaging data objects

– When executing a given step, they may use their own custom method, or else defer to a method that they inherit from a more ‘generic’ datatype

– E.g. MIRI imaging data and NIRCAM imaging data may both use the flatfield() method of the JWSTImagingData class, from which they both inherit

JWST Pipelines (continued…)

• The inheritance hierarchy encapsulates information about what is the same and what is different about JWST data types– We can mix in behaviors from different types of object,

as necessary

– But, to the extent that is possible, we try and keep as much the same as possible

– The people who inherit this project will thank us

What goes in?

• IDTs and instrument teams at STScI will figure out:– Which steps are needed, and their ordering– Which instruments/modes use the steps– What each step does– What calibration reference data are needed– What tests the code needs to pass

Facilitating the process

• Calibration data will be in a “public” repository

• This will include:– Code– Test data– Documentation

Facilitating…• We will encourage everyone to try out our algorithms as

we develop them• And we encourage everyone to contribute their own

algorithms• We’ll handle keeping teams synchronized by versioning

and providing different builds– E.g. Team A may still be testing build X, when team B needs to

test the next stage of functionality in build X.1– When Team B is ready to test the functionality in build X.1, there

may already be build X.2 (which includes the functionality in build X.1 as well as new functionality)

– In the end, all the teams will test the same code

Facilitating

• How do we know that the code does the ‘right’ thing?– Teams provide test data with test results

– Then we know that the result is correct because it reproduces team-supplied answers

– Test results could be actual data (e.g. FITS files)• Pixels in pipeline-calibrated data should be identical within +/-

– Or results of analysis• Aperture photometry should be the same to within +/-

Interfacing with other languages• If teams develop code that does a lot of fancy

processing, we can try to include it by wrapping• Python talks to C/C++ using C extensions• An existing C function can be wrapped so that

Python objects can be passed to C/C++, and C objects passed back to Python

• We can wrap relatively simple C functions– Arguments are arrays or primitive datatypes

(integer/float/string…)– No objects as arguments– Structs are OK, as long as they are simple (flat)– Play nice with memory

Wishlists

• We don’t need to feel constrained by HST• What are the biggest deficiencies in HST?

– Best reference files and best calibration steps can be determined by querying a service

• Don’t need to rely on HST archive to find these out

– Reference files can be downloaded as needed– Even calibration code can be updated as needed

(don’t need to wait 6 months for the next STSDAS release)

Wishlists

• Tell us what you want!– The earlier the better– Some aspects of the overall architecture are still

flexible

• And not just pipeline calibration code– We are going to need tools for data analysis,

evaluation, interpretation, visualization– Reference file generation

developing jwst pipelines at stsci robert jedrzejewski

Documents