arxiv:1504.02936v3 [astro-ph.im] 21 dec 2015ing and image distortion (see, e.g., alard & lup-ton...

24
Automated Transient Identification in the Dark Energy Survey D. A. Goldstein, 1,2 C. B. D’Andrea, 3 J. A. Fischer, 4 R. J. Foley, 5,6 R. R. Gupta, 7 R. Kessler, 8,9 A. G. Kim, 2 R. C. Nichol, 3 P. Nugent, 1,2 A. Papadopoulos, 3 M. Sako, 4 M. Smith, 10 M. Sullivan, 10 R. C. Thomas, 2 W. Wester, 11 R.C. Wolf, 4 F. B. Abdalla, 12 M. Banerji, 13,14 A. Benoit-L´ evy, 12 E. Bertin, 15 D. Brooks, 12 A. Carnero Rosell, 16,17 F. J. Castander, 18 L. N. da Costa, 16,17 R. Covarrubias, 19 D. L. DePoy, 20 S. Desai, 21 H. T. Diehl, 11 P. Doel, 12 T. F. Eifler, 4,22 A. Fausti Neto, 16 D. A. Finley, 11 B. Flaugher, 11 P. Fosalba, 18 J. Frieman, 8,11 D. Gerdes, 23 D. Gruen, 24,25 R. A. Gruendl, 5,19 D. James, 26 K. Kuehn, 27 N. Kuropatkin, 11 O. Lahav, 12 T. S. Li, 20 M. A. G. Maia, 16,17 M. Makler, 28 M. March, 4 J. L. Marshall, 20 P. Martini, 29,30 K. W. Merritt, 11 R. Miquel, 31,32 B. Nord, 11 R. Ogando, 16,17 A. A. Plazas, 22,33 A. K. Romer, 34 A. Roodman, 35,36 E. Sanchez, 37 V. Scarpine, 11 M. Schubnell, 23 I. Sevilla-Noarbe, 5,37 R. C. Smith, 26 M. Soares-Santos, 11 F. Sobreira, 11,16 E. Suchyta, 29,38 M. E. C. Swanson, 19 G. Tarle, 23 J. Thaler, 6 A. R. Walker 26 ABSTRACT We describe an algorithm for identifying point-source transients and moving objects on reference-subtracted optical images containing artifacts of processing and instrumentation. The algorithm makes use of the supervised machine learning technique known as Random Forest. We present results from its use in the Dark Energy Survey Supernova program (DES-SN), where it was trained using a sample of 898,963 signal and background events generated by the tran- sient detection pipeline. After reprocessing the data collected during the first DES-SN observing season (Sep. 2013 through Feb. 2014) using the algorithm, the number of transient candidates eligible for human scanning decreased by a factor of 13.4, while only 1.0 percent of the arti- ficial Type Ia supernovae (SNe) injected into search images to monitor survey efficiency were lost, most of which were very faint events. Here we characterize the algorithm’s performance in detail, and we discuss how it can inform pipeline design decisions for future time-domain imaging surveys, such as the Large Synoptic Survey Telescope and the Zwicky Transient Facility. An implementation of the algorithm and the training data used in this paper are available at http://portal.nersc.gov/project/dessn/autoscan. Subject headings: transients – discovery, algorithms – statistical, random forest, machine learning. 1 Department of Astronomy, University of California, Berkeley, 501 Campbell Hall #3411, Berkeley, CA 94720 2 Lawrence Berkeley National Laboratory, 1 Cyclotron Road, Berkeley, CA 94720, USA 3 Institute of Cosmology and Gravitation, University of Portsmouth, Dennis Sciama Building, Burnaby Road, Portsmouth, PO1 3FX, UK 4 Department of Physics and Astronomy, University of Pennsylvania, Philadelphia, PA 19104, USA 5 Astronomy Department, University of Illinois at Urbana-Champaign, 1002 W. Green Street, Urbana, IL 61801, USA 6 Department of Physics, University of Illinois at Urbana-Champaign, 1110 W. Green Street, Urbana, IL 61801, USA 7 Argonne National Laboratory, 9700 South Cass Av- enue, Lemont, IL 60439, USA 8 Kavli Institute for Cosmological Physics, University of Chicago, Chicago, IL 60637, USA 9 Department of Astronomy and Astrophysics, Univer- sity of Chicago, 5640 South Ellis Avenue, Chicago, IL 60637, USA 10 School of Physics and Astronomy, University of Southampton, Highfield, Southampton, SO17 1BJ, UK 11 Fermi National Accelerator Laboratory, P. O. Box 500, 1 arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015

Upload: others

Post on 12-Nov-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Automated Transient Identification in the Dark Energy Survey

D. A. Goldstein,1,2 C. B. D’Andrea,3 J. A. Fischer,4 R. J. Foley,5,6 R. R. Gupta,7

R. Kessler,8,9 A. G. Kim,2 R. C. Nichol,3 P. Nugent,1,2 A. Papadopoulos,3 M. Sako,4

M. Smith,10 M. Sullivan,10 R. C. Thomas,2 W. Wester,11 R.C. Wolf,4 F. B. Abdalla,12

M. Banerji,13,14 A. Benoit-Levy,12 E. Bertin,15 D. Brooks,12 A. Carnero Rosell,16,17

F. J. Castander,18 L. N. da Costa,16,17 R. Covarrubias,19 D. L. DePoy,20 S. Desai,21

H. T. Diehl,11 P. Doel,12 T. F. Eifler,4,22 A. Fausti Neto,16 D. A. Finley,11 B. Flaugher,11

P. Fosalba,18 J. Frieman,8,11 D. Gerdes,23 D. Gruen,24,25 R. A. Gruendl,5,19 D. James,26

K. Kuehn,27 N. Kuropatkin,11 O. Lahav,12 T. S. Li,20 M. A. G. Maia,16,17 M. Makler,28

M. March,4 J. L. Marshall,20 P. Martini,29,30 K. W. Merritt,11 R. Miquel,31,32 B. Nord,11

R. Ogando,16,17 A. A. Plazas,22,33 A. K. Romer,34 A. Roodman,35,36 E. Sanchez,37

V. Scarpine,11 M. Schubnell,23 I. Sevilla-Noarbe,5,37 R. C. Smith,26 M. Soares-Santos,11

F. Sobreira,11,16 E. Suchyta,29,38 M. E. C. Swanson,19 G. Tarle,23 J. Thaler,6 A. R. Walker26

ABSTRACT

We describe an algorithm for identifying point-source transients and moving objects onreference-subtracted optical images containing artifacts of processing and instrumentation. Thealgorithm makes use of the supervised machine learning technique known as Random Forest. Wepresent results from its use in the Dark Energy Survey Supernova program (DES-SN), whereit was trained using a sample of 898,963 signal and background events generated by the tran-sient detection pipeline. After reprocessing the data collected during the first DES-SN observingseason (Sep. 2013 through Feb. 2014) using the algorithm, the number of transient candidateseligible for human scanning decreased by a factor of 13.4, while only 1.0 percent of the arti-ficial Type Ia supernovae (SNe) injected into search images to monitor survey efficiency werelost, most of which were very faint events. Here we characterize the algorithm’s performancein detail, and we discuss how it can inform pipeline design decisions for future time-domainimaging surveys, such as the Large Synoptic Survey Telescope and the Zwicky Transient Facility.An implementation of the algorithm and the training data used in this paper are available athttp://portal.nersc.gov/project/dessn/autoscan.

Subject headings: transients – discovery, algorithms – statistical, random forest, machine learning.

1Department of Astronomy, University of California,Berkeley, 501 Campbell Hall #3411, Berkeley, CA 94720

2Lawrence Berkeley National Laboratory, 1 CyclotronRoad, Berkeley, CA 94720, USA

3Institute of Cosmology and Gravitation, Universityof Portsmouth, Dennis Sciama Building, Burnaby Road,Portsmouth, PO1 3FX, UK

4Department of Physics and Astronomy, University ofPennsylvania, Philadelphia, PA 19104, USA

5Astronomy Department, University of Illinois atUrbana-Champaign, 1002 W. Green Street, Urbana, IL61801, USA

6Department of Physics, University of Illinois at

Urbana-Champaign, 1110 W. Green Street, Urbana, IL61801, USA

7Argonne National Laboratory, 9700 South Cass Av-enue, Lemont, IL 60439, USA

8Kavli Institute for Cosmological Physics, University ofChicago, Chicago, IL 60637, USA

9Department of Astronomy and Astrophysics, Univer-sity of Chicago, 5640 South Ellis Avenue, Chicago, IL60637, USA

10School of Physics and Astronomy, University ofSouthampton, Highfield, Southampton, SO17 1BJ, UK

11Fermi National Accelerator Laboratory, P. O. Box 500,

1

arX

iv:1

504.

0293

6v3

[as

tro-

ph.I

M]

21

Dec

201

5

Page 2: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Batavia, IL 60510, USA12Department of Physics & Astronomy, University Col-

lege London, Gower Street, London, WC1E 6BT, UK13Kavli Institute for Cosmology, University of Cam-

bridge, Madingley Road, Cambridge CB3 0HA, UK14Institute of Astronomy, University of Cambridge, Mad-

ingley Road, Cambridge CB3 0HA, UK15Institut d’Astrophysique de Paris, Univ. Pierre et

Marie Curie & CNRS UMR7095, F-75014 Paris, France16Laboratorio Interinstitucional de e-Astronomia -

LIneA, Rua Gal. Jose Cristino 77, Rio de Janeiro, RJ -20921-400, Brazil

17Observatorio Nacional, Rua Gal. Jose Cristino 77, Riode Janeiro, RJ - 20921-400, Brazil

18Institut de Ciencies de l’Espai, IEEC-CSIC, CampusUAB, Facultat de Ciencies, Torre C5 par-2, 08193 Bel-laterra, Barcelona, Spain

19National Center for Supercomputing Applications,1205 West Clark St., Urbana, IL 61801, USA

20George P. and Cynthia Woods Mitchell Institute forFundamental Physics and Astronomy, and Department ofPhysics and Astronomy, Texas A&M University, CollegeStation, TX 77843, USA

21Department of Physics, Ludwig-Maximilians-Universitaet, Scheinerstr. 1, 81679 Muenchen, Germany

22Jet Propulsion Laboratory, California Institute ofTechnology, 4800 Oak Grove Dr., Pasadena, CA 91109,USA

23Department of Physics, University of Michigan, AnnArbor, MI 48109, USA

24Max Planck Institute for Extraterrestrial Physics,Giessenbachstrasse, 85748 Garching, Germany

25University Observatory Munich, Scheinerstrasse 1,81679 Munich, Germany

26Cerro Tololo Inter-American Observatory, NationalOptical Astronomy Observatory, Casilla 603, Colina ElPino S/N, La Serena, Chile

27Australian Astronomical Observatory, North Ryde,NSW 2113, Australia

28ICRA, Centro Brasileiro de Pesquisas Fısicas, Rua Dr.Xavier Sigaud 150, CEP 22290-180, Rio de Janeiro, RJ,Brazil

29Center for Cosmology and Astro-Particle Physics, TheOhio State University, Columbus, OH 43210, USA

30Department of Astronomy, The Ohio State University,Columbus, OH 43210, USA

31Institut de Fısica d’Altes Energies, UniversitatAutonoma de Barcelona, E-08193 Bellaterra, Barcelona,Spain

32Institucio Catalana de Recerca i Estudis Avancats, E-08010 Barcelona, Spain

33Brookhaven National Laboratory, Bldg 510, Upton,NY 11973, USA

34Astronomy Centre, University of Sussex, Falmer,Brighton, BN1 9QH, UK

35Kavli Institute for Particle Astrophysics & Cosmology,

1. Introduction

To identify scientifically valuable transients ormoving objects on the sky, imaging surveys havehistorically adopted a manual approach, employ-ing humans to visually inspect images for signa-tures of the events (e.g., Zwicky 1964; Hamuy et al.1993; Perlmutter et al. 1997; Schmidt et al. 1998;Filippenko et al. 2001; Strolger et al. 2004; Blancet al. 2004; Astier et al. 2006; Sako et al. 2008;Mainzer et al. 2011; Waszczak et al. 2013; Restet al. 2014). But recent advances in the capabil-ities of telescopes, detectors, and supercomputershave fueled a dramatic rise in the data produc-tion rates of such surveys, straining the ability oftheir teams to quickly and comprehensively lookat images to perform discovery.

For surveys that search for objects on differ-ence images—CCD images that reveal changes inthe appearance of a region of the sky between twopoints in time—this problem of data volume iscompounded by the problem of data purity. Dif-ference images are produced by subtracting refer-ence images from single-epoch images in a processthat involves point-spread function (PSF) match-ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detectionsof astrophysical variability, they can contain arti-facts of the differencing process, such as poorlysubtracted galaxies, and artifacts of the single-epoch images, such as cosmic rays, optical ghosts,star halos, defective pixels, near-field objects, andCCD edge effects. Some examples are presentedin Figure 1. These artifacts can vastly outnumberthe signatures of scientifically valuable sources onthe images, forcing object detection thresholds tobe considerably higher than what is to be expectedfrom Gaussian fluctuations.

For time-domain imaging surveys with a spec-troscopic follow-up program, these issues of datavolume and purity are compounded by time-pressure to produce lists of the most promising

P. O. Box 2450, Stanford University, Stanford, CA 94305,USA

36SLAC National Accelerator Laboratory, Menlo Park,CA 94025, USA

37Centro de Investigaciones Energeticas, Medioambien-tales y Tecnologicas (CIEMAT), Madrid, Spain

38Department of Physics, The Ohio State University,Columbus, OH 43210, USA

2

Page 3: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

(a)

(b)

Transient

(c)

Transient FakeSN

FakeSN

CR/Bad Column

BadSub

BadSub

BadSub

Fig. 1.— Cutouts of DES difference images, roughly 14 arcsec on a side, centered on legitimate (greenboxes; left four columns of figure) and spurious (red boxes; right four columns of figure) objects, at avariety of signal-to-noise ratios: (a) S/N ≤ 10, (b) 10 < S/N ≤ 30, (c) 30 < S/N ≤ 100. The cutouts aresubclassed to illustrate both the visual diversity of spurious objects and the homogeneity of authentic ones.Objects in the “Transient” columns are real astrophysical transients that subtracted cleanly. Objects in the“Fake SN” columns are fake SNe Ia injected into transient search images to monitor survey efficiency. Thecolumn labeled “CR/Bad Column” shows detections of cosmic rays (rows b and c) and a bad column on theCCD detector (row a). The columns labeled “Bad Sub” show non-varying astrophysical sources that didnot subtract cleanly; this can result from poor astrometric solutions, shallow templates, or bad observingconditions. The numbers at the bottom of each cutout indicate the score that each detection received fromthe machine learning algorithm introduced in §3; a score of 1.0 indicates the algorithm is perfectly confidentthat the detection is not an artifact, while a score of 0.0 indicates the opposite.

3

Page 4: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

targets for follow-up observations before they be-come too faint to observe or fall outside a windowof scientific utility. Ongoing searches for Type Iasupernovae (SNe Ia) out to z ∼ 1, such as thePanoramic Survey Telescope and Rapid ResponseSystem Medium Deep Survey’s (Rest et al. 2014)and the Dark Energy Survey’s (DES; Flaugher2005), face all three of these challenges. The DESsupernova program (DES-SN; Bernstein et al.2012), for example, produces up to 170 gigabytesof raw imaging data on a nightly basis. Visualexamination of sources extracted from the result-ing difference images using SExtractor (Bertin &Arnouts 1996) revealed that ∼93 percent are arti-facts, even after selection cuts (Kessler et al. 2015,in preparation). Additionally, the survey has ascience-critical spectroscopic follow-up programfor which it must routinely select the ∼10 mostpromising transient candidates from hundreds ofpossibilities, most of which are artifacts. This pro-gram is crucial to survey science as it allows DESto confirm transient candidates as SNe, train andoptimize its photometric SN typing algorithms(e.g., PSNID; Sako et al. 2011, NNN; Karpenka,Feroz, & Hobson 2013), and investigate interest-ing non-SN transients. To prepare a list of objectseligible for consideration for spectroscopic follow-up observations, members of DES-SN scannednearly 1 million objects extracted from differenceimages during the survey’s first observing season,the numerical equivalent of nearly a week of un-interrupted scanning time, assuming scanning oneobject takes half a second.

For DES to meet its discovery goals, moreefficient techniques for artifact rejection on dif-ference images are needed. Efforts to “crowd-source” similar large-scale classification problemshave been successful at scaling with growing datarates; websites such as Zooniverse.org have ac-cumulated over one million users to tackle a va-riety of astrophysical classification problems, in-cluding the classification of transient candidatesfrom the Palomar Transient Factory (PTF; Smithet al. 2011). However, for DES to optimize classi-fication accuracy and generate reproducible clas-sification decisions, automated techniques are re-quired.

To reduce the number of spurious candidatesconsidered for spectroscopic follow-up, many sur-veys impose selection requirements on quantities

that can be directly and automatically computedfrom the raw imaging data. Making hard selectioncuts of this kind has been shown to be a subop-timal technique for artifact rejection in differenceimaging. Although such cuts are automatic andeasy to interpret, they do not naturally handle cor-relations between features, and they are an ineffi-cient way to select a subset of the high-dimensionalfeature space as the number of dimensions growslarge (Bailey et al. 2007).

In contrast to selection cuts, machine learning(ML) classification techniques provide a flexiblesolution to the problem of artifact rejection indifference imaging. In general, these techniquesattempt to infer a precise mapping between nu-meric features that describe characteristics of ob-served data, and the classes or labels assigned tothose data, using a training set of feature-classpairs. ML classification algorithms that gener-ate decision rules using labeled data—data whoseclass membership has already been definitivelyestablished—are called “supervised” algorithms.After generating a decision rule, supervised MLclassifiers can be used to predict the classes of un-labeled data instances. For a review of supervisedML classification in astronomy, see, e.g. Ivezicet al. (2013). For an introduction to the statis-tical underpinnings of supervised ML classifica-tion techniques, see Willsky, Wornell, & Shapiro(2003).

Such classifiers address many of the shortcom-ings of scanning and selection cuts. ML algo-rithms’ decisions are automatic, reproducible, andfast enough to process streaming data in real-time.Their biases can be systematically and quantita-tively studied, and, most importantly, given ade-quate computing resources, they remain fast andconsistent in the face of increasing data produc-tion rates. As more data are collected, ML meth-ods can continue to refine their knowledge abouta data set (see §5.1), thereby improving theirpredictive performance on future data. Super-vised ML classification techniques are currentlyused in a variety of astronomical contexts, in-cluding time-series analysis, such as the classifi-cation of variable stars (Richards et al. 2011) andSNe (Karpenka, Feroz, & Hobson 2013) from lightcurves, and image analysis, such as the typingof galaxies (Banerji et al. 2010), and discoveryof trans-Neptunian objects (Gerdes et al. 2015,

4

Page 5: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

in preparation) on images. Although their inputdata types differ, light curve shape and image-based ML classification frameworks are quite sim-ilar: both operate on tabular numeric classifica-tion features computed from raw input data (see§3.2.2).

The use of supervised machine learning classi-fication techniques for artifact rejection in differ-ence imaging was pioneered by Bailey et al. (2007)for the Nearby Supernova Factory (Aldering etal. 2002) using imaging data from the Near-EarthAsteroid Tracking program1 and the Palomar-QUEST Consortium, using the 112-CCD QUEST-II camera (Baltay et al. 2007). They comparedthe performance of three supervised classificationtechniques—a Support Vector Machine, a Ran-dom Forest, and an ensemble of boosted decisiontrees—in separating a combination of real and fakedetections of SNe from background events. Theyfound that boosted decision trees constructed froma library of astrophysical domain features (magni-tude, FWHM, distance to the nearest object inthe reference co-add, measures of roundness, etc.)provided the best overall performance.

Bloom et al. (2012) built on the methodologyof Bailey et al. (2007) by developing a highly ac-curate Random Forest framework for classifyingdetections of variability extracted from PTF dif-ference images. Brink et al. (2013) made improve-ments to the classifier of Bloom et al. (2012), set-ting an unbroken benchmark for best overall per-formance on the PTF data set, using the tech-nique of recursive feature elimination to optimizetheir classifier. Recently, du Buisson et al. (2014)published a systematic comparison of several clas-sification algorithms using features based on Prin-cipal Component Analysis (PCA) extracted fromSloan Digital Sky Survey-II SN survey differenceimages. Finally, Wright et al. (2015) used a pixel-based approach to engineer a Random Forest clas-sifier for the Pan-STARRS Medium Deep Survey.

In this article, we describe autoScan, a com-puter program developed for this purpose in DES-SN. Our main objective is to report the method-ology that DES-SN adopted to construct an ef-fective supervised classifier, with an eye towardinforming the design of similar frameworks for fu-ture time domain surveys such as the Large Syn-

1http://neat.jpl.nasa.gov.

optic Survey Telescope (LSST; LSST Science Col-laboration 2009) and the Zwicky Transient Facility(ZTF; Smith et al. 2014). We extend the work ofprevious authors to a newer, larger data set, show-ing how greater selection efficiency can be achievedby increasing training set size, using generativemodels for training data, and implementing newclassification features.

The structure of the paper is as follows. In §2,we provide an overview of DES and the DES-SNtransient detection pipeline. In §3, we describethe development of autoScan. In §4, we presentmetrics for evaluating the code’s performance andreview its performance on a realistic classificationtask. In §5, we discuss lessons learned and areasof future development that can inform the designof similar frameworks for future surveys.

2. The Dark Energy Survey and TransientDetection Pipeline

In this section, we introduce DES and theDES-SN transient detection pipeline (“DiffImg”;Kessler et al. 2015, in preparation), whichproduced the data used to train and validateautoScan. DES is a Stage III ground-based darkenergy experiment designed to provide the tightestconstraints to date on the dark energy equationof state parameter using observations of the fourmost powerful probes of dark energy suggestedby the Dark Energy Task Force (DETF; Albrechtet al. 2006): SNe Ia, galaxy clusters, baryon acous-tic oscillations, and weak gravitational lensing.DES consists of two interleaved imaging surveys:a wide-area survey that covers 5,000 deg2 of thesouth Galactic cap in 5 filters (grizY ), and DES-SN, a time-domain transient survey that covers10 (8 “shallow” and 2 “deep”) 3 deg2 fields inthe XMM-LSS, ELAIS-S, CDFS, and Stripe-82regions of the sky, in four filters (griz). The sur-vey’s main instrument, the Dark Energy Cam-era (DECam; Diehl et al. 2012; Flaugher et al.2012; Flaugher et al. 2015, submitted), is a 570-megapixel 3 deg2 imager with 62 fully depleted,red-sensitive CCDs. It is mounted at the primefocus of the Victor M. Blanco 4m telescope at theCerro Tololo Inter-American Observatory (CTIO).DES conducted “science verification” (SV) com-missioning observations from November 2012 untilFebruary 2013, and it began science operations in

5

Page 6: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

August 2013 that will continue until at least 2018(Diehl et al. 2014). The data used in this articleare from the first season of DES science operations(“Y1”; Aug. 2013—Feb. 2014).

A schematic of the pipeline that DES-SN em-ploys to discover transients is presented in Figure2. Transient survey “science images” are single-epoch CCD images from the DES-SN fields. Afterthe image subtraction step, sources are extractedusing SExtractor. Sources that pass the cuts de-scribed in the Object section of Table 1 are re-ferred to as “detections.” A “raw candidate” is de-fined when two or more detections match to within1”. A raw candidate is promoted to a “science can-didate” when it passes the NUMEPOCHS requirementin Table 1. This selection requirement was im-posed to reject Solar System objects, such as mainbelt asteroids and Kuiper belt objects, which movesubstantially on images from night to night. Sci-ence candidates are eligible for visual examinationand spectroscopic follow-up observations. Duringthe observing season, science candidates are rou-tinely photometered, fit with multi-band SN lightcurve models, visually inspected, and slated forspectroscopic follow-up.

3. Classifier Development

In this section, we describe the development ofautoScan. We present the classifier’s training dataset (§3.1), its classification feature set (§3.2), andthe selection (§3.3), properties (§3.4), and opti-mization (§3.5) of its core classification algorithm.

3.1. Training Data

To make probabilistic statements about theclass membership of new data, supervised MLclassifiers must be trained or fit to existing datawhose true class labels are already known. Eachdata instance is described by numeric classification“features” (see §3.2.2); an effective training dataset must approximate the joint feature distribu-tions of all classes considered. Objects extractedfrom difference images can belong to one of twoclasses: “Artifacts,” or “Non-Artifacts.” Examplesof each class must be present in the training set.Failing to include data from certain regions of fea-ture space can corrode the predictive performanceof the classifier in those regions, introducing biasinto the search that can systematically degrade

survey efficiency (Richards et al. 2012). Becausethe training set compilation described here tookplace during the beginning of Y1, it was compli-cated by a lack of available visually scanned “non-artifact” sources.

Fortunately, labeling data does not necessarilyrequire humans to visually inspect images. Bloomet al. (2012) discuss a variety of methods for label-ing detections of variability produced by differenceimaging pipelines, including scanning alternativessuch as artificial source construction and spectro-scopic follow-up. Scanning, spectroscopy, and us-ing fake data each have their respective merits anddrawbacks. Scanning is laborious and potentiallyinaccurate, especially if each data instance is onlyexamined by one scanner, or if scanners are notwell trained. However, a large group of scannerscan quickly label a number of detections sufficientto create a training set for a machine classifier, andBrink et al. (2013) have shown that the supervisedclassification algorithm Random Forest, which wasultimately selected for autoScan, is insensitive tomislabeled training data up to a contaminationlevel of 10 percent.

Photometric typing (e.g., Sako et al. 2011) canalso be useful for labeling detections of transients.However, robust photometric typing requires well-sampled light curves, which in turn require high-cadence photometry of difference image objectsover timescales of weeks or months. This require-ment is prohibitive for imaging surveys in theirearly stages. Further, because photometric typingis an integral part of the spectroscopic target se-lection process, by extension new imaging surveysalso have too few detections of spectroscopicallyconfirmed SNe, AGN, or variable stars. Nativespectroscopic training samples are therefore im-practical sources of training data for new surveys.

Artificial source construction is the fastestmethod for generating native detections of non-artifact sources in the early stages of a survey.Large numbers of artificial transients (“fakes”) canbe injected into survey science images, and by con-struction their associated detections are true pos-itives. Difficulties can arise when the joint featuredistributions of fakes selected for the training setdo not approximate the joint feature distributionsof observed transients in production. In DES-SN,SN Ia fluxes from fake SN Ia light curves are over-laid on images near real galaxies. The fake SN Ia

6

Page 7: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

4/5/2015 diffimg (8).svg

file:///Users/dgold/Downloads/diffimg%20(8).svg 1/1

Generate single-epochtransient search

images

Mask cosmic rays +bad pixels, detrend

Perform astrometryfor each CCD(scamp)

Inject fake SNe Ia tomonitor efficiency

Transform templatesto align with new images(swarp)

Extract objects fromdifference images(SExtractor)

Visually scancandidates with at least

two nights of non-artifact detections 

Spatially associateobjects into transient

candidates 

Select targets forspectroscopy

Make selection cuts on extracted objects

(see Table 1)

Subtract templatesfrom new images

(hotpants)

Identify artifacts withmachine learning

algorithm(autoScan)

Fig. 2.— Schematic of the DES-SN transient detection pipeline. The magnitudes of fake SNe Ia used tomonitor survey efficiency are calibrated using the zero point of the images into which they are injected andgenerated according to the procedure described in §3.1. The autoScan step (red box) occurs after selectioncuts are applied to objects extracted from difference images and before objects are spatially associated intoraw transient candidates. Codes used at specific steps are indicated in parenthesis.

7

Page 8: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Table 1

DES-SN object and candidate selection requirements.

Set Feature Lower Limit Upper Limit Description

Object MAG · · · 30.0 Magnitude from SExtractor.A IMAGE · · · 1.5 pix. Length of semi-major axis from

SExtractor.SPREAD MODEL · · · 3σS + 1.0 Star-galaxy separation output parameter

from SExtractor. σS is the estimatedSPREAD MODEL uncertainty.

CHISQ · · · 104 χ2 from PSF-fit to 35 × 35 pixel cutoutaround object in difference image.

SNR 3.5 · · · Flux from a PSF-model fit to a 35×35 pixelcutout around the object divided by the un-certainty from the fit.

VETOMAGa 21.0 · · · Magnitude from SExtractor for use in vetocatalog check.

VETOTOLa Magnitude-dependent

· · · Separation from nearest object in veto cat-alog of bright stars.

DIPOLE6 · · · 2 Npix in 35× 35 pixel object-centered cutoutat least 6σ below 0.

DIPOLE4 · · · 20 Npix in 35× 35 pixel object-centered cutoutat least 4σ below 0.

DIPOLE2 · · · 200 Npix in 35× 35 pixel object-centered cutoutat least 2σ below 0.

Candidate NUMEPOCHS 2 · · · Number of distinct nights that the candidateis detected.

aThe difference imaging pipeline is expected to produce false positives near bright or variable stars, thusall difference image objects are checked against a “veto” catalog of known bright and variable stars and arerejected if they are brighter than 21st magnitude and within a magnitude-dependent radius of a veto catalogsource. Thus only one of VETOMAG and VETOTOL must be satisfied for an object to be selected.

8

Page 9: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

light curves are generated by the SNANA simulation(Kessler et al. 2009), and they include true parentpopulations of stretch and color, a realistic modelof intrinsic scatter, a redshift range from 0.1 to1.4, and a galaxy location proportional to surfacebrightness. On difference images, detections ofoverlaid fakes are visually indistinguishable fromreal point-source transients and Solar System ob-jects moving slowly enough not to streak. All fakeSN Ia light curves are generated and stored priorto the start of the survey. The overlay procedureis part of the difference imaging pipeline, wherethe SN Ia flux added to the image is scaled bythe zero point, spread over nearby pixels usinga model of the PSF, and fluctuated by randomPoisson noise. These fakes are used to monitorthe single-epoch transient detection efficiency, aswell as the candidate efficiency in which detectionson two distinct nights are required. On average,six detections of fake SNe are overlaid on eachsingle-epoch CCD-image.

The final autoScan training set contained de-tections of visually scanned artifacts and artificialsources only. We did not include detections of pho-tometrically typed transients to minimize the con-tamination of the “Non-Artifact” class with falsepositives. Bailey et al. (2007) also used a train-ing set in which the “Non-Artifact” class consistedlargely of artificial sources.

With 898,963 training instances in total, theautoScan training set is the largest used for differ-ence image artifact rejection in production. It wassplit roughly evenly between “real” and “artifact”labeled instances—454,092 were simulated SNe Iainjected onto host galaxies, while the remaining444,871 detections were human-scanned artifacts.Compiling a set of artifacts to train autoScan wasaccomplished by taking a random sample of theobjects that had been scanned as artifacts by hu-mans during an early processing of DES Y1 datawith a pared-down version of the difference imag-ing pipeline presented in Figure 2.

3.2. Features and Processing

The supervised learning algorithms we considerin this analysis are nonlinear functions that mappoints representing individual detections in fea-ture space to points in a space of object classesor class probabilities. The second design choice indeveloping autoScan is therefore to define a suit-

able feature space in which to represent the datainstances we wish to use for training, validation,and prediction. In this section, we describe theclassification features that we computed from theraw output of the difference imaging pipeline, aswell as the steps used to pre- and post-processthese features.

3.2.1. Data Preprocessing

The primary data sources for autoScan fea-tures are 51×51 pixel object-centered search, tem-plate, and difference image cutouts. The templateand difference image cutouts are sky-subtracted.The search image cutout is sky-subtracted if andonly if it does not originate from a coadded ex-posure, though this is irrelevant for what followsas no features are directly computed from searchimage pixel values. Photometric measurements,SExtractor output parameters, and other datasources are also used. Each cutout associated witha detection is compressed to 25 × 25 pixels. Theseeing for each search image is usually no less than1 arcsec, while the DECam pixel scale lies be-tween 0.262 and 0.264 arcsec depending on thelocation on the focal plane, so little information islost during compression. Although some artifactsare sharper than the seeing, we found that usingcompressed cutouts to compute some features re-sulted in better performance.

Consider a search, template, or difference im-age cutout associated with a single detection. Letthe matrix element Ix,y of the 51 × 51 matrix Irepresent the flux-value of the pixel at locationx, y on the cutout. We adopt the convention ofzero-based indexing and the convention that ele-ment (0, 0) corresponds to the pixel at the topleft-hand corner of the cutout. Let the matrix el-ement Cx,y of the 25× 25 matrix C represent theflux-value of the pixel at location x, y on the com-pressed cutout. Then C is defined element-wisefrom I via

Cx,y =1

Nu

1∑i=0

1∑j=0

I2x+i,2y+j , (1)

where Nu is the number of unmasked pixels in thesum. Masked pixels are excluded from the sum.Only when all four terms in the sum representmasked pixels is the corresponding pixel maskedin C. Note that matrix elements from the right-

9

Page 10: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

hand column and last row of I never appear inEquation 1.

To ensure that the pixel flux-values acrosscutouts are comparable, we rescale the pixel valuesof each compressed cutout via ”’;

Rx,y =Cx,y −med(C)

σ, (2)

where the matrix element Rx,y of the 25× 25 ma-trix R represents the flux-value of the pixel at loca-tion x, y on the compressed, rescaled cutout, andσ is a consistent estimator of the standard devia-tion of C. We take the median absolute deviationas a consistent estimator of the standard deviation(Rousseeuw & Croux 1993), according to

σ =med(|C −med(C)|)

Φ−1(34

) (3)

where 1/Φ−1(3/4) ≈ 1.4826 is the reciprocal ofthe inverse cumulative distribution for the stan-dard normal distribution evaluated at 3/4. Thisis done to ensure that the effects of defective pix-els and cosmic rays nearly perpendicular to thefocal plane are suppressed. We therefore have thefollowing closed-form expression for the matrix el-ement Rx,y,

Rx,y ≈1

1.4826

[Cx,y −med(C)

med(|C −med(C)|)

]. (4)

The rescaling expresses the value of each pixel onthe compressed cutout as the number of standarddeviations above the median. Masked pixels areexcluded from the computation of the median inEquation 4.

Finally, an additional rescaling from Brink et al.(2013) is defined according to

Bx,y =Ix,y −med(I)

max(|I|)(5)

The size of B is 51 × 51. We found that usingB instead of R or I to compute certain featuresresulted in better classifier performance. Maskedpixels are excluded from the computation of themedian in Equation 5.

3.2.2. Feature Library

Two feature libraries were investigated. Thefirst was primarily “pixel-based.” For a given ob-ject, each matrix element of the rescaled, com-pressed search, template, and difference cutouts

was used as a feature. The CCD ID number ofeach detection was also used, as DECam has 62CCDs with specific artifacts (such as bad columnsand hot pixels) as well as effects that are repro-ducible on the same CCD depending on which fieldis observed (such as bright stars). The signal-to-noise ratio of each detection was also used as a fea-ture. The merits of this feature space include rel-atively straightforward implementation and com-putational efficiency. A production version of thispixel-based classifier was implemented in the DES-SN transient detection pipeline at the beginning ofY1. In production, it became apparent that the1,877-dimensional2 feature space was dominatedby uninformative features, and that better falsepositive control could be achieved with a morecompact feature set.

We pursued an alternative feature space go-ing forward, instead using 38 high-level metrics tocharacterize detections of variability. A subset ofthe features are based on analogs from Bloom et al.(2012) and Brink et al. (2013). In this section, wedescribe the features that are new. We presentan at-a-glance view of the entire autoScan fea-ture library in Table 2. Histograms and contoursfor the three most important features in the finalautoScan model (see §3.4) appear in Figure 4.

2625 pixels on a 25× 25 pixel cutout × 3 cutouts per detec-tion + 2 non-pixel features (snr, ccdid) = 1,877.

10

Page 11: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Table 2

autoScan’s feature library.

Feature Name Importance Source Description

r aper psf 0.148 New The average flux in a 5-pixel circular aperture centered onthe object on the It cutout plus the flux from a 35×35-pixelPSF model-fit to the object on the Id cutout, all dividedby the PSF model-fit flux.

magdiff 0.094 B12 If a source is found within 5” of the location of the objectin the galaxy coadd catalog, the difference between mag andthe magnitude of the nearby source. Else, the difference be-tween mag and the limiting magnitude of the parent imagefrom which the Id cutout was generated.

spread model 0.066 New SPREAD MODEL output parameter from SExtractor on Id.n2sig5 0.055 B12 Number of matrix elements in a 7×7 element block centered

on the detection on Rd with values less than -2.n3sig5 0.053 B12 Number of matrix elements in a 7×7 element block centered

on the detection on Rd with values less than -3.n2sig3 0.047 B12 Number of matrix elements in a 5×5 element block centered

on the detection on Rd with values less than -2.flux ratio 0.037 B12 Ratio of the flux in a 5-pixel circular aperture centered on

the location of the detection on Id to the absolute value ofthe flux in a 5-pixel circular at the same location on It.

n3sig3 0.034 B12 Number of matrix elements in a 5×5 element block centeredon the detection on Rd with values less than -3.

mag ref err 0.030 B12 Uncertainty on mag ref, if it exists. Else imputed.snr 0.029 B12 The flux from a 35 × 35-pixel PSF model-fit to the object

on Id divided by the uncertainty from the fit.colmeds 0.028 New The maximum of the median pixel values of each column

on Bd.nn dist renorm 0.027 B12 The distance from the detection to the nearest source in the

galaxy coadd catalog, if one exists within 5”. Else imputed.ellipticity 0.027 B12 The ellipticity of the detection on Id using a image and

b image from SExtractor.amp 0.027 B13 Amplitude of fit that produced gauss.scale 0.024 B13 Scale parameter of fit that produced gauss.b image 0.024 B12 Semi-minor axis of object from SExtractor on Id.mag ref 0.022 B12 The magnitude of the nearest source in the galaxy coadd

catalog, if one exists within 5” of the detection on Id. Elseimputed.

diffsum 0.021 New The sum of the matrix elements in a 5 × 5 element boxcentered on the detection location on Rd.

mag 0.020 B12 The magnitude of the object from SExtractor on Id.a ref 0.019 B12 Semi-major axis of the nearest source in the galaxy coadd

catalog, if one exists within 5”. Else imputed.

11

Page 12: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

3.2.3. New Features

In this section we present new features devel-oped for autoScan. Let the superscripts s, t, andd on matrices defined in the previous section de-note search, template, and difference images, re-spectively. The feature r aper psf is designedto identify badly subtracted stars and galaxieson difference images caused by poor astrometricalignment between search and template images.These objects typically appear as overlapping cir-cular regions of positive and negative flux colloqui-ally known as “dipoles.” Examples are presentedin Figure 3. In these cases the typical search-template astrometric misalignment scale is com-parable to the FWHM of the PSF, causing thecontributions of the negative and positive regionsto the total object-flux from a PSF-model fit tobe approximately equal in magnitude but oppo-site in sign, usually with a slight positive excess asthe PSF-fit is centered on the detection location,where the flux is always positive. The total fluxfrom a PSF-model fit to a dipole is usually greaterthan but comparable to the average flux per pixelin a five-pixel circular aperture centered on thedetection location on the template image. To thisend, let Faper,I be the flux from a five-pixel circu-lar aperture centered on the location of a detectionon the uncompressed template image. Let FPSF,I

be the flux computed by fitting a PSF-model toa 35× 35 pixel cutout centered on the location ofthe detection on the uncompressed difference im-age. Then r aper psf is given by

r aper psf =Faper,I + FPSF,I

FPSF,I. (6)

We find that objects with r aper psf > 1.25 arealmost entirely “dipoles.”

Let a ∈ {2, 3}, b ∈ {3, 5}. The four featuresnasigbshift represent the difference between thenumber of pixels with flux values greater than orequal to a in (b+2)×(b+2) element blocks centeredon the detection position in Rd and Rt. These fea-tures coarsely describe changes in the morphologyof the source between the template and search im-ages.

The feature diffsum is the sum of the matrixelements in a 5× 5 element (2.8× 2.8 arcsec2) boxcentered on the detection location in Rd. It is

given by

diffsum =

2∑i=−2

2∑j=−2

Rdxc+i,yc+j , (7)

where xc, yc is the location of the central elementon Rd. It gives a coarse measurement of the sig-nificance of the detection.

bandnum is a numeric representation of the filterin which the object was detected on the searchimage. This feature enables autoScan to identifyband-specific patterns.

numneg is intended to assess object-smoothnessby returning the number of negative elements ina 7 × 7 pixel box centered on the object in Rd,exposing objects riddled with negative pixels orobjects that have a significant number of pixelsbelow med(Rd). Used in concert with the S/N,numneg can help identify high-S/N objects withspatial pixel intensity distributions that do notvary smoothly, useful in rejecting hot pixels andcosmic rays.

lacosmic was designed to identify cosmic raysand other objects with spatial pixel intensity dis-tributions that do not vary smoothly, and is basedloosely on the methodology that van Dokkum(2001) uses to identify cosmic rays on arbitrary skysurvey images. Derive the “fine structure” imageF from Bd according to

F = (M3 ∗Bd)− ([M3 ∗Bd] ∗M7), (8)

where Mn is an n× n median filter. Then

lacosmic = max(Bd)/max(F). (9)

Relatively speaking, this statistic should be largefor objects that do not vary smoothly, and small

Fig. 3.— Difference image cutouts (left fourcolumns; r aper psf values indicated) and cor-responding template image cutouts (right fourcolumns) for objects with r aper psf > 1.25.

12

Page 13: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Table 2—Continued

Feature Name Importance Source Description

n3sig3shift 0.019 New The number of matrix elements with values greater than orequal to 3 in the central 5 × 5 element block of Rd minusthe number of matrix elements with values greater than orequal to 3 in the central 5× 5 element block of Rt.

n3sig5shift 0.018 New The number of matrix elements with values greater than orequal to 3 in the central 7 × 7 element block of Rd minusthe number of matrix elements with values greater than orequal to 3 in the central 7× 7 element block of Rt

n2sig3shift 0.014 New The number of matrix elements with values greater than orequal to 2 in the central 5 × 5 element block of Rd minusthe number of matrix elements with values greater than orequal to 2 in the central 5× 5 element block of Rt.

b ref 0.012 B12 Semi-minor axis of the nearest source in the galaxy coaddcatalog, if one exists within 5”. Else imputed.

gauss 0.012 B13 χ2 from fitting a spherical, 2D Gaussian to a 15× 15 pixelcutout around the detection on Bd.

n2sig5shift 0.012 New The number of matrix elements with values greater than orequal to 2 in the central 7 × 7 element block of Rd minusthe number of matrix elements with values greater than orequal to 2 in the central 7× 7 element block of Rt.

mag from limit 0.010 B12 Limiting magnitude of the parent image from which the Id

cutout was generated minus mag.a image 0.009 B12 Semi-major axis of object on Id from SExtractor.min dist to edge 0.009 B12 Distance in pixels to the nearest edge of the detector array

on the parent image from which the Id cutout was gener-ated.

ccdid 0.008 B13 The numerical ID of the CCD on which the detection wasregistered.

flags 0.008 B12 Numerical representation of SExtractor extraction flagson Id.

numneg 0.007 New The number of negative matrix elements in a 7× 7 elementbox centered on the detection in Rd.

l1 0.006 B13 sign(∑

Bd)×∑|Bd|/|

∑Bd|

lacosmic 0.006 New max(Bd)/max(F ), where F is the LACosmic (van Dokkum2001) “fine structure” image computed on Bd.

spreaderr model 0.006 New Uncertainty on spread model.maglim 0.005 B12 True if there is no nearby galaxy coadd source, false other-

wise.bandnum 0.004 New Numerical representation of image filter.maskfrac 0.003 New The fraction of Id that is masked.

Note.—Source column indicates the reference in which the feature was first published. B13 indicates thefeature first appeared in Brink et al. (2013); B12 indicates the feature first appeared in Bloom et al. (2012),and New indicates the feature is new in this work. See §3.3 for an explanation of how feature importancesare computed. Imputation refers to the procedure described in §3.2.4.

13

Page 14: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

for objects that approximate a PSF. The readeris referred to Figure 3 of van Dokkum (2001) forvisual examples.

Bad columns and CCD edge effects that ap-pear as fuzzy vertical streaks near highly maskedregions of difference images are common types ofartifacts. Because they share a number of visualsimilarities, we designed a single feature, colmeds,to identify them:

colmeds = max({med(transpose(Bd)i);

i ∈ {0 . . . Ncol − 1}}),(10)

where Ncol is the number of columns in Bd. Thisfeature operates on the principle that the medianof a column in Bd should be comparable to thebackground if the cutout is centered on a PSF,because, in general, even the column in which thePSF is at its greatest spatial extent in Bd shouldstill contain more background pixels than sourcepixels. However, for vertically oriented artifactsthat occupy entire columns on Bd, this does notnecessarily hold. Since these artifacts frequentlyappear near masked regions of images, we definemaskfrac as the percentage of Id that is masked.

The feature spread model (Desai et al. 2012;Bouy et al. 2013) is a SExtractor star/galaxyseparation output parameter computed on theId cutout. It is a normalized simplified lineardiscriminant between the best fitting local PSFmodel and a slightly more extended model madefrom the same PSF convolved with a circular ex-ponential disk model.

3.2.4. Data Postprocessing

When there is not a source in the galaxy coaddcatalog within 5 arcsec of an object detected ona difference image, certain classification featurescannot be computed for the object (see Table 2).If the feature of an object cannot be computed,it is assigned the mean value of that feature fromthe training set.

3.3. Classification Algorithm Selection

After we settled on an initial library of classi-fication features, we compared three well-knownML classification algorithms: a Random Forest(Breiman 2001), a Support Vector Machine (SVM;

Vapnik 1995), and an AdaBoost decision tree clas-sifier (Zhu et al. 2009). We used scikit-learn

(Pedregosa et al. 2012), an open source Pythonpackage for machine learning, to instantiate ex-amples of each model with standard settings. Weperformed a three-fold cross-validated comparisonusing a randomly selected 100,000-detection sub-set of the training set described in §3.1. The subsetwas used to avoid long training times for the SVM.For a description of cross validation and the met-rics used to evaluate each model, see §4 and §4.2.The results appear in Figure 5. We found thatthe performance of all three models was compara-ble, but that the Random Forest outperformed theother models by a small margin. We incorporatedthe Random Forest model into autoScan.

Random Forests are collections of decisiontrees, or cascading sequences of feature-space unittests, that are constructed from labeled train-ing data. For an introduction to decision trees,see Breiman et al. (1984). Random Forests canbe used for predictive classification or regression.During the construction of a supervised RandomForest classifier, trees in the forest are trained in-dividually. To construct a single tree, the trainingalgorithm first chooses a bootstrapped sample ofthe training data. The algorithm then attemptsto recursively define a series of binary splits onthe features of the training data that optimallyseparate the training data into their constituentclasses. During the construction of each node, arandom subsample of features with a user-specifiedsize is selected with replacement. A fine grid ofsplits on each feature is then defined, and the splitthat maximizes the increase in the purity of theincident training data is chosen for the node.

Two popular metrics for sample-purity are theGini coefficient (Gini 1921) and the Shannon en-tropy (Shannon 1948). Define the purity of a sam-ple of difference image objects to be3

P =NNA

NA +NNA, (11)

where NNA is the number of non-artifact objectsin the sample, and NA is the number of artifacts

3Some authors define P =∑

NA wi∑NA wi+

∑A wi

, where wi is the

weight of instance i,∑

A is a sum over artifact events, and∑NA is a sum over non-artifact events. This renders the

definition of the Gini coefficient in Equation 12 as Gini =P (1 − P )

∑i wi.

14

Page 15: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

8

4

0

4

8

MA

GD

IFF

0.05

00.

025

0.00

00.

025

SPR

EA

D_M

OD

EL

1.00

1.25

1.50

1.75

R_APER_PSF

6

12

18

24

30

SN

R

8 4 0 4 8

MAGDIFF

0.05

0

0.02

5

0.00

0

0.02

5

SPREAD_MODEL

6 12 18 24 30

SNR

Fake SNArtifact

Fig. 4.— Contours of r aper psf, magdiff, and spread model—the three most important features in theautoScan Random Forest model, computed using the feature importance evaluation scheme described in§3.4—and the signal-to-noise ratio, snr. The importances of r aper psf, magdiff, and spread model were0.148, 0.094, and 0.066, respectively. The contours show that the relationships between the features arehighly nonlinear and better suited to machine learning techniques than hard selection cuts.

in the sample. Note that P = 1 for a samplecomposed entirely of artifacts, P = 0 for a samplecomposed entirely of non-artifacts, and P (1−P ) =0 for a sample composed entirely of either artifactsor non-artifacts. Then the Gini coefficient is

Gini = P (1− P )(NA +NNA). (12)

A tree with a Gini objective function seeks at eachnode to minimize the quantity

Ginilc + Ginirc, (13)

where Ginilc is the Gini coefficient of the data in-cident on the node’s left child, and Ginirc is theGini coefficient of the data incident on the node’sright child. If Ginilc + Ginirc > Gini, then no splitis performed and the node is declared a terminalnode. The process proceeds identically if anothermetric is used, such as the Shannon entropy, themost common alternative. The Shannon entropyS of a sample of difference image objects is givenby

S = −pNAlog2(pNA)− pAlog2(pA), (14)

where pNA is the proportion of non-artifact ob-jects in the sample, and pA is the proportion ofartifacts in the sample.

Nodes are generated in this fashion until a max-imum depth or a user-specified measure of nodepurity is achieved. The number of trees to grow inthe forest is left as a free parameter to be set bythe user. Training a single Random Forest usingthe entire ∼900, 000 object training sample withthe hyperparameters selected from the grid searchdescribed in Table 3 took ∼4.5 minutes when theconstruction of the trees was distributed across 601.6GHz AMD Opteron 6262 HE processors.

Random Forests treat the classes of unseen ob-jects as unknown parameters that are describedprobabilistically. An object to be classified de-scends each tree in the forest, beginning at theroot nodes. Once a data point arrives at a termi-nal node, the tree returns the fraction of the train-ing instances that reached that node that were la-beled “non-artifact.” The output of the trainedautoScan Random Forest model on a single inputdata instance is the average of the outputs of eachtree, representing the probability that the object

15

Page 16: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

0.04 0.06 0.08 0.10 0.12 0.14 0.16

Missed Detection Rate

0.000

0.005

0.010

0.015

0.020

0.025

0.030

0.035

0.040

Fals

e P

osi

tive R

ate

Random Forest

AdaBoost

SVM-RBF

Fig. 5.— Initial comparison of the performanceof a Random Forest, a Support Vector Machinewith a radial basis function kernel, and an Ad-aBoost Decision Tree classifier on the DES-SNartifact/non-artifact classification task. Each clas-sifier was trained on a randomly selected 67% ofthe detections from a 100,000-detection subset ofthe training set, then tested on the remaining 33%.This process was repeated three times until everydetection in the subset was used in the testing setonce. The curves above represent the mean of eachiteration. The closer a curve is to the origin, thebetter the classifier. The unoptimized RandomForest outperformed the other two methods, andwas selected.

is not an artifact, henceforth the “autoScan score”or “ML score.” Ultimately, a score of 0.5 wasadopted as the cut τ to separate real detectionsof astrophysical variability from artifacts in theDES-SN data; see §4.4 for details. Class predic-tion for 200,000 unseen data instances took 9.5s ona single 1.6GHz AMD Opteron 6262 HE processor.

3.4. Feature Importances

Numeric importances can be assigned to thefeatures in a trained forest based on the amountof information they provided during training(Breiman et al. 1984). For each tree T in theforest, a tree-specific importance for feature i iscomputed according to

ζi,T =∑n∈T

N(n)Bi(n) [m(n)−mch(n)] , (15)

where n is an index over nodes in T , N(n) is thenumber of training data points incident on noden, Bi(n) is 1 if node n splits on feature i and0 otherwise, m(n) is the value of the objectivefunction (usually the Gini coefficient or the Shan-non entropy, see §3.3) applied to the the trainingdata incident on node n, and mch(n) is the sum ofthe values of the objective function applied to thenode’s left and right children. The global impor-tance of feature i is the average of the tree-specificimportances:

Ii =1

NT

∑T

ζi,T , (16)

where NT is the number of trees in the forest. Inthis article, importances are normalized to sum tounity.

3.5. Optimization

The construction of a Random Forest is gov-erned by a number of free parameters called hy-perparameters. The hyperparameters of the Ran-dom Forest implementation used in this work aren estimators, the number of decision trees inthe forest, criterion, the function that measuresthe quality of a proposed split at a given treenode, max features, the number of features torandomly select when looking for the best split ata given tree node, max depth, the maximum depthof a tree, and min samples split, the minimum

16

Page 17: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Table 3

Grid search results for autoScan hyperparameters.

Hyperparameter Values

n estimators 10, 50, 100, 300criterion gini, entropymax features 5, 6min samples split 2, 3, 4, 10, 20, 50max depth Unlimited, 100, 30, 15, 5

Note.—A 3-fold cross-validated search over thegrid of Random Forest hyperparameters tabulatedabove was performed to characterize the the perfor-mance of the machine classifier. The hyperparame-ters of the best-performing classifier appear in bold.

number of samples required to split an internalnode.

We performed a 3-fold cross-validated (see §4.2)grid search over the space of Random Forest hy-perparameters described in Table 3. A total of1,884 trainings were performed. The best classifierhad 100 trees, used the Shannon entropy objectivefunction, chose 6 features for each split, requiredat least 3 samples to split a node, and had unlim-ited depth, and it was incorporated into the code.Recursive feature elimination (Brink et al. 2013)was explored to improve the performance of theclassifier, but we found that it provided no statis-tically significant performance improvement.

4. Performance

In this section, we describe performance ofautoScan on a realistic classification task and theeffect of the code on the DES-SN transient candi-date scanning load. Performance statistics for theclassification task were measured using productionY1 data, whereas candidate-level effects were mea-sured using a complete reprocessing of Y1 data us-ing an updated difference imaging pipeline. Thereprocessed detection pool differed significantlyfrom its production counterpart, providing a out-of-sample data set for benchmarking the effects of

the code on the scanning load.4

4.1. Performance Metrics

The performance of a classifier on an n-classtask is completely summarized by the correspond-ing n × n confusion matrix E, also known as acontingency table or error matrix. The matrixelement Eij represents the number of instancesfrom the task’s validation set with ground truthclass label j that were predicted to be members ofclass i. A schematic 2×2 confusion matrix for theautoScan classification task is shown in Figure 6.

From the confusion matrix, several classifierperformance metrics can be computed. Two thatfrequently appear in the literature are the FalsePositive Rate (FPR) and the Missed DetectionRate (MDR; also known as the False NegativeRate or False Omission Rate). Using the notationfrom Figure 6, the FPR is defined by:

FPR =Fp

Fp + Tn, (17)

4 Although the re-processing of data through the differenceimaging pipeline from the raw images is not useful for get-ting spectra of live transients, it is quite useful for acquiringhost-galaxy targets for previously missed transients and istherefore performed regularly as pipeline improvements aremade.

17

Page 18: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

and the missed detection rate by

MDR =Fn

Tp + Fn. (18)

For autoScan, the FPR represents the fraction ofartifacts in the validation set that are predictedto be legitimate detections of astrophysical vari-ability. The MDR represents the fraction of non-artifacts in the task’s validation set that are pre-dicted to be artifacts. Another useful metric is theefficiency or True Positive Rate (TPR),

ε =Tp

Tp + Fn, (19)

which represents the fraction of non-artifacts inthe sample that are classified correctly. For theremainder of this study, we often refer to thecandidate-level efficiency measured on fake SNe Ia,εF (see §4.4).

Finally, the receiver operating characteristic(ROC) is a graphical tool for visualizing the per-formance of a classifier. It displays FPR as a func-tion of MDR, both of which are parametric func-tions of τ , the autoScan score that one choosesto delineate the boundary between “non-artifacts”and “artifacts.” One can use the ROC to deter-mine the location at which the trade-off betweenthe FPR and MDR is optimal for the survey athand, a function of both the scanning load andthe potential bias introduced by the classifier, thensolve for the corresponding τ . By benchmark-ing the performance of the classifier using the theROC, one can paint a complete picture of its per-formance that can also serve as a statistical guar-antee on performance in production, assuming avalidation set and a production data set that areidentically distributed in feature space, and thatdetections are scanned individually in production(see §4.4).

4.2. Classification Task

We used stratified 5-fold cross-validation to testthe performance of autoScan. Cross validation isa technique for assessing how the results of a sta-tistical analysis will generalize to an independentdata set. In a k-fold cross-validated analysis, adata set is partitioned into k disjoint subsets. kiterations of training and testing are performed.During the ith iteration, subset i is held out as

a “validation” set of labeled data instances thatare not included in the training sample, and theunion of the remaining k − 1 subsets is passed tothe classifier as a training set. The classifier istrained and its predictive performance on the val-idation set is recorded. In standard k-fold cross-validation, the partitioning of the original data setinto disjoint subsets is done by drawing samplesat random without replacement from the originaldata set. But in a stratified analysis, the draw-ing is performed subject to the constraint that thedistribution of classes in each subset be the sameas the distribution of classes in the original dataset. Cross-validation is useful because it enablesone to characterize how a classifier’s performancevaries with respect to changes in the compositionof training and testing data sets, helping quantifyand control “generalization error.”

4.3. Results

Figure 7 shows the ROCs that resulted fromeach round of cross-validation. We report thatautoScan achieved an average detection-levelMDR of 4.0 ± 0.1 percent at a fixed FPR of2.5 percent with τ = 0.5, which was ultimatelyadopted in the survey; see §4.4. We found thatautoScan scores were correlated with detectionsignal-to-noise ratio (S/N). Figure 8 displays thefake efficiency and false positive of autoScan usingall out-of-sample detections of fake SNe from eachround of cross-validation. At S/N . 10, the out-of-sample fake efficiency is markedly lower than itis at higher S/N. The efficiency asymptotically ap-proaches unity for S/N & 100. The effect becomesmore pronounced when the class discriminationboundary is raised. This occurs because legiti-mate detections of astrophysical variability at lowS/N are similar to artifacts. The false positiverate remains relatively constant in the S/N . 10regime, where the vast majority of artifacts reside.

4.4. Effect of autoScan on Transient Can-didate Scanning Load

As discussed in §2, DES-SN performs target se-lection and scanning using aggregates of spatiallycoincident detections from multiple nights and fil-ters (“candidates”). After the implementation ofautoScan, the NUMEPOCHS requirement describedin Table 1 was revised to require that a candidatebe detected on at least two distinct nights having

18

Page 19: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Fig. 8.— Object-level fake efficiency and false positive rate as a function of S/N, at several autoScan scorecuts. The S/N is computed by dividing the flux from a PSF-model fit to a 35× 35 pixel cutout around theobject in the difference image by the uncertainty from the fit. The artifact rejection efficiency and misseddetection rate are 1 minus the false positive rate and fake efficiency, respectively. The fake efficiency ofautoScan degrades at low S/N, whereas the false positive rate is relatively constant in the S/N regime notdominated by small number statistics. τ = 0.5 (bold) was adopted in DES-SN.

10 D. Goldstein et al.

Predic

ted

Cla

ss

True Class

Non-Artifact Artifact

Non-A

rti

fact

True

Positives

(Tp)

False

Positives

(Fp)

Arti

fact

FalseNegatives

(Fn)

True

Negatives(Tn)

Figure 5. Schematic confusion matrix for the autoScan classifi-

cation task.

quoted in the literature are the False Positive Rate (FPR)and the Missed Detection Rate (also known as the FalseNegative Rate or False Omission Rate). Using the notationfrom table ??, the false positive rate is defined by:

FPR =Fp

Fp + Tn,

and the missed detection rate by

MDR =Fn

Tp + Fn

In the autoScan classification task, the false positiverate represents the fraction of detections produced bydiffim that are artifacts that are classified as legitimatedetections of astrophysical variability.

In our classification task, the missed detection rate isthe number of objects that were ground-truth Real and in-correctly classified as Bogus, divided by the total number ofobjects that were ground-truth Real. Another way of think-ing of the false positive rate is the percentage of objects thatshould have been saved that ended up getting rejected.

Other metrics for assessing the performance of a classi-fier are the accuracy,

a =Tp + Tn

Tp + Tn + Fp + Fn(2)

The efficiency:

ϵ =Tp

Tp + Fn(3)

And the purity:

π =Tp

Tp + Fp(4)

.The receiver operating characteristic is a widely used

graphical tool for visualizing the performance of a machine-learned classifier. It displays its Type I error as a function ofits Type II error, in this case its false positive rate as a func-tion of its missed detection rate, parametrized by the proba-bility threshold τ at which the distinction between real andbogus is drawn. By determining the location on the ROC

at which the tradeoff between the Type I and Type II errorfor the classification task at hand is optimal, the scientistresponsible for the administration of the machine learningalgorithm can simply read of the value of the parameter τand draw the cut there.

4.2.1 Avoiding Overfitting with Cross-Validation

One round of cross-validation involves partitioning a sampleof data into complementary subsets, performing the anal-ysis on one subset (called the training set), and validatingthe analysis on the other subset (called the validation set ortesting set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and thevalidation results are averaged over the rounds. We selected5 partitions randomly, without replacement, from our train-ing data, then in each round we selected one of the five par-titions to be the validation set and allowed the remaining 4to be the training set.

4.3 Performance Results and Comparison withOther Studies

Figure XX shows the ROCs for each round of cross valida-tion. The average false positive rate at the tip of the ROCwas XX%, with a standard deviation of YY% and the av-erage missed detection rate at the tip of the ROC was ZZ%,with a standard deviation of qq%.

The Figure of Merit that Brink et al. (2013) use toevaluate the performance of their classifier is the missed de-tection rate at a false positive rate of 1%, a figure that waschosen to minimize the amount of manual vetting to be per-formed in PTF downstream of the classification. Their bestfigure of merit was 7.7%. Our 5-fold cross validated averagefigure of merit is 8.3%.

As described in §3.5, autoScanassigns to each instanceto be classified a score that characterizes the code’s confi-dence that the detection is not an artifact. Given the outputof autoScan, it

By selecting a cut on autoScanscores (see §3.5,autoScanoffers a trade-off between two types of error:missed detections, and false positives. In §4.2, we describedetection-theoretic constructs for quantifying classifier per-formance, including the Receiver Operating Characteristic(ROC), false positive rate (FPR), and the missed detectionrate (MDR). Meaningful measures of classification perfor-mance can only be obtained by benchmarking the perfor-mance of the classifier on data samples that mirror the distri-bution of objects in feature space that one would encounterin production. In §4.1, we descirbe

This paper has been typeset from a TEX/ LATEX file preparedby the author.

REFERENCES

Albrecht, A., et al., 2006, astro, arXiv:astro-ph/0609591Amanullah R., et al., 2010, ApJ, 716, 712Aragon C. R., Bailey S. J., Poon S., Runge K., ThomasR. C., 2008, JPhCS, 125, 012091

c⃝ 0000 RAS, MNRAS 000, 000–000

10 D. Goldstein et al.

Predic

ted

Cla

ss

True Class

Non-Artifact Artifact

Non-A

rti

fact

True

Positives

(Tp)

False

Positives

(Fp)

Arti

fact

False

Negatives

(Fn)

True

Negatives

(Tn)

Figure 5. Schematic confusion matrix for the autoScan classifi-

cation task.

quoted in the literature are the False Positive Rate (FPR)and the Missed Detection Rate (also known as the FalseNegative Rate or False Omission Rate). Using the notationfrom table ??, the false positive rate is defined by:

FPR =Fp

Fp + Tn,

and the missed detection rate by

MDR =Fn

Tp + Fn

In the autoScan classification task, the false positiverate represents the fraction of detections produced bydiffim that are artifacts that are classified as legitimatedetections of astrophysical variability.

In our classification task, the missed detection rate isthe number of objects that were ground-truth Real and in-correctly classified as Bogus, divided by the total number ofobjects that were ground-truth Real. Another way of think-ing of the false positive rate is the percentage of objects thatshould have been saved that ended up getting rejected.

Other metrics for assessing the performance of a classi-fier are the accuracy,

a =Tp + Tn

Tp + Tn + Fp + Fn(2)

The efficiency:

ϵ =Tp

Tp + Fn(3)

And the purity:

π =Tp

Tp + Fp(4)

.The receiver operating characteristic is a widely used

graphical tool for visualizing the performance of a machine-learned classifier. It displays its Type I error as a function ofits Type II error, in this case its false positive rate as a func-tion of its missed detection rate, parametrized by the proba-bility threshold τ at which the distinction between real andbogus is drawn. By determining the location on the ROC

at which the tradeoff between the Type I and Type II errorfor the classification task at hand is optimal, the scientistresponsible for the administration of the machine learningalgorithm can simply read of the value of the parameter τand draw the cut there.

4.2.1 Avoiding Overfitting with Cross-Validation

One round of cross-validation involves partitioning a sampleof data into complementary subsets, performing the anal-ysis on one subset (called the training set), and validatingthe analysis on the other subset (called the validation set ortesting set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and thevalidation results are averaged over the rounds. We selected5 partitions randomly, without replacement, from our train-ing data, then in each round we selected one of the five par-titions to be the validation set and allowed the remaining 4to be the training set.

4.3 Performance Results and Comparison withOther Studies

Figure XX shows the ROCs for each round of cross valida-tion. The average false positive rate at the tip of the ROCwas XX%, with a standard deviation of YY% and the av-erage missed detection rate at the tip of the ROC was ZZ%,with a standard deviation of qq%.

The Figure of Merit that Brink et al. (2013) use toevaluate the performance of their classifier is the missed de-tection rate at a false positive rate of 1%, a figure that waschosen to minimize the amount of manual vetting to be per-formed in PTF downstream of the classification. Their bestfigure of merit was 7.7%. Our 5-fold cross validated averagefigure of merit is 8.3%.

As described in §3.5, autoScanassigns to each instanceto be classified a score that characterizes the code’s confi-dence that the detection is not an artifact. Given the outputof autoScan, it

By selecting a cut on autoScanscores (see §3.5,autoScanoffers a trade-off between two types of error:missed detections, and false positives. In §4.2, we describedetection-theoretic constructs for quantifying classifier per-formance, including the Receiver Operating Characteristic(ROC), false positive rate (FPR), and the missed detectionrate (MDR). Meaningful measures of classification perfor-mance can only be obtained by benchmarking the perfor-mance of the classifier on data samples that mirror the distri-bution of objects in feature space that one would encounterin production. In §4.1, we descirbe

This paper has been typeset from a TEX/ LATEX file preparedby the author.

REFERENCES

Albrecht, A., et al., 2006, astro, arXiv:astro-ph/0609591Amanullah R., et al., 2010, ApJ, 716, 712Aragon C. R., Bailey S. J., Poon S., Runge K., ThomasR. C., 2008, JPhCS, 125, 012091

c⃝ 0000 RAS, MNRAS 000, 000–000

10D

.G

olds

tein

etal

.

Predicted

Class

True

Cla

ss

Non-A

rti

fact

Arti

fact

Non-Artifact

Tru

ePosi

tives

(Tp)

Fals

ePosi

tives

(Fp)

Artifact

Fals

e

Neg

atives

(Fn)

Tru

e

Neg

atives

(Tn)

Fig

ure

5.

Sch

emati

cco

nfu

sion

matr

ixfo

rth

eautoScan

class

ifi-

cation

task

.

quote

din

the

lite

ratu

reare

the

Fals

ePosi

tive

Rate

(FP

R)

and

the

Mis

sed

Det

ecti

on

Rate

(als

oknow

nas

the

Fals

eN

egati

ve

Rate

or

Fals

eO

mis

sion

Rate

).U

sing

the

nota

tion

from

table

??,th

efa

lse

posi

tive

rate

isdefi

ned

by:

FP

R=

Fp

Fp

+T

n,

and

the

mis

sed

det

ecti

on

rate

by

MD

R=

Fn

Tp

+F

n

Inth

eautoScan

class

ifica

tion

task

,th

efa

lse

posi

tive

rate

repre

sents

the

fract

ion

of

det

ecti

ons

pro

duce

dby

diffim

that

are

art

ifact

sth

at

are

class

ified

as

legit

imate

det

ecti

ons

ofast

rophysi

calva

riability.

Inour

class

ifica

tion

task

,th

em

isse

ddet

ecti

on

rate

isth

enum

ber

ofobje

cts

that

wer

egro

und-t

ruth

Rea

land

in-

corr

ectl

ycl

ass

ified

as

Bogus,

div

ided

by

the

tota

lnum

ber

of

obje

cts

that

wer

egro

und-t

ruth

Rea

l.A

noth

erw

ayofth

ink-

ing

ofth

efa

lse

posi

tive

rate

isth

eper

centa

ge

ofobje

ctsth

at

should

hav

ebee

nsa

ved

that

ended

up

get

ting

reje

cted

.O

ther

met

rics

for

ass

essi

ng

the

per

form

ance

ofa

class

i-fier

are

the

acc

ura

cy,

a=

Tp

+T

n

Tp

+T

n+

Fp

+F

n(2

)

The

effici

ency

:

ϵ=

Tp

Tp

+F

n(3

)

And

the

puri

ty:

π=

Tp

Tp

+F

p(4

)

.T

he

rece

iver

oper

ati

ng

chara

cter

isti

cis

aw

idel

yuse

dgra

phic

alto

olfo

rvis

ualizi

ng

the

per

form

ance

ofa

mach

ine-

learn

edcl

ass

ifier

.It

dis

pla

ys

its

Type

Ier

ror

as

afu

nct

ion

of

its

Type

IIer

ror,

inth

isca

seit

sfa

lse

posi

tive

rate

as

afu

nc-

tion

ofit

sm

isse

ddet

ecti

on

rate

,para

met

rize

dby

the

pro

ba-

bility

thre

shold

τat

whic

hth

edis

tinct

ion

bet

wee

nre

aland

bogus

isdra

wn.

By

det

erm

inin

gth

elo

cati

on

on

the

RO

C

at

whic

hth

etr

adeo

ffbet

wee

nth

eT

ype

Iand

Type

IIer

ror

for

the

class

ifica

tion

task

at

hand

isopti

mal,

the

scie

nti

stre

sponsi

ble

for

the

adm

inis

trati

on

of

the

mach

ine

learn

ing

alg

ori

thm

can

sim

ply

read

of

the

valu

eof

the

para

met

erτ

and

dra

wth

ecu

tth

ere.

4.2

.1A

void

ing

Ove

rfittin

gwith

Cro

ss-V

alidation

One

round

ofcr

oss

-validati

on

involv

espart

itio

nin

ga

sam

ple

of

data

into

com

ple

men

tary

subse

ts,

per

form

ing

the

anal-

ysi

son

one

subse

t(c

alled

the

train

ing

set)

,and

validati

ng

the

analy

sis

on

the

oth

ersu

bse

t(c

alled

the

validati

on

set

or

test

ing

set)

.To

reduce

vari

ability,

mult

iple

rounds

ofcr

oss

-va

lidati

on

are

per

form

edusi

ng

diff

eren

tpart

itio

ns,

and

the

validati

on

resu

lts

are

aver

aged

over

the

rounds.

We

sele

cted

5part

itio

ns

random

ly,w

ithout

repla

cem

ent,

from

our

train

-in

gdata

,th

enin

each

round

we

sele

cted

one

ofth

efive

par-

titi

ons

tobe

the

validati

on

set

and

allow

edth

ere

main

ing

4to

be

the

train

ing

set.

4.3

Perf

orm

ance

Resu

lts

and

Com

pari

son

wit

hO

ther

Stu

die

s

Fig

ure

XX

show

sth

eR

OC

sfo

rea

chro

und

ofcr

oss

valida-

tion.T

he

aver

age

fals

eposi

tive

rate

at

the

tip

of

the

RO

Cw

asX

X%

,w

ith

ast

andard

dev

iati

on

ofY

Y%

and

the

av-

erage

mis

sed

det

ecti

on

rate

atth

eti

pofth

eR

OC

wasZZ%

,w

ith

ast

andard

dev

iati

on

ofqq%

.T

he

Fig

ure

of

Mer

itth

at

Bri

nk

etal.

(2013)

use

toev

alu

ate

the

per

form

ance

ofth

eir

class

ifier

isth

em

isse

dde-

tect

ion

rate

at

afa

lse

posi

tive

rate

of1%

,a

figure

that

was

chose

nto

min

imiz

eth

eam

ount

ofm

anualvet

ting

tobe

per

-fo

rmed

inP

TF

dow

nst

ream

ofth

ecl

ass

ifica

tion.T

hei

rbes

tfigure

ofm

erit

was

7.7

%.O

ur

5-fold

cross

validate

dav

erage

figure

ofm

erit

is8.3

%.

As

des

crib

edin

§3.5

,autoScanass

igns

toea

chin

stance

tobe

class

ified

asc

ore

that

chara

cter

izes

the

code’

sco

nfi-

den

ceth

atth

edet

ecti

on

isnotan

art

ifact

.G

iven

the

outp

ut

ofautoScan,it

By

sele

ctin

ga

cut

on

autoScansc

ore

s(s

ee§3

.5,

autoScanoffer

sa

trade-

off

bet

wee

ntw

oty

pes

of

erro

r:m

isse

ddet

ecti

ons,

and

fals

eposi

tives

.In

§4.2

,w

edes

crib

edet

ecti

on-t

heo

reti

cco

nst

ruct

sfo

rquanti

fyin

gcl

ass

ifier

per

-fo

rmance

,in

cludin

gth

eR

ecei

ver

Oper

ati

ng

Chara

cter

isti

c(R

OC

),fa

lse

posi

tive

rate

(FP

R),

and

the

mis

sed

det

ecti

on

rate

(MD

R).

Mea

nin

gfu

lm

easu

res

of

class

ifica

tion

per

for-

mance

can

only

be

obta

ined

by

ben

chm

ark

ing

the

per

for-

mance

ofth

ecl

ass

ifier

on

data

sam

ple

sth

atm

irro

rth

edis

tri-

buti

on

ofobje

cts

infe

atu

resp

ace

that

one

would

enco

unte

rin

pro

duct

ion.In

§4.1

,w

edes

cirb

e

This

paper

hasbee

nty

pes

etfr

om

aTEX

/LA

TEX

file

pre

pare

dby

the

auth

or.

REFER

EN

CES

Alb

rech

t,A

.,et

al.,2006,ast

ro,arX

iv:a

stro

-ph/0609591

Am

anullah

R.,

etal.,2010,A

pJ,716,712

Ara

gon

C.

R.,

Bailey

S.

J.,

Poon

S.,

Runge

K.,

Thom

as

R.C

.,2008,JP

hC

S,125,012091

c ⃝0000

RA

S,M

NR

AS

000,000–000

10D

.G

olds

tein

etal

.

Predicted

Class

True

Cla

ss

Non-A

rti

fact

Arti

fact

Non-ArtifactTru

ePosi

tives

(Tp)

Fals

ePosi

tives

(Fp)

Artifact

Fals

e

Neg

atives

(Fn)

Tru

e

Neg

atives

(Tn)

Fig

ure

5.

Sch

emati

cco

nfu

sion

matr

ixfo

rth

eautoScan

class

ifi-

cation

task

.

quote

din

the

lite

ratu

reare

the

Fals

ePosi

tive

Rate

(FP

R)

and

the

Mis

sed

Det

ecti

on

Rate

(als

oknow

nas

the

Fals

eN

egati

ve

Rate

or

Fals

eO

mis

sion

Rate

).U

sing

the

nota

tion

from

table

??,th

efa

lse

posi

tive

rate

isdefi

ned

by:

FP

R=

Fp

Fp

+T

n,

and

the

mis

sed

det

ecti

on

rate

by

MD

R=

Fn

Tp

+F

n

Inth

eautoScan

class

ifica

tion

task

,th

efa

lse

posi

tive

rate

repre

sents

the

fract

ion

of

det

ecti

ons

pro

duce

dby

diffim

that

are

art

ifact

sth

at

are

class

ified

as

legit

imate

det

ecti

ons

ofast

rophysi

calva

riability.

Inour

class

ifica

tion

task

,th

em

isse

ddet

ecti

on

rate

isth

enum

ber

ofobje

cts

that

wer

egro

und-t

ruth

Rea

land

in-

corr

ectl

ycl

ass

ified

as

Bogus,

div

ided

by

the

tota

lnum

ber

of

obje

cts

that

wer

egro

und-t

ruth

Rea

l.A

noth

erw

ayofth

ink-

ing

ofth

efa

lse

posi

tive

rate

isth

eper

centa

ge

ofobje

ctsth

at

should

hav

ebee

nsa

ved

that

ended

up

get

ting

reje

cted

.O

ther

met

rics

for

ass

essi

ng

the

per

form

ance

ofa

class

i-fier

are

the

acc

ura

cy,

a=

Tp

+T

n

Tp

+T

n+

Fp

+F

n(2

)

The

effici

ency

:

ϵ=

Tp

Tp

+F

n(3

)

And

the

puri

ty:

π=

Tp

Tp

+F

p(4

)

.T

he

rece

iver

oper

ati

ng

chara

cter

isti

cis

aw

idel

yuse

dgra

phic

alto

olfo

rvis

ualizi

ng

the

per

form

ance

ofa

mach

ine-

learn

edcl

ass

ifier

.It

dis

pla

ys

its

Type

Ier

ror

as

afu

nct

ion

of

its

Type

IIer

ror,

inth

isca

seit

sfa

lse

posi

tive

rate

as

afu

nc-

tion

ofit

sm

isse

ddet

ecti

on

rate

,para

met

rize

dby

the

pro

ba-

bility

thre

shold

τat

whic

hth

edis

tinct

ion

bet

wee

nre

aland

bogus

isdra

wn.

By

det

erm

inin

gth

elo

cati

on

on

the

RO

C

at

whic

hth

etr

adeo

ffbet

wee

nth

eT

ype

Iand

Type

IIer

ror

for

the

class

ifica

tion

task

at

hand

isopti

mal,

the

scie

nti

stre

sponsi

ble

for

the

adm

inis

trati

on

of

the

mach

ine

learn

ing

alg

ori

thm

can

sim

ply

read

of

the

valu

eof

the

para

met

erτ

and

dra

wth

ecu

tth

ere.

4.2

.1A

void

ing

Ove

rfittin

gwith

Cro

ss-V

alidation

One

round

ofcr

oss

-validati

on

involv

espart

itio

nin

ga

sam

ple

of

data

into

com

ple

men

tary

subse

ts,

per

form

ing

the

anal-

ysi

son

one

subse

t(c

alled

the

train

ing

set)

,and

validati

ng

the

analy

sis

on

the

oth

ersu

bse

t(c

alled

the

validati

on

set

or

test

ing

set)

.To

reduce

vari

ability,

mult

iple

rounds

ofcr

oss

-va

lidati

on

are

per

form

edusi

ng

diff

eren

tpart

itio

ns,

and

the

validati

on

resu

lts

are

aver

aged

over

the

rounds.

We

sele

cted

5part

itio

ns

random

ly,w

ithout

repla

cem

ent,

from

our

train

-in

gdata

,th

enin

each

round

we

sele

cted

one

ofth

efive

par-

titi

ons

tobe

the

validati

on

set

and

allow

edth

ere

main

ing

4to

be

the

train

ing

set.

4.3

Perf

orm

ance

Resu

lts

and

Com

pari

son

wit

hO

ther

Stu

die

s

Fig

ure

XX

show

sth

eR

OC

sfo

rea

chro

und

ofcr

oss

valida-

tion.T

he

aver

age

fals

eposi

tive

rate

at

the

tip

of

the

RO

Cw

asX

X%

,w

ith

ast

andard

dev

iati

on

ofY

Y%

and

the

av-

erage

mis

sed

det

ecti

on

rate

atth

eti

pofth

eR

OC

wasZZ%

,w

ith

ast

andard

dev

iati

on

ofqq%

.T

he

Fig

ure

of

Mer

itth

at

Bri

nk

etal.

(2013)

use

toev

alu

ate

the

per

form

ance

ofth

eir

class

ifier

isth

em

isse

dde-

tect

ion

rate

at

afa

lse

posi

tive

rate

of1%

,a

figure

that

was

chose

nto

min

imiz

eth

eam

ount

ofm

anualvet

ting

tobe

per

-fo

rmed

inP

TF

dow

nst

ream

ofth

ecl

ass

ifica

tion.T

hei

rbes

tfigure

ofm

erit

was

7.7

%.O

ur

5-fold

cross

validate

dav

erage

figure

ofm

erit

is8.3

%.

As

des

crib

edin

§3.5

,autoScanass

igns

toea

chin

stance

tobe

class

ified

asc

ore

that

chara

cter

izes

the

code’

sco

nfi-

den

ceth

atth

edet

ecti

on

isnotan

art

ifact

.G

iven

the

outp

ut

ofautoScan,it

By

sele

ctin

ga

cut

on

autoScansc

ore

s(s

ee§3

.5,

autoScanoffer

sa

trade-

off

bet

wee

ntw

oty

pes

of

erro

r:m

isse

ddet

ecti

ons,

and

fals

eposi

tives

.In

§4.2

,w

edes

crib

edet

ecti

on-t

heo

reti

cco

nst

ruct

sfo

rquanti

fyin

gcl

ass

ifier

per

-fo

rmance

,in

cludin

gth

eR

ecei

ver

Oper

ati

ng

Chara

cter

isti

c(R

OC

),fa

lse

posi

tive

rate

(FP

R),

and

the

mis

sed

det

ecti

on

rate

(MD

R).

Mea

nin

gfu

lm

easu

res

of

class

ifica

tion

per

for-

mance

can

only

be

obta

ined

by

ben

chm

ark

ing

the

per

for-

mance

ofth

ecl

ass

ifier

on

data

sam

ple

sth

atm

irro

rth

edis

tri-

buti

on

ofobje

cts

infe

atu

resp

ace

that

one

would

enco

unte

rin

pro

duct

ion.In

§4.1

,w

edes

cirb

e

This

paper

hasbee

nty

pes

etfr

om

aTEX

/LA

TEX

file

pre

pare

dby

the

auth

or.

REFER

EN

CES

Alb

rech

t,A

.,et

al.,2006,ast

ro,arX

iv:a

stro

-ph/0609591

Am

anullah

R.,

etal.,2010,A

pJ,716,712

Ara

gon

C.

R.,

Bailey

S.

J.,

Poon

S.,

Runge

K.,

Thom

as

R.C

.,2008,JP

hC

S,125,012091

c ⃝0000

RA

S,M

NR

AS

000,000–000

10 D. Goldstein et al.

Predic

ted

Cla

ss

True Class

Non-Artifact Artifact

Non-A

rti

fact

True

Positives(Tp)

False

Positives(Fp)

Arti

fact

False

Negatives

(Fn)

True

Negatives

(Tn)

Figure 5. Schematic confusion matrix for the autoScan classifi-

cation task.

quoted in the literature are the False Positive Rate (FPR)and the Missed Detection Rate (also known as the FalseNegative Rate or False Omission Rate). Using the notationfrom table ??, the false positive rate is defined by:

FPR =Fp

Fp + Tn,

and the missed detection rate by

MDR =Fn

Tp + Fn

In the autoScan classification task, the false positiverate represents the fraction of detections produced bydiffim that are artifacts that are classified as legitimatedetections of astrophysical variability.

In our classification task, the missed detection rate isthe number of objects that were ground-truth Real and in-correctly classified as Bogus, divided by the total number ofobjects that were ground-truth Real. Another way of think-ing of the false positive rate is the percentage of objects thatshould have been saved that ended up getting rejected.

Other metrics for assessing the performance of a classi-fier are the accuracy,

a =Tp + Tn

Tp + Tn + Fp + Fn(2)

The efficiency:

ϵ =Tp

Tp + Fn(3)

And the purity:

π =Tp

Tp + Fp(4)

.The receiver operating characteristic is a widely used

graphical tool for visualizing the performance of a machine-learned classifier. It displays its Type I error as a function ofits Type II error, in this case its false positive rate as a func-tion of its missed detection rate, parametrized by the proba-bility threshold τ at which the distinction between real andbogus is drawn. By determining the location on the ROC

at which the tradeoff between the Type I and Type II errorfor the classification task at hand is optimal, the scientistresponsible for the administration of the machine learningalgorithm can simply read of the value of the parameter τand draw the cut there.

4.2.1 Avoiding Overfitting with Cross-Validation

One round of cross-validation involves partitioning a sampleof data into complementary subsets, performing the anal-ysis on one subset (called the training set), and validatingthe analysis on the other subset (called the validation set ortesting set). To reduce variability, multiple rounds of cross-validation are performed using different partitions, and thevalidation results are averaged over the rounds. We selected5 partitions randomly, without replacement, from our train-ing data, then in each round we selected one of the five par-titions to be the validation set and allowed the remaining 4to be the training set.

4.3 Performance Results and Comparison withOther Studies

Figure XX shows the ROCs for each round of cross valida-tion. The average false positive rate at the tip of the ROCwas XX%, with a standard deviation of YY% and the av-erage missed detection rate at the tip of the ROC was ZZ%,with a standard deviation of qq%.

The Figure of Merit that Brink et al. (2013) use toevaluate the performance of their classifier is the missed de-tection rate at a false positive rate of 1%, a figure that waschosen to minimize the amount of manual vetting to be per-formed in PTF downstream of the classification. Their bestfigure of merit was 7.7%. Our 5-fold cross validated averagefigure of merit is 8.3%.

As described in §3.5, autoScanassigns to each instanceto be classified a score that characterizes the code’s confi-dence that the detection is not an artifact. Given the outputof autoScan, it

By selecting a cut on autoScanscores (see §3.5,autoScanoffers a trade-off between two types of error:missed detections, and false positives. In §4.2, we describedetection-theoretic constructs for quantifying classifier per-formance, including the Receiver Operating Characteristic(ROC), false positive rate (FPR), and the missed detectionrate (MDR). Meaningful measures of classification perfor-mance can only be obtained by benchmarking the perfor-mance of the classifier on data samples that mirror the distri-bution of objects in feature space that one would encounterin production. In §4.1, we descirbe

This paper has been typeset from a TEX/ LATEX file preparedby the author.

REFERENCES

Albrecht, A., et al., 2006, astro, arXiv:astro-ph/0609591Amanullah R., et al., 2010, ApJ, 716, 712Aragon C. R., Bailey S. J., Poon S., Runge K., ThomasR. C., 2008, JPhCS, 125, 012091

c⃝ 0000 RAS, MNRAS 000, 000–000

10D

.G

olds

tein

etal

.

Predicted

Class

True

Cla

ss

Non-A

rti

fact

Arti

fact

Non-Artifact

Tru

e

Posi

tives

(Tp)

Fals

e

Posi

tives

(Fp)

Artifact

Fals

eN

egatives

(Fn)

Tru

e

Neg

atives

(Tn)

Fig

ure

5.

Sch

emati

cco

nfu

sion

matr

ixfo

rth

eautoScan

class

ifi-

cation

task

.

quote

din

the

lite

ratu

reare

the

Fals

ePosi

tive

Rate

(FP

R)

and

the

Mis

sed

Det

ecti

on

Rate

(als

oknow

nas

the

Fals

eN

egati

ve

Rate

or

Fals

eO

mis

sion

Rate

).U

sing

the

nota

tion

from

table

??,th

efa

lse

posi

tive

rate

isdefi

ned

by:

FP

R=

Fp

Fp

+T

n,

and

the

mis

sed

det

ecti

on

rate

by

MD

R=

Fn

Tp

+F

n

Inth

eautoScan

class

ifica

tion

task

,th

efa

lse

posi

tive

rate

repre

sents

the

fract

ion

of

det

ecti

ons

pro

duce

dby

diffim

that

are

art

ifact

sth

at

are

class

ified

as

legit

imate

det

ecti

ons

ofast

rophysi

calva

riability.

Inour

class

ifica

tion

task

,th

em

isse

ddet

ecti

on

rate

isth

enum

ber

ofobje

cts

that

wer

egro

und-t

ruth

Rea

land

in-

corr

ectl

ycl

ass

ified

as

Bogus,

div

ided

by

the

tota

lnum

ber

of

obje

cts

that

wer

egro

und-t

ruth

Rea

l.A

noth

erw

ayofth

ink-

ing

ofth

efa

lse

posi

tive

rate

isth

eper

centa

ge

ofobje

ctsth

at

should

hav

ebee

nsa

ved

that

ended

up

get

ting

reje

cted

.O

ther

met

rics

for

ass

essi

ng

the

per

form

ance

ofa

class

i-fier

are

the

acc

ura

cy,

a=

Tp

+T

n

Tp

+T

n+

Fp

+F

n(2

)

The

effici

ency

:

ϵ=

Tp

Tp

+F

n(3

)

And

the

puri

ty:

π=

Tp

Tp

+F

p(4

)

.T

he

rece

iver

oper

ati

ng

chara

cter

isti

cis

aw

idel

yuse

dgra

phic

alto

olfo

rvis

ualizi

ng

the

per

form

ance

ofa

mach

ine-

learn

edcl

ass

ifier

.It

dis

pla

ys

its

Type

Ier

ror

as

afu

nct

ion

of

its

Type

IIer

ror,

inth

isca

seit

sfa

lse

posi

tive

rate

as

afu

nc-

tion

ofit

sm

isse

ddet

ecti

on

rate

,para

met

rize

dby

the

pro

ba-

bility

thre

shold

τat

whic

hth

edis

tinct

ion

bet

wee

nre

aland

bogus

isdra

wn.

By

det

erm

inin

gth

elo

cati

on

on

the

RO

C

at

whic

hth

etr

adeo

ffbet

wee

nth

eT

ype

Iand

Type

IIer

ror

for

the

class

ifica

tion

task

at

hand

isopti

mal,

the

scie

nti

stre

sponsi

ble

for

the

adm

inis

trati

on

of

the

mach

ine

learn

ing

alg

ori

thm

can

sim

ply

read

of

the

valu

eof

the

para

met

erτ

and

dra

wth

ecu

tth

ere.

4.2

.1A

void

ing

Ove

rfittin

gwith

Cro

ss-V

alidation

One

round

ofcr

oss

-validati

on

involv

espart

itio

nin

ga

sam

ple

of

data

into

com

ple

men

tary

subse

ts,

per

form

ing

the

anal-

ysi

son

one

subse

t(c

alled

the

train

ing

set)

,and

validati

ng

the

analy

sis

on

the

oth

ersu

bse

t(c

alled

the

validati

on

set

or

test

ing

set)

.To

reduce

vari

ability,

mult

iple

rounds

ofcr

oss

-va

lidati

on

are

per

form

edusi

ng

diff

eren

tpart

itio

ns,

and

the

validati

on

resu

lts

are

aver

aged

over

the

rounds.

We

sele

cted

5part

itio

ns

random

ly,w

ithout

repla

cem

ent,

from

our

train

-in

gdata

,th

enin

each

round

we

sele

cted

one

ofth

efive

par-

titi

ons

tobe

the

validati

on

set

and

allow

edth

ere

main

ing

4to

be

the

train

ing

set.

4.3

Perf

orm

ance

Resu

lts

and

Com

pari

son

wit

hO

ther

Stu

die

s

Fig

ure

XX

show

sth

eR

OC

sfo

rea

chro

und

ofcr

oss

valida-

tion.T

he

aver

age

fals

eposi

tive

rate

at

the

tip

of

the

RO

Cw

asX

X%

,w

ith

ast

andard

dev

iati

on

ofY

Y%

and

the

av-

erage

mis

sed

det

ecti

on

rate

atth

eti

pofth

eR

OC

wasZZ%

,w

ith

ast

andard

dev

iati

on

ofqq%

.T

he

Fig

ure

of

Mer

itth

at

Bri

nk

etal.

(2013)

use

toev

alu

ate

the

per

form

ance

ofth

eir

class

ifier

isth

em

isse

dde-

tect

ion

rate

at

afa

lse

posi

tive

rate

of1%

,a

figure

that

was

chose

nto

min

imiz

eth

eam

ount

ofm

anualvet

ting

tobe

per

-fo

rmed

inP

TF

dow

nst

ream

ofth

ecl

ass

ifica

tion.T

hei

rbes

tfigure

ofm

erit

was

7.7

%.O

ur

5-fold

cross

validate

dav

erage

figure

ofm

erit

is8.3

%.

As

des

crib

edin

§3.5

,autoScanass

igns

toea

chin

stance

tobe

class

ified

asc

ore

that

chara

cter

izes

the

code’

sco

nfi-

den

ceth

atth

edet

ecti

on

isnotan

art

ifact

.G

iven

the

outp

ut

ofautoScan,it

By

sele

ctin

ga

cut

on

autoScansc

ore

s(s

ee§3

.5,

autoScanoffer

sa

trade-

off

bet

wee

ntw

oty

pes

of

erro

r:m

isse

ddet

ecti

ons,

and

fals

eposi

tives

.In

§4.2

,w

edes

crib

edet

ecti

on-t

heo

reti

cco

nst

ruct

sfo

rquanti

fyin

gcl

ass

ifier

per

-fo

rmance

,in

cludin

gth

eR

ecei

ver

Oper

ati

ng

Chara

cter

isti

c(R

OC

),fa

lse

posi

tive

rate

(FP

R),

and

the

mis

sed

det

ecti

on

rate

(MD

R).

Mea

nin

gfu

lm

easu

res

of

class

ifica

tion

per

for-

mance

can

only

be

obta

ined

by

ben

chm

ark

ing

the

per

for-

mance

ofth

ecl

ass

ifier

on

data

sam

ple

sth

atm

irro

rth

edis

tri-

buti

on

ofobje

cts

infe

atu

resp

ace

that

one

would

enco

unte

rin

pro

duct

ion.In

§4.1

,w

edes

cirb

e

This

paper

hasbee

nty

pes

etfr

om

aTEX

/LA

TEX

file

pre

pare

dby

the

auth

or.

REFER

EN

CES

Alb

rech

t,A

.,et

al.,2006,ast

ro,arX

iv:a

stro

-ph/0609591

Am

anullah

R.,

etal.,2010,A

pJ,716,712

Ara

gon

C.

R.,

Bailey

S.

J.,

Poon

S.,

Runge

K.,

Thom

as

R.C

.,2008,JP

hC

S,125,012091

c ⃝0000

RA

S,M

NR

AS

000,000–000

Fig. 6.— Schematic confusion matrix for theautoScan classification task. Each matrix elementEij represents the number of instances from thetask’s validation set with ground truth class labelj that were predicted to be members of class i.

Fig. 7.— 5-fold cross-validated receiver operat-ing characteristics of the best-performing classifierfrom §3.5. Six visually indistinguishable curves areplotted: one translucent curve for each round ofcross-validation, and one opaque curve represent-ing the mean. Points on the mean ROC corre-sponding to different class discrimination bound-aries τ are labeled. τ = 0.5 was adopted in DES-SN.

19

Page 20: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

at least one detection with an ML score greaterthan τ to become eligible for visual scanning. Inthis section we describe the effect of this revisionon the scanning load for an entire observing seasonusing a full reprocessing of the Y1 data.

We sought to minimize the size of our transientcandidate scanning load with no more than a 1percent loss in εF . By performing a grid search onτ , we found that we were able to reduce the num-ber of candidates during the first observing seasonof DES-SN by a factor of 13.4, while maintain-ing εF > 99.0 per cent by adopting τ = 0.5. Af-ter implementing autoScan using this τ , we mea-sured the quantity 〈NA/NNA〉, the average ratioof artifact objects to non-artifact detections that ahuman scanner encountered during a scanning ses-sion, using random samples of 3,000 objects drawnfrom the pool of objects passing the modified andunmodified cuts in Table 1. We found that theratio decreased by a factor of roughly 40 after theproduction implementation of autoScan. Table 4summarizes these results.

5. Discussion

With the development of autoScan and the useof fake overlays to robustly measure efficiencies,the goal of automating artifact rejection on dif-ference images using supervised ML classificationhas reached a certain level of maturity. With sev-eral historical and ongoing time-domain surveysusing ML techniques for candidate selection, itis clear that the approach has been successful inimproving astrophysical source selection efficiencyon images. However, there are still several waysthe process could be improved for large-scale tran-sient searches of the future, especially for ZTF andLSST, whose demands for reliability, consistency,and transparency will eclipse those of contempo-rary surveys.

5.1. Automating Artifact Rejection in Fu-ture Surveys

For surveys like LSST and ZTF, small decreasesin MDR are equivalent to the recovery of vastnumbers of new and interesting transients. De-creasing the size of the feature set and increas-ing the importance of each feature is one of themost direct routes to decreasing MDR. However,designing and engineering effective classification

features is among the most time-consuming andleast intuitive aspects of framework design. Im-proving MDR by revising feature sets is a mat-ter of trial and error—occasionally, performanceimprovements can result, but sometimes addingfeatures can degrade the performance of a classi-fier. Ideally, surveys that will retrain their classi-fiers periodically will have a rigorous, determinis-tic procedure to extract the optimal feature setfrom a given training data set. This is possi-ble with the use of convolutional neural networks(CNNs), a subclass of Artificial Neural Networks,that can take images as input and infer an optimalset of features for a given set of training data. Thedownside to CNNs is that the resulting featuresare significantly more abstract than astrophysi-cally motivated features and consequently can bemore difficult to interpret, especially in compari-son with Random Forests, which assign each fea-ture a relative importance. However, CNNs haveachieved high levels of performance for a diversearray of problems. They remain relatively unex-plored in the context of astrophysical data pro-cessing, and bear examination for use in futuresurveys.

Next, unless great care is taken to produce atraining data set that is drawn from the same mul-tidimensional feature distribution as the testingdata, dense regions of testing space might be com-pletely devoid of training data, leading to an un-acceptable degradation of classification accuracyin production. Developing a rigorous method foravoiding such sample selection bias is crucial forfuture surveys, for which small biases in the train-ing set can result in meaningful losses in efficiency.The idea of incorporating active learning tech-niques into astronomical ML classification frame-works has been advanced as a technique for reduc-ing sample selection bias (Richards et al. 2012).

Given a testing set and a training set whichare free to be drawn from different distributionsin feature space, in the pool-based active learn-ing for classification framework, an algorithm it-eratively selects, out of the entire set of unlabeleddata, the object (or set of objects) that would givethe maximum performance gains for the classifi-cation model, if its true label were known. Thealgorithm then solicits a user to manually inputthe class of the object under consideration, andthen the object is automatically incorporated into

20

Page 21: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Table 4

Effect of autoScan on Reprocessed DES Y1 Transient Candidate Scanning Load.

No ML ML (τ = 0.5) ML / No ML

Nca 100,450 7,489 0.075

〈NA/NNA〉b 13 0.34 0.027εF

c 1.0 0.990 0.990

aTotal number of science candidates discovered.

bAverage ratio of artifact to non-artifact detections inhuman scanning pool determined from scanning 3,000 ran-domly selected detections from all science candidate detec-tions.

cautoScan candidate-level efficiency for fake SNe Ia.

Fig. 9.— 24 consecutively observed difference image cutouts of a poorly subtracted galaxy that was wronglyidentified as a transient. The autoScan score of each detection appears at the bottom of each cutout. Themis-identification occurred because on two nights the candidate had a detection that received a score above anautoScan class discrimination boundary τ = 0.4 used during early code tests (green boxes). Night-to-nightvariations in observing conditions, data reduction, and image subtraction can cause detections of artifacts toappear real. If a two-night trigger is used, spurious “transients” like this one can can easily accumulate asa season goes on. Consequently, care must be taken when using an artifact rejection framework that scoresindividual detections to make statements about aggregates of detections. Each image is labeled with theobservation date and filter for the image, in the format YYYYMMDD-filter.

21

Page 22: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

future training sets to improve upon the originalclassifier. Under this paradigm, human scannerswould play the valuable role of helping the classi-fier learn from its mistakes, and each human hourspent vetting data would immediately carry sci-entific return. Active learning could produce ex-tremely powerful classifiers over short timescaleswhen used in concert with generative models fortraining data. Instead of relying on historical datato train artifact rejection algorithms during com-missioning phases, experiments like LSST coulduse generative models for survey observations tosimulate new data sets. After training a classifierusing simulated data, in production active learn-ing could be used to automatically fill in gaps inclassifier knowledge and augment predictive accu-racy.

In this work, we used a generative model ofSN Ia observations—overlaying fake SNe Ia ontoreal host galaxies—to produce the “Non-Artifact”component of our training data set. However, thenearly 500,000 artifacts in our training set werehuman-scanned, implying that future surveys willstill need to do a great deal of scanning beforebeing able to get an ML classifier off the ground.A new survey should not intentionally alter thepipeline to produce artifacts during commission-ing, as it is crucial that the unseen data be drawnfrom the same feature distributions as the trainingdata. For surveys with 〈NA/NNA〉 & 100, Brinket al. (2013) showed that a robust artifact librarycan be prepared by randomly sampling from alldetections of variability produced by the differ-ence imaging pipeline. For surveys or pipelinesthat do not produce as many artifacts, some ini-tial scanning to produce a few 104-artifact libraryfrom commissioning data should be sufficient toproduce an initial training set (Brink et al. 2013;du Buisson et al. 2014).

5.2. Eliminating Spurious Candidates

Using a two-night trigger, some spurious sciencecandidates can be created due to nightly variationsin astrometry, observing conditions, and repeat-edly imaged source brightnesses that cause night-to-night fluctuations in the appearance of candi-dates on difference images. These variations leadto a spread of ML scores for a given candidate. Asan observing season progress, artifacts can accu-mulate large numbers of detections via repeated

visits. Although for a typical artifact the vast ma-jority of detections fail the ML requirement, thefluctuations in ML scores can cause a small frac-tion of the detections to satisfy the autoScan re-quirement. Figure 9 shows an example of this ef-fect.

Mitigating the buildup of spurious multi-nightcandidates could be achieved by implementinga second ML classification framework that takesas input multi-night information, including thedetection-level output of autoScan, to predictwhether a given science candidate represents abona-fide astrophysical source. Training datacompilation could be performed by randomly se-lecting time-contiguous strings of detections fromknown candidates. The lengths of the stringscould be drawn from a distribution specified dur-ing framework development. Candidate-level fea-tures could characterize the temporal variationof detection level features, such as the highestand lowest night-to-night shifts in autoScan score,magnitude, and astrometric uncertainty.

DAG thanks an anonymous referee for com-ments that improved the paper. We are gratefulfor the extraordinary contributions of our CTIOcolleagues and the DES Camera, Commissioningand Science Verification teams for achieving ex-cellent instrument and telescope conditions thathave made this work possible. The success ofthis project also relies critically on the expertiseand dedication of the DES Data Management or-ganization. Funding for DES projects has beenprovided by the U.S. Department of Energy, theU.S. National Science Foundation, the Ministry ofScience and Education of Spain, the Science andTechnology Facilities Council of the United King-dom, the Higher Education Funding Council forEngland, the National Center for Supercomput-ing Applications at the University of Illinois atUrbana-Champaign, the Kavli Institute of Cosmo-logical Physics at the University of Chicago, Fi-nanciadora de Estudos e Projetos, Fundacao Car-los Chagas Filho de Amparo a Pesquisa do Es-tado do Rio de Janeiro, Conselho Nacional deDesenvolvimento Cientıfico e Tecnologico and theMinisterio da Ciencia e Tecnologia, the DeutscheForschungsgemeinschaft and the collaborating in-stitutions in the Dark Energy Survey.

The collaborating institutions are Argonne Na-

22

Page 23: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

tional Laboratory, the University of California,Santa Cruz, the University of Cambridge, Centrode Investigaciones Energeticas, Medioambientalesy Tecnologicas-Madrid, the University of Chicago,University College London, the DES-Brazil Con-sortium, the Eidgenossische Tecnische Hochschule(ETH) Zurich, Fermi National Accelerator Labo-ratory, the University of Edinburgh, the Univer-sity of Illinois at Urbana-Champaign, the Insti-tut de Ciencies de l’Espai (IEEC/CSIC), the In-titut de Fisica d’Altes Energies, Lawrence Berke-ley National Laboratory, the Ludwig-MaximiliansUniversitat and the associated Excellence Clus-ter Universe, the University of Michigan, the Na-tional Optical Astronomy Observatory, the Uni-versity of Nottingham, the Ohio State University,the University of Pennsylvania, the University ofPortsmouth, SLAC National Acclerator Labora-tory, Stanford University, the University of Sussex,and Texas A&M University.

This research used resources of the NationalEnergy Research Scientific Computing Center, aDOE Office of Science User Facility supported bythe Office of Science of the U.S. Department of En-ergy under Contract No. DE-AC02-05CH11231.Figure 4 was generated with a modified version oftriangle.py (Foreman-Mackey et al. 2014). ACRacknowledges financial support provided by thePAPDRJ CAPES/FAPERJ Fellowship. FS ac-knowledges financial support provided by CAPESunder contract No. 3171-13-2. The DES partici-pants from Spanish institutions are partially sup-ported by MINECO under grants AYA2012-39559,ESP2013-48274, FPA2013-47986, and Centro deExcelencia Severo Ochoa SEV-2012-0234, some ofwhich include ERDF funds from the EuropeanUnion.

REFERENCES

Alard, C., & Lupton, R. H. 1998, ApJ, 503, 325

Albrecht, A., et al. 2006, arXiv:astro-ph/0609591

Aldering, G., et al. 2002, SPIE, 4836, 61

Aragon, C. R., et al. 2008, JPhCS, 125, 012091

Astier, P., et al. 2006, A&A, 447, 31

Bailey, S., et al. 2007, ApJ, 665, 1246

Baltay, C., et al. 2007, PASP, 119, 1278

Banerji, M., et al. 2010, MNRAS, 406, 342

Bernstein, J. P., et al. 2012, ApJ, 753, 152

Bertin, E. 2006, ASPC, 351, 112

Bertin, E., & Arnouts, S. 1996, A&AS, 117, 393

Blanc G., et al. 2004, A&A, 423, 881

Bloom, J. S., et al. 2012, PASP, 124, 1175

Bloom, J. S., & Richards, J. W. 2012, in Advancesin Machine Learning and Data Mining for As-tronomy, eds. M. J. Way et al. (London: CRCPress), 89

Bouy, H., et al. 2013, A&A, 554, AA101

Breiman, L. 2001, Machine Learning, 45, 5

Breiman, L., et al. 1984, Classification and Re-gression Trees (London: CRC Press)

Brink, H., et al. 2013, MNRAS, 435, 1047

du Buisson, L., Sivanandam, N., Bassett, B. A.,Smith, M. 2014, arXiv, arXiv:1407.4118

Desai, S., et al. 2012, ApJ, 757, 83

Diehl, H. T. (for the Dark Energy Survey Collab-oration) 2012, PhPro, 37, 1332

Diehl, H. T., et al. 2014, SPIE, 9149, 91490V

Filippenko, A. V., et al. 2001, ASPC, 246, 121

Flaugher, B. 2005, IJMPA, 20, 3121

Flaugher, B., et al. 2012, SPIE, 8446, 844611

Flaugher, B., et al. 2015, preprint FERMILAB-PUB-15-121-PPD.

Foreman-Mackey, D., Price-Whelan,A., Ryan, G., et al. 2014,http://dx.doi.org/10.5281/zenodo.11020

Frieman, J. A., et al. 2008, AJ, 135, 338

Gini, C. 1921, Econ. J., 31, 124

Hamuy, M., et al. 1993, AJ, 106, 2392

Holtzman, J. A., et al. 2008, AJ, 136, 2306

Ivezic, Z., et al. 2013, Statistics, Data Mining,and Machine Learning in Astronomy (Prince-ton, NJ: Princeton UP)

23

Page 24: arXiv:1504.02936v3 [astro-ph.IM] 21 Dec 2015ing and image distortion (see, e.g., Alard & Lup-ton 1998). In addition to legitimate detections of astrophysical variability, they can

Karpenka, N. V., Feroz, F., Hobson, M. P. 2013,MNRAS, 429, 1278

Kessler, R., et al. 2009, PASP, 121, 1028

LSST Science Collaboration 2009,arXiv:0912.0201

Mainzer, A., et al. 2011, ApJ, 731, 53

Pedregosa, F., et al. 2012, arXiv:1201.0490

Perlmutter, S., et al. 1997, AAS, 29, 1351

Rest, A., et al. 2014, ApJ, 795, 44

Richards, J. W., et al. 2011, ApJ, 733, 10

Richards, J. W., et al. 2012, ApJ, 744, 192

Rousseeuw, P. J., & Croux, C. 1993, Journal ofthe American Statistical Association, 88, 424,1273

Sako M., et al. 2008, AJ, 135, 348

Sako M., et al. 2011, ApJ, 738, 162

Sako M., et al. 2014, arXiv:1401.3317

Schmidt, B. P., et al. 1998, ApJ, 507, 46

Shannon, C. E. 1948, BSTJ, 27, 379

Smith, A. M., et al. 2011, MNRAS, 412, 1309

Smith, R. M., et al. 2014, SPIE, 9147,

Strolger, L.-G., et al. 2004, ApJ, 613, 200

Tyson, J. A. 2002, SPIE, 4836, 10

Vapnik, V. 1995, The Nature of Statistical Learn-ing Theory (New York: Springer)

van Dokkum, P. G. 2001, PASP, 113, 1420

Waszczak, A., et al. 2013, MNRAS, 433, 3115

Willsky, A. S., Wornell, G. W., Shapiro, J. H.2003, Stochastic Processes, Detection and Es-timation (Course notes for MIT 6.432)

Wright, D. E., et al. 2015, MNRAS, 449, 451

Zhu, J., et al. 2009, SII, 2.3, 349

Zwicky , F. 1964, AnAp, 27, 300

This 2-column preprint was prepared with the AAS LATEXmacros v5.2.

24