for astronomy: the case for sn...

Emille E. O. Ishida

CNRS/Laboratoire de Physique de Clermont - Université Clermont-AuvergneClermont Ferrand, France

Optimizing machine learning for astronomy:

the case for SN classificationStatistical Challenges in 21st Century Cosmology

22-25 March 2018 – Valencia, Spain

1. detection

3. spectroscopy

2. photometry

4. standardization + cosmological fit

Brig

htne

ss

redshift

Supernova Cosmology

Distance(redshift) + classification

b

right

ness

days

Flu

x

wavelength

year Number of supernova

1998 42

2014 740

2025 > 10 000

Big Data (in astronomy) → Large Scale Sky Surveys

https://www.lsst.org/

year Number of supernova

1998 42

2014 740

2025 > 10 000

2 million alerts/day15 TB/day

40 nights of LSST

entire Google database

PS: Machine Learningmethods do not

extrapolate!

Spectroscopy x Photometry

Photometryonly

Photometry+

Spectra

From COIN Residence Program #4, Ishida et al., 2018 – arXiv:astro-ph/1804.03765

The Data: post-SNPCC simulations – Kessler et al., 2010

Representativeness

How to construct

training samples which optimize

photometric classification

results?Given known observational constraints...

Points to take into account:

A spectroscopic sample was not constructed to

be a training sample

Training data dictates the limits of photometric

classification

Spectroscopic follow-up should be used with

parsimony

or Optimal Experimental DesignActive Learning“Can machines learn with fewer labeled training instances

if they are allowed to ask questions?”

Feature 1

Feat

ure

2

Step 1

Standard Classification

Feature 1

Feat

ure

2

Step 2

Identify problematic as well as

representative elements

Step 3

Make a query

Feature 1

Feat

ure

2

Step 4

Classify

Class 1

Class 2

Our strategyAL for Supernova classification

“Raw” data

Parametric Fit5 parameters

Random Forest

Photo-classuncertainty

Query the SN with highest uncertainty

AL for SN classificationStatic results

Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4

Active Learning

Passive Learning

Canonical strategy

Points to take into account:

A spectroscopic sample was not constructed to

be a training sample

Training data dictates the limits of photometric

classification

Spectroscopic follow-up should be used with

parsimony

A supernova is a transient

Time DomainSurvey evolution

1. Feature extraction done daily with available observed epochs until then.

2. Query sample is also re-defined daily: objects with r-mag < 24

3. No need for an initial training sample


Partial LC, no trainingTime domain

The arrow shows

traditionalFull light-curveresults with full

SNPCC spec


Batch ModePartial LC, no initial training, time domain


The arrow shows

traditionalFull light-curveresults with full

SNPCC spec

The queried samplePartial LC, no training, time domain, batch

SNPCC spec: Queried sample: Telescope time: 1103 objects 800 objects Queried/spec = 0.999


Summary“How do we optimize

machine learning results

with a minimum number of labeled

training instances?”

AdaptiveLearning

designed for astronomical

data

This is a group effort!

The Cosmostatistics Initiatve (COIN) was born in

Cosmo21 - Lisbon, 2014!

Build an interdisciplinary, people-centric community!

https://www.cosmostatistics-initiative.org

https://www.cosmostatistics-initiative.org

Thank you!

http://cointoolbox.github.io/

http://cointoolbox.github.io/

Active Learningor Optimum experimental Design

AL algorithm


, deployment: Emille Ishida

for astronomy: the case for sn...

Documents