for astronomy: the case for sn...
TRANSCRIPT
Emille E. O. Ishida
CNRS/Laboratoire de Physique de Clermont - Université Clermont-AuvergneClermont Ferrand, France
Optimizing machine learning for astronomy:
the case for SN classificationStatistical Challenges in 21st Century Cosmology
22-25 March 2018 – Valencia, Spain
1. detection
3. spectroscopy
2. photometry
4. standardization + cosmological fit
Brig
htne
ss
redshift
Supernova Cosmology
Distance(redshift) + classification
b
right
ness
days
Flu
x
wavelength
year Number of supernova
1998 42
2014 740
2025 > 10 000
1. detection
3. spectroscopy
2. photometry
4. standardization + cosmological fit
Brig
htne
ss
redshift
Supernova Cosmology
Distance(redshift) + classification
b
right
ness
days
Flu
x
wavelength
year Number of supernova
1998 42
2014 740
2025 > 10 000
Big Data (in astronomy) → Large Scale Sky Surveys
https://www.lsst.org/
year Number of supernova
1998 42
2014 740
2025 > 10 000
2 million alerts/day15 TB/day
40 nights of LSST
entire Google database
PS: Machine Learningmethods do not
extrapolate!
Spectroscopy x Photometry
Photometryonly
Photometry+
Spectra
From COIN Residence Program #4, Ishida et al., 2018 – arXiv:astro-ph/1804.03765
The Data: post-SNPCC simulations – Kessler et al., 2010
Representativeness
How to construct
training samples which optimize
photometric classification
results?Given known observational constraints...
Points to take into account:
A spectroscopic sample was not constructed to
be a training sample
Training data dictates the limits of photometric
classification
Spectroscopic follow-up should be used with
parsimony
or Optimal Experimental DesignActive Learning“Can machines learn with fewer labeled training instances
if they are allowed to ask questions?”
Feature 1
Feat
ure
2
Step 1
Standard Classification
Feature 1
Feat
ure
2
Step 2
Identify problematic as well as
representative elements
Step 3
Make a query
Feature 1
Feat
ure
2
Step 4
Classify
Class 1
Class 2
Our strategyAL for Supernova classification
“Raw” data
Parametric Fit5 parameters
Random Forest
Photo-classuncertainty
Query the SN with highest uncertainty
AL for SN classificationStatic results
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Active Learning
Passive Learning
Canonical strategy
Points to take into account:
A spectroscopic sample was not constructed to
be a training sample
Training data dictates the limits of photometric
classification
Spectroscopic follow-up should be used with
parsimony
A supernova is a transient
Time DomainSurvey evolution
1. Feature extraction done daily with available observed epochs until then.
2. Query sample is also re-defined daily: objects with r-mag < 24
3. No need for an initial training sample
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Partial LC, no trainingTime domain
The arrow shows
traditionalFull light-curveresults with full
SNPCC spec
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Batch ModePartial LC, no initial training, time domain
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
The arrow shows
traditionalFull light-curveresults with full
SNPCC spec
The queried samplePartial LC, no training, time domain, batch
SNPCC spec: Queried sample: Telescope time: 1103 objects 800 objects Queried/spec = 0.999
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Summary“How do we optimize
machine learning results
with a minimum number of labeled
training instances?”
AdaptiveLearning
designed for astronomical
data
This is a group effort!
The Cosmostatistics Initiatve (COIN) was born in
Cosmo21 - Lisbon, 2014!
Build an interdisciplinary, people-centric community!
https://www.cosmostatistics-initiative.org
Active Learningor Optimum experimental Design
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
AL algorithm
Ishida et al., 2018 - arXiv:astro-ph/1804.03765 - from CRP #4
, deployment: Emille Ishida