![Page 1: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/1.jpg)
Big Data Era Challenges and
Opportunities in Astronomy: How
SOM/LVQ and Related Learning
Methods Can Contribute?Prof. Pablo Estévez, Ph.D.
Department of Electrical Engineering, Universidad de Chile
& Millennium Institute of Astrophysics, Chile
Houston, TX, January 8, 2016WSOM 2016
![Page 2: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/2.jpg)
Contents
� Astronomy Context
� Large Synoptic Survey Telescope
� Millennium Institute of Astrophysics (MAS)
� Big Data
� SOM/LVQ
� Two Examples
� Conclusions
![Page 3: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/3.jpg)
Mirrors of the largest telescopes
SALT
By 2024 Chile will concentrate 70% of the global observingWe have the responsibility to capitalize on this opportunity
![Page 4: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/4.jpg)
Large Synoptic Survey
Telescope (LSST)
Cerro Pachón, Chile, 2022
![Page 5: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/5.jpg)
3 x3 degrees field of view
All southern hemisphere in 3 days
During 10 years
LSST will produce a 3D video of the Universe
Cosmic Cinematography: Exploration of time domain
In one year it will collect more data than
all previous telescopes as a whole (15 PB/year)
Real time data management
100,000 transients per night
![Page 6: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/6.jpg)
Challenges for ChileAccess to 100% of the data
But LSST will not do the analysis
Avalanche of data
Huge challenges in computational
intelligence and data analysis
New way of doing science:
Data-driven Science
![Page 7: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/7.jpg)
LSST: Big Data Challenges
� Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years
� Classify more than 50 billion objects and followup many of these events in real time
� Spectrograph to separate light into frequencyspectrum
� Extracting knowledge in real time for ~2 millionevents per night
� Discovering the unknown unknowns(serendipity): the things that we do not evenknow that we don´t know!
![Page 8: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/8.jpg)
Big Data Four V´s
![Page 9: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/9.jpg)
Astronomy Data Production Comparison
Credits: ALMA, Maccarena Gonzalez:
![Page 10: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/10.jpg)
Big Data Analytics
How did
the
Milky Way
form?
Challenge:
Can Data
Mining
discover
new patterns
and
correlations?
What vs.
Why
Challenge:
Providing
good
visualization
tools for
doing
science
High
Performance
Computing,
GPGPUs
![Page 11: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/11.jpg)
Pragmatic Approach
� Knowing what, not why, is good enough
� This is done finding out valuable correlations(including non-linear relationships)
� Correlation allows us analyzing a phenomenonby identifying a good proxy for it
� The grand challenge is the problem of inference: turning data into knowledge throughmodels
� A data-driven approach is used instead of a hypothesis-driven one
![Page 12: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/12.jpg)
SOM/LVQ Methods
� What is the Best Classifier in the World?
� Source: Fernandez-Delgado et al, JMLR, 2014
� “Including all the relevant classifiers availabletoday”. Comparison of 179 classifiers on 121 data sets
Ranking Classifier
1 Parallel
Random Forest
2 Random Forest
3 SVM-C
77 LVQ
119 Supervised
SOM
Why the GLVQ*classifiers are not inthe list?
• Implementation in R or Python• Easy interface• Automatic parameter tuning
![Page 13: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/13.jpg)
SOM Journal Papers
0
20
40
60
80
100
120
385 SOM Papers Published in ISI Journals (2010-2015)
![Page 14: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/14.jpg)
Semi-supervised variable star clustering
New paradigm in the field of ML/CI, is semi-supervised learning:
� (unsupervised) find structures, patterns or clusters by
measuring similarities between samples
� (supervised) Incorporate labels if available, this guides the
unsupervised half (label propagation)
Possibility of detecting something novel (patterns not in the
training set) while still discriminating the known classes.
Example: Clustering 10,000 periodic
variable stars from EROS-2 (Sammon
visualization). Only 10% of the data is
labeled.
Purple: EB, blue: CEPH, yellow: RRL,
green: LPV, red: unknown
![Page 15: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/15.jpg)
Active learning with human in the loop
Active learning: The machine can query the expert for labels. In
practice, the number of labels is much less than in the
supervised case.
Query strategy: (1) Ask labels for the most uncertain samples
(boundaries), (2) minimize expected error, (3) minimize output
variance, etc.
Example: AL query
interface for variable
star classification,
show a pair of
samples and choose
if they belong to the
same class.
![Page 16: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/16.jpg)
Millennium Institute of Astrophysics (MAS)
Started in January 2014
Passion for the exploration of the natural world
![Page 17: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/17.jpg)
Millennium Institute of Astrophysics (MAS)Started in January 2014
Milky Way Astroinformatics, Astrostatistics
Exoplanets, Transients Supernovae
![Page 18: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/18.jpg)
Astronomical Time Series: Light
Curves. “LOS PABLOS” Work
� Light Curve: Stellar brightness (magnitude or flux) versus time.
� Variable stars: stars whose luminosity varies over time (3% of the stars in the universe are variables, and 1% are periodic variable stars)
� Light Curve Analysis: Useful for period detection, event detection, stellar classification, extra solar planet discovery, measuring distance to earth, etc.
![Page 19: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/19.jpg)
An Example of a Light Curve
![Page 20: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/20.jpg)
Variable stars
Eclipsing binary stars Pulsating star
![Page 21: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/21.jpg)
Folded Light Curves
� The transformation “t modulus T” plots successive
cycles atop one another, where T is the period
� Usually all periods within a range are tried to find the
one that maximizes a criterion (sweep).
Folding a light curve Estimating the period
![Page 22: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/22.jpg)
Automated period detection
� Correntropy (generalized correlation) is used to compute similarities between samples
� Go beyond second order statistics, taking into account higher order moments
� Robustness to outliers and noise
� Spectral decomposition of correntropy using advanced signal processing techniques
� Gaussian basis functions are used instead of sinusoids
� Go beyond Fourier representation to get super-resolution, more localized and sparser spectra
![Page 23: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/23.jpg)
Example
![Page 24: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/24.jpg)
EROS-2 Survey
� Survey of the Magellanic Clouds and the Galactic bulge
� Data taken from ESO Observatory, in La Silla, Chile
� 38.2 million light curves with two channels each (blue and red).
� EROS dataset processed automatically in 18 hours using GPGPU cluster. We found 120,000 periodic variables.
� Near future (within MAS):
� Be able to process a billion light curves per day
![Page 25: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/25.jpg)
� DEMO Period_finding_demo_Python
![Page 26: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/26.jpg)
Real-time Transient Detection
Pipeline (PANCHO´S Work)
� Quest to complete the luminosity-time diagram for low luminosities and short cadences
� Discovery of new transient phenomena
� New instruments like DECam and LSST will allow us to detect for example the explosion of a supernova in real time.
� A custom real-time transient pipeline has been developed.
![Page 27: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/27.jpg)
![Page 28: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/28.jpg)
High Cadence Transient Survey
(F. Forster et al.)HiTS scientific objective: Find
evidence of shock breakouts (SBO).
SBO: Event that occurs instants after
the explosion of a supernova.
Supernova: Explosion by the end of
the life cycle of massive stars
Dark Energy Camera (DECam)
1. Formation of neutron star (~secs)2. Shock emergence (~hrs)3. Glowing ejecta (~days/weeks)4. Renmant diffusion (~kyrs)
Figures: nasa.gov
![Page 29: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/29.jpg)
HiTS Image Reduction Pipeline
● At this point, candidates are dominated by artifacts 1:10K
● ML to find the needles in the haystack
Data Capture Preprocessing Alignment
PSF Matching SubtractionCandidate
Selection
Candidate
Filtering
Visual
Inspection
![Page 30: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/30.jpg)
Non-negative Matrix Factorization
![Page 31: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/31.jpg)
Feature Extraction using NMF
In NMF, we aim to decompose V into factors W and H by
solving
where the non-negative constraints are element-wise.
Nonnegativity: Only additive combinations. Sparse and
part based decompositions
The NMF problem is non-convex in W and H at the same
time (ill-posed). Regularization can alleviate this.
![Page 32: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/32.jpg)
Principal Component Analysis
Lee and Seung, Nature 1999
![Page 33: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/33.jpg)
Non-negative Matrix Factorization
Lee and Seung, Nature 1999
![Page 34: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/34.jpg)
Cosmic rays and noise
Current Reference Difference Current Reference Difference
![Page 35: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/35.jpg)
Stars!
Current Reference Difference Current Reference Difference
![Page 36: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/36.jpg)
2D Visualization of the astronomical images using Self Organizing Maps (SOM)
● The SOM is an unsupervised neural network using for visualization...
● The database contain ~1000 21x21 images of objects labeled as variable by the CMM pipeline.
● We use Non-negative matrix factorization (NMF) to capture the different behaviors in the stamps and reduce dimensionality (441 → 16)
● We train the SOM with the NMF coefficients and obtain a u-matrix visualization
![Page 37: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/37.jpg)
The whole picture
![Page 38: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/38.jpg)
Proposed method: Offline training
phase
Slicing &
Scaling
Training
data
NMF
Im1
W1
H1
W2
H2
Ws
Hs
Train RF
modelIm2
Ims ● Training set: 500,000 labeled
HiTS candidates
● NMF parameters:K and Lambda
Nx3x441
Nx441 NxKKx441
![Page 39: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/39.jpg)
Results: Classification Performance
NMF outperforms PCA and the raw model
(a) FPR: 0.1%, FNR: 4% (NMF), 10% (PCA), 15% (raw)
(b) FPR: 0.1%, FNR: 15% (NMF), 30% (PCA), 40% (raw)
NMF is less affected when SNR decreases
SNR>6 5<SNR<6
![Page 40: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/40.jpg)
Traditional Pattern Recognition
Model
![Page 41: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/41.jpg)
Inspired in Inception Movie
PANCHO, WE NEED TO GO
DEEPER
![Page 42: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/42.jpg)
Deep Learning
• For many years shallow neural network
architectures were used
• Classical multilayer perceptron trained by error
backpropagation (gradient descent)
• Problem of vanishing gradients for deeper
arquitectures, number of examples,
computational time
![Page 43: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/43.jpg)
21
21
5
5
17
20
3
20
6
50
2
2
200 200
3
6
3
17
34
4
Convolutional Neural Nets
Applied to HiTS4
convolutionalpooling
Guillermo Cabrera, Ignacio Reyes, Pablo Estevez, Francisco Förster,
Juan-Carlos Maureira
![Page 44: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/44.jpg)
ConvNet Applied to HiTSDetection error tradeoff curve
455,393 data-set:- 350,393 training set- 5,000 validation set- 100,000 test set
It takes ~12 mins using Theano over a GPU Tesla K20(Graphical Processor Unit)
![Page 45: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/45.jpg)
Movies!
http://www.das.uchile.cl/~fforster/ATEL/summary_das.html
![Page 46: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/46.jpg)
Movies!
http://www.das.uchile.cl/~fforster/ATEL/summary_das.html
![Page 47: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/47.jpg)
![Page 48: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/48.jpg)
![Page 49: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/49.jpg)
Movies!
20 Young supernovae were detected in real-time
Plus other 70 supernovae
In 6-nights of observation at DECam
![Page 50: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/50.jpg)
Conclusions
Big Data is here to stay
Deep learning is a computational
intelligence/machine learning technique that
allows us to extract features automatically
The importance of multidisciplinary teams
TO DO List: Larger sets and more heterogeneous
sets, combine features (NMF, PCA, DL,
engineered features), active learning
![Page 51: Big Data Era Challenges and Opportunities in …...LSST: Big Data Challenges Mining in real time a massive data stream of ~2 Terabytes per hour for 10 years Classify more than 50 billion](https://reader030.vdocuments.us/reader030/viewer/2022040620/5f30abb7f47def18d9438ed8/html5/thumbnails/51.jpg)
References
� P. Huijse, P. Estevez, P. Protopapas, JC Principe, P. Zegers, “Computational Intelligence Challenges and Applications on Large-Scale Astronomical Time Series Databases”, IEEE Computational Intelligence Magazine, August 2014.
� P. Protopapas, P. Huijse, P. Estevez, P. Zegers, JC Principe, JB Marquette “A novel, fully automated pipeline for period estimation in the EROS 2 data set”, Astrophysical Journal Supplement Series, 216:25, February 2015.
� P.Huijse, P. Estevez, P. Protopapas, P. Zegers, J. Principe, “An Information Theoretic Algorithm for Finding Periodicities in Stellar Light Curves”, IEEE Transactions on Signal Processing, Vol. 60, n°10, pp. 5135-5145, 2012.
� P.Huijse, P. Estevez, P. Zegers, P. Protopapas, J. Principe, “Period Estimation in Astronomical Time Series using Slotted Correntropy”, IEEE Signal Processing Letters, Vol. 18, n°6, pp. 371-374, 2011.