what's new for machine learning with oracle database and ...€¦ · • r –prototyping and...
TRANSCRIPT
![Page 1: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/1.jpg)
![Page 2: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/2.jpg)
What's New for Machine Learning with Oracle Database and HadoopEric Grancher
Manuel Martín Márquez
![Page 3: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/3.jpg)
Outline• CERN Data environment
• Introduction to Machine Learning
• Our Machine Learning path• Introduction to a real use case
• R – prototyping and validating ideas
• First Scalable attend – Oracle R Enterprise
• Hadoop and the analytic transformation
• Oracle Advance Analytics for Hadoop
• Spark
• TensorFlow + Spark
3
![Page 4: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/4.jpg)
CERN Database Environment
4
October 2012 December 2015
Max size ACCLOG 136TB ACCLOG 352TB
Max redo ACCMEAS 27TB /
month
QPSR 115TB / month
![Page 5: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/5.jpg)
CERN Control Systems• IoT and Control System
• Cryogenics
• Vacuum
• Machine Protection
• Power Converters
• QPS• Accelerator Logging Service
• ~ 275 GB/day
• Storing more than 50 TB / year
• Data acquisition• CERN accelerator complex
• Related subsystems
• Experiments
• Around 1 million signals
5
![Page 6: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/6.jpg)
Machine Learning (ML)• ML is a branch of artificial intelligence:
• Uses computing based systems to make sense out of data
• Extracting patterns, fitting data to functions, classifying data, etc
• ML systems can learn and improve• With historical data, time and experience
• Bridges theoretical computer science and real noise data.
6
![Page 7: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/7.jpg)
ML in real-life
7
![Page 8: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/8.jpg)
Supervised and Unsupervised Learning
• Unsupervised Learning
• There are not predefined and known set of outcomes
• Look for hidden patterns and relations in the data
• A typical example: Clustering
8
0.0
0.5
1.0
1.5
2.0
2.5
2 4 6
Petal.Length
Pe
tal.W
idth
irisCluster$cluster
1
2
3
![Page 9: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/9.jpg)
Supervised and Unsupervised Learning
• Supervised Learning
• For every example in the data there is always a predefined
outcome
• Models the relations between a set of descriptive features and
a target (Fits data to a function)
• 2 groups of problems:
• Classification
• Regression
9
![Page 10: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/10.jpg)
Supervised Learning
• Classification• Predicts which class a given sample of data (sample of
descriptive features) is part of (discrete value).
• Regression• Predicts continuous values.
10
100.0
0.0
0.0
0.0
96.0
4.0
4.0
0.0
96.0
setosa
versicolor
virginica
setosa versicolor virginica
Actual
Pre
dic
ted
0
25
50
75
100
Percent
![Page 11: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/11.jpg)
Machine Learning as a ProcessDefine
Objectives
Data Preparation
Model Building
Model Evaluation
Model Deployment
11
- Define measurable and quantifiable goals
- Use this stage to learn about the problem
- Normalization
- Transformation
- Missing Values
- Outliers
- Data Splitting
- Features Engineering
- Estimating Performance
- Evaluation and Model
Selection
- Study models accuracy
- Work better than the naïve
approach or previous system
- Do the results make sense in
the context of the problem
![Page 12: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/12.jpg)
12
![Page 13: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/13.jpg)
13
![Page 14: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/14.jpg)
14
![Page 15: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/15.jpg)
15
![Page 16: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/16.jpg)
16
![Page 17: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/17.jpg)
17
![Page 18: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/18.jpg)
18
![Page 19: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/19.jpg)
Largest Cryogenics Installation• 50k I/O, 11k actuators, ~5k control loops
• Control:
• ~100 PLCs (Siemens, Schneider)
• ~40 FECs (industrial PCs)
• Supervision: 26 SCADA servers
19
Instrument/Actuators Total
Temperature [1.6 – 300 K]
Pressure [0 – 20 bar]
Level
Flow
Control valves
On/Off valves
Manual valves
Virtual flow meters
Controllers (PID)
10361
2300
923
2633
3692
1835
1916
325
4833
![Page 20: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/20.jpg)
Use Case: Faulty Cryogenics Valve Detection
• What is the objective?• Predict faulty valves before they actually fail
• How?• Valve receive an aperture order value (aperture order)
• Effective aperture realized by the valve (aperture measured)
• Analyzing the difference between both (S = aperture order - aperture measured)
20
![Page 21: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/21.jpg)
Faulty Cryogenics Valve Detection with R
• Signals used:• S = aperture order - aperture measured
• Features extractions based on S• Variance
• Percentile 99.9
• Rope distance – R(S)
• Noise Band – B(S) (Pxx be the power spectrum of the
signal S, from 0 to 0.5Hz, where S has been previously mean-centred).
• Automatic Faulty Valves Detection System• SVM - Support Vector Machine
21
![Page 22: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/22.jpg)
Faulty Cryogenics Valve Detection with R
22
![Page 23: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/23.jpg)
Our experience• Excellent for prototype potential solution or validate idea
• Fast development using standard CRAN packages such as CARET etc.
• Large number of models and statistic functions (+7500 packages) covering a wide range of fields
• Data Exploration
• Use the existing skills• R is widely use in the domain
• Move the data is very expensive• The data need to be extracted from DB and generate files CSV
• SQL, Java API, Custom Extraction Applications (Timber)
• Hard to deploy models in production and scale the solutions as the data grown• Data limited by memory size.
• Few packages but very limited scalability - the models themselves do not scale • Foreach, Snow, Rmpi, BatchExperiments package (BatchJobs)
![Page 24: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/24.jpg)
24
![Page 25: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/25.jpg)
![Page 26: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/26.jpg)
Why ORE? - ORE benefits• A database-centric environment for analytical processes in R
• Allows to use the database server to run R scripts (scalability & performance)
• Eliminate memory constraint of client R engine
• Transparency Layer• Transparently analyze and use data in Oracle Database through R
• Tables as R native data frames
• Enables users to take advantage of data-parallel and task-parallel execution through Oracle Database
USA - 27/01/2016 BIWA 16 26
![Page 27: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/27.jpg)
Cryo Valves – Parallel Features Extraction in ORE
Instrument/Actuators Total
Temperature [1.6 – 300 K]
Pressure [0 – 20 bar]
Level
Flow
Control valvesOn/Off valves
Manual valves
Virtual flow meters
Controllers (PID)
10361
2300
923
2633
36921835
1916
325
4833
93600 points per cycle (about 24 hours)
![Page 28: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/28.jpg)
Cryo Valves – Parallel Features Extraction in ORE
![Page 29: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/29.jpg)
Our experience• No need to move data
• It is faster to process it in-DB with ORE using the appropriate degree of parallelism
• DB nodes already prepared for the workload• Simplifies the infrastructure
• Write/adapt R code is straight forward• Thanks to transparency layer and embedded R execution
• Tables and Views as R dataframes
• Still problems scaling• Scalability determined by RAC installations
• Need to differentiate between production and analytics environments• Risk on affecting the production environment performance by running in-database analytics
• Analytics developments in-database• Risk on data security and resources competition
![Page 30: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/30.jpg)
![Page 31: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/31.jpg)
31
![Page 32: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/32.jpg)
![Page 33: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/33.jpg)
![Page 34: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/34.jpg)
![Page 35: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/35.jpg)
![Page 36: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/36.jpg)
![Page 37: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/37.jpg)
![Page 38: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/38.jpg)
CERN Accelerator Logging Service 2.0
38
• New Landscape bring new challenges
• Better Performance on bigger datasets
• Big Data queries: Impala, Spark SQL
• Leverage analytics capabilities
• Spark Analytics: Python, ML, R
• More heterogeneous data access models
0
100
200
300
400
500
600
700
200
82
00
82
00
82
00
92
00
92
01
02
01
02
01
02
01
12
01
12
01
22
01
22
01
22
01
32
01
32
01
42
01
42
01
42
01
52
01
52
01
5
Storage Evolution - Size in GB / day
Credit: BE-CO-DSQPS
![Page 39: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/39.jpg)
CERN Accelerator Logging Service
39
HDFS
Storage
Gobblin
HBase1min
Compactor
Schema PartitionProvider
Kafka
Speed
Batch
7 min
1min
7 min
CCDB
Log. Proc.
Log. Proc.
100mS
Credit: BE-CO-DS
![Page 40: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/40.jpg)
Cryo Valves – ORAAH
![Page 41: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/41.jpg)
Our experience• No need to move data – the analysis in done where the data is
• Access to database or Hive tables transparently
• Memory is not a problem anymore • The analysis is not anymore limited by the dataset size
• Simplifies the infrastructure – transparent use of hadoop technologies
• Write/adapt R code is straight forward• Same principals than ORE – no need to acquire a new set of skills
• Background use of Spark machine learning capabilities
• Limited functionality • When using ORAAH for Machine learning in scalable way the functionality is limited to
Spark Machine Learning libraries
• No as fast pace as Spark itself – Why do we need to wait
• Commercial VS open source
![Page 42: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/42.jpg)
Machine Learning with Spark
• Why Apache Spark for Machine Learning• No memory limitations
• Compatibility – Scala, R, Python
• General purpose• Not only machine learning also advanced data preparation, feature
engineering, parameter tuning and model selection etc.
• New Skills required• RDD-based API (spark.mllib)
• DataFrame-based API (spark.ml)
• Pipelines Concept (CARET package does in R)• Cross Validation
• Parameter tuning using parameter grid
• Fast Pace Evolution
![Page 43: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/43.jpg)
Machine Learning with Spark
• Cross Validation• Repeat the construction of the model on different subsets of the available training data
and then evaluate the model only on data not seen during training
![Page 44: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/44.jpg)
Machine Learning with Spark
• Model Tuning• ML models have several parameter
• there is no analytics formula to calculate appropriate values
• These parameters control the complexity of the model
• bad performance
• over-fitting
• etc.
![Page 45: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/45.jpg)
Cryo Valves – Spark
![Page 46: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/46.jpg)
![Page 47: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/47.jpg)
![Page 48: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/48.jpg)
Machine Learning with TensorFlow
• Why TensorFlow for Machine Learning• Spark machine learning capabilities are really limited
• Number of models
• Customization capabilities
• Overcome in term of performance any of the previous technologies• Spark is slow on training models
• State-of-the-art algorithms available • Deep-learning
• New skill need to be understood• Tensor concept
• Model freedom comes with a price• Coding
![Page 49: What's New for Machine Learning with Oracle Database and ...€¦ · • R –prototyping and validating ideas • First Scalable attend –Oracle R Enterprise • Hadoop and the](https://reader033.vdocuments.us/reader033/viewer/2022053023/6053df6996f8bd27831432de/html5/thumbnails/49.jpg)
Machine Learning with TensorFlow+Spark
• No memory limitations
• Bigger than memory datasets treated transparently
• Parallelization
• Tensorflow profit from Spark partitioning concepts to improve the
user control over parallelization