data analytics with matlab - mathworksdevelop advanced analytics with machine learning –advanced...
TRANSCRIPT
1© 2015 The MathWorks, Inc.
Data Analytics with MATLAB
Dr. Jan Eggers
MathWorks
June 9, 2015
2MPG Acceleration Displacement Weight Horsepow er
MP
GA
ccele
ratio
nD
ispla
cem
ent
Weig
ht
Hors
epow
er
50 1001502002000 4000200 40010 2020 40
50
100
150
200
2000
4000
200
400
10
20
20
40
From Data to Decisions & Design
Observation Organization UnderstandingDecisions &
Design
PhysicalSensors
Data Information Knowledge Action
0 20 40 60 80 100 120 140 160 180 200
0.5
0.6
0.7
0.8
0.9
1
time secs
active p
ow
er
per-
unit
NN
measured
3
Key Takeaways
Access and preprocess large amounts of heterogeneous data
Develop advanced analytics with machine learning
Integrate analytics with your enterprise systems
4
Agenda
Data GoalTechniques
Explore
Prototype
Scale
Access Share/Deploy
Advanced
Statistics
Machine
Learning
Predictive
Modelling
Decision
Making
Volume
Variety
Velocity
5
Big Data Capabilities in MATLAB
Memory and Data Access
64-bit processors
Memory Mapped Variables
Disk Variables
Databases
Datastores
Platforms
Desktop (Multicore, GPU)
Clusters
Cloud Computing (MDCS on EC2)
Hadoop
Programming Constructs
Streaming
Block Processing
Parallel-for loops
GPU Arrays
SPMD and Distributed Arrays
MapReduce
6
DataStore
datastoreImport text files & collections of text files
that don’t fit into memory
ds = datastore('file1.mat');
ds = datastore('*.csv');
ds = datastore('/shared/data_repository/');
ds = datastore('hdfs://myserver:7867/data/file1.txt');
ds = datastore({'/shared01/','/shared02/'});
while hasdata(ds)
T = read(ds);
end
7
1503 UA LAX -5 -10 2356
540 PS BUR 13 5 186
1920 DL BOS 10 32 1876
1840 DL SFO 0 13 568
272 US BWI 4 -2 359
784 PS SEA 7 3 176
796 PS LAX -2 2 237
1525 UA SFO 3 -5 1867
632 PS SJC 2 -4 245
1610 UA MIA 60 34 1365
2032 DL EWR 10 16 789
2134 DL DFW -2 6 914
1503 UA LAX -5 -10 2356
540 PS BUR 13 5 186
1920 DL BOS 10 32 1876
1840 DL SFO 0 13 568
272 US BWI 4 -2 359
784 PS SEA 7 3 176
796 PS LAX -2 2 237
1525 UA SFO 3 -5 1867
632 US SJC 2 -4 245
1610 UA MIA 60 34 1365
2032 DL EWR 10 16 789
2134 DL DFW -2 6 914
UA
PS
DL
DL
2356
186
1876
568
US
PS
PS
UA
US
UA
DL
DL
245
1365
789
914
359
176
237
1867
UA 2356
PS 186
PS 237
UA 1867
UA 1365
DL 1876
DL 914
US 359
US 245
Data Store Map Reduce
MapReduce
maxDelay = mapreduce(ds, @maxDistMapper, @maxDistReducer);
8
Datastore
HDFS
Reduce
Node
Node
Node Data
Data
Data
Map
ReduceMap
ReduceMap
Map Reduce
Map
Map
Reduce
Reduce
A Big Data Platform
9
Advanced
Statistics
Machine
Learning
Predictive
Modelling
Decision
Making
Volume
Variety
Velocity
Agenda
Data GoalTechniques
Explore
Prototype
Scale
Access Share/Deploy
10
Machine Learning
Machine learning uses data and produces a program to perform a task
Standard Approach Machine Learning Approach
𝑚𝑜𝑑𝑒𝑙 = <𝑴𝒂𝒄𝒉𝒊𝒏𝒆𝑳𝒆𝒂𝒓𝒏𝒊𝒏𝒈𝑨𝒍𝒈𝒐𝒓𝒊𝒕𝒉𝒎
>(𝑠𝑒𝑛𝑠𝑜𝑟_𝑑𝑎𝑡𝑎, 𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦)
Computer
Program
Machine
Learning
𝑚𝑜𝑑𝑒𝑙: Inputs → OutputsHand Written Program Formula or Equation
If X_acc > 0.5
then “SITTING”
If Y_acc < 4 and Z_acc > 5
then “STANDING”
…
𝑌𝑎𝑐𝑡𝑖𝑣𝑖𝑡𝑦= 𝛽1𝑋𝑎𝑐𝑐 + 𝛽2𝑌𝑎𝑐𝑐+ 𝛽3𝑍𝑎𝑐𝑐 +
…
Task: Human Activity Detection
11
Machine Learning Techniques
Machine
Learning
Supervised
LearningClassification
Regression
Unsupervised
LearningClustering
Group and interpretdata based only
on input data
Develop predictive model based on bothinput and output data
Type of Learning Categories of Algorithms
12
Apply Machine Learning techniques easily
Data:
3-axial Accelerometer data
3-axial Gyroscope data
Machine
Learning
13
Data Analytics Workflow
Work on your desktop
Start “simple”
Basic statistics
Explore data
ExploreAccess Share/Deploy
Start locally …
14
Data Analytics Workflow
… prototype algorithms and then …..
Explore
Prototype
Access Share/Deploy
Work on your desktop
Interactive development
Advanced algorithms
15
Data Analytics Workflow
Scale to a cluster
… scale up as needed
Explore
Prototype
Scale
Access Share/Deploy
Parallel Computing Tools
16
Advanced
Statistics
Machine
Learning
Predictive
Modelling
Decision
Making
Volume
Variety
Velocity
Agenda
Data GoalTechniques
Explore
Prototype
Scale
Access Share/Deploy
17
A Primer on Deploying MATLAB Programs
Excel®
add-ins
Desktop
MATLABProductionServer(s)
WebServer(s)
Web &
Enterprise
• Royalty-free
• Encryption to protect intellectual property
18
Benefits of Deploying MATLAB Code
Domain experts maintain ownership of ideas, algorithms, and applications
Flexibility to integrate with different programming languages
Implement a common algorithm on different platforms
Avoid time consuming and error prone re-coding
Easily adopt algorithm improvements throughout lifecycle
19
Predictive Data Analytics – Load Demand Forecasting
20
Big Data and Predictive Analytics at Shell
21
Link to user story
STIWA Increases Total Production Output of Automation Machinery
Challenge
Apply sophisticated mathematical methods to optimize
automation machinery and increase total production output
Solution
Use AMS ZPoint-CI to collect large production data sets in
near real time and use MATLAB to analyze the data and
identify optimal trajectories
Results
Total cycle time reduced by 30%
Large data sets analyzed in seconds
Deployment to multiple machines streamlined
“Our shopfloor management system AMS
ZPoint-CI collects a huge amount of
machine, process, and product data 24 hours
a day. By analyzing this data immediately in
MATLAB and AMS Analysis-CI we have
achieved a tenfold increase in precision, a
30% reduction in total cycle time, and a
significant increase in production output.”
Alexander Meisinger
STIWA
STIWA’s shopfloor management
system, based on MATLAB, AMS
ZPoint-CI, and AMS Analysis-CI.
22
Key Takeaways
Access and preprocess large amounts of heterogeneous data
– Capabilities to deal with big data are available and evolve
– Tools to organize data and automate the process
Develop advanced analytics with machine learning
– Advanced statistical and machine learning methods to gain insights
– Apps to rapidly iterate through and assess different models
Integrate analytics with your enterprise systems
– Parallel Computing and Map Reduce to scale up as needed
– Application deployment on every scale to make models available to others