best practices machine learning final
TRANSCRIPT
![Page 1: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/1.jpg)
© 2013 Datameer, Inc. All rights reserved.
Best Practices for Big Data Analytics with Machine Learning
![Page 2: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/2.jpg)
Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.
About our Speakers
Dr. Alex GuazzelliZementis Vice President, Analytics (@DrAlexGuazzelli)
![Page 3: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/3.jpg)
• Came from Infomatica• Worked with start-ups• Infomatica purchased to bring data
solutions to market• Data quality• Master data management • B2B
• Data security solutions
About our Speakers
• Over 15 years of enterprise software experience
• Co-authored 4 patents
• Worked in a variety of engineering, marketing and sales roles
• Bachelors of Science degree in Management Science and Engineering from Stanford University
Karen HsuDatameer Senior Director, Product Marketing (@Karenhsumar)
![Page 4: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/4.jpg)
Agenda• Considerations
• Best Practices
• Demonstration
• Q&A
![Page 5: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/5.jpg)
© 2013 Datameer, Inc. All rights reserved.
Considerations
![Page 6: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/6.jpg)
Considerations
Target Users
Questions
Business IT
Descriptive Predictive Prescriptive
Data Scientist
![Page 7: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/7.jpg)
■Visual
BusinessProfessional
Clustering
Decision Trees
Dependencies
+ More!
Target Users
![Page 8: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/8.jpg)
IT
▪Flexible, powerful
Target Users
![Page 9: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/9.jpg)
▪Algorithms▪SAS, SPSS, R
Data Scientist
Target Users
![Page 10: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/10.jpg)
■Descriptive machine learning…– Tells you what has happened
Descriptive Predictive Prescriptive
Questions
![Page 11: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/11.jpg)
■Predictive machine learning…– Answers the question what will happen
Descriptive Predictive Prescriptive
Questions
![Page 12: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/12.jpg)
■Prescriptive machine learning…– What will happen, when it will happen,
why it will happen– Predict what will happen and prescribe
how to take advantage of this future
Descriptive Predictive Prescriptive
Questions
![Page 13: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/13.jpg)
© 2013 Datameer, Inc. All rights reserved.
Best Practices
![Page 14: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/14.jpg)
Lean Analytics
![Page 15: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/15.jpg)
Data Preparation
![Page 16: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/16.jpg)
Descriptive Analytics
![Page 17: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/17.jpg)
Predictive Analytics
Descriptive vs. Predictive Analytics
Descriptive Analytics answers “What happened?” Predictive Analytics answers “What will happen next?”
Predictive Analytics helps you discover patterns in the past, which can signal what is ahead.
Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data.
??
![Page 18: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/18.jpg)
Example: Predicting Churn
Matt - Churned 2 days ago
Scott - “Liked” our company last week
John - ??
![Page 19: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/19.jpg)
Churn-related featuresMatt3 complaints in last 6 monthsOpened 2 support tickets in last 4 weeksSpent a total of $1,234 buying merchandiseSpent a total of $123 in servicesPurchased 2 items in last 4 weeks Is 34 years oldIs a maleLives in Los Angeles...
ScottNo complaints in last 6 monthsOpened 1 support ticket in last 4 weeksSpent a total of $9,876 buying merchandiseSpent a total of $987 in servicesPurchased 12 items in last 4 weeks Is 54 years oldIs a maleLives in Chicago...
![Page 20: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/20.jpg)
Big Data An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
90% of the data todaycreated in last 2 years
Breadth and Depth
Transaction recordsSocial mediaClimate informationMobile GPS signalsHealthcareSmart Grid Digital Breadcrumbs
![Page 21: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/21.jpg)
Churn-related “Big Data” featuresMatt12 friends listed as customers2 complaints from friends in last 6 monthsAverage age of friends is 41 years old2 friends churned in last 30 daysNo purchases for same items as friends1 website visit in last 7 days2 website pages opened during last visitOpened 3 newsletters in last 6 months...
Scott34 friends listed as customers1 complaint from friends in last 6 monthsAverage age of friends is 62 years oldNo friends churned in last 30 daysPurchased same 2 items as friends in last 2 months3 website visits in last 7 days5 website pages opened during last visitOpened 12 newsletters in last 6 months...
![Page 22: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/22.jpg)
PredictiveModel
Building a predictive model ...Model Training
Churn-relatedfeatures
ChurnedNot-churned
Data Prediction
HiddenLayer
InputLayer
OutputLayer
Neural NetworksLinear/Logistic RegressionSupport Vector MachinesScorecardsDecision TreesClusteringAssociation RulesK-Nearest NeighborsNaive Bayes Classifiers...
![Page 23: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/23.jpg)
Why not several models?
Model Ensemble
Data Pre-Processing
Raw Inputs
Prediction
Scores from all models are computed
Majority Voting, Weighted Voting,
Weighted Average, etc.
Model 1
Model 2
Model n
Voting
...
![Page 24: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/24.jpg)
End Goal: Predicting churn ...
Model Deployment and Execution in
ChurnRisk
ScoreChurn-related
Features
Big Data
PredictiveChurnModel
![Page 25: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/25.jpg)
ProductionEnvironment
Scientist’s Desktop
SAS, R, IBM SPSS, Perl,
Python
Java, .NETC, SQL
Lost in Translation
From Model Building to Model Deployment(Traditionally ...)
SAS, R, IBM SPSS …
Great for model building but not for scoring, even more so when it comes
to Hadoop
![Page 26: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/26.jpg)
From Model Building to Model Deployment (with PMML)
Model Building Model Deployment and Execution Angoss
BigML
FICO Model Builder
IBM SPSS
KNIME
KXEN
Microstrategy
Open Data
Pervasive DataRush
RapidMiner
R / Rattle
SAS
SAP Business Objects
Salford Systems
StatSoft STASTISTICA
SQL Server
TIBCO Spotfire
Custom Code, etc.
Universal PMML Plug-in (UPPI)
PMML(models)
PMML(models)
PMML(models)PMML
Datameer Server
Deploy in minutes ...
![Page 27: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/27.jpg)
PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
PMML eliminates need for custom model deployment and ensures reliability.
PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)
Predictive Model Markup Language
Models
DataTransformations
![Page 28: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/28.jpg)
Neural Networks (neural gas, radial-basis and backpropagation)
Support Vector Machines (for classification and regression)
Naive Bayes Classifier (for continuous and categorical inputs)
Rule Set Models
Clustering Models (2-step clustering, distribution and center-based)
Decision Trees (for classification and regression)
General Regression Models (Cox, General and Generalized Linear Models)
Regression Models (Linear, Logistic and Polynomial Regression Models)
Scorecards (with support for Reason Codes)
Restricted Boltzmann Machines
Association Rules
Multiple Models (with the possibility of having models spread over multiple PMML files)
Model Ensemble (including Random Forest Models and Boosted Trees)
Model Segmentation
Model Chaining
Model Composition
Model Cascade
UPPI: Supported Techniques
© Zementis, Inc. - Confidential
![Page 29: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/29.jpg)
Demonstration Flow
DescriptivePredictiveModeling
PrescriptivePredictiveProduction
Karen Alex KarenKaren
![Page 30: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/30.jpg)
© 2013 Datameer, Inc. All rights reserved.
Descriptive Analytics
![Page 31: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/31.jpg)
Descriptive Analytics▪Answers: What caused people to
churn?
▪Clustering▪Column Dependencies▪Decision Tree
![Page 32: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/32.jpg)
Demonstration Flow
DescriptivePredictiveModeling
PrescriptivePredictiveProduction
Karen Alex KarenKaren
![Page 33: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/33.jpg)
© 2013 Datameer, Inc. All rights reserved.
Predictive Analytics
![Page 34: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/34.jpg)
Predictive Analytics▪Who will churn?
![Page 35: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/35.jpg)
Demonstration Flow
DescriptivePredictiveModeling
PrescriptivePredictiveProduction
Karen Alex KarenKaren
![Page 36: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/36.jpg)
© 2013 Datameer, Inc. All rights reserved.
Prescriptive Analytics
![Page 37: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/37.jpg)
Prescriptive Analytics▪Who will churn? Why will they churn?▪What can we do to support that
outcome?
![Page 38: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/38.jpg)
Demonstration Flow
DescriptivePredictiveModeling
PrescriptivePredictiveProduction
Karen Alex KarenKaren
![Page 39: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/39.jpg)
Q&A
![Page 40: Best practices machine learning final](https://reader037.vdocuments.us/reader037/viewer/2022110118/5555c507d8b42afe5d8b547e/html5/thumbnails/40.jpg)
Next Steps:
Page 40
More about Datameer and Big Datawww.datameer.com
More about Zementiswww.zementis.com
Contact us:Alex Guazzeli [email protected] Hsu [email protected]