open data @odsc science conference · some quick definitions •machine learning (ml) =...
Post on 13-Jul-2020
0 Views
Preview:
TRANSCRIPT
@ODSC
OPEN
DATA
SCIENCE
CONFERENCEBoston | May 1 - 4 2018
Principal Data Scientist
Booz Allen Hamilton
http://www.boozallen.com/datascience
Kirk Borne@KirkDBorne
A Tour of Machine Learning Algorithms:The Usual Suspects in Some Unusual Applications
Booz | Allen | HamiltonODSC Conference – Boston – May 2018
OUTLINE
• Goals of Machine Learning Applications
• Some Machine Learning Algorithms
• Some Atypical Applications
• Final Thoughts
3
@KirkDBorne
OUTLINE
• Goals of Machine Learning Applications
• Some Machine Learning Algorithms
• Some Atypical Applications
• Final Thoughts
4
@KirkDBorne
The Goal of Machine Learning
“…is to use algorithms to learn from data,
in order to build generalizable models that
give accurate classifications or predictions,
or to find patterns, particularly with new
and previously unseen data.”
(the key is GENERALIZATION!)
https://www.innoarchitech.com/machine-learning-an-in-depth-non-technical-guide/
5
6
How does a Data Scientist build amodel of a complex dynamic system?
6
7
We might start by modeling a complex system like this…
7
8
We can add more features to model the system with higher fidelity …
8
Overfitting = bad machine learning!
http://www.holehouse.org/mlclass/10_Advice_for_applying_machine_learning.html
9
Generalization is key!
(The Goldilocks model)
Overfitting = bad machine learning!
d(x)
g(x) is a poor fit (one line through all points) = Underfitting (bias)
h(x) is a good fit (takes into account the natural variance in the data)
d(x) is a very poor fit (every point) = Overfitting (natural variance)
overfitting
10
Bias–Variance Tradeoff in Model Complexity
11http://scott.fortmann-roe.com/docs/BiasVariance.html
Generalization Error
precisely right vs. generally right
precisely wrong vs. generally wrong
Schematic Approach to Avoiding Overfitting
Error
Training Epoch
Validation Set error
Training
Set error
To avoid overfitting, you
need to know when to stop
training the model.
Although the Training Set
error may continue to
decrease, you may simply be
overfitting the Training Data.
Test this by applying the
model to Validation Data Set
(not part of Training Set).
If the Validation Data Set
error starts to increase,
then you know that you are
overfitting the Training Set
and it is time to stop!
STOP Training HERE !
Generalization is key!12
Some Quick Definitions• Machine Learning (ML) = mathematical
algorithms that learn from experience(= pattern recognition from previous evidence).
• Data Mining = application of ML algorithms to data.
• Artificial Intelligence (AI) = application of ML algorithms to robotics and machines = taking actions based on data ( #bots ).
• Data Science = application of scientific method to discovery from data (including statistics, ML, and more: visual analytics, machine vision, computational modeling, semantics, graphs, network analysis, NLU, data indexing schemes [Google!], …).
• Analytics = the products of machine learning & data science.13
“To back-propagate is A.I., but to know is Intelligence!”
It’s all about taking “Data to Action”Example: a Decision Tree …
https://eight2late.wordpress.com/2016/02/16/a-gentle-introduction-to-decision-trees-using-r/
14
An “Easy Button” for Extracting Value from Data through Machine Learning
Exploiting the Data Value Chain: transform Digital Data to Information to Knowledge to Insights (and Action)
✓ From Sensors (Measurement & Data Collection) …
… Big Data (Deep, Fast, Wide)
✓ to Sentinels (Monitoring & Alerts = Information) …
… Machine Learning
✓ to Sense-making (Knowledge & Insight Discovery) …
… Data Science
✓ to Cents-making (Your Applications of Data = Action!)
… Analytics
… Productizing / Monetizing your Big Data
15
Typical Machine Learning Applications
in our lives :
Your Purchase Preferences, Recommender Systems,
Credit Scoring, Smart Phone auto-complete …
Your Thermostat, Your Commute Time and Routing,
Personalized Learning …
Your Health Issues (wearables), Your Best Deal
(Bed & Breakfast or Restaurant) …
Your Social Sentiment, Identify Theft,
Credit Card Fraud …
16
PREDICT
OPTIMIZE
DISCOVER
DETECT
Typical Machine Learning Applications
in the Enterprise :
Predict outcomes, events, prices, costs, risks,
product demand …
Optimize processes, products, and people
(delivery of services, supplies, personnel) …
Discover insights in social media, documents,
quarterly business reports, customer call records...
Detect fraud, anomalies in safety events,
behaviors, outbreaks, data usage (GDPR),
systems (cybersecurity breaches) …
17
PREDICT
OPTIMIZE
DISCOVER
DETECT
4 Types of Machine Learning Discovery from Data:
18
(Graphic by S. G. Djorgovski, Caltech)
1) Class Discovery: Find the categories of objects (population segments), events, and behaviors in your data. + Learn the rules that constrain the class boundaries (that uniquely distinguish them).
2) Correlation (Predictive and Prescriptive Power) Discovery: (insights discovery) – Find trends, patterns, dependencies in data that reveal the governing principles or behavioral patterns (the object’s “DNA”).
3) Outlier / Anomaly / Novelty / Surprise Discovery: Find the new, surprising, unexpected one-in-a-[million / billion / trillion] object, event, or behavior.
4) Association (or Link) Discovery: (Graph and Network Analytics) – Find both the typical (usual) and the atypical (unusual, interesting) data associations / links / connections in your domain.
5 Levels of Analytics Maturity
in Data-Driven Applications1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
19
5 Levels of Analytics Maturity
in Data-Driven Applications1) Descriptive Analytics
– Hindsight (What happened?)
2) Diagnostic Analytics
– Oversight (real-time / What is
happening? Why did it happen?)
3) Predictive Analytics
– Foresight (What will happen?)
4) Prescriptive Analytics
– Insight (How can we optimize what happens?) (Follow the dots / connections in
the graph!)
5) Cognitive Analytics– Right Sight (the 360 view , what is the right
question to ask for this set of data in this
context? = Game of Jeopardy)
– Finds the right insight, the right action, the right decision,… right now!
– Moves beyond simply providing answers, to generating new questions and hypotheses.
20
PREDICTIVE
Analytics
Find a function (i.e., the model) f(d,t) that
predicts the value of some predictive
variable y = f(d,t) at a future time t, given
the set of conditions found in the training
data {d}.
=> Given {d}, find y.
PRESCRIPTIVE
Analytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a future time t, using the previously learned
conditional dependencies among the
variables in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:
What’s the Difference?
21
PREDICTIVE
Analytics
Find a function (i.e., the model) f(d,t) that
predicts the value of some predictive
variable y = f(d,t) at a future time t, given
the set of conditions found in the training
data {d}.
=> Given {d}, find y.
PRESCRIPTIVE
Analytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a future time t, using the previously learned
conditional dependencies among the
variables in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:
What’s the Difference?
22
Confucius says…
“Study your past to know
your future”
PREDICTIVE
Analytics
Find a function (i.e., the model) f(d,t) that
predicts the value of some predictive
variable y = f(d,t) at a future time t, given
the set of conditions found in the training
data {d}.
=> Given {d}, find y.
PRESCRIPTIVE
Analytics
Find the conditions {d’} that will produce a
prescribed (desired, optimum) value y at a future time t, using the previously learned
conditional dependencies among the
variables in the predictive function f(d,t).
=> Given y, find {d’}.
Predictive vs Prescriptive:
What’s the Difference?
23
Confucius says…
“Study your past to know
your future”
Baseball philosopher Yogi Berra says…
“The future ain’t what it
used to be.”
24Source for graphic: https://data-flair.training/blogs/machine-learning-applications/
Predictive Analytics is currently the most significant application of Machine Learning (*)
(*) The set of mathematical algorithms that learn (patterns) from experience (data)
25Source for graphic: https://www.altexsoft.com/blog/datascience/machine-learning-strategy-7-steps/
Predictive Analytics is everywhere in Business Data and Machine Learning (AI) Strategy Discussions
Traditional Time Series Forecasting:Prediction based on historical patterns
Source: https://medium.com/99xtechnology/time-series-forecasting-in-machine-learning-3972f7a7a467
26
Traditional Time Series Forecasting:Autoregressive (uncertainty in prediction can be large)
Source: https://peltiertech.com/excel-fan-chart-showing-uncertainty-in-projections/
Un
cert
ain
ty!
27
Traditional Time Series Forecasting:Autoregressive (assumes future time series values
depend on the past values from the same series)
Source: http://ucanalytics.com/blogs/step-by-step-graphic-guide-to-forecasting-through-arima-modeling-in-r-manufacturing-case-study-example/
28
Traditional Time Series Forecasting:Even with very high-fidelity physics-based models,
uncertainty in prediction can be large!
Source: https://www.reddit.com/r/weather/comments/6xecax/tracking_hurricane_irma/ 29
30
Source for image: https://www.hausmanmarketingletter.com/translating-analytics-to-action/
Advances in Predictive, Prescriptive,and Cognitive Analytics provide us with
More Ways to See Around Corners
“You can see a lot by just looking”
(and you can see around corners!)
Cognitive, Contextual, Insightful, Forecastful
31https://www.speedcafe.com/2017/07/12/f1-demo-take-place-london-streets/
The best forecasting methodology is not autoregressive and univariate y = f(t) …
Rather, it is context-based (using rich contextual data)The Internet of Things (IoT) will provide rich contextual metadata (“other
data about your data”) for seeing around corners and for better forecasting!
IoT will power The Internet of Context!
Source for graphic: https://www.iotcentral.i o/bl og/direct-integration- between-the- physical- world-computer-system 32
OUTLINE
• Goals of Machine Learning Applications
• Some Machine Learning Algorithms
• Some Atypical Applications
• Final Thoughts
33
@KirkDBorne
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment34
OUTLINE
• Goals of Machine Learning Applications
• Some Machine Learning Algorithms
• Some Atypical Applications
• Final Thoughts
35
@KirkDBorne
Machine Learning Algorithms0) Counting
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment36
0
“All models should be as simple as possible, but no simpler!” – A.Einstein
0) Counting!
Remember…
37
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment38
1
39
PCA vs ICA
Initial impression is that the data are extended in only one direction (one principal component)
40
PCA vs ICA
Initial impression is that the data are extended in only one direction (one principal component)
41
Initial impression is that the data are extended in only one direction (one principal component)
PCA vs ICABut, there are
2 independent correlations here
… hence there are
2 signal sources!
Cocktail Party
Problem: example
of Class Discovery using ICA
(IndependentComponent Analysis:
Blind Source Separation)
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment42
2
Trend Lines in big data sets: Descriptive Analytics! It is tempting to over-fit every wiggle in the data.
-1000
0
1000
2000
3000
4000
5000
6000
7000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Bo
ilin
g P
oin
t
Melting Point
92 Naturally Occurring Elements
43
This is a better fit to the trend line… (generalization!) for use in Predictive Analytics!
0
1000
2000
3000
4000
5000
6000
7000
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Bo
ilin
g P
oin
t
Melting Point
92 Naturally Occurring Elements
44
Trend Line and Outliers:
where is the real discovery?
Sometimes we are tempted to think that
outliers are just noise or
natural variance.
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
45
Trend Line and Outliers:
where is the real discovery?
Sometimes we are tempted to think that
outliers are just noise or
natural variance.
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
46
Trend Line and Outliers:
where is the real discovery?
Sometimes we are tempted to think that
outliers are just noise or
natural variance.
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
47
Trend Line and Outliers:
Add some context to the data!
…that diagonal line in
the plot (where melting
point = boiling point)
… this provides some
context (related to your
prior knowledge)!0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
48
Trend Line and Outliers:
Add some context to the data!
…that diagonal line in
the plot (where melting
point = boiling point)
… this provides some
context (related to your
prior knowledge)!0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
49
Trend Line and Outliers:
What is that point below the line?
…that diagonal line in
the plot (where melting
point = boiling point)
… this provides some
context (related to your
prior knowledge)!0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
50
Trend Line and Outliers: there’s the
real discovery!
Arsenic:Melts @ 1089
oK
Boils @ 889oK
0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
51
Trend Line and Outliers: there’s the
real discovery!
Arsenic:Melts @ 1089
oK
Boils @ 889oK
Arsenic! 0
1000
2000
3000
4000
5000
6000
7000
0 1000 2000 3000 4000 5000
Bo
ilin
g P
oin
t
Melting Point
Boiling Points and Melting Points
of the 92 Chemical Elements
52
Novelty Discovery!
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment53
3
(Graphic by Cray, for Cray Graph Engine CGE)
http://www.cray.com/products/analytics/cray-graph-engine
“All the World is a Graph” – Shakespeare?
The natural data structure of the world is not rows and columns, but a Graph!
The Human Connectome Project: mapping and linking the major pathways in the brain.http://www.humanconnectomeproject.org
54
Producing A Smarter Data Narrative• It is best when we understand our data’s context and meaning…
• … the Semantics! This is based on Ontologies.
• My students memorized the definition of an Ontology…
–“is_a formal, explicit specification of a shared conceptualization.”from Tom Gruber (Stanford)
• Semantic “facts” can be expressed in a database as RDF triples:
{subject, predicate, object} = {noun, verb, noun}
55
Simple Example of the Power of Graph:
Semi-Metric Space
• Entity {1} is linked to Entity {2} (small distance A)
• Entity {2} is linked to Entity {3} (small distance B)
• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)
• Similarity Distances between A, B, and C violate the triangle inequality!
{1} {3}{2}
56
• Entity {1} is linked to Entity {2} (small distance A)
• Entity {2} is linked to Entity {3} (small distance B)
• Entity {1} is *not* linked directly to Entity {3} (Similarity Distance C = infinite)
• Similarity Distances between A, B, and C violate the triangle inequality!
• The connection between black hat entities {1} and {3} never appears explicitly
within a transactional database.
• Examples: (a) Medical Research Discoveries across disconnected journals,
through linked semantic assertions; (b) Customer Journey modeling; (c) Safety
Incident Causal Factor Analysis; (d) Marketing Attribution Analysis; (e) Fraud
networks, Illegal goods trafficking networks, Money-Laundering networks.
{1} {3}{2}
Simple Example of the Power of Graph:
Semi-Metric Space
57
analytics.gmu.eduCDDA Spring 2014 Workshop
Research Example: Literature-Based Discovery (LBD)
58
References:• https://www.sciencedirect.com/science/article/pii/S0950705116303860• https://summerofhpc.prace-ri.eu/introducing-lbdream-and-literature-based-discovery/
analytics.gmu.eduCDDA Spring 2014 Workshop
Research Example: Discovery in the
NIH-NLM Semantic MEDLINE Database
Project Description: Conduct semantic graph mining of the NIH-NLM metadata repository from ~26 million medical research articles.
Graph Database: ~90 million RDF triples (predications; semantic assertions).
Research Project: (PhD dissertation at GMU) Novel subgraph discovery; Context-based
discovery; New concept emergence in medical
research; Story discovery in linked graph network; and Hidden knowledge discovery through semi-metrics.
59https://skr3.nlm.nih.gov/SemMedDB/
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment60
4
Clustering = the process of partitioning a set of data into subsets
(segments or clusters) such that a data element belonging to any
chosen cluster is more similar to data elements belonging to
that cluster than to data elements belonging to other clusters.
= Group together similar items + separate the dissimilar items
= Identify similar characteristics, patterns, or behaviors
among subsets of the data elements.
Challenge #1) No prior knowledge of the number of clusters.
#2) No prior knowledge of semantic meaning of the clusters.
#3) Different clusters are possible from the same data set!
#4) Different clusters are possible using different similarity metrics.61
62
Types of Clustering
In general terms, there are two approaches to clustering:
Partitional – One set of clusters is created (e.g., K-Means clustering – choose K, the number
of clusters)
Hierarchical – Nested sets of clusters are created sequentially.
62
Example of Hierarchical Clustering
63
Starting with (a),then going to (e):Bottom-up,AgglomerativeClustering Starting with (e),
then going to (a):Top-down,Divisive Clustering
The “Google Maps” view for your Data Space
63
Hierarchical Clustering Methods
Clusters are created at multiple levels –creating a new set of clusters at each level.
There are 2 types of hierarchical clustering:
Agglomerative Clustering Bottom-Up
Initially, each item is in its own cluster.
Then, clusters are merged together iteratively ...
... based upon similarity of data items.
Divisive Clustering Top-Down
Initially, all items are in one cluster.
Then, large clusters are successively divided ...
... based upon distance between data items.
Segmentation of One =
‘SegOne’ Marketing
Marketing CampaignSegments =Customer Personas
64
Cluster Evaluation
65
66
Remember that Clustering has lots of issues!
The number of clusters is not known
There might not exist a “correct” number of clusters
Results depend on which attributes are selected
Results depend on the choice of distance/similarity metric
Therefore, there is no “correct” set of clusters.
So, how do you know what is a good set of clusters?
66
How to know if your clusters are good enough
Reference: http://www.biomedcentral.com/content/supplementary/1471-2105-9-90-S2.pdf
You know the clusters are good …
… if the clusters are compact relative to their separation
… if the clusters are well separated from one another
… the “within cluster” errors are small (low variance within)
… if the number of clusters is small relative to the number of data points
Various measures of cluster compactness exist, including the Dunn index , C-index, and the DBI (Davies-Bouldin Index)
67
Application of Davies-Bouldin Index
Assume K (the number of clusters) and assume other things (choice of clustering algorithm; the
choice of clustering feature attributes; etc.)
Measure DBI
Test another set of values for the cluster input parameters (K, feature attributes, etc.)
Measure DBI
… continue iterating like this until you find the set of
cluster input parameters that yields the best (minimum) value for DBI.
68
Scientific Discovery from
Cluster Analysis of data
parameters from events on
the Sun and around the Earth
Cluster Analysis:Find the clusters, then Evaluate them
D- B
Ind
ex
Delay (hr) of Dst from Vsw and Bz
DBI for Dst_Vsw_Bz
0.8
0.85
0.9
0.95
1
1.05
1.1
1.15
1.2
0 1 2 3 4 5 6 7 8 9 10 11 12
Time Shift
DB
I
2C DBI
3C DBI
4C DBI
Average
Figure 10. Davies-Bouldin index for various time delays of Dst from Vsw and Bz for cases of 2 (blue), 3 (red), 4 (yellow) clusters, and the overall average (purple), indicating an optimal delay of
~2-3 hours for Dst.
Good Clusters =
Small Size relative to
Cluster Separation.
DISCOVERY! ...
Solar wind events
have the strongest
association (i.e., the
tightest clusters) with
the space plasma
events within the
Earth’s magnetosphere
about 2-4 hours after
a major plasma outburst
occurs on the Sun.
70
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment71
5
4 Examples of Big Data Association Mining:
The goal of Rec Sys algorithms is Diversity!
72
Classic Textbook Example of Data Mining (Legend?): Data
mining of grocery store logs indicated that men who buy
diapers also tend to buy beer at the same time.
Example #1
73
Amazon.com mines its customers’ purchase logs to
recommend books to you: “People who bought this book also
bought this other one.”
Example #2
74
Netflix mines its video rental history database to recommend
rentals to you based upon other customers who rented similar
movies as you.
Example #3
75
Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many of {one particular product}
compared to everything else.
Example #4
76
Wal-Mart studied product sales in their Florida stores in 2004
when several hurricanes passed through Florida.
Wal-Mart found that, before the hurricanes arrived, people
purchased 7 times as many strawberry pop tarts compared
to everything else.
Example #4
77
Strawberry pop tarts???
http://www.nytimes.com/2004/11/14/business/yourmoney/14wal.htmlhttp://www.hurricaneville.com/pop_tarts.html
http://bit.ly/1gHZddA78
Association Rule Mining forHurricane Intensification Prediction
• Research by GMU geoscientists
• Predict the final strength of hurricane at landfall.
• Find co-occurrence of final hurricane strength with specific values of measured physical properties of the hurricane while it is still over the ocean.
• Result: the association mining model prediction is better than National Hurricane Center prediction!
• Research Paper by GMU scientists: https://ams.confex.com/ams/pdfpapers/84949.pdf
79
A Statistical Data Puzzle
80
The Island of Games Puzzle:Can you find a pattern in the player ratings data?
http://www.datasciencecentral.com/profiles/blogs/island-of-games-puzzle-problem-statement
81
Solution to the Island of Games Puzzle
82
Island of Games Color-coding the player ratings data distribution
Green and Red = High Rubik’s Cube ranking (> 0.5) ; Blue and Yellow = Low Rubik’s Cube ranking (< 0.5)
The intrinsic patterns in the player ratings data are NOT
revealed in 2-D scatter plots or by using traditional statistical
methods.
83
Island of Games Color-coding the player ratings data distribution
Green and Red =High Rubik’s Cube ranking (> 0.5) ; Blue and Yellow =Low Rubik’s Cube ranking (< 0.5)
The intrinsic patterns in the player ratings data are not revealed
in 2-D scatter plots or by using traditional statistical methods.
Finally, exploration in the 3-D input parameter space (of player
ratings for 3 games) reveals the actual player groupings…84
3-D view of the Player Ratings DataThe true 3-D data distribution = 4 separable groups!
http://www.datasciencecentral.com/profiles/blogs/island-of-games-puzzle-problem-statement
Data Visualization Revelations85
Solving the Island of Games Puzzle
with a sequence of cluster models
Reference: Dr. Joseph Marr, GMU 86
1-Dimensional Exploration –statistical dependence tests :
all outcomes are equally likely (50%)
x=H y=H z=H
495 524 521
P(x=H) P(y=H) P(z=H)
0.495 0.524 0.521
x=L y=L z=L
505 476 479
P(x=L) P(y=L) P(z=L)
0.505 0.476 0.479
87
2-Dimensional Exploration –statistical dependence tests :
pair-wise associations are also random
PAIRS
1 P(x=H,y=H) 0.270 LIFT 7 P(x=L,z=H) 0.251 LIFT
P(x=H)P(y=H) 0.259 1.041 P(x=L)P(z=H) 0.263 0.954
2 P(x=H,y=L) 0.225 LIFT 8 P(x=L,z=L) 0.254 LIFT
P(x=H)P(y=L) 0.236 0.955 P(x=L)P(z=L) 0.242 1.050
3 P(x=L,y=H) 0.254 LIFT 9 P(y=H,z=H) 0.270 LIFT
P(x=L)P(y=H) 0.265 0.960 P(y=H)P(z=H) 0.273 0.989
4 P(x=L,y=L) 0.251 LIFT 10 P(y=H,z=L) 0.254 LIFT
P(x=L)P(y=L) 0.240 1.044 P(y=H)P(z=L) 0.251 1.012
5 P(x=H,z=H) 0.270 LIFT 11 P(y=L,z=H) 0.251 LIFT
P(x=H)P(z=H) 0.258 1.047 P(y=L)P(z=H) 0.248 1.012
6 P(x=H,z=L) 0.225 LIFT 12 P(y=L,z=L) 0.225 LIFT
P(x=H)P(z=L) 0.237 0.949 P(y=L)P(z=L) 0.228 0.987
88
3-Dimensional Exploration –statistical dependence tests :
3-way combinations are gold!!!TRIPLES
1 P(x=H,y=H,z=H) 0.270 LIFT
P(x=H)P(y=H)P(z=H) 0.135 1.998
2 P(x=H,y=H,z=L) 0.000 LIFT
P(x=H)P(y=H)P(z=L) 0.124 0.000
3 P(x=H,y=L,z=H) 0.000 LIFT
P(x=H)P(y=L)P(z=H) 0.123 0.000
4 P(x=H,y=L,z=L) 0.225 LIFT
P(x=H)P(y=L)P(z=L) 0.113 1.994
5 P(x=L,y=H,z=H) 0.000 LIFT
P(x=L)P(y=H)P(z=H) 0.138 0.000
6 P(x=L,y=H,z=L) 0.254 LIFT
P(x=L)P(y=H)P(z=L) 0.127 2.004
7 P(x=L,y=L,z=H) 0.251 LIFT
P(x=L)P(y=L)P(z=H) 0.125 2.004
.
8 P(x=L,y=L,z=L) 0.000 LIFT
P(x=L)P(y=L)P(z=L) 0.115 0.00089
Association Mining
also discovers voids
(anti-Associations =XOR relationships)
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment90
6
All of the features in the data histogramconvey valuable (actionable) information (the long tail, outliers, multi-modal peaks, …)
0
2000
4000
6000
8000
10000
12000
14000
-8 -6 -4 -2 0 2 4 6 8
91
Mixture Models = Statistical Clustering Each of these data histograms
can be represented by the mixture (i.e., sum) of several Gaussian normal distributions,
such as the 3 Gaussian distributions shown in the
lower right.
Each Gaussian statistically represents (characterizes) one “cluster” of data values within
the full set of data values.
Comprehensive web resource for Mixture Models for clustering and unsupervised learning in Data Mining:
http://www.csse.monash.edu.au/~dld/mixture.modelling.page.html
92
Statistical Clustering tags (characterizes) the data, enabling discovery: making the data “smart”!
93
Each Gaussian in the mixture can be characterized by various parameters, such as the mean, variance (standard deviation), and amplitude (i.e., the strength of that particular Gaussian component within the mixture).
These parameters can be plotted as a function of some independent (treatment) variable, to discover trends and correlations in the effects across the different segments of the population. h
ttp
s://
ww
w.r
ese
arch
gat
e.n
et/p
ub
lica
tio
n/6
20
02
24
_Co
nfo
rmat
ion
al_
entr
op
y_i
n_
mo
lecu
lar_
reco
gnit
ion
_by
_pro
tein
s
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment94
7
Bayes Theorem
• Bayes Theorem…
• Naïve Bayes assumption95
http://www.datasciencecentral.com/profiles/blogs/6-easy-steps-to-learn-naive-bayes-algorithm-with-code-in-python
Bayes Theorem
• Bayes Theorem… now with Legos
96
https://www.countbayesie.com/blog/2015/2/18/bayes-theorem-with-lego
Bayes Theorem• … for missing value imputation
• Bad idea: inserting estimated values of missing data elements = Data Creation!
• Better idea: predicting a value that is not knowable in advance = Predictive Analytics!
97
Bayes Belief Networks• … for missing value imputation … Example:
• Use all conditional probabilities across all database attributes to predict the missing value.
98
Bayes Belief Networks for Cosmology• …for missing value imputation: Galaxy Redshifts
• The problem: less than 0.1% of catalogued galaxies have a measured redshift = Distance!
• Bigger sky surveys are coming in the next 10 yrs
• Less than 0.001% of galaxies will have distance estimate!!
• Traditional method: use colors of galaxies (red-shift…)
• BBN method: use all properties of galaxies (shape, size, color, texture, concentration,…)
• Result: generate a probability distribution curve of the redshift (distance estimate) for each and every galaxy.
• Consequence: build a mass distribution map of the Universe!99
Machine Learning Algorithms
1) Principal Component Analysis (PCA)
2) Regression Analysis
3) Graph Mining (Network Analysis)
4) Clustering and Validation
5) Association Mining (Link Analysis)
6) Statistical Clustering
7) Bayesian Belief Networks
8) My Data Science Career ‘Aha!’ moment100
8
My Data Science Career “Aha!” moment
101
6Data
Mining
what?
http://dssresources.com/cases/DSScatalog.html#ADSCOUThttp://www.moneyballmarketer.com/blog/2017/12/28/data-analytics-lessons-from-the-nbas-first-data-mining-product-from-ibm-1
OUTLINE
• Goals of Machine Learning Applications
• Some Machine Learning Algorithms
• Some Atypical Applications
• Final Thoughts
102
@KirkDBorne
In the Big Data era, Everything is Quantified and Monitored :– Populations & Persons– Smart Cities, Energy, Grids, Farms, Highways– Environmental Sensors– IoE = Internet of Everything!
Discovery through Machine Learning and Data Science:– Class Discovery, Correlation Discovery,
Novelty Discovery, and – Association Discovery: Find interesting
cases where condition X is associated with event Y with time shift Z.
17 SDGs are KPIs for the World!
(currently, the SDGs have 229
Key Performance Indicators)( SDG: Sustainability Development Goal )
Big Data + the IoT + Citizen Data Scientists =
= Partners in SustainabilityThe Internet of Things (IoT):Knowing the knowable via deep, wide, and fast data from ubiquitous sensors!
Big Data:
Sustainability Development Goals
http://www.unglobalpulse.o rg 103
104Booz | Allen | Hamilton
@KirkDBorne
@BoozDataScience
LISTEN
READ, BUILD, and EXPLOREwww.boozallen.com/datascience
Tips for Building a Data Science Capability The Mathematical Corporation 10 Signs of Data Science Maturity
The Field Guide to Data Science The Data and Analytics Catalyst
Explore: sailfish.boozallen.com
Booz | Al len | Hamilton
PARTICIPATEdatasciencebowl.com
…Learn how AI and Machine Intelligence empower The Mathematical Corporation
in MachineIntelligence
These slides are here: http://www.kirkborne.net/ODSC2018/
THANK YOU!Check out some of these resources…
Thank you!Contact information, for further questions or inquiries:
Dr. Kirk Borne
Principal Data Scientist, Booz Allen Hamilton
Twitter: @KirkDBorne or Email: kirk.borne@gmail.com
105Booz | Allen | Hamilton
These slides are here: http://www.kirkborne.net/ODSC2018/
ODSC Conference – Boston – May 2018
top related