survival of the fittest: using genetic algorithm for data mining optimization
TRANSCRIPT
![Page 1: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/1.jpg)
Survival of the Fittest - Using Genetic
Algorithm for Data Mining Optimization
July 25, 2013
Or Levi
![Page 2: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/2.jpg)
Introduction
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 2
•Better Results
•Higher Accuracy
•Knowledge
• Insights
Big DataMachine Learning on eBay
Data Mining Optimization
Genetic Algorithm
![Page 3: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/3.jpg)
Agenda
What is Genetic Algorithm?
How GA can help improve Cluster Analysis?
Where it might be useful? An eBay Use Case
Questions and Answers
3Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
1
2
3
4
![Page 4: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/4.jpg)
Genetic Algorithm
A Search Heuristic Inspired by the Natural Evolution
![Page 5: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/5.jpg)
Genetic Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 5
0 0 1 0 0 1 0
Neck Length
Solution Representation Fitness Value Natural Selection Mechanism
EnvironmentChromosome
Tall Trees, Competition5’1
Adi
7 Genes
![Page 6: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/6.jpg)
Genetic Algorithm
Initial Population
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 6
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
0 0 1 0 0 1 0
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
Adi
![Page 7: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/7.jpg)
Genetic Algorithm
Fitness Function
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 7
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
Adi
5’6
4’2
5’8
4’9
5’1
Neck Length
7
![Page 8: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/8.jpg)
Genetic Algorithm
Selection
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 8
1 0 0 1 1 0 0
Joe
0 1 0 1 0 1 1
Zoe
1 1 0 1 1 1 0
Ron
1 0 1 0 1 0 1
Tom
0 0 1 0 0 1 0
5’6
4’2
5’8
4’9
Elitism
Adi 5’1
Neck Length
![Page 9: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/9.jpg)
Genetic Algorithm
Selection
Fitness proportionate selection
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 9
Ron
Joe
AdiTom
Zoe
![Page 10: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/10.jpg)
Genetic Algorithm
Crossover
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 10
1 1 0 1 1 1 0
Ron
0 1 0 1 0 1 1
Zoe
1 1 0 1 0 1 1
Ron Junior
0 1 0 1 1 1 0
Zoe Junior
5’8
4’2
6’0
5’3
Crossover Probability
![Page 11: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/11.jpg)
Genetic Algorithm
Mutation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 11
1 1 0 1 0 1 1
Ron Junior
0 1 0 1 1 1 0
Zoe Junior
No Mutation
6’0
5’3
Mutation Probability: 0. 1
Fitness
Chromosome
![Page 12: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/12.jpg)
Genetic Algorithm
Crossover
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 12
1 0 0 1 1 0 0
Joe
0 0 1 0 0 1 0
Adi
No Crossover
5’6
5’1
Crossover Probability
![Page 13: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/13.jpg)
Genetic Algorithm
Mutation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 13
1 0 0 1 1 0 0
Joe
0 0 1 0 0 1 0
Adi
0 0 1 0 1 1 0
Adi Junior 5’5
5’6
5’1
![Page 14: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/14.jpg)
Genetic Algorithm
New Generation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 14
1 1 0 1 1 1 0
Ron
1 1 0 1 0 1 1
Ron Junior
1 0 0 1 1 0 0
Joe
0 0 1 0 1 1 0
5’8
6’0
5’6
Neck Length
Previous
Adi Junior 5’5
0 1 0 1 1 1 0
Zoe Junior 5’3
5’1 5’5New
![Page 15: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/15.jpg)
Genetic Algorithm
Results
15Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
1 0 1 1 0 1 0
Adi Junior VIII 7’0
![Page 16: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/16.jpg)
Overview – Genetic Algorithm
16Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
![Page 17: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/17.jpg)
eBay Structured Data
What inventory is on our shelves?
![Page 18: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/18.jpg)
Structured Data
18Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
TaxonomyProducts
Item Finders
Attributes
![Page 19: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/19.jpg)
Structured Data
19Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
![Page 20: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/20.jpg)
Structured Data
20Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
![Page 21: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/21.jpg)
Structured Data
21Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Data Vendors
eBay Sellers
eBay Items
Products
Everywhere
Products
ISBN
UPI
![Page 22: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/22.jpg)
Choose Aggregation Set
Brand
Model
Color
Creating Products from Items
22
Product Features
Network: 4G
Camera: 8.0MP
Screen Size: 4 in.
Used iOS
16GB New
Unlocked
$525.0017 Bids
$649.99Buy It Now
$579.99or Best Offer
Storage
Carrier
Apple iPhone 5 – BlackSmartphones
Product Type eBay View Items Product
Apple iPhone 5 Black
Apple iPhone 5 Black
Black Apple iPhone 5
Other Features
Bluetooth: Yes
GPS: Yes
Dimensions:
Height: 4.87 in.
Depth: 0.30 in.
Width: 2.31 in.
Choose Aggregation Set Extract Relevant Attributes Aggregate Similar Items
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
![Page 23: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/23.jpg)
Creating Products from Items
23Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Structured Aggregation
Top-Down
Unstructured Clustering
Bottom-Up
Items
Products
Aggregation Set
Aggregation Set
![Page 24: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/24.jpg)
Overview – eBay Use Case
24Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
![Page 25: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/25.jpg)
Cluster Analysis
Discovering groups and structures that are in some way similar
![Page 26: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/26.jpg)
K-Means Cluster Analysis
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 26
𝒙𝒋
𝒎𝒊𝒏
𝒊=𝟏
𝑲
𝑺𝑺𝑬𝒊
𝑺𝑺𝑬𝒊 =
𝒙𝒋∈𝑪𝒊
𝒙𝒋 − 𝝁𝒊𝟐
𝝁𝒊
𝑪𝒊
Model
Total Within
Cluster Variance
Observation
Center
Cluster
Objective
![Page 27: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/27.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 27
Choose
K Random
Points
Initial Center
![Page 28: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/28.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 28
Assign
Points to
Clusters
Cluster
Center
![Page 29: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/29.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 29
Recalculate
the Clusters
Means
![Page 30: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/30.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 30
1
33.7
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Recalculate
the Clusters
Means
![Page 31: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/31.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 31
2
26.8
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
![Page 32: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/32.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 32
3
23.6
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
![Page 33: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/33.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 33
4
21.6
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
![Page 34: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/34.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 34
5
19.5
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
![Page 35: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/35.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 35
6
18.8
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
![Page 36: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/36.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 36
7
18.7
Solution
In Iteration
Total Within
Cluster Variance
Solution Score
Local Optimum
![Page 37: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/37.jpg)
Standard K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 37
Initial
Cluster
Centers
Initial Center
Local Optimum
![Page 38: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/38.jpg)
Overview – Standard K-Means
38Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Standard K-Means
Local Optimum
![Page 39: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/39.jpg)
Genetic K-Means Algorithm
Applying genetic algorithm to the standard K-Means heuristic
![Page 40: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/40.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 40
0 0 1 0 0 1 0
Chromosome
Adi
7 Genes
Solution Representation
![Page 41: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/41.jpg)
Genetic K-Means Algorithm
Solution Representation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 41
![Page 42: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/42.jpg)
Genetic K-Means Algorithm
Solution Representation
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 42
![Page 43: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/43.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 43
Solution Fitness
Neck Length
![Page 44: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/44.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 44
𝒊=𝟏
𝑲
𝒙𝒋∈𝑪𝒊
𝒙𝒋 − 𝝁𝒊𝟐
Total Within
Cluster Variance
Solution Fitness
![Page 45: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/45.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 45
𝑲𝑴𝒆𝒂𝒏𝒔 𝑰𝒕𝒆𝒓𝒂𝒕𝒊𝒐𝒏
Solution Fitness
![Page 46: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/46.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 46
𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓
Solution 1
Solution 2
Crossover
Ron Zoe
![Page 47: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/47.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 47
Offspring 1
Offspring 2
Crossover
Ron Junior Zoe Junior
𝑨𝒓𝒊𝒕𝒉𝒎𝒆𝒕𝒊𝒄 𝑪𝒓𝒐𝒔𝒔𝒐𝒗𝒆𝒓
![Page 48: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/48.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 48
Mutation
Adi
![Page 49: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/49.jpg)
Genetic K-Means Algorithm
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 49
Mutation
Adi Junior
![Page 50: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/50.jpg)
Overview – Genetic K-Means Algorithm
50Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Apply GA to K-MeansStandard K-Means
Local Optimum
![Page 51: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/51.jpg)
Demo
Genetic Algorithm VS Standard K-Means
![Page 52: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/52.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 52
1
32.5
Best Solution
In Generation
Cluster
Center
Total Within
Cluster Variance
Solution Fitness
![Page 53: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/53.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 53
2
21.9
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 54: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/54.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 54
3
14.7
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 55: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/55.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 55
4
12.3
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 56: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/56.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 56
5
9.7
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 57: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/57.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 57
6
9.1
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 58: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/58.jpg)
Demo
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 58
7
8.9
Best Solution
In Generation
Total Within
Cluster Variance
Solution Fitness
![Page 59: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/59.jpg)
Genetic Algorithm VS Standard K-Means
Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization 59
0
5
10
15
20
25
30
35
40
0 2 4 6 8 10 12
Total
Within
Cluster
Variance
Generations
Total Within Cluster Variance Per Generation
Genetic Algorithm K-Means
![Page 60: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/60.jpg)
Genetic Algorithm VS Standard K-Means
60Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
0
5
10
15
20
25
30
0 5 10 15 20
Total Within-Cluster Variance on Different Runs
K-Means Multiple K-Means Genetic Algorithm
Local Optimum
High Volatility
Global Optimum
51% 32%Average
ImprovementAcross 20 Different Runs
VS Standard K-Means VS Multiple K-Means
Total Within
Cluster Variance
![Page 61: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/61.jpg)
Overview – GA VS Standard K-Means
61Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
![Page 62: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/62.jpg)
eBay Use Case
Extract Structured Data from groups of similar items
![Page 63: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/63.jpg)
eBay Use Case
63Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Lumia 920 Red 32GB Lumia 520 Yellow 8GB Lumia 620 Green 8GB
Lumia 800 Blue 16GB
![Page 64: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/64.jpg)
eBay Use Case
64Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Nokia Lumia 800 Blue 8GB phone
0.05 0.12 0 0.31 0 0.20 0.12 0 0.14
Clean Up
TF-IDF Weights
NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE
NOKIA LUMIA 800 BLUE 8GB PHONE
Number of Unique Terms in All Titles
Original Title
97
9
25
520 620 800 920
50 Random Items
Text Dictionary: All Titles
Importance of
A term to a title
{Stop Words}
brand new
![Page 65: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/65.jpg)
eBay Use Case
65Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Aggregation Set:
Model Color Storage
0.08 0.11 0 0.13 0 0.06 0.05 0 0.03
8GB 620
Average
Weight
GREEN 5MP CAMERA PHONE
Cluster Center
1 Item
NOKIA | LUMIA | 800 | BLUE | 8GB | PHONE46% 23%Average
ImprovementAcross 20 Different Runs
VS Standard K-Means VS Multiple K-Means
Accurate Item
Classifications
![Page 66: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/66.jpg)
Overview – Example
66Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
![Page 67: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/67.jpg)
Questions & Answers
Open Discussion
?
![Page 68: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/68.jpg)
Conclusion
Summing it all up
![Page 69: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/69.jpg)
Conclusion
69Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Use Case Example
Apply GA to K-Means
Global Optimum
Standard K-Means
Local Optimum
+50% Accuracy
![Page 70: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/70.jpg)
Thank You!
Or Levi
Data Analyst
Catalog & Classification
eBay Structured Data
Linked
![Page 71: Survival of the Fittest: Using Genetic Algorithm for Data Mining Optimization](https://reader031.vdocuments.us/reader031/viewer/2022031907/55a930061a28ab3f578b48ce/html5/thumbnails/71.jpg)
Appendix – Genetic Algorithm Parameters
71Survival of the Fittest - Using Genetic Algorithm for Data Mining Optimization
Crossover
Probability65% 90%
5%
10%
Mutation
Probability
Population Size: 10 Number of Generations: 10
Crossover Probability: 75% Mutation Probability: 9%
Normalized
Score
100
0
Total Within
Cluster Variance
Average of 5 Runs