fall 20041 optimization methods in data mining. fall 20042 overview optimization mathematical...
TRANSCRIPT
![Page 1: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/1.jpg)
Fall 2004 1
Optimization Methods in Data Mining
![Page 2: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/2.jpg)
Fall 2004 2
OverviewOptimization
MathematicalProgramming
CombinatorialOptimization
SupportVectorMachines
SteepestDescentSearch
Classification,Clustering,etc
Neural Nets,Bayesian Networks(optimize parameters)
GeneticAlgorithm
Feature selectionClassificationClustering
![Page 3: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/3.jpg)
Fall 2004 3
What is Optimization? Formulation
Decision variables Objective function Constraints
Solution Iterative algorithm Improving search
Problem
Model
Solution
Formulation
Algorithm
![Page 4: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/4.jpg)
Fall 2004 4
Combinatorial Optimization Finitely many solutions to choose from
Select the best rule from a finite set of rules Select the best subset of attributes
Too many solutions to consider all Solutions
Branch-and-bound (better than Weka exhaustive search)
Random search
![Page 5: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/5.jpg)
Fall 2004 5
Random Search Select an initial solution x(0) and let k=0 Loop:
Consider the neighbors N(x(k)) of x(k)
Select a candidate x’ from N(x(0)) Check the acceptance criterion If accepted then let x(k+1) = x’ and
otherwise let x(k+1) = x(k)
Until stopping criterion is satisfied
![Page 6: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/6.jpg)
Fall 2004 6
Common Algorithms Simulated Annealing (SA)
Idea: accept inferior solutions with a given probability that decreases as time goes on
Tabu Search (TS) Idea: restrict the neighborhood with a list of
solutions that are tabu (that is, cannot be visited) because they were visited recently
Genetic Algorithm (GA) Idea: neighborhoods based on ‘genetic
similarity’ Most used in data mining applications
![Page 7: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/7.jpg)
Fall 2004 7
Genetic Algorithms Maintain a population of solutions
rather than a single solution Members of the population have
certain fitness (usually just the objective)
Survival of the fittest through selection crossover mutation
![Page 8: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/8.jpg)
Fall 2004 8
GA Formulation Use binary strings (or bits) to
encode solutions:0 1 1 0 1 0 0 1 0
Terminology Chromosomes = solution Parent chromosome Children or offspring
![Page 9: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/9.jpg)
Fall 2004 9
Problems Solved
Data Mining Problems that have been addressed using Genetic Algorithms: Classification
Attribute selection
Clustering
![Page 10: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/10.jpg)
Fall 2004 10
Outlook
Sunny 100
Overcast 010
Rainy 001
Yes 10
No 01
Windy
Classification Example
![Page 11: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/11.jpg)
Fall 2004 11
Representing a Rule
If windy=yes then play=yes
playwindyoutlook
10 10 111
If outlook=overcast and windy=yes then play=no
playwindyoutlook
01 10 010
![Page 12: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/12.jpg)
Fall 2004 12
Single-Point Crossover
playwindyoutlook
10 10 111
playwindyoutlook
01 01 010
Crossover point
playwindyoutlook
01 01 111
playwindyoutlook
10 10 010
Parents Offspring
![Page 13: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/13.jpg)
Fall 2004 13
Two-Point Crossover
playwindyoutlook
10 10 111
playwindyoutlook
01 01 010
Crossover points
playwindyoutlook
10 01 111
playwindyoutlook
01 10 010
Parents Offspring
![Page 14: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/14.jpg)
Fall 2004 14
Uniform Crossover
playwindyoutlook
01 10 1 11
playwindyoutlook
10 01 001
Parents Offspring
playwindyoutlook
00 10 110
playwindyoutlook
11 01 011
Problem?
![Page 15: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/15.jpg)
Fall 2004 15
Mutation
playwindyoutlook
01 01 010 playwindyoutlook
01 11 010
Parent Offspring
Mutated bit
![Page 16: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/16.jpg)
Fall 2004 16
Selection Which strings in the population
should be operated on? Rank and select the n fittest ones Assign probabilities according to
fitness and select probabilistically, say
jj
ii x
xxP
)(Fitness
)(Fitness]select [
![Page 17: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/17.jpg)
Fall 2004 17
Creating a New Population Create a population Pnew with p individuals Survival
Allow individuals from old population to survived intact Rate: 1-r % of population How to select the individuals that survive: Deterministic/random
Crossover Select fit individuals and create new once Rate: r% of population. How to select?
Mutation Slightly modify any on the above individuals Mutation rate: m Fixed number of mutations versus probabilistic mutations
![Page 18: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/18.jpg)
Fall 2004 18
GA Algorithm Randomly generate an initial population P Evaluate the fitness f(xi) of each individual in P Repeat:
Survival: Probabilistically select (1-r)p individuals from P and add to Pnew, according to
Crossover: Probabilistically select rp/2 pairs from P and apply the crossover operator. Add to Pnew
Mutation: Uniformly choose m percent of member and invert one randomly selected bit
Update: P Pnew
Evaluate: Compute the fitness f(xi) of each individual in P Return the fittest individual from P
jj
ii xf
xfxP
)(
)(]select [
![Page 19: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/19.jpg)
Fall 2004 19
Analysis of GA: Schemas Does GA converge? Does GA move towards a good solution?
Local optima?
Holland (1975): Analysis based on schemas
Schema: string combination of 0s, 1s, *s Example: 0*10 represents {0010,0110}
![Page 20: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/20.jpg)
Fall 2004 20
The Schema Theorem(all the theory on one slide)
)()1(1
)(1),(
)(
),(ˆ)]1,([ so
mc pl
sdptsm
tf
tsutsmE
Number of instance ofschema s at time t
Average fitness ofindividuals in schema s at time t
Probabilityof crossover
Probabilityof mutation
Number ofdefined bitsin schema s
Distance betweendefined bits in s
![Page 21: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/21.jpg)
Fall 2004 21
Interpretation Fit schemas grow in influence What is missing
Crossover? Mutation?
How about time t+1 ? Other approaches:
Markov chains Statistical mechanics
![Page 22: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/22.jpg)
Fall 2004 22
GA for Feature Selection Feature selection:
Select a subset of attributes (features) Reason: to many, redundant, irrelevant
Set of all subsets of attributes very large
Little structure to search Random search methods
![Page 23: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/23.jpg)
Fall 2004 23
Encoding Need a bit code representation Have some n attributes Each attribute is either in (1) or out
(0) of the selected set
windy
humidity
etemperatur
outlook
0101
![Page 24: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/24.jpg)
Fall 2004 24
Fitness Wrapper approach
Apply learning algorithm, say a decision tree, to the individual x ={outlook, humidity}
Let fitness equal error rate (minimize) Filter approach
Let fitness equal the entropy (minimize) Other diversity measures can also be used
Simplicity measure?
![Page 25: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/25.jpg)
Fall 2004 25
Crossover
windy
humidity
etemperatur
outlook
0101
windy
humidity
etemperatur
outlook
1100
windy
humidity
etemperatur
outlook
1101
windy
humidity
etemperatur
outlook
0100
Crossover point
![Page 26: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/26.jpg)
Fall 2004 26
In Weka
![Page 27: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/27.jpg)
Fall 2004 27
Clustering Example
ID Outlook Temperature Humidity Windy Play10 Sunny Hot High True No20 Overcast Hot High False Yes30 Rainy Mild High False Yes40 Rainy Cool Normal False Yes
OffspringParents
0010
1011
1010
0011
Crossover
{10,20}{30,40}
{20,40}{10,30}
{10,20,40}{30}
{20}{10,30,40}
Create two clusters for:
![Page 28: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/28.jpg)
Fall 2004 28
Discussion GA is a flexible and powerful
random search methodology Efficiency depends on how well you
can encode the solutions in a way that will work with the crossover operator
In data mining, attribute selection is the most natural application
![Page 29: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/29.jpg)
Fall 2004 29
Attribute Selection in Unsupervised Learning
Attribute selection typically uses a measure, such as accuracy, that is directly related to the class attribute
How do we apply attribute selection to unsupervised learning such as clustering?
Need a measure compactness of cluster separation among clusters
Multiple measures
![Page 30: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/30.jpg)
Fall 2004 30
Quality Measures Compactness
K
k
n
ikjijik
withinwithin x
dZF
1 1
2111
Otherwise0
cluster tobelongs If1 kxiik
n
i ik
n
i ijikkj
x
1
1
Centroid
InstancesClusters
Number ofattributes
Normalizationconstant to make
]1,0[withinF
![Page 31: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/31.jpg)
Fall 2004 31
More Quality Measures Cluster Separation
K
k
n
i Jjkjijik
betbetween x
kdZF
1 1
211
111
![Page 32: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/32.jpg)
Fall 2004 32
Final Quality Measures Adjustment for bias
Compexityminmax
min1KK
KKFclusters
1
11
D
dFclusters
![Page 33: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/33.jpg)
Fall 2004 33
Wrapper Framework
Loop:
Obtain an attribute subset
Apply k-means algorithm
Evaluate cluster quality
Until stopping criterion satisfied
![Page 34: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/34.jpg)
Fall 2004 34
Problem What is the optimal attribute
subset? What is the optimal number of
clusters?
Try to find simultaneously
![Page 35: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/35.jpg)
Fall 2004 35
Example Find an attribute subset and optimal number of
clusters (Kmin = 2, Kmin = 3) for
ID Sepal Length Sepal Width Petal length Petal Width10 5.0 3.5 1.6 0.620 5.1 3.8 1.9 0.430 4.8 3.0 1.4 0.340 5.1 3.8 1.6 0.250 4.6 3.2 1.4 0.260 6.5 2.8 4.6 1.570 5.7 2.8 4.5 1.380 6.3 3.3 4.7 1.690 4.9 2.4 3.3 1.0
100 6.6 2.9 4.6 1.3
![Page 36: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/36.jpg)
Fall 2004 36
Formulation Define an individual
Initial Population0 1 0 1 1
1 0 0 1 0
clusters
ofnumber
attributes Selected
*****
![Page 37: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/37.jpg)
Fall 2004 37
Evaluate Fitness Start with 0 1 0 1 1 Three clusters and {Sepal Width, Petal Width}
Apply k-means with k=3
ID Sepal Width Petal Width10 3.5 0.620 3.8 0.430 3.0 0.340 3.8 0.250 3.2 0.260 2.8 1.570 2.8 1.380 3.3 1.690 2.4 1.0
100 2.9 1.3
![Page 38: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/38.jpg)
Fall 2004 38
K-Means Start with random centroids: 10, 70, 80
1007060
80
10204050
30
90
0
0.5
1
1.5
2
2 2.5 3 3.5 4
Sepal Width
Peta
l W
idth
![Page 39: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/39.jpg)
Fall 2004 39
New Centroids
No change in assignment soterminate k-means algorithm
C1
C3
1007060
80
10204050
30
90
0
0.5
1
1.5
2
2 2.5 3 3.5 4
Sepal Width
Pet
al W
idth
![Page 40: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/40.jpg)
Fall 2004 40
Quality of Clusters Centers
Center 1 at (3.46,0.34): {60,70,90,100} Center 2 at (3.30,1.60): {80} Center 3 at (2.73,1.28): {10,20,30,40,50}
Evaluation
67.0
00.0
60.6
55.0
complexity
clusters
between
within
F
F
F
F
![Page 41: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/41.jpg)
Fall 2004 41
Next Individual Now look at 1 0 0 1 0 Two clusters and {Sepal Length, Petal Width}
Apply k-means with k=3
ID Sepal Length Petal Width10 5.0 0.620 5.1 0.430 4.8 0.340 5.1 0.250 4.6 0.260 6.5 1.570 5.7 1.380 6.3 1.690 4.9 1.0
100 6.6 1.3
![Page 42: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/42.jpg)
Fall 2004 42
K-Means Say we select 20 and 90 as initial
centroids:
90
3050 40
2010
8060
70 100
0
0.5
1
1.5
2
4 4.5 5 5.5 6 6.5 7
Sepal Width
Pet
al W
idth
![Page 43: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/43.jpg)
Fall 2004 43
Recalculate Centroids
90
3050 40
2010
8060
70 100
C1
C2
0
0.5
1
1.5
2
4 4.5 5 5.5 6 6.5 7
Sepal Width
Pet
al W
idth
![Page 44: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/44.jpg)
Fall 2004 44
Recalculate Again
90
3050 40
2010
8060
70 100
C1
C2
0
0.5
1
1.5
2
4 4.5 5 5.5 6 6.5 7
Sepal Width
Pet
al W
idth
No change in assignment soterminate k-means algorithm
![Page 45: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/45.jpg)
Fall 2004 45
Quality of Clusters
Centers Center 1 at (4.92,0.45): {10,20,30,40,50,90} Center 3 at (6.28,1.43): {60,70,90,100}
Evaluation
67.0
00.1
59.14
39.0
complexity
clusters
between
within
F
F
F
F
![Page 46: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/46.jpg)
Fall 2004 46
Compare Individuals
67.0
00.1
59.14
39.0
01001
complexity
clusters
between
within
F
F
F
F
67.0
00.0
60.6
55.0
11010
complexity
clusters
between
within
F
F
F
F
Which is fitter?
![Page 47: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/47.jpg)
Fall 2004 47
Evaluating Fitness Can scale (if necessary) Then weight them together, e.g.,
Alternatively, we can use Pareto optimization
84.067.0059.14
59.14
55.0
39.0)01001(
53.067.0059.14
60.6
55.0
55.0)11010(
fitness
fitness
![Page 48: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/48.jpg)
Fall 2004 48
![Page 49: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/49.jpg)
Fall 2004 49
Mathematical Programming
Continuous decision variables Constrained versus non-
constrained Form of the objective function
Linear Programming (LP) Quadratic Programming (QP) General Mathematical Programming
(MP)
![Page 50: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/50.jpg)
Fall 2004 50
Linear Program
xxf 5.02)(
10
15x
xx0
150s.t
5.02max
x
x
Optimal solution
![Page 51: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/51.jpg)
Fall 2004 51
Two Dimensional Problem
FeasibleRegion
2000
1500
1000
500
500 1000 1500 2000
15002 x
15001 x
175021 xx
480024 21 xx
01 x
02 x
2x
1x
0,
480024
1750
1500
1000s.t.912max
21
21
21
2
1
21
xx
xx
xx
x
xxx
6000912 21 xx
Optimal Solution
12000912 21 xx
Optimum isalways at anextreme point
![Page 52: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/52.jpg)
Fall 2004 52
Simplex Method2000
1500
1000
500
500 1000 1500 2000
15002 x
15001 x
175021 xx
480024 21 xx
01 x
02 x
2x
1x
![Page 53: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/53.jpg)
Fall 2004 53
Quadratic Programming
0
0.2
0.4
0.6
0.8
1
1.2
1.4
00.2
0.4
0.6
0.8 1
1.2
1.4
1.6
1.8 2
f (x )=0.2+(x -1)2
1
0)1(2
)1(2)('
x
x
xxf
![Page 54: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/54.jpg)
Fall 2004 54
General MP
0)(' xf
Derivative beingzero is a necessarybut not sufficientcondition
0)(' xf
0)(' xf
0)(' xf
0)(' xf
![Page 55: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/55.jpg)
Fall 2004 55
Constrained Problem?
0)(' xf
10x
![Page 56: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/56.jpg)
Fall 2004 56
General MP
)g,...,g,(g
)h,...,h,(h
ariablesdecision v ofVector
)(s.t.)(min
m21
m21
(x)(x)(x)g(x)
(x)(x)(x)h(x)
x
0g(x)
0xhx
f
We write a general mathematical program in matrix notation as:
![Page 57: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/57.jpg)
Fall 2004 57
Karush-Kuhn-Tucker (KKT) Conditions
0g(x)
0xhx
)(s.t.)(min
for minimum relative a is If *
f
x
0*
***
xgμ
0xgμxhλxT
TTf
such that ,,exist There 0μλ
![Page 58: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/58.jpg)
Fall 2004 58
Convex Sets
Convex Not Convex
A set C is convex if any line connecting two points in theset lies completely within the set, that is,
CC 2121 )1(:)1,0(,, xxxx
![Page 59: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/59.jpg)
Fall 2004 59
Convex Hull The convex hull co(S) of a set S is
the intersection of all convex sets containing S
A set V Rn is a linear variety ifRxxxx ,)1(:, 2121 VV
![Page 60: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/60.jpg)
Fall 2004 60
HyperplaneA hyperplane in Rn is a (n-1)-dimensional
variety
Hyperplane in R3Hyperplane in R2
![Page 61: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/61.jpg)
Fall 2004 61
Convex Hull Example
Play
No Play
Temperature
Humidity
Separating hyperplanebisects closest points
Closets points inconvex hulls
c
d
![Page 62: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/62.jpg)
Fall 2004 62
Finding the Closest Points
0
1
1
s.t.
2
1min
YesPlay
YesPlay
NoPlay
YesPlay
2
i
i:i
i:i
i:ii
i:ii
xd
xc
dc
Formulate as QP:
![Page 63: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/63.jpg)
Fall 2004 63
Support Vector MachinesPlay
No Play
Temperature
Humidity
SeparatingHyperplane
Support Vectors
![Page 64: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/64.jpg)
Fall 2004 64
ExampleID Sepal Width Petal Width10 3.5 0.620 3.8 0.430 3.0 0.340 3.8 0.250 3.2 0.260 2.8 1.570 2.8 1.380 3.3 1.690 2.4 1.0
100 2.9 1.3
![Page 65: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/65.jpg)
Fall 2004 65
Separating Hyperplane
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2 2.5 3 3.5 4
![Page 66: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/66.jpg)
Fall 2004 66
Assume Separating Planes
.1:,1
1:,1
iii
iii
yib
yib
wx
wxConstraints:
Distance to each plane:
w
1
![Page 67: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/67.jpg)
Fall 2004 67
Optimization Problem
.1:,1
1:,1subject to
max2
b,
iii
iii
yib
yib
wx
wx
ww
![Page 68: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/68.jpg)
Fall 2004 68
How Do We Solve MPs?
-5 -4 -3 -2 -1 0 1 2 3 4 5-5
-4
-3
-2
-1
0
1
2
3
4
5
![Page 69: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/69.jpg)
Fall 2004 69
Improving Search
Direction-step approach
kkk xxx 1
Step size
Search directionCurrent Solution
New Solution
![Page 70: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/70.jpg)
Fall 2004 70
Steepest Descent Search direction equal to negative
gradient
Finding is a one-dimensional optimization problem of minimizing
)( kf xx
)()( kkf xx
![Page 71: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/71.jpg)
Fall 2004 71
Newton’s Method Taylor series expansion
The right hand side is minimized at
)()( 11 kkkk fF xxxx
)()()(2
1
))(()()(
kkT
k
kkk
F
fff
xxxxx
xxxxx
![Page 72: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/72.jpg)
Fall 2004 72
Discussion Computing the inverse Hessian is
difficult Quasi-Newton
Conjugate gradient methods Does not account for constraints
Penalty methods Lagrangian methods, etc.
)(1 kkkkk f xSxx
![Page 73: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/73.jpg)
Fall 2004 73
Non-separable
.,0
1:,1
1:,1
i
yib
yib
i
iiii
iiii
wx
wx
Add an error term to the constraints:
![Page 74: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/74.jpg)
Fall 2004 74
Wolfe Dual
.0
0subject to
2
1max
,
2
iii
i
jijijiji
ii
y
C
yy
xxwα
Simpleconstraints
Only placedata appears
![Page 75: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/75.jpg)
Fall 2004 75
Extension to Non-Linear
)()()( yxyx, K
Kernel functions
Mapping Hn R:Takes place ofdot product inWolfe dual
High dimensionalHilbert space
![Page 76: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/76.jpg)
Fall 2004 76
Some Possible Kernels
)tanh()(
)(
)1()(22
2/
yxyx,
yx,
yxyx,yx
K
eK
K p
![Page 77: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/77.jpg)
Fall 2004 77
In Weka
Weka.classifiers.smo
Support vector machine for nominal
data only
Does both linear and non-linear models
![Page 78: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/78.jpg)
Fall 2004 78
Optimization in DMOptimization
MathematicalProgramming
SupportVectorMachines
Classification,Clustering,etc
SteepestDescentSearch
Neural Nets,Bayesian Networks(optimize parameters)
CombinatorialOptimization
GeneticAlgorithm
Feature selectionClassificationClustering
![Page 79: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/79.jpg)
Fall 2004 79
Bayesian Classification Naïve Bayes assumes independence
between attributes Simple computations Best classifier if assumption is true
Bayesian Belief Networks Joint probability distributions Directed acyclic graphs
Nodes are random variables (attributes) Arcs represent the dependencies
![Page 80: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/80.jpg)
Fall 2004 80
Example: Bayesian Network
Family History Smoker
Lung Cancer Emphysema
Positive X-Ray Dyspnea
Lung Cancer is conditionally independent of emphysema. given Family History and Smoker
FH,S FH,~S ~FH,S ~FH,~S
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
Lung Cancer depends on Family History and Smoker
![Page 81: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/81.jpg)
Fall 2004 81
Conditional Probabilities
n
iiin Zzzz
11 )(Parents|Pr),...,Pr(
9.0)no""Smoker,no""oryFamilyHist|no""LungCancerPr(
8.0)yes""Smoker,yes""oryFamilyHist|yes""LungCancerPr(
Randomvariable
Outcome of therandom variable The node representing
the class attribute iscalled the output node
![Page 82: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/82.jpg)
Fall 2004 82
How Do we Learn? Network structure
Given/known Inferred or learned from the data
Variables Observable Hidden (missing values / incomplete
data)
![Page 83: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/83.jpg)
Fall 2004 83
Case 1: Known Structure and Observable Variables
Straightforward Similar to Naïve Bayes Compute the entries of the
conditional probability table (CPT) of each variable
![Page 84: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/84.jpg)
Fall 2004 84
Case 2: Known Structure and Some Hidden Variables Still need to learn the CPT entries Let S be a set of s training instances
Let wijk be the CPT entry for variable Yi=yij having parents Ui=uik.
.,...,, 21 sXXX
![Page 85: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/85.jpg)
Fall 2004 85
CPT Example
}"","{"
},{
""
yesyesu
SmokeroryFamilyHistU
yesy
LungCancerY
ik
i
ij
i
FH,S FH,~S ~FH,S ~FH,~S
LC 0.8 0.5 0.7 0.1
~LC 0.2 0.5 0.3 0.9
ijkw
kjiijkw
,,w
![Page 86: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/86.jpg)
Fall 2004 86
Objective Must find the value of
The objective is to maximize the likelihood of the data, that is,
How do we do this?
kjiijkw
,,w
s
ddXS
1
Pr)(Pr ww
![Page 87: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/87.jpg)
Fall 2004 87
Non-Linear MP Compute gradients:
Move in the direction of the gradient
s
d ijk
dikiiji
ijk w
XuUyY
w
S
1
|,Pr)(Prw
From training data
ijkijkijk w
Slww
)(Prw
Learning rate
![Page 88: Fall 20041 Optimization Methods in Data Mining. Fall 20042 Overview Optimization Mathematical Programming Combinatorial Optimization Support Vector Machines](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d9d5503460f94a869f6/html5/thumbnails/88.jpg)
Fall 2004 88
Case 3: Unknown Network Structure Need to find/learn the optimal network structure for the data
What type of optimization problem is this?
Combinatorial optimization (GA etc.)