finding unsupervised learning - cornell universityunsupervised learning pantelis p. analytis...
TRANSCRIPT
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Unsupervised Learning
Pantelis P. Analytis
March 19, 2018
1 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
1 Introduction
2 Finding structure in graphs
3 Clustering analysis
4 Dimensionality reduction
2 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
What’s unsupervised learning?
Most of the data available on the internet do not havelabels. How can we make sense of it?
3 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Finding structure in graphs
4 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Finding structure in graphs
5 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Organizing the web
First attempts to organize the web were based on humancurated directories (Yahoo, looksmart).People also used methods from information retrieval touncover relevant documents.Yet he web has a deluge of untrusted documents, spam,random webpages, advertisements etc.
6 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Elements of the PageRank algorithm
Solution: Use social feedback to rank the quality ofdocuments.You can see links as vote. A page is more important whenit has more incoming links.For instance www.nytimes.com has numerous incomingnotes, as opposed to www.inkefalonia.grLinks from important questions countmore—Recursiveness.
7 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
The iterative PageRank algorithm
At t = 0, assume an initial probability distribution:
PR(pi ; 0) = 1N .
At each time step, the computation yields:
PR(pi ; t + 1) = 1−dN + d
∑pj∈M(pi )
PR(pj ;t)L(pj )
8 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
At t = 0, assume an initial probability distribution:
PR(pi ; 0) = 1N .
At each time step, the computation yields:
PR(pi ; t + 1) = 1−dN + d
∑pj∈M(pi )
PR(pj ;t)L(pj )
9 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
10 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Page Rank Equilibrium
11 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
PageRank: The spider trap
12 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
PageRank: The spider trap
13 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
The Scaled PageRank algorithm
Scaled PageRank Update Rule
Apply basic PR rule.
Scale all values down by factor s.
Divide the 1-s leftover units of PR evenly over nodes.14 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
What’s unsupervised learning?
Most of the data available on the internet do not havelabels. How can we make sense of it?
15 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Clustering: the k-means algorithm
Input: K , set of points x1, ..., xn
Place centroids c1, ..., ck randomly
Then repeat until convergence:
For each point xi find the nearest centroid cj and assignthat point to that clusterIn math notation: argminj D(xi , cj)For each cluster j = 1, ...,K find the new centroid of allpoints xi assigned to cluster j in previous step.In math notation: cj(a) = 1/nj
∑xi→cj
xi (a) for a = 1, ..., d
Stop when the algorithm has converged i.e. none of theitems changes cluster.
16 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Converging to clusters
17 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
How do we select k?
There are diminishing returns in the size of differentclusters.
An intuitive approach suggests picking the after which thedistance reduction flattens out.
18 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Hierarchical Clustering
19 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative vs. divisive
Agglomerative clustering starts from the bottom andmoves to larger clusters.
Divisive clustering starts with one cluster which isgradually disintegrated into smaller ones.
20 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative vs. divisive
How do we determine the nearness of clusters?
Complete linkage: D(X ,Y ) = maxx∈X ,y∈Y d(x , y)
Single linkage: D(X ,Y ) = minx∈X ,y∈Y d(x , y)
Average linkage: 1|X ||Y |
∑x∈X
∑y∈Y d(x , y).
21 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Agglomerative Clustering
Pick k upfront, stop when we have k clusters.
Stop when a cluster with low cohesion is created(diameter, radius or density-based approaches).
22 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 0: Randomly position the grid’s neurons in the dataspace.
23 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 1: Select one data point, either randomly orsystematically cycling through the dataset in order
24 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 2: Find the neuron that is closest to the chosen datapoint. This neuron is called the Best Matching Unit(BMU).
25 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 3: Move the BMU closer to that data point. Thedistance moved by the BMU is determined by a learningrate, which decreases after each iteration.
26 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Step 4: Move the BMU’s neighbors closer to that datapoint as well, with farther away neighbors moving less.Neighbors are identified using a radius around the BMU,and the value for this radius decreases after each iteration.
27 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
28 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
29 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
30 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
31 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
Update the learning rate and BMU radius, beforerepeating Steps 1 to 4. Iterate these steps until positionsof neurons have been stabilized.
32 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Kohonen’s self-organizing maps
33 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
34 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
35 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
36 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
Often used to accelerate supervised learning.
Visualization
37 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
38 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Principal component analysis
39 / 40
UnsupervisedLearning
Pantelis P.Analytis
Introduction
Findingstructure ingraphs
Clusteringanalysis
Dimensionalityreduction
Dimensionality reduction in recommender systems
40 / 40