finding unsupervised learning - cornell universityunsupervised learning pantelis p. analytis...

UnsupervisedLearning

Pantelis P.Analytis

Introduction

Findingstructure ingraphs

Clusteringanalysis

Dimensionalityreduction

Unsupervised Learning

Pantelis P. Analytis

March 19, 2018

1 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


1 Introduction

2 Finding structure in graphs

3 Clustering analysis

4 Dimensionality reduction

2 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


What’s unsupervised learning?

Most of the data available on the internet do not havelabels. How can we make sense of it?

3 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Finding structure in graphs

4 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Finding structure in graphs

5 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Organizing the web

First attempts to organize the web were based on humancurated directories (Yahoo, looksmart).People also used methods from information retrieval touncover relevant documents.Yet he web has a deluge of untrusted documents, spam,random webpages, advertisements etc.

6 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Elements of the PageRank algorithm

Solution: Use social feedback to rank the quality ofdocuments.You can see links as vote. A page is more important whenit has more incoming links.For instance www.nytimes.com has numerous incomingnotes, as opposed to www.inkefalonia.grLinks from important questions countmore—Recursiveness.

7 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


The iterative PageRank algorithm

At t = 0, assume an initial probability distribution:

PR(pi ; 0) = 1N .

At each time step, the computation yields:

PR(pi ; t + 1) = 1−dN + d

∑pj∈M(pi )

PR(pj ;t)L(pj )

8 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


At t = 0, assume an initial probability distribution:

PR(pi ; 0) = 1N .

At each time step, the computation yields:

PR(pi ; t + 1) = 1−dN + d

∑pj∈M(pi )

PR(pj ;t)L(pj )

9 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


10 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Page Rank Equilibrium

11 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


PageRank: The spider trap

12 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


PageRank: The spider trap

13 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


The Scaled PageRank algorithm

Scaled PageRank Update Rule

Apply basic PR rule.

Scale all values down by factor s.

Divide the 1-s leftover units of PR evenly over nodes.14 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


What’s unsupervised learning?

Most of the data available on the internet do not havelabels. How can we make sense of it?

15 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Clustering: the k-means algorithm

Input: K , set of points x1, ..., xn

Place centroids c1, ..., ck randomly

Then repeat until convergence:

For each point xi find the nearest centroid cj and assignthat point to that clusterIn math notation: argminj D(xi , cj)For each cluster j = 1, ...,K find the new centroid of allpoints xi assigned to cluster j in previous step.In math notation: cj(a) = 1/nj

∑xi→cj

xi (a) for a = 1, ..., d

Stop when the algorithm has converged i.e. none of theitems changes cluster.

16 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Converging to clusters

17 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


How do we select k?

There are diminishing returns in the size of differentclusters.

An intuitive approach suggests picking the after which thedistance reduction flattens out.

18 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Hierarchical Clustering

19 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Agglomerative vs. divisive

Agglomerative clustering starts from the bottom andmoves to larger clusters.

Divisive clustering starts with one cluster which isgradually disintegrated into smaller ones.

20 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Agglomerative vs. divisive

How do we determine the nearness of clusters?

Complete linkage: D(X ,Y ) = maxx∈X ,y∈Y d(x , y)

Single linkage: D(X ,Y ) = minx∈X ,y∈Y d(x , y)

Average linkage: 1|X ||Y |

∑x∈X

∑y∈Y d(x , y).

21 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Agglomerative Clustering

Pick k upfront, stop when we have k clusters.

Stop when a cluster with low cohesion is created(diameter, radius or density-based approaches).

22 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Kohonen’s self-organizing maps

Step 0: Randomly position the grid’s neurons in the dataspace.

23 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Step 1: Select one data point, either randomly orsystematically cycling through the dataset in order

24 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Step 2: Find the neuron that is closest to the chosen datapoint. This neuron is called the Best Matching Unit(BMU).

25 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Step 3: Move the BMU closer to that data point. Thedistance moved by the BMU is determined by a learningrate, which decreases after each iteration.

26 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Step 4: Move the BMU’s neighbors closer to that datapoint as well, with farther away neighbors moving less.Neighbors are identified using a radius around the BMU,and the value for this radius decreases after each iteration.

27 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



28 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



29 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



30 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



31 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Update the learning rate and BMU radius, beforerepeating Steps 1 to 4. Iterate these steps until positionsof neurons have been stabilized.

32 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



33 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Principal component analysis

34 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



35 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



36 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



Often used to accelerate supervised learning.

Visualization

37 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



38 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis



39 / 40


Pantelis P.Analytis

Introduction


Clusteringanalysis


Dimensionality reduction in recommender systems

40 / 40

finding unsupervised learning - cornell universityunsupervised learning pantelis p. analytis...

Documents