unsupervised slides

7

Click here to load reader

Upload: escom

Post on 11-May-2015

390 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Unsupervised Slides

5 - Unsupervised Learning

• Introduction• Statistical Clustering• Conceptual Clustering

• UNIMEM• COBWEB

Introduction

• Unsupervised Learning• Learner receives no explicit information about

classification of input examples.• Information is implicit.

• Aim of learning process - to discover regularities in the input data.

• Typically, consists of partitioning instances into classes (based on some similarity metric).• ie finding clusters of instances in the instance space.

• Not surprising that unsupervised learning systems sometimes closely resemble statistical clustering systems.

What is Clustering ?

• Common problem - construction of meaningful classifications of observed objects or situations.

• Often known as numerical taxonomy - since it involves production of a class hierarchy (classification scheme) using a mathematical measure of similarity over the instances.

Simple Clustering Algorithm

• Initialize• Set D to be the set of singleton sets such that each

set contains a unique set.• Until D contains only 1 element, do the following:

• Form a matrix of similarity values for all elements of D

• Using some given similarity function• Merge those elements of D which have a

maximum similarity value.

• Often known as agglomerative clustering.• Works bottom-up - trying to build larger clusters.

• Alternative - divisive clustering.• Works top-down (cf ID3)

Page 2: Unsupervised Slides

Clustering

• Traditional techniques• Often inadequate - as they arrange objects into classes solely

on the basis of a numerical measure of object similarity.• Only information used is that contained in the instances

themselves.• Algorithms unable to take account of semantic relationships

among instance attributes or global concepts that might be of relevance in forming a classification scheme.

• Conceptual Clustering• Idea first introduced by R S Michalski - 1980• Defined as process of constructing a concept network

characterizing a collection of objects with nodes marked by concepts describing object classes & links marked by the relationships between the classes.

Clustering

• Consider this example:

• WE would not cluster A and B together - but would cluster them into the 2 diamonds.

• Partitioning using concept membership rather than distance.

• Points are placed in the same cluster if collectively they represent the same concept.

• This is basis of conceptual clustering

A B

Conceptual Clustering

• Can be regarded as:

• Given:• A set of objects• A set of attributes to be used to characterise objects• A body of background knowledge - includes problem

constraints, properties of attributes, criteria for evaluating quality of constructed classifications.

• Find:• A hierarchy of object classes

• Each node should form a coherent concept• Compact• Easily represented in terms of a definition or rule that

has a natural interpretation for humans

Conceptual Clustering

• Given animal descriptors:

• Classification hierarchy produced:

name body-cover heart-chamber body-temp fertilisation

mammal hair four regulated internal

bird feathers four regulated internal

reptile cornified-skin imperfect-four unregulated internal

amphibian moist-skin three unregulated external

fish scales two unregulated external

animals

mammals/bird amphibian/fish

mammal fishbird amphibian

reptile

Page 3: Unsupervised Slides

Conceptual Clustering

• Michalski - 1980• Conjunctive conceptual clustering

• Concept class consists of conjunctive statements involving relations on selected object attributes.

• Method arranges objects into a hierarchy of classes.• CLUSTER/2

• Used to construct classification hierarchy of a large collection of Spanish folk songs.

UNIMEM

• Lebowitz - 1987• Essentially a divisive clustering algorithm

• Uses a decision tree structure as its basic representation.

• If asked to classify an instance - searches down through the tree, testing attributes & returns a classification based on the relevant leaf nodes.

• If asked to update the tree so as to represent a new instance - searches down through the tree looking for a suitable place to add in new structure.

UNIMEM

• Basic clustering principle:• Add new nodes into tree as & when they appear

to be warranted by the presented instances.• UNIMEM actually stores each presented instance

at all nodes which cover it.

• If two instances stored at a node that are particularly similar - then create an extra child node whose definition covers the two instances in question.

• Two instances are then relocated to this node.• As new instances are processed - new nodes are

created & hierarchy grows downwards.

UNIMEM

• Instance matches a node if it is covered by that node (concept)• Matching determined by testing to see what proportion of

the instance's attributes are associated with the node.• Search process returns all the most specific nodes that explain

(cover) the new instance.• UNIMEM then generalizes each node in this set as necessary

in order to account for the new instance.• The new instance is then classified with all other instances

stored at the node.

Page 4: Unsupervised Slides

UNIMEM Algorithm

• Initialize decision tree to be an empty root node.

• Apply following steps to each instance:• Search the tree depth-first for most specific concept

nodes that the instance matches.• Add new instance to the tree at or below these nodes

• Involves comparing new instance to ones already stored there & creating new subnodes if appropriate.

UNIMEM as Memory

• UNIMEM actually stores new instances inside the tree.• Can thus be viewed as a type of memory.

• GBM - Generalisation-Based Memory• Structure of hierarchy enables classes of instances to be

accessed much more efficiently than would be the case if all instances were stored in a linear memory structure.

COBWEB

• Fisher - 1987• Based on principle that a good clustering should

minimize distance between two points within a cluster & maximize distance between points in different clusters.

• Good clustering defined as:• One which maximizes intra-cluster similarity &

minimizes inter-cluster similarity.

• Goal of COBWEB - to find optimum tradeoff between these two !

COBWEB

• Incremental system for hierarchical conceptual clustering

• Carries out hill-climbing search through a space of hierarchical classification schemes using operators which enable bidirectional travel through this space.

• Features of COBWEB:• Heuristic evaluation function to guide search.• State representation - structure of hierarchies &

representation of concepts.• Operators used to build classification schemes• Control strategy.

Page 5: Unsupervised Slides

Category Utility

• Can be viewed as a function which rewards similarity of objects within same class & dissimilarity of objects in different classes.

• Gluck & Corter - 1985• Category utility function:

∑k=1

P(Ck) [ ∑

i ∑

j P(A

i = V

ij/C

k)2 - ∑

i ∑

j P(A

i = V

ij)2 ]

n

n

Representation

• Choice of category utility as heuristic measure dictates a concept representation different to logical, typically conjunctive representations used in AI.

• Probabilistic representation of {fish, amphibian, mammal}

• Each node in the classification tree is a probabilistic concept which represents an object class & summarises the objects classified under the node.

Attributes Values & Probabilities

body-cover scales (0.33), moist-skin (0.33), hair (0.33)

heart-chamber two (0.33), three (0.33), four (0.33)

body-temp unregulated (0.67), regulated (0.33)

fertilisation external (0.67), internal (0.33)

Operators

• Incorporation of a new object into the tree is a process of classifying an object by descending the tree along an appropriate path & performing one of several operations at each level.

• Operators include:• Classifying object with respect to an existing class.• Creating a new class.• Combining two classes into a single class.• Dividing a class into several classes.

Operators contd ...

• Classifying object in existing class• To determine which category best "hosts" a new object,

COBWEB tentatively places the object in each category.• Partition which results from adding object to a given node

is evaluated using category utility function.• Node which results in the best partition (highest CU) is

identified as the best existing host for the new object.• Creating a new class

• Quality of the partition resulting from placing the object in the best existing host is compared to partition resulting from creation of a new singleton class containing the object.

• Depending on which partition is best - object is placed in the best existing class or a new class is created.

Page 6: Unsupervised Slides

Example

P(C0) = 1.0

P(scales | C0) = 0.5

. . .

P(C1) = 0.5

P(scales | C1) = 1.0

. . .

P(C2) = 0.5

P(moist | C2) = 1.0

. . .

P(C0) = 1.0

P(scales | C0) = 0.33

. . .

P(C1) = 0.33

P(scales | C1) = 1.0

. . .

P(C2) = 0.33

P(moist | C2) = 1.0

. . .

P(C3) = 0.33

P(hair | C3) = 1.0

. . .

P(C0) = 1.0

P(scales | C0) = 0.25

. . .

P(C1) = 0.25

P(scales | C1) = 1.0

. . .

P(C2) = 0.25

P(moist | C2) = 1.0

. . .

P(C3) = 0.5

P(hair | C3) = 0.5

. . .

P(C5) = 0.5

P(feath | C5) = 1.0

. . .

P(C4) = 0.5

P(hair | C4) = 1.0

. . .

Existing Classification Structure

• Add "mammal":

• Add "bird":

Operators contd ...

• While the first two operators are effective in many ways - by themselves they are very sensitive to ordering of input data.

• Merging & splitting operators implemented to guard against these effects.

• Merging• Two nodes of a level are combined in hope that

the resultant partition is of better quality.• Involves creating a new node• Two original nodes are made children of newly

created node.• Splitting

• Node may be deleted and its children promoted.

Merging & Splitting Operators

P

A B

P

A B

New node• Node Merging

• Node Splitting

P

A B

P

A B

COBWEB Control Structure

COBWEB ( Object , Root of classification tree )1. Update counts of the Root2. IF Root is a leaf THEN Return the expanded leaf to accommodate Object ELSE Find the child of Root which best hosts Object & perform one of the following:

a. Consider creating a new class & do so if appropriate b. Consider node merging & do so if appropriate, call COBWEB ( Object, Merged node ) c. Consider node splitting & do so if appropriate, call COBWEB ( Object, Root ) d. IF None of the above were performed THEN Call COBWEB ( Object, Best child of Root )

Page 7: Unsupervised Slides

AutoClass

• Cheeseman et al - 1988• Bayesian statistical technique

• Bayes' theorem - formula for combining probabilities• Technique determines:

• Most probable number of classes• Their probabilistic descriptions• Probability that each object is a member of each class

• AutoClass does not do absolute partitioning of data into classes.• Calculates the probability of each object's membership in

each class.