unsupervised slides
TRANSCRIPT
5 - Unsupervised Learning
• Introduction• Statistical Clustering• Conceptual Clustering
• UNIMEM• COBWEB
Introduction
• Unsupervised Learning• Learner receives no explicit information about
classification of input examples.• Information is implicit.
• Aim of learning process - to discover regularities in the input data.
• Typically, consists of partitioning instances into classes (based on some similarity metric).• ie finding clusters of instances in the instance space.
• Not surprising that unsupervised learning systems sometimes closely resemble statistical clustering systems.
What is Clustering ?
• Common problem - construction of meaningful classifications of observed objects or situations.
• Often known as numerical taxonomy - since it involves production of a class hierarchy (classification scheme) using a mathematical measure of similarity over the instances.
Simple Clustering Algorithm
• Initialize• Set D to be the set of singleton sets such that each
set contains a unique set.• Until D contains only 1 element, do the following:
• Form a matrix of similarity values for all elements of D
• Using some given similarity function• Merge those elements of D which have a
maximum similarity value.
• Often known as agglomerative clustering.• Works bottom-up - trying to build larger clusters.
• Alternative - divisive clustering.• Works top-down (cf ID3)
Clustering
• Traditional techniques• Often inadequate - as they arrange objects into classes solely
on the basis of a numerical measure of object similarity.• Only information used is that contained in the instances
themselves.• Algorithms unable to take account of semantic relationships
among instance attributes or global concepts that might be of relevance in forming a classification scheme.
• Conceptual Clustering• Idea first introduced by R S Michalski - 1980• Defined as process of constructing a concept network
characterizing a collection of objects with nodes marked by concepts describing object classes & links marked by the relationships between the classes.
Clustering
• Consider this example:
• WE would not cluster A and B together - but would cluster them into the 2 diamonds.
• Partitioning using concept membership rather than distance.
• Points are placed in the same cluster if collectively they represent the same concept.
• This is basis of conceptual clustering
A B
Conceptual Clustering
• Can be regarded as:
• Given:• A set of objects• A set of attributes to be used to characterise objects• A body of background knowledge - includes problem
constraints, properties of attributes, criteria for evaluating quality of constructed classifications.
• Find:• A hierarchy of object classes
• Each node should form a coherent concept• Compact• Easily represented in terms of a definition or rule that
has a natural interpretation for humans
Conceptual Clustering
• Given animal descriptors:
• Classification hierarchy produced:
name body-cover heart-chamber body-temp fertilisation
mammal hair four regulated internal
bird feathers four regulated internal
reptile cornified-skin imperfect-four unregulated internal
amphibian moist-skin three unregulated external
fish scales two unregulated external
animals
mammals/bird amphibian/fish
mammal fishbird amphibian
reptile
Conceptual Clustering
• Michalski - 1980• Conjunctive conceptual clustering
• Concept class consists of conjunctive statements involving relations on selected object attributes.
• Method arranges objects into a hierarchy of classes.• CLUSTER/2
• Used to construct classification hierarchy of a large collection of Spanish folk songs.
UNIMEM
• Lebowitz - 1987• Essentially a divisive clustering algorithm
• Uses a decision tree structure as its basic representation.
• If asked to classify an instance - searches down through the tree, testing attributes & returns a classification based on the relevant leaf nodes.
• If asked to update the tree so as to represent a new instance - searches down through the tree looking for a suitable place to add in new structure.
UNIMEM
• Basic clustering principle:• Add new nodes into tree as & when they appear
to be warranted by the presented instances.• UNIMEM actually stores each presented instance
at all nodes which cover it.
• If two instances stored at a node that are particularly similar - then create an extra child node whose definition covers the two instances in question.
• Two instances are then relocated to this node.• As new instances are processed - new nodes are
created & hierarchy grows downwards.
UNIMEM
• Instance matches a node if it is covered by that node (concept)• Matching determined by testing to see what proportion of
the instance's attributes are associated with the node.• Search process returns all the most specific nodes that explain
(cover) the new instance.• UNIMEM then generalizes each node in this set as necessary
in order to account for the new instance.• The new instance is then classified with all other instances
stored at the node.
UNIMEM Algorithm
• Initialize decision tree to be an empty root node.
• Apply following steps to each instance:• Search the tree depth-first for most specific concept
nodes that the instance matches.• Add new instance to the tree at or below these nodes
• Involves comparing new instance to ones already stored there & creating new subnodes if appropriate.
UNIMEM as Memory
• UNIMEM actually stores new instances inside the tree.• Can thus be viewed as a type of memory.
• GBM - Generalisation-Based Memory• Structure of hierarchy enables classes of instances to be
accessed much more efficiently than would be the case if all instances were stored in a linear memory structure.
COBWEB
• Fisher - 1987• Based on principle that a good clustering should
minimize distance between two points within a cluster & maximize distance between points in different clusters.
• Good clustering defined as:• One which maximizes intra-cluster similarity &
minimizes inter-cluster similarity.
• Goal of COBWEB - to find optimum tradeoff between these two !
COBWEB
• Incremental system for hierarchical conceptual clustering
• Carries out hill-climbing search through a space of hierarchical classification schemes using operators which enable bidirectional travel through this space.
• Features of COBWEB:• Heuristic evaluation function to guide search.• State representation - structure of hierarchies &
representation of concepts.• Operators used to build classification schemes• Control strategy.
Category Utility
• Can be viewed as a function which rewards similarity of objects within same class & dissimilarity of objects in different classes.
• Gluck & Corter - 1985• Category utility function:
∑k=1
P(Ck) [ ∑
i ∑
j P(A
i = V
ij/C
k)2 - ∑
i ∑
j P(A
i = V
ij)2 ]
n
n
Representation
• Choice of category utility as heuristic measure dictates a concept representation different to logical, typically conjunctive representations used in AI.
• Probabilistic representation of {fish, amphibian, mammal}
• Each node in the classification tree is a probabilistic concept which represents an object class & summarises the objects classified under the node.
Attributes Values & Probabilities
body-cover scales (0.33), moist-skin (0.33), hair (0.33)
heart-chamber two (0.33), three (0.33), four (0.33)
body-temp unregulated (0.67), regulated (0.33)
fertilisation external (0.67), internal (0.33)
Operators
• Incorporation of a new object into the tree is a process of classifying an object by descending the tree along an appropriate path & performing one of several operations at each level.
• Operators include:• Classifying object with respect to an existing class.• Creating a new class.• Combining two classes into a single class.• Dividing a class into several classes.
Operators contd ...
• Classifying object in existing class• To determine which category best "hosts" a new object,
COBWEB tentatively places the object in each category.• Partition which results from adding object to a given node
is evaluated using category utility function.• Node which results in the best partition (highest CU) is
identified as the best existing host for the new object.• Creating a new class
• Quality of the partition resulting from placing the object in the best existing host is compared to partition resulting from creation of a new singleton class containing the object.
• Depending on which partition is best - object is placed in the best existing class or a new class is created.
Example
P(C0) = 1.0
P(scales | C0) = 0.5
. . .
P(C1) = 0.5
P(scales | C1) = 1.0
. . .
P(C2) = 0.5
P(moist | C2) = 1.0
. . .
P(C0) = 1.0
P(scales | C0) = 0.33
. . .
P(C1) = 0.33
P(scales | C1) = 1.0
. . .
P(C2) = 0.33
P(moist | C2) = 1.0
. . .
P(C3) = 0.33
P(hair | C3) = 1.0
. . .
P(C0) = 1.0
P(scales | C0) = 0.25
. . .
P(C1) = 0.25
P(scales | C1) = 1.0
. . .
P(C2) = 0.25
P(moist | C2) = 1.0
. . .
P(C3) = 0.5
P(hair | C3) = 0.5
. . .
P(C5) = 0.5
P(feath | C5) = 1.0
. . .
P(C4) = 0.5
P(hair | C4) = 1.0
. . .
Existing Classification Structure
• Add "mammal":
• Add "bird":
Operators contd ...
• While the first two operators are effective in many ways - by themselves they are very sensitive to ordering of input data.
• Merging & splitting operators implemented to guard against these effects.
• Merging• Two nodes of a level are combined in hope that
the resultant partition is of better quality.• Involves creating a new node• Two original nodes are made children of newly
created node.• Splitting
• Node may be deleted and its children promoted.
Merging & Splitting Operators
P
A B
P
A B
New node• Node Merging
• Node Splitting
P
A B
P
A B
COBWEB Control Structure
COBWEB ( Object , Root of classification tree )1. Update counts of the Root2. IF Root is a leaf THEN Return the expanded leaf to accommodate Object ELSE Find the child of Root which best hosts Object & perform one of the following:
a. Consider creating a new class & do so if appropriate b. Consider node merging & do so if appropriate, call COBWEB ( Object, Merged node ) c. Consider node splitting & do so if appropriate, call COBWEB ( Object, Root ) d. IF None of the above were performed THEN Call COBWEB ( Object, Best child of Root )
AutoClass
• Cheeseman et al - 1988• Bayesian statistical technique
• Bayes' theorem - formula for combining probabilities• Technique determines:
• Most probable number of classes• Their probabilistic descriptions• Probability that each object is a member of each class
• AutoClass does not do absolute partitioning of data into classes.• Calculates the probability of each object's membership in
each class.