prof nb venkateswarlu head, it, gvpcoe visakhapatnam venkat_ritch@yahoo
DESCRIPTION
Prof NB Venkateswarlu Head, IT, GVPCOE Visakhapatnam [email protected] www.ritchcenter.com/nbv. First Let me say Hearty Welcome to you All. Also, let me congrachulate Chairman, Secretary/Correspondent. Principal, Prof. Ravindra Babu Vice-Principal. - PowerPoint PPT PresentationTRANSCRIPT
Prof NB Venkateswarlu
Head, IT, GVPCOE
www.ritchcenter.com/nbv
First Let me say Hearty Welcome
to you All
Also, let mecongrachulate
Chairman,
Secretary/Correspondent
Principal,
Prof. Ravindra Babu
Vice-Principal
and other Organizers for planning for such a nice workshop with excellent themes.
Feature Extraction/ Selection
My Talk
A Typical Image Processing System contains
Image Acquisition
Image Pre-Processing
Image En-hancement
Image Seg-mentation
Image Featu-re Extraction
Image Class-fication
Image Unde-rstanding
Two Aspects of Feature Extraction
Extracting useful features from images or any other measurements.
Identifying Transformed Variables which are functions of original variables and having some charcateristics.
Feature Selection
Selecting Important Variables is Feature Selection
Some Features Used in I.P Applications
• Shape based
• Contour based
• Area based
• Transform based
• Projections
• Signature
• Problem specific
Perimeter, length etc. First Convex hull is extracted
Skeletons
Averaged Radial density
Radial Basis functions
Rose Plots
Chain Codes
Crack code - 32330300
Signature
Bending Energy
Chord Distribution
Fourier Descriptors
Structure
Splines
Horizontal and vertical projections
Elongatedness
Convex Hull
Compactness
RGB, R ,G and B bands
Classification/Pattern Recognition
• Statistical
• Syntactical Linguistic
• Discriminant function
• Fuzzy
• Neural
• Hybrid
Dimensionality Reduction
• Feature selection (i.e., attribute subset selection):– Select a minimum set of features such that the probability
distribution of different classes given the values for those features is as close as possible to the original distribution given the values of all features
– reduce # of patterns in the patterns, easier to understand
• Heuristic methods (due to exponential # of choices):– step-wise forward selection– step-wise backward elimination– combining forward selection and backward elimination– decision-tree induction
Example of Decision Tree Induction
Initial attribute set:{A1, A2, A3, A4, A5, A6}
A4 ?
A1? A6?
Class 1 Class 2 Class 1 Class 2
> Reduced attribute set: {A1, A4, A6}
Heuristic Feature Selection Methods
• There are 2d possible sub-features of d features• Several heuristic feature selection methods:
– Best single features under the feature independence assumption: choose by significance tests.
– Best step-wise feature selection: • The best single-feature is picked first• Then next best feature condition to the first, ...
– Step-wise feature elimination:• Repeatedly eliminate the worst feature
– Best combined feature selection and elimination:– Optimal branch and bound:
• Use feature elimination and backtracking
Why do We need?
• A classifier performance depends on• No of features• Feature distinguishability• No of groups• Groups characteristics in multidimensional
space.• Needed response time• Memory requirements
Feature Extraction Methods
We will find transformed variables which are functions of original variables.
A good example: Though we may conduct tests in more than test (K-D), finally grading is done based on total marks (1-D)
Principal Component Analysis
• Given N data vectors from k-dimensions, find c <= k orthogonal vectors that can be best used to represent data – The original data set is reduced to one consisting of N data
vectors on c principal components (reduced dimensions)
• Each data vector is a linear combination of the c principal component vectors
• Works for numeric data only
• Used when the number of dimensions is large
Principal Component Analysis
X1
X2
Y1
Y2
Principal Component Analysis
Aimed at finding new co-ordinate system which has some characteristics.
M=[4.5 4.25 ]Cov Matrix [ 2.57 1.86 ] [ 1.86 6.21]Eigen Values = 6.99, 1.79Eigen Vectors = [ 0.387 0.922 ] [ -0.922 0.387 ]
However in some cases it is not possible to have PCA working.
Canonical Analysis
Unlike PCA which takes global mean and covariance, this takes between the group and within the group covariance matrix and the calculates canonical axes.
Standard Deviation – A Simple Indicator
Correlation Coefficient
Feature Selection –
Group Separability
Indices
Feature Selection Through
Clustering
Selecting From 4 variables
Multi-Layer Perceptron
Output nodes
Input nodes
Hidden nodes
Output vector
Input vector: xi
wij
i
jiijj OwI
jIje
O
1
1
))(1( jjjjj OTOOErr
jkk
kjjj wErrOOErr )1(
ijijij OErrlww )(jjj Errl)(
Network Pruning and Rule Extraction
• Network pruning– Fully connected network will be hard to articulate
– N input nodes, h hidden nodes and m output nodes lead to h(m+N) weights
– Pruning: Remove some of the links without affecting classification accuracy of the network
• Extracting rules from a trained network– Discretize activation values; replace individual activation value by the cluster
average maintaining the network accuracy
– Enumerate the output from the discretized activation values to find rules between activation value and output
– Find the relationship between the input and activation value
– Combine the above two to have rules relating the output to input
Neural Networks for Feature Extraction
Self-organizing feature maps (SOMs)
• Clustering is also performed by having several units competing for the current object
• The unit whose weight vector is closest to the current object wins
• The winner and its neighbors learn by having their weights adjusted
• SOMs are believed to resemble processing that can occur in the brain
• Useful for visualizing high-dimensional data in 2- or 3-D space
Other Model-Based Clustering Methods
• Neural network approaches– Represent each cluster as an exemplar, acting as a
“prototype” of the cluster– New objects are distributed to the cluster whose exemplar
is the most similar according to some dostance measure
• Competitive learning– Involves a hierarchical architecture of several units
(neurons)– Neurons compete in a “winner-takes-all” fashion for the
object currently being presented
Model-Based Clustering Methods
SVM
SVM constructs nonlinear decision functions by training classifier to perform a linear separation in some high dimensional space which is nonlinearly related to the input space. A Mercer kernel is used for mapping.