data analysis methods and applications: hyperspectral band ...hyperspectral imagery (hsi)...
TRANSCRIPT
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Data Analysis Methods and Applications:Hyperspectral Band Selection and Data
Classification on Embedded Grassmannians
Sofya Chepushtanova
Department of MathematicsColorado State University
February 10, 2014
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 1 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Outline
1. IntroductionMotivationSparse SVMs
2. Hyperspectral Band SelectionHyperspectral Imagery (HSI)AlgorithmComputational ResultsFuture Work
3. Classification of Data on GrassmanniansGrassmannian FrameworkAlgorithmApplication to HSIFuture Work
4. Future Directions
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 2 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Motivation
Application-driven research
Algorithms for Threat Detection (ATD) program (launched in 2009):developing novel mathematical and statistical methods to extractmeaningful information from large data streams
Big data: massive, high-dimensional, complex
Growing demand for geometric data analysis, classification, anddimension reduction models
Dimension reduction - how?Feature extraction: transforms the data to a lower dimensional space,using manifold learning techniquesFeature selection: identifies the relevant set of features whilemaintaining or improving the performance of a prediction model
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 3 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Support Vector Machines
Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m;D = diag(di) and X is the m× n data matrix.Separating hyperplane P = {x : wTx + b = 0},w ∈ Rn is normal to P.Points on wTx + b = ±1 are support vectors.The optimal P has the largest margin 2/‖w‖2.
SVM:
minw,b,ξ
‖w‖22
2+ CeTξ
s. t. D(Xw + be) + ξ ≥ e,
ξ ≥ 0.
Decision function:f (x) = sgn(wTx + b)
Class +1
Margin
W
Class -1
WTx+b=0
WTx+b=1
Misclassified
points
WTx+b=-1
Optimal
Separating
Hyperplane
Support vectors
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 4 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Nonlinear SVM: Kernel Trick
Φ : x ∈ RN 7→ Φ(x) ∈ RN′ ,N′ > N.Kernel function Kij = K(xi, xj) = Φ(xi)
TΦ(xj).
Ф
Input
Space
Feature
Space
the decision function is f (x) = sgn(∑m
i=1 αidiK(xi, x) + b).RBF K(xi, x) = exp(−γ‖xi − x‖2),polynomial K(xi, x) = (xT
i x + 1)n.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 5 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Arbitrary-Norm Separating Hyperplane
Dual norm
For a norm ‖· ‖ on R, the dual norm ‖x‖′ := max‖y‖=1
xTy.
Example: for p, q ∈ [1,∞], 1/p + 1/q = 1, the p-norm and q-normare dual.
Theorem (Mangasarian, 1998)
Let q ∈ Rn be any point not on the plane P := {x|wTx + b = 0},0 6= w ∈ Rn, b ∈ R.Then the distance between q and p(q) is given by:
‖q− p(q)‖ =|wTq + b|‖w‖′
.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 6 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Sparse SVMs
Corollary
‖q− p(q)‖∞ = |wTq + b|/‖w‖1
(where ‖x‖1 =∑n
i=1 |xi| and ‖x‖∞ = maxi{|xi|})
If the `∞-norm is used to measure the distance between the planes,then the margin is given by 2/‖w‖1, which yields the followingsparse SVM (SSVM):
minw,b,ξ
‖w‖1 + CeTξ
s. t. D(Xw + be) + ξ ≥ e,
ξ ≥ 0.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 7 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Sparse SVMs
SSVM⇒ LP (with ‖w‖1 = w+ + w− and w = w+ − w−):min
w+,w−,b,ξeT(w+ + w−) + CeTξ
s. t. D(X(w+ − w−) + be) + ξ ≥ e,
w+,w−, ξ ≥ 0.
Sparsity of `1-norm:
−3 −2 −1 0 1 2 3 4
−1.5
−1
−0.5
0
0.5
1
1.5
2
x1
x2
Class −1Class +12−norm hyperplanes1−norm hyperplanes
−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
1.2
w1
w2
feasible set1−norm locus2−norm locussolution to 1−norm SVMsolution to 2−norm SVM
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 8 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Hyperspectral Imagery (HSI)
Hyperspectral sensors generate imagery in the electromagneticspectrum, capturing aspects that are imperceptible to the humaneye.
The radiance of materials is measured within each pixel area at avery large number of contiguous spectral wavelength bands.
Spatial and spectral information is contained in data cubes.
Each pixel is a vector x ∈ Rn .
Z,bands
X,columns of pixels
Y,rows
of pixels
20 40 60 80 100 120 140 160 180 200 220
2000
3000
4000
5000
6000
7000
Band index
Spe
ctra
l rad
ianc
e
AlfalfaCorn−notillCorn−minCornGrass−PastureGrass−TreesGrass−PastureMowedHay−windrowedOatsSoybeans−notillSoybeans−minSoybeans−cleanWheatWoodsBldg−Grass−Trees−DrivesStone−steel Towers
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 9 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Hyperspectral Imagery (HSI)
Advantage: rich detailed radiance informationDisadvantage: huge amount of data (more is not always better)
Band selectionidentify a subset of bands that contain the most discriminatoryinformation→ use them for further analysis
Methods1 Filters:
all bands→ filter→ band subset→ predictor2 Wrappers:
all bands→ space of band subsets→ predictor (wrapper)→ bandsubset
3 Embedded algorithms:all bands→ predictor→ band subset
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 10 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Band Selection via SSVMs(Collaborators: M. Kirby and C. Gittins)
A linear SSVM: basic model for band selection. We solve it bythe primal dual interior point method. This allows one to monitorthe variation of the primal and dual variables simultaneously.
A weight ratio criterion for embedded band selection: allows toeasily distinguish the non-zero weights from the zero weights.
The bagging (Bootstrap AGGregatING) approach is employed toenhance the robustness of SSVMs.
We extend the binary band selection to the multiclass case.
The SSVM algorithm is an effective technique for embeddedband selection⇒ high accuracies in numerical experiments.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 11 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Recall: Sparse Linear SVMs
Training data xi ∈ Rn with class labelsdi ∈ {−1,+1}, i = 1, . . . ,m;D = diag(di) and X is the m× n data matrix.Separating hyperplane P = {x : wTx + b = 0},w ∈ Rn is normal to P.Points on wTx + b = ±1 are support vectors.The optimal P has the largest margin 2/‖w‖1.
SSVM:
minw,b,ξ
‖w‖1 + CeTξ
s. t. D(Xw + be) + ξ ≥ e,
ξ ≥ 0.
Decision function:f (x) = sgn(wTx + b)
Class +1
Margin
W
Class -1
WTx+b=0
WTx+b=1
Misclassified
points
WTx+b=-1
Optimal
Separating
Hyperplane
Support vectors
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 12 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Sparsity in w
Comparison of weights for sparse SVMand standard SVM models using twoclasses of a hyperspectral data set.
0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4−1.5
−1
−0.5
0
0.5
1
1.5x 10
−3
Wavelength (µm)
Wei
ghts
Sparse SVM weights
Standard SVM weights
Weight ratio criterionThe resulting weights of the modelw1,w2, . . . ,wl are ordered s.t.:
|wi1 | ≥ |wi2 | ≥ · · · ≥ |wil |.
The key feature of this sparseapproach is that
|wik ||wik+1 |
= O(1)
save for where the weights transitionto zero:
|wik∗ ||wik∗+1|
= O(10M).
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 13 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Bootstrap Aggregating (Breiman, 1996)
To enforce stability: sample with replacement from n-dimensionaltraining data to compute N SSVM models:
band 1: [w11 w2
1 · · · wN1 ]
band 2: [w12 w2
2 · · · wN2 ]
...... · · ·
...band j: [w1
j w2j · · · wN
j ]...
... · · ·...
band n: [w1n w2
n · · · wNn ]
To reduce the number of bands, we eliminate those with at least 95%of “zeros” in the samples:
#{|wjk| < tolerance, j = 1, . . . ,N} ≥ 0.95 ∗ N for the k-th band.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 14 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Algorithm
Input: Set of bands S = {1, 2, . . . , n}
Sample withreplacement from
train data X ⇒X1,X2, . . . ,XN
Train N SSVMmodels fj(x) ⇒ Nweight vectors wj
Remove k-th band if#{|wi
k| < tol, i =1, . . . ,N} ≥ 0.95 ∗N ⇒S = S \ k,Xnew = X(:, S)
Train an SSVM onXnew ⇒ w, rankw values → wr
In wr comparemagnitude orders:if |wr
ik |/|wrik+1| >
10 for someik = i∗k , remove
bands starting fromi∗k + 1, update S
Train a finalSSVM model f onXnew = Xnew(:, S)
Return: bandset S, model f
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 15 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Multiclass Band Selection
One-against-one (OAO) SSVMs
k classes→(k
2
)= k(k−1)
2 binary classifiers→ majority voting to assign class to a testing point.
MethodsMethod I: Rank selected bands by the frequency of their occurrence.
Method II: Rank bands in each two-class subset by magnitude andtake the superset of the M top bands.
Method III: The Ward’s Linkage Strategy Using MutualInformation (WaLuMI) method (Martinez-Uso et al, 2007) is a filtermethod that we employ as a pre-selection step.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 16 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Spatial Smoothing
Adopted from Zare & Gader, 2008:After a texting pixel X has been assigned a class vote via OAOSSVMs, spatial smoothing can be done by summing class votes overthe eight-connected neighborhood of the pixel X
1 1 11 X 11 1 1
1 1 11 X 22 2 2
1 1 12 X 23 3 3
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 17 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
AVIRIS Indian Pines Data Set
20 40 60 80 100 120 140
20
40
60
80
100
120
140
20 40 60 80 100 120 140
20
40
60
80
100
120
140Background
Alfalfa
Corn−notill
Corn−min
Corn
Grass−Pasture
Grass−Trees
Grass−PastureMowed
Hay−windrowed
Oats
Soybeans−notill
Soybeans−min
Soybeans−clean
Wheat
Woods
Bldg−Grass−Trees−Drives
Stone−steel Towers
Aiborne Visible/Infrared Imaging Spectrometer (AVIRIS): collected inan agricultural area of northern Indiana in 1992.
145× 145 images, 220 spectral bands (ranging from 0.4 to 2.5µm).
Ground truth is known for 49% of the pixels.
16 classes ranging from 20 to 2468 pixels.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 18 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Comparison with Other Methods
1 WaLuMI: hierarchical clustering approach that exploits bandcorrelation using a mutual information (MI) criterion(Martinez-Uso et al, 2007).
2 B-SPICE: simultaneous band selection and endmember detection(Zare & Gader, 2008).
3 Lasso Logistic Regression:
minβ0,β− 1
m
m∑i=1
yi(β0 + xTi β)− log(1 + e(β0+xT
i β)) + λ‖β‖1.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 19 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Binary Band Selection
Weight magnitudes:
Corn-min and Woods Corn-notill and Grass/TreesBand Weight Band Weight
29 1.4249e-03 1 1.0202e-0341 1.3191e-03 9 9.6991e-0428 3.5594e-08 5 6.5283e-0442 1.6342e-09 29 8.3022e-0927 1.3258e-09 32 4.2466e-09
Accuracy rates (%) for binary band selection:
Classes Accuracy: SSVM Algorithm WaLuMI + SSVM Lasso Logistic Regressionall bands # Bands Kept Accuracy # Bands Kept Accuracy # Bands Kept Accuracy
Corn-min andWoods 100.00 2 100.00 2 99.9 12 100.00
Corn-notill andGrass/Trees 99.73 12 99.73 12 100 19 98.9
Soybeans-notill andSoybeans-min 89.58 179 89.23 - - 127 89.52
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 20 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Binary Band Selection
Spectral signatures and weights of selected bands for:
20 40 60 80 100 120 140 160 180 200 2200
5000
10000
SP
EC
TR
AL
RA
DIA
NC
E
BAND INDEX20 40 60 80 100 120 140 160 180 200 220
0
1
2x 10
−3
WE
IGH
T
20 40 60 80 100 120 140 160 180 200 2200
5000
10000
SP
EC
TR
AL
RA
DIA
NC
E
BAND INDEX20 40 60 80 100 120 140 160 180 200 220
0
0.005
0.01
WE
IGH
T
20 40 60 80 100 120 140 160 180 200 2200
2000
4000
6000
SP
EC
TR
AL
RA
DIA
NC
E
BAND INDEX20 40 60 80 100 120 140 160 180 200 220
0
0.02
0.04
0.06
WE
IGH
T
Corn-min & Woods Corn-notill & Grass/TreesSoybeans-notill &
Soybeans-min
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 21 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Multiclass Band Selection
Number of bands selected for each of(16
2
)subsets (pairs of classes)
and number of occurrences of each band.
Class number
Cla
ss n
um
ber
2 4 6 8 10 12 14 16
2
4
6
8
10
12
14
160
10
20
30
40
50
60
70
80
90
100
110
120
130
140
150
160
170
50 100 150 2000
10
20
30
40
50
60
70
Band index
Occ
urr
ence
nu
mb
er
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 22 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Multiclass Band Selection
Accuracy plots for OAO SSVM before and after spatial smoothingobtained by Methods I and III.
20 40 60 80 100 120 140 160 180 200 22020
30
40
50
60
70
80
90
100
Number of bands
Cla
ssif
icat
ion
Acc
ura
cy (
%)
Method I
Method I + Spatial Smoothing
Method III
Method III + Spatial Smoothing
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 23 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Multiclass Band Selection
Accuracy results for multiclass band selection (%) and comparisonwith other methods
# Bands Kept Method I Method II Method III Comparison(frequency) (top bands) (WaLuMI + SSVM) B-SPICE + WaLuMI +
RVM NN220 98.36 - 98.36 93.9 -80 97.14 - 96.89 - -57 95.66 97.3 96.22 - -34 93.15 - 93.03 86.4 8019 91.20 - 92.57 82.5 8110 84.37 - 93.07 - 81
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 24 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Future Work
Apply the algorithm to other data sets (not necessarily HSI).
Consider using kernel SSVMs instead of linear: the resultingdimension reduction is not in the number of input space featuresbut in the number of kernel functions, so it is interesting toinvestigate how a feature selection tool can be build in thenonlinear predictor.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 25 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
References
V. N. Vapnik, The nature of statistical learning theory, New York: Springer, 1995
L. Breiman, Bagging predictors, Machine learning, 24, pp. 123-140, 1996
O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters,24, pp. 15-23, 1997
J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, and M. Song, Dimensionalityreduction via sparse support vector machines, Journal of Machine LearningResearch, 3, pp. 1229-1243, 2003
O. L. Mangasarian, Exact 1-norm support vector machines via unconstrainedconvex differentiable minimization, Journal of Machine Learning Research, 7, pp.1517-1530, 2006
A. Zare and P. Gauder, Hyperspectral band selection and endmember detectionusing sparsity promoting priors, IEEE Geoscience and remote sensing letters, vol.5, no.2, pp. 256-260, 2008
A. Martinez-Uso, F. Pla, J. M. Sotoca, and P. Garcia-Sevilla, Clustering-basedhyperspectral band selection using information measures, IEEE Transactions onGeoscience and Remote Sensing, vol. 45, no. 12, pp.4158-4171, 2007.
S. Chepushtanova, C. Gittins, and M. Kirby, Band Selection in HyperspectralImagery Using Sparse Support Vector Machines, submitted
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 26 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Classification of Data on Grassmannians(Collaborator: M. Kirby)
Set-to-set pattern recognition: a set of points from a classcharacterizes the variability of the class information.
Grassmann manifolds G(k, n) (collections of k-dimensionalsubspaces of Rn) provide a geometric framework forcharacterizing sets of points.
Subspaces can be realized as points in Euclidean space viamultidimensional scaling.
Sparse support vector machine identifies optimal dimensionsof embedded subspaces.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 27 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Grassmann Manifold
Definition
The Grassmann manifold G(k, n) is the collection of allk-dimensional linear subspaces of Rn, 1 ≤ k ≤ n.
Example: G(1, n) is the set of all lines going through the origin ofRn (projective space RPn−1).
Note 1: An element of G(k, n) can be represented by an n× korthogonal matrix U (UTU = Ik).
Note 2: The matrix representation on G(k, n) is not unique: we sayU1 = U2 if span(U1) = span(U2).
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 28 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Constructing Points on G(k, n):
Data points in Rn, dimension k
Form n × k “tall-skinny”matrices Y1,Y2, . . . ,YN
U1,U2, . . . ,UN representpoints on G(k, n) (or take
UjVTj = arg min
PT P=I‖Yj − P‖F).
sample data
SVD: Yi = UiΣiVTi
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 29 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Geodesic distance
θ
up
vp
Uispan )(
Ujspan )(
Principal angles
0 < θ1 ≤ θ2 ≤ . . . ≤ θk ≤ π/2given by cos θp = max
up∈span(Ui)max
vp∈span(Uj)uT
p vp,
where uTp up = 1, vT
p vp = 1, uTp uq = 0, vT
p vq = 0, q = 1, . . . , p− 1.
Geodesic distance (or arclength)
dG(Ui,Uj) = ‖θ‖2 =√∑k
p=1 θ2p
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 30 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Embedding G(k, n) in Rd via Multidimensional Scaling (MDS)
Classical MDS (Mardia):
Input: Distance matrix D ∈ RN×N with Dij = dG(Ui,Uj).1 Compute B = HAH, where H = I − 1
N eeT and Aij = −12 D2
ij(e is a vector of N ones).
2 Compute the spectral decomposition of B: B = ΓΛΓT .3 Set X := ΓΛ
12 .
Output: X, a configuration of points in Rd, whered = rank(B) = rank(X) ≤ N − 1. (Note: Be = 0e.)
Note: if B is positive semidefinite, the configuration preserves thegeodesic distances, otherwise we adopt the resulting scaling as thebest approximation we can obtain.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 31 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Algorithm
Compute points and geodesicdistance matrix on G(k, n)
Embed subspaces in Euclidean space (MDS)preserving distances
Feature (dimension) selectionand classification via SSVMs
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 32 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Application to HSI
20 40 60 80 100 120 140
20
40
60
80
100
120
140
20 40 60 80 100 120 140
20
40
60
80
100
120
140Background
Alfalfa
Corn−notill
Corn−min
Corn
Grass−Pasture
Grass−Trees
Grass−PastureMowed
Hay−windrowed
Oats
Soybeans−notill
Soybeans−min
Soybeans−clean
Wheat
Woods
Bldg−Grass−Trees−Drives
Stone−steel Towers
Aiborne Visible/Infrared Imaging Spectrometer (AVIRIS): collected inan agricultural area of northern Indiana in 1992.
145× 145 images, 220 spectral bands (ranging from 0.4 to 2.5µm).
Ground truth is known for 49% of the pixels.
16 classes ranging from 20 to 2468 pixels.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 33 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Configurations in Euclidean Space
2 Classes: Corn-notill (blue) and Grass/Pasture (red).Dimensions correspond to the two top eigenvalues of B (MDS).Solid dots - training set, hollow dots - testing set
−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1
0
1
2
3k = 1
−1.5 −1 −0.5 0 0.5 1 1.5 2−3
−2
−1
0
1k = 2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−3
−2
−1
0
1
2k = 3
−1 −0.5 0 0.5 1 1.5−1
−0.5
0
0.5
1k = 5
−1 −0.5 0 0.5 1−1.5
−1
−0.5
0
0.5
1k = 10
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1.5
−1
−0.5
0
0.5
1k = 15
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 34 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Configurations in Euclidean Space
3-Classes: Corn-notill (blue), Grass/Pasture (red) and Grass/Trees(green).Dimensions correspond to the two top eigenvalues of B (MDS).(Solid dots - training set, hollow dots - testing set)
−3 −2 −1 0 1 2 3−4
−2
0
2k = 1
−2.5 −2 −1.5 −1 −0.5 0 0.5 1−2
−1
0
1
2k = 2
−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2
−1
0
1
2k = 3
−1 −0.5 0 0.5 1 1.5−2
−1
0
1
2k = 5
−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1
−0.5
0
0.5
1k = 10
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1
−0.5
0
0.5
1k = 15
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 35 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Classification
SSVM applied toconfiguration of points onG(15, 220) embedded inEuclidean space:
Corn-notill (blue) andGrass/Pasture (red)
Dimensions correspond tothe two largest absolutevalues of the sparse weightvector w −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 36 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Classification
Accuracy as a funcion of k:
2 4 6 8 10 12 14 16 18 200.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Dimension of subspaces k
Acc
urac
y
Training set
Testing set
Corn-notill and Grass/Pasture
2 4 6 8 10 12 14 16 18 200.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
Dimension of subspaces kA
ccur
acy
Training set
Testing set
Corn-notill, Grass/Pasture, and Grass/Trees
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 37 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Feature Selection in Embedded Spaces
Corn-notill versus Grass/Pasture: N = 200 constructed points on G(k, 220)
Dimension of Dimension of Number of Number of Features Number ofsubspaces feature space negative zero selected features
k of embedded eigenvalues eigenvalues selectedpoints, d of B of B
1 131 68 1 1-3,5-7,10 72 156 43 1 1-6,8,11 83 126 73 1 1-6,10-13,16-18,20,23,43,39,47,62,74 205 147 52 1 1,3,6,9,14,15,18,19,34,37,39,42,52,63 1410 195 4 1 1,4,5,8,15,28,38,65,71 920 199 0 1 1,3,24,31,63 525 199 0 1 1,2,8,14 4
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 38 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Future Work
Use other distances provided by the principal angles, forinstance, projection F-norm, ‖sin θ‖2.
Compare results of the multiclass case with the literature.
Determine computationally the optimal number of constructedpoints on G(k, n) for training and testing.
Apply the method to other HSI and medical data sets.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 39 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
References
J.-M. Chang, et al. Recognition of digital images of the human face at ultralow resolution via illumination spaces, Proceedings of the 8th Asianconference on Computer vision - Volume Part II, pp. 733-743, 2007.
A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms withorthogonality constraints, SIAM J. MATRIX ANAL. APPL, 20(2), pp.303-353, 1998.
K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, AcademicPress, 1979.
V. N. Vapnik, The nature of statistical learning theory, New York: Springer,1995.O. L. Mangasarian, Exact 1-norm support vector machines viaunconstrained convex differentiable minimization, Journal of MachineLearning Research, 7, pp. 1517-1530, 2006.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 40 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Ellipsoidal Separation: Motivation
Important application: medical diagnosis. In particular, we areinterested in diagnosis of neonatal sepsis, for the data setcollected in Yale-New Haven Hospital’s Neonatal Intensive CareUnit (NICU).
We expect points from the same class to be close to each other,i.e. to be enclosed in a hull or ball. Ellipsoids, being the affinedeformations of balls, will make the separation procedure scalinginvariant.
Ellipsoids are simple convex sets.
Ellipsoidal separation can be modelled as a semidefinite program(SDP) which can be solved efficiently.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 41 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Ellipsoids: Facts
An ellipsoid is the image of a unit ball {xTx ≤ 1} under an affinetransformation.
Given a center c and an n× n symmetric positive semidefinite matrixE (E � 0), we can define an ellipsoid as{x ∈ Rn|(x− c)TE(x− c) ≤ 1}.
The condition E � 0 is crucial for ellipsoids: if not satisfied, theequation above may describe any quadratic set.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 42 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Feasibility Problem (Boyd and Vandenberghe 2004)
find P, q, r
s. t. xTi Pxi + qTxi + r ≥ 1, i = 1, . . . ,N,
yTi Pyi + qTyi + r ≤ −1, i = 1, . . . ,M,
P ≺ 0.
Note: the constraint P ≺ 0 can be expressed as P � −I (due homogeneity off in P, q, r)
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 43 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Non-separable Data
minimizeP,q,r
eTξ + eTτ
subject to xTi Pxi + qTxi + r ≥ 1− ξi, i = 1, . . . ,N,
yTi Pyi + qTyi + r ≤ −1 + τi, i = 1, . . . ,M,
P � −I, ξ, τ ≥ 0.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 44 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
References
S. Boyd and L. Vandenberghe Convex Optimization, Cambridge UniversityPress, New York, NY, USA, 2004.
M. Grant and S. Boyd, Graph implementations for nonsmooth convexprograms, In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advancesin Learning and Control, Lecture Notes in Control and InformationSciences, pp. 95-110. Springer-Verlag Limited, 2008,http://stanford.edu/~boyd/graph_dcp.html.
M. Grant and S. Boyd, CVX: Matlab software for disciplined convexprogramming, version 2.0 beta. http://cvxr.com/cvx, September2013.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 45 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Topological Data Analysis (TDA)
Basic idea: to describe the “shape of the data” by findingclusters, holes, tunnels, etc.
Persistent homology (PH): a rapidly growing branch of TDA.
PH can be applied to a data set to capture the persistance oftopological structure across scales.
Application of PH to hyperspectral remote sensing data analysis.See, e.g., Afra Zomorodian and Gunnar Carlsson Computingpersistent homology, Discrete Comput. Geom. 33 (2005), 2, 249–274.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 46 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
Persistent Homology
Encoded in the form of a parameterized version of a Betti number,called a barcode: a set of line segments each representing the range ofparameter values over which a topological feature persists.
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 47 OF 48
INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS
THANK YOU FOR YOUR ATTENTION!
SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 48 OF 48