data analysis methods and applications: hyperspectral band ...hyperspectral imagery (hsi)...

INTRODUCTION HYPERSPECTRAL BAND SELECTION CLASSIFICATION OF DATA ON GRASSMANNIANS FUTURE DIRECTIONS

Data Analysis Methods and Applications:Hyperspectral Band Selection and Data

Classification on Embedded Grassmannians

Sofya Chepushtanova

Department of MathematicsColorado State University

February 10, 2014

SOFYA CHEPUSHTANOVA COLORADO STATE UNIVERSITY 1 OF 48


Outline

1. IntroductionMotivationSparse SVMs

2. Hyperspectral Band SelectionHyperspectral Imagery (HSI)AlgorithmComputational ResultsFuture Work

3. Classification of Data on GrassmanniansGrassmannian FrameworkAlgorithmApplication to HSIFuture Work

4. Future Directions



Motivation

Application-driven research

Algorithms for Threat Detection (ATD) program (launched in 2009):developing novel mathematical and statistical methods to extractmeaningful information from large data streams

Big data: massive, high-dimensional, complex

Growing demand for geometric data analysis, classification, anddimension reduction models

Dimension reduction - how?Feature extraction: transforms the data to a lower dimensional space,using manifold learning techniquesFeature selection: identifies the relevant set of features whilemaintaining or improving the performance of a prediction model



Support Vector Machines

Training data xi ∈ Rn with class labels di ∈ {−1,+1}, i = 1, . . . ,m;D = diag(di) and X is the m× n data matrix.Separating hyperplane P = {x : wTx + b = 0},w ∈ Rn is normal to P.Points on wTx + b = ±1 are support vectors.The optimal P has the largest margin 2/‖w‖2.

SVM:

minw,b,ξ

‖w‖22

2+ CeTξ

s. t. D(Xw + be) + ξ ≥ e,

ξ ≥ 0.

Decision function:f (x) = sgn(wTx + b)

Class +1

Margin

W

Class -1

WTx+b=0

WTx+b=1

Misclassified

points

WTx+b=-1

Optimal

Separating

Hyperplane

Support vectors



Nonlinear SVM: Kernel Trick

Φ : x ∈ RN 7→ Φ(x) ∈ RN′ ,N′ > N.Kernel function Kij = K(xi, xj) = Φ(xi)

TΦ(xj).

Ф

Input

Space

Feature

Space

the decision function is f (x) = sgn(∑m

i=1 αidiK(xi, x) + b).RBF K(xi, x) = exp(−γ‖xi − x‖2),polynomial K(xi, x) = (xT

i x + 1)n.



Arbitrary-Norm Separating Hyperplane

Dual norm

For a norm ‖· ‖ on R, the dual norm ‖x‖′ := max‖y‖=1

xTy.

Example: for p, q ∈ [1,∞], 1/p + 1/q = 1, the p-norm and q-normare dual.

Theorem (Mangasarian, 1998)

Let q ∈ Rn be any point not on the plane P := {x|wTx + b = 0},0 6= w ∈ Rn, b ∈ R.Then the distance between q and p(q) is given by:

‖q− p(q)‖ =|wTq + b|‖w‖′

.



Sparse SVMs

Corollary

‖q− p(q)‖∞ = |wTq + b|/‖w‖1

(where ‖x‖1 =∑n

i=1 |xi| and ‖x‖∞ = maxi{|xi|})

If the `∞-norm is used to measure the distance between the planes,then the margin is given by 2/‖w‖1, which yields the followingsparse SVM (SSVM):

minw,b,ξ

‖w‖1 + CeTξ

s. t. D(Xw + be) + ξ ≥ e,

ξ ≥ 0.



Sparse SVMs

SSVM⇒ LP (with ‖w‖1 = w+ + w− and w = w+ − w−):min

w+,w−,b,ξeT(w+ + w−) + CeTξ

s. t. D(X(w+ − w−) + be) + ξ ≥ e,

w+,w−, ξ ≥ 0.

Sparsity of `1-norm:

−3 −2 −1 0 1 2 3 4

−1.5

−1

−0.5

0

0.5

1

1.5

2

x1

x2

Class −1Class +12−norm hyperplanes1−norm hyperplanes

−0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

1.2

w1

w2

feasible set1−norm locus2−norm locussolution to 1−norm SVMsolution to 2−norm SVM



Hyperspectral Imagery (HSI)

Hyperspectral sensors generate imagery in the electromagneticspectrum, capturing aspects that are imperceptible to the humaneye.

The radiance of materials is measured within each pixel area at avery large number of contiguous spectral wavelength bands.

Spatial and spectral information is contained in data cubes.

Each pixel is a vector x ∈ Rn .

Z,bands

X,columns of pixels

Y,rows

of pixels

20 40 60 80 100 120 140 160 180 200 220

2000

3000

4000

5000

6000

7000

Band index

Spe

ctra

l rad

ianc

e

AlfalfaCorn−notillCorn−minCornGrass−PastureGrass−TreesGrass−PastureMowedHay−windrowedOatsSoybeans−notillSoybeans−minSoybeans−cleanWheatWoodsBldg−Grass−Trees−DrivesStone−steel Towers



Hyperspectral Imagery (HSI)

Advantage: rich detailed radiance informationDisadvantage: huge amount of data (more is not always better)

Band selectionidentify a subset of bands that contain the most discriminatoryinformation→ use them for further analysis

Methods1 Filters:

all bands→ filter→ band subset→ predictor2 Wrappers:

all bands→ space of band subsets→ predictor (wrapper)→ bandsubset

3 Embedded algorithms:all bands→ predictor→ band subset



Band Selection via SSVMs(Collaborators: M. Kirby and C. Gittins)

A linear SSVM: basic model for band selection. We solve it bythe primal dual interior point method. This allows one to monitorthe variation of the primal and dual variables simultaneously.

A weight ratio criterion for embedded band selection: allows toeasily distinguish the non-zero weights from the zero weights.

The bagging (Bootstrap AGGregatING) approach is employed toenhance the robustness of SSVMs.

We extend the binary band selection to the multiclass case.

The SSVM algorithm is an effective technique for embeddedband selection⇒ high accuracies in numerical experiments.



Recall: Sparse Linear SVMs

Training data xi ∈ Rn with class labelsdi ∈ {−1,+1}, i = 1, . . . ,m;D = diag(di) and X is the m× n data matrix.Separating hyperplane P = {x : wTx + b = 0},w ∈ Rn is normal to P.Points on wTx + b = ±1 are support vectors.The optimal P has the largest margin 2/‖w‖1.

SSVM:

minw,b,ξ

‖w‖1 + CeTξ

s. t. D(Xw + be) + ξ ≥ e,

ξ ≥ 0.

Decision function:f (x) = sgn(wTx + b)

Class +1

Margin

W

Class -1

WTx+b=0

WTx+b=1

Misclassified

points

WTx+b=-1

Optimal

Separating

Hyperplane

Support vectors



Sparsity in w

Comparison of weights for sparse SVMand standard SVM models using twoclasses of a hyperspectral data set.

0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 2.4−1.5

−1

−0.5

0

0.5

1

1.5x 10

−3

Wavelength (µm)

Wei

ghts

Sparse SVM weights

Standard SVM weights

Weight ratio criterionThe resulting weights of the modelw1,w2, . . . ,wl are ordered s.t.:

|wi1 | ≥ |wi2 | ≥ · · · ≥ |wil |.

The key feature of this sparseapproach is that

|wik ||wik+1 |

= O(1)

save for where the weights transitionto zero:

|wik∗ ||wik∗+1|

= O(10M).



Bootstrap Aggregating (Breiman, 1996)

To enforce stability: sample with replacement from n-dimensionaltraining data to compute N SSVM models:

band 1: [w11 w2

1 · · · wN1 ]

band 2: [w12 w2

2 · · · wN2 ]

...... · · ·

...band j: [w1

j w2j · · · wN

j ]...

... · · ·...

band n: [w1n w2

n · · · wNn ]

To reduce the number of bands, we eliminate those with at least 95%of “zeros” in the samples:

#{|wjk| < tolerance, j = 1, . . . ,N} ≥ 0.95 ∗ N for the k-th band.



Algorithm

Input: Set of bands S = {1, 2, . . . , n}

Sample withreplacement from

train data X ⇒X1,X2, . . . ,XN

Train N SSVMmodels fj(x) ⇒ Nweight vectors wj

Remove k-th band if#{|wi

k| < tol, i =1, . . . ,N} ≥ 0.95 ∗N ⇒S = S \ k,Xnew = X(:, S)

Train an SSVM onXnew ⇒ w, rankw values → wr

In wr comparemagnitude orders:if |wr

ik |/|wrik+1| >

10 for someik = i∗k , remove

bands starting fromi∗k + 1, update S

Train a finalSSVM model f onXnew = Xnew(:, S)

Return: bandset S, model f



Multiclass Band Selection

One-against-one (OAO) SSVMs

k classes→(k

2

)= k(k−1)

2 binary classifiers→ majority voting to assign class to a testing point.

MethodsMethod I: Rank selected bands by the frequency of their occurrence.

Method II: Rank bands in each two-class subset by magnitude andtake the superset of the M top bands.

Method III: The Ward’s Linkage Strategy Using MutualInformation (WaLuMI) method (Martinez-Uso et al, 2007) is a filtermethod that we employ as a pre-selection step.



Spatial Smoothing

Adopted from Zare & Gader, 2008:After a texting pixel X has been assigned a class vote via OAOSSVMs, spatial smoothing can be done by summing class votes overthe eight-connected neighborhood of the pixel X

1 1 11 X 11 1 1

1 1 11 X 22 2 2

1 1 12 X 23 3 3



AVIRIS Indian Pines Data Set

20 40 60 80 100 120 140

20

40

60

80

100

120

140

20 40 60 80 100 120 140

20

40

60

80

100

120

140Background

Alfalfa

Corn−notill

Corn−min

Corn

Grass−Pasture

Grass−Trees

Grass−PastureMowed

Hay−windrowed

Oats

Soybeans−notill

Soybeans−min

Soybeans−clean

Wheat

Woods

Bldg−Grass−Trees−Drives

Stone−steel Towers

Aiborne Visible/Infrared Imaging Spectrometer (AVIRIS): collected inan agricultural area of northern Indiana in 1992.

145× 145 images, 220 spectral bands (ranging from 0.4 to 2.5µm).

Ground truth is known for 49% of the pixels.

16 classes ranging from 20 to 2468 pixels.



Comparison with Other Methods

1 WaLuMI: hierarchical clustering approach that exploits bandcorrelation using a mutual information (MI) criterion(Martinez-Uso et al, 2007).

2 B-SPICE: simultaneous band selection and endmember detection(Zare & Gader, 2008).

3 Lasso Logistic Regression:

minβ0,β− 1

m

m∑i=1

yi(β0 + xTi β)− log(1 + e(β0+xT

i β)) + λ‖β‖1.



Binary Band Selection

Weight magnitudes:

Corn-min and Woods Corn-notill and Grass/TreesBand Weight Band Weight

29 1.4249e-03 1 1.0202e-0341 1.3191e-03 9 9.6991e-0428 3.5594e-08 5 6.5283e-0442 1.6342e-09 29 8.3022e-0927 1.3258e-09 32 4.2466e-09

Accuracy rates (%) for binary band selection:

Classes Accuracy: SSVM Algorithm WaLuMI + SSVM Lasso Logistic Regressionall bands # Bands Kept Accuracy # Bands Kept Accuracy # Bands Kept Accuracy

Corn-min andWoods 100.00 2 100.00 2 99.9 12 100.00

Corn-notill andGrass/Trees 99.73 12 99.73 12 100 19 98.9

Soybeans-notill andSoybeans-min 89.58 179 89.23 - - 127 89.52



Binary Band Selection

Spectral signatures and weights of selected bands for:

20 40 60 80 100 120 140 160 180 200 2200

5000

10000

SP

EC

TR

AL

RA

DIA

NC

E

BAND INDEX20 40 60 80 100 120 140 160 180 200 220

0

1

2x 10

−3

WE

IGH

T

20 40 60 80 100 120 140 160 180 200 2200

5000

10000

SP

EC

TR

AL

RA

DIA

NC

E

BAND INDEX20 40 60 80 100 120 140 160 180 200 220

0

0.005

0.01

WE

IGH

T

20 40 60 80 100 120 140 160 180 200 2200

2000

4000

6000

SP

EC

TR

AL

RA

DIA

NC

E

BAND INDEX20 40 60 80 100 120 140 160 180 200 220

0

0.02

0.04

0.06

WE

IGH

T

Corn-min & Woods Corn-notill & Grass/TreesSoybeans-notill &

Soybeans-min




Number of bands selected for each of(16

2

)subsets (pairs of classes)

and number of occurrences of each band.

Class number

Cla

ss n

um

ber

2 4 6 8 10 12 14 16

2

4

6

8

10

12

14

160

10

20

30

40

50

60

70

80

90

100

110

120

130

140

150

160

170

50 100 150 2000

10

20

30

40

50

60

70

Band index

Occ

urr

ence

nu

mb

er




Accuracy plots for OAO SSVM before and after spatial smoothingobtained by Methods I and III.

20 40 60 80 100 120 140 160 180 200 22020

30

40

50

60

70

80

90

100

Number of bands

Cla

ssif

icat

ion

Acc

ura

cy (

%)

Method I

Method I + Spatial Smoothing

Method III

Method III + Spatial Smoothing




Accuracy results for multiclass band selection (%) and comparisonwith other methods

# Bands Kept Method I Method II Method III Comparison(frequency) (top bands) (WaLuMI + SSVM) B-SPICE + WaLuMI +

RVM NN220 98.36 - 98.36 93.9 -80 97.14 - 96.89 - -57 95.66 97.3 96.22 - -34 93.15 - 93.03 86.4 8019 91.20 - 92.57 82.5 8110 84.37 - 93.07 - 81



Future Work

Apply the algorithm to other data sets (not necessarily HSI).

Consider using kernel SSVMs instead of linear: the resultingdimension reduction is not in the number of input space featuresbut in the number of kernel functions, so it is interesting toinvestigate how a feature selection tool can be build in thenonlinear predictor.



References

V. N. Vapnik, The nature of statistical learning theory, New York: Springer, 1995

L. Breiman, Bagging predictors, Machine learning, 24, pp. 123-140, 1996

O. L. Mangasarian, Arbitrary-norm separating plane, Operations Research Letters,24, pp. 15-23, 1997

J. Bi, K. P. Bennett, M. Embrechts, C. M. Breneman, and M. Song, Dimensionalityreduction via sparse support vector machines, Journal of Machine LearningResearch, 3, pp. 1229-1243, 2003

O. L. Mangasarian, Exact 1-norm support vector machines via unconstrainedconvex differentiable minimization, Journal of Machine Learning Research, 7, pp.1517-1530, 2006

A. Zare and P. Gauder, Hyperspectral band selection and endmember detectionusing sparsity promoting priors, IEEE Geoscience and remote sensing letters, vol.5, no.2, pp. 256-260, 2008

A. Martinez-Uso, F. Pla, J. M. Sotoca, and P. Garcia-Sevilla, Clustering-basedhyperspectral band selection using information measures, IEEE Transactions onGeoscience and Remote Sensing, vol. 45, no. 12, pp.4158-4171, 2007.

S. Chepushtanova, C. Gittins, and M. Kirby, Band Selection in HyperspectralImagery Using Sparse Support Vector Machines, submitted



Classification of Data on Grassmannians(Collaborator: M. Kirby)

Set-to-set pattern recognition: a set of points from a classcharacterizes the variability of the class information.

Grassmann manifolds G(k, n) (collections of k-dimensionalsubspaces of Rn) provide a geometric framework forcharacterizing sets of points.

Subspaces can be realized as points in Euclidean space viamultidimensional scaling.

Sparse support vector machine identifies optimal dimensionsof embedded subspaces.



Grassmann Manifold

Definition

The Grassmann manifold G(k, n) is the collection of allk-dimensional linear subspaces of Rn, 1 ≤ k ≤ n.

Example: G(1, n) is the set of all lines going through the origin ofRn (projective space RPn−1).

Note 1: An element of G(k, n) can be represented by an n× korthogonal matrix U (UTU = Ik).

Note 2: The matrix representation on G(k, n) is not unique: we sayU1 = U2 if span(U1) = span(U2).



Constructing Points on G(k, n):

Data points in Rn, dimension k

Form n × k “tall-skinny”matrices Y1,Y2, . . . ,YN

U1,U2, . . . ,UN representpoints on G(k, n) (or take

UjVTj = arg min

PT P=I‖Yj − P‖F).

sample data

SVD: Yi = UiΣiVTi



Geodesic distance

θ

up

vp

Uispan )(

Ujspan )(

Principal angles

0 < θ1 ≤ θ2 ≤ . . . ≤ θk ≤ π/2given by cos θp = max

up∈span(Ui)max

vp∈span(Uj)uT

p vp,

where uTp up = 1, vT

p vp = 1, uTp uq = 0, vT

p vq = 0, q = 1, . . . , p− 1.

Geodesic distance (or arclength)

dG(Ui,Uj) = ‖θ‖2 =√∑k

p=1 θ2p



Embedding G(k, n) in Rd via Multidimensional Scaling (MDS)

Classical MDS (Mardia):

Input: Distance matrix D ∈ RN×N with Dij = dG(Ui,Uj).1 Compute B = HAH, where H = I − 1

N eeT and Aij = −12 D2

ij(e is a vector of N ones).

2 Compute the spectral decomposition of B: B = ΓΛΓT .3 Set X := ΓΛ

12 .

Output: X, a configuration of points in Rd, whered = rank(B) = rank(X) ≤ N − 1. (Note: Be = 0e.)

Note: if B is positive semidefinite, the configuration preserves thegeodesic distances, otherwise we adopt the resulting scaling as thebest approximation we can obtain.



Algorithm

Compute points and geodesicdistance matrix on G(k, n)

Embed subspaces in Euclidean space (MDS)preserving distances

Feature (dimension) selectionand classification via SSVMs



Application to HSI

20 40 60 80 100 120 140

20

40

60

80

100

120

140

20 40 60 80 100 120 140

20

40

60

80

100

120

140Background

Alfalfa

Corn−notill

Corn−min

Corn

Grass−Pasture

Grass−Trees

Grass−PastureMowed

Hay−windrowed

Oats

Soybeans−notill

Soybeans−min

Soybeans−clean

Wheat

Woods

Bldg−Grass−Trees−Drives

Stone−steel Towers

Aiborne Visible/Infrared Imaging Spectrometer (AVIRIS): collected inan agricultural area of northern Indiana in 1992.

145× 145 images, 220 spectral bands (ranging from 0.4 to 2.5µm).

Ground truth is known for 49% of the pixels.

16 classes ranging from 20 to 2468 pixels.



Configurations in Euclidean Space

2 Classes: Corn-notill (blue) and Grass/Pasture (red).Dimensions correspond to the two top eigenvalues of B (MDS).Solid dots - training set, hollow dots - testing set

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1

0

1

2

3k = 1

−1.5 −1 −0.5 0 0.5 1 1.5 2−3

−2

−1

0

1k = 2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−3

−2

−1

0

1

2k = 3

−1 −0.5 0 0.5 1 1.5−1

−0.5

0

0.5

1k = 5

−1 −0.5 0 0.5 1−1.5

−1

−0.5

0

0.5

1k = 10

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1.5

−1

−0.5

0

0.5

1k = 15



Configurations in Euclidean Space

3-Classes: Corn-notill (blue), Grass/Pasture (red) and Grass/Trees(green).Dimensions correspond to the two top eigenvalues of B (MDS).(Solid dots - training set, hollow dots - testing set)

−3 −2 −1 0 1 2 3−4

−2

0

2k = 1

−2.5 −2 −1.5 −1 −0.5 0 0.5 1−2

−1

0

1

2k = 2

−2 −1.5 −1 −0.5 0 0.5 1 1.5 2−2

−1

0

1

2k = 3

−1 −0.5 0 0.5 1 1.5−2

−1

0

1

2k = 5

−0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1−1

−0.5

0

0.5

1k = 10

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8−1

−0.5

0

0.5

1k = 15



Classification

SSVM applied toconfiguration of points onG(15, 220) embedded inEuclidean space:

Corn-notill (blue) andGrass/Pasture (red)

Dimensions correspond tothe two largest absolutevalues of the sparse weightvector w −1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8



Classification

Accuracy as a funcion of k:

2 4 6 8 10 12 14 16 18 200.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Dimension of subspaces k

Acc

urac

y

Training set

Testing set

Corn-notill and Grass/Pasture

2 4 6 8 10 12 14 16 18 200.5

0.55

0.6

0.65

0.7

0.75

0.8

0.85

0.9

0.95

1

Dimension of subspaces kA

ccur

acy

Training set

Testing set

Corn-notill, Grass/Pasture, and Grass/Trees



Feature Selection in Embedded Spaces

Corn-notill versus Grass/Pasture: N = 200 constructed points on G(k, 220)

Dimension of Dimension of Number of Number of Features Number ofsubspaces feature space negative zero selected features

k of embedded eigenvalues eigenvalues selectedpoints, d of B of B

1 131 68 1 1-3,5-7,10 72 156 43 1 1-6,8,11 83 126 73 1 1-6,10-13,16-18,20,23,43,39,47,62,74 205 147 52 1 1,3,6,9,14,15,18,19,34,37,39,42,52,63 1410 195 4 1 1,4,5,8,15,28,38,65,71 920 199 0 1 1,3,24,31,63 525 199 0 1 1,2,8,14 4



Future Work

Use other distances provided by the principal angles, forinstance, projection F-norm, ‖sin θ‖2.

Compare results of the multiclass case with the literature.

Determine computationally the optimal number of constructedpoints on G(k, n) for training and testing.

Apply the method to other HSI and medical data sets.



References

J.-M. Chang, et al. Recognition of digital images of the human face at ultralow resolution via illumination spaces, Proceedings of the 8th Asianconference on Computer vision - Volume Part II, pp. 733-743, 2007.

A. Edelman, T. A. Arias, and S. T. Smith, The geometry of algorithms withorthogonality constraints, SIAM J. MATRIX ANAL. APPL, 20(2), pp.303-353, 1998.

K. V. Mardia, J. T. Kent, and J. M. Bibby, Multivariate Analysis, AcademicPress, 1979.

V. N. Vapnik, The nature of statistical learning theory, New York: Springer,1995.O. L. Mangasarian, Exact 1-norm support vector machines viaunconstrained convex differentiable minimization, Journal of MachineLearning Research, 7, pp. 1517-1530, 2006.



Ellipsoidal Separation: Motivation

Important application: medical diagnosis. In particular, we areinterested in diagnosis of neonatal sepsis, for the data setcollected in Yale-New Haven Hospital’s Neonatal Intensive CareUnit (NICU).

We expect points from the same class to be close to each other,i.e. to be enclosed in a hull or ball. Ellipsoids, being the affinedeformations of balls, will make the separation procedure scalinginvariant.

Ellipsoids are simple convex sets.

Ellipsoidal separation can be modelled as a semidefinite program(SDP) which can be solved efficiently.



Ellipsoids: Facts

An ellipsoid is the image of a unit ball {xTx ≤ 1} under an affinetransformation.

Given a center c and an n× n symmetric positive semidefinite matrixE (E � 0), we can define an ellipsoid as{x ∈ Rn|(x− c)TE(x− c) ≤ 1}.

The condition E � 0 is crucial for ellipsoids: if not satisfied, theequation above may describe any quadratic set.



Feasibility Problem (Boyd and Vandenberghe 2004)

find P, q, r

s. t. xTi Pxi + qTxi + r ≥ 1, i = 1, . . . ,N,

yTi Pyi + qTyi + r ≤ −1, i = 1, . . . ,M,

P ≺ 0.

Note: the constraint P ≺ 0 can be expressed as P � −I (due homogeneity off in P, q, r)



Non-separable Data

minimizeP,q,r

eTξ + eTτ

subject to xTi Pxi + qTxi + r ≥ 1− ξi, i = 1, . . . ,N,

yTi Pyi + qTyi + r ≤ −1 + τi, i = 1, . . . ,M,

P � −I, ξ, τ ≥ 0.



References

S. Boyd and L. Vandenberghe Convex Optimization, Cambridge UniversityPress, New York, NY, USA, 2004.

M. Grant and S. Boyd, Graph implementations for nonsmooth convexprograms, In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Advancesin Learning and Control, Lecture Notes in Control and InformationSciences, pp. 95-110. Springer-Verlag Limited, 2008,http://stanford.edu/~boyd/graph_dcp.html.

M. Grant and S. Boyd, CVX: Matlab software for disciplined convexprogramming, version 2.0 beta. http://cvxr.com/cvx, September2013.


http://stanford.edu/~boyd/graph_dcp.html

http://cvxr.com/cvx


Topological Data Analysis (TDA)

Basic idea: to describe the “shape of the data” by findingclusters, holes, tunnels, etc.

Persistent homology (PH): a rapidly growing branch of TDA.

PH can be applied to a data set to capture the persistance oftopological structure across scales.

Application of PH to hyperspectral remote sensing data analysis.See, e.g., Afra Zomorodian and Gunnar Carlsson Computingpersistent homology, Discrete Comput. Geom. 33 (2005), 2, 249–274.



Persistent Homology

Encoded in the form of a parameterized version of a Betti number,called a barcode: a set of line segments each representing the range ofparameter values over which a topological feature persists.



THANK YOU FOR YOUR ATTENTION!


data analysis methods and applications: hyperspectral band ...hyperspectral imagery (hsi)...

Documents