dimension reduction for hyperspectral data using
TRANSCRIPT
DIMENSION REDUCTION FOR HYPERSPECTRAL DATA USING RANDOMIZED PCA AND LAPLACIANEIGENMAPSYIRAN LI
APPLIED MATHEMATICS, STATISTICS AND SCIENTIFIC COMPUTING
ADVISOR: DR. WOJTEK CZAJA, DR. JOHN BENEDETTO
DEPARTMENT OF MATHEMATICS
UNIVERSITY OF MARYLAND, COLLEGE PARK
BACKGROUND: HYPERSPECTRAL IMAGING
β’ Light is described in terms of its wavelength
β’ A reflectance spectrum shows the reflectance of a material measured across a
range of wavelengths. It helps identify certain materials uniquely
β’ We measure reflectance at many narrow, closely spaced wavelength bands
β’ When a spectrometer is used in an imaging sensor, the resulting images record
a reflectance spectrum for each pixel in the images
(Shippert, 2003)
SPECTRUM AND HYPERSPECTRAL IMAGERY
β’ Left: Reflectance spectra measured by laboratory spectrometers for three
materials: a green bay laurel leaf, the mineral talc, and a silty loam soil.
β’ Right: The concept of hyperspectral imagery. (Shippert, 2003)
MULTISPECTRAL VS HYPERSPECTRAL
β’ Multispectral imaging measures reflectance at discrete and somewhat narrow
bands. Multispectral images do not produce the "spectrum" of an object
β’ Hyperspectral deals with imaging narrow spectral bands over a continuous spectral
range, and produce the spectra of all pixels in the scene.
β’ So a sensor with only 20 bands can also be hyperspectral when it covers the range
from 500 to 700 nm with 20 bands each 10 nm wide.
(Wikipedia: hyperspectral imaging)
AN EXAMPLE: SALINAS VALLEY, CALIFORNIA
β’ Left: sample band collected by 224-band sensor. It includes vegetables, bare soils,
and vineyard fields. Right: Groundtruth of Salinas dataset (16 classes)
(IC: Hyperspectral Remote Sensing Scenes)
PROBLEM
β’ Hyperspectral images are three dimensional (x-coordinate, y-coordinate, b)
β’ Each pixel has a different spectrum that represents different materials
β’ Sometimes over 100 bands and with large number of pixels
β’ Dimension reduction reduces the number of bands of a hyperspectral image
β’ It maps dimensional data into a lower dimension while preserving the main
features of the original data.
(hyperspectral imaging, Wikipedia)
PROJECT GOAL
β’ Reduce dimensionality of hyperspectral imaging
β’ Compare two algorithms to be implemented
METHODS
Existing methods (partial) :
β’ Principal Component Analysis( PCA)
β’ Local Linear Embedding
β’ Neighborhood Preserving Embedding
β’ Classical multidimensional scaling
β’ Isomap
β’ Stochastic Proximity Embedding
My Methods:
β’ Randomized PCA
β’ Laplacian Eigenmaps
(Delft University)
COMPARISON BETWEEN TWO ALGORITHMS
Compare two algorithms,
Randomized PCA and Laplacian Eigenmaps, in terms of:
β’ Implementation
β’ Running time
β’ Results
β’ Difficulties during implementation
ALGORITHM 1: LAPLACIAN EIGENMAPS
β’ Consider the problem of mapping the weighted graph G to a line so that connected
points stay as close together as possible, let π¦ = π¦1, π¦2, β¦ π¦πT be such a map. Our
goal is to minimize
π,π π¦π β π¦π2πππ
Since π,π π¦π β π¦π2πππ = 2yTLy, the problem of finding ππππππ π¦ππΏπ¦ given that
π¦ππ·π¦ = 1, π¦ππ·1 = 0 becomes the minimum eigenvalue problem:
πΏπ = ππ·π
(Belkin, Niyogi, 2002)
ALGORITHM 1: THE ALGORITHM
β’ Step 1: Constructing the Adjacency Graph
β’ Construct a weighted graph with n nodes (n number of data points), and a set of edges connecting
neighboring points.
β’ A) π neighborhood: connected if
π₯π β π₯π2< π
β’ B) n nearest neighbors
β’ Step 2: Choosing the weights
β’ A) Heat Kernel:
πππ = πβπ₯πβπ₯π
2
π‘
β’ B) Simple Minded: πππ = 1 if connected and πππ = 0 otherwise
β’ Step 3: Compute eigenvalues and eigenvectors for the generalized eigenvector
problem:
πΏπ = ππ·π (1)
Where π is the weight matrix defined earlier, π· is diagonal weight matrix, with
π·ππ = ππππ, and
πΏ = π· βπ
β’ Let π0, π1, β¦ , ππβ1 be the solutions of equation (1), ordered such that
0 = π0 β€ π1 β€ β¦ β€ ππβ1
β’ Then the first m eigenvectors (excluding π0) ,
{π1, π2, β¦ , ππ}
are the desired vectors for embedding in m-dimensional Euclidean space
(Belkin, Niyogi, 2002)
ALGORITHM 2: RANDOMIZED PCA INTRODUCTION
β’ Canonical construction of the best possible rank-k approximation to a real π Γ π
matrix π΄ uses singular value decomposition (SVD) of π΄,
π΄ = πΞ£ππ ,
Where π real unitary π Γπ matrix, π is real unitary π Γ π matrix, and Ξ£ is real
π Γ π diagonal matrix with nonnegative, non increasing diagonal entries
β’ Best Approximation of π΄:
π΄ β π Ξ£ ππ ,
Where π leftmost π Γ π block of π, Ξ£ π Γ π upper left block of Ξ£, π leftmost π Γ π
block of π
(Rokhlin, Szlam, Tygert, 2009)
β’ Best because it minimizes the spectral norm π΄ β π΅ for a rank-k matrix π΅ = π Ξ£ ππ. In fact ,
π΄ β π Ξ£ ππ = ππ+1,
Where ππ+1 is the π + 1 π‘βgreatest singular value
β’ Randomized PCA generates π΅ such that
π΄ β π΅ β€ πΆπ1
4π+2ππ+1
with high probability (1 β 10β15) , where π is specified by user, and C depends
on parameters of algorithm
(Rokhlin, Szlam, Tygert, 2009)
ALGORITHM 2: THE ALGORITHM
β’ Choose π > k such that π β€ π β π
β’ Step 1: Generate a real π Γ π matrix πΊ whose entries are i.i.d normal Gaussian
random variables, compute
π = πΊ π΄π΄π ππ΄
β’ Step 2: Using SVD, form a real π Γ π matrix π whose columns are orthonormal, such
that
ππ β π π β€ ππ+1
for some π Γ π matrix π, where ππ+1 is the π + 1 π‘β greatest singular value of π
β’ Step 3: Compute
π = π΄π
β’ Step 4: Form an SVD of T:
π = πΞ£ππ,
where π is a real π Γ π matrix whose columns are orthonormal, π is a real π Γ π
matrix whose columns are orthonormal, Ξ£ is a real diagonal π Γ π matrix with
nonnegative diagonal entries
β’ Step 5: Compute
π = ππ
β’ In this way, we get π, Ξ£, π as desired, and π΅ = πΞ£ππ
(Rokhlin, Szlam, Tygert, 2009)
IMPLEMENTATION
β’ Hardware: Personal laptop/Computers in the math computer lab
β’ Software: Matlab
β’ Database: 12 Band Moderate Dimension Image: June 1966 aircraft scanner
Flightline C1 (Portion of Southern Tippecanoe County, Indiana)
β’ 220 Band Hyperspectral Image: June 12, 1992 AVIRIS image Indian Pine Test Site 3
(2 x 2 mile portion of Northwest Tippecanoe County, Indiana)
β’ 220 Band Hyperspectral Image: June 12, 1992 AVIRIS image North-South flight line
(25 x 6 mile portion of Northwest Tippecanoe County, Indiana)
β’ Hyperspectral data from Norbert Weiner Center
β’ Data can be large (with 10,000^2 pixels, 200 bands, for example)
VALIDATION METHODS
β’ Delft University has developed Matlab toolbox for dimension reduction, which
includes many methods, and is publically available
β’ Use algorithms from DR matlab toolbox to run on the same data and compare results
β’ For randomized PCA, check error bound:
π΄ β π΅ β€ πΆπ1
4π+2ππ+1 (Rohklin, 2009)
β’ Compare with ground truth images for the test cases
TEST PROBLEMS FOR VERIFICATION
β’ Test on known data set (as provided earlier), and compare results with ground
truth classifications and images
β’ Test on smaller scales at first, and then move to large data set
EXPECTED RESULTS/CONCLUDING REMARKS
β’ Laplacian Eigenmaps should be easier to implement, but may take longer to
run because it deals with solving the eigenvalue problem of large matrices
β’ Randomized PCA will be more difficult to implement, but will give desired
results under unfavorable conditions with reasonable speed, and it should
perform better than Laplacian eigenmaps when dealing with very large
matrices
TIMELINE/MILESTONES
β’ October 17th: Project proposal
β’ Now to November, 2014: Implement and test laplacian eigenmaps, prepare
for implementation of randomized PCA
β’ December, 2014: Midyear report and presentation
β’ January to March: Implement and test randomized PCA, compare two
methods in various situations
β’ April to May: Final presentation and Final report
DELIVERABLES
β’ Presentation of data sets with reduced dimensions of both algorithms
β’ Comparison charts in terms of running time and accuracy of two different
methods
β’ Comparison charts with other methods that are available from the DR matlab
toolbox
β’ Data sets, Matlab codes, presentations, proposals, mid-year report, final
report
BIBLIOGRAPHY
β’ Shippert, Peg. Introduction to Hyperspectral Image Analysis. Online Journal of Space
Communication, issue No. 3: Remote Sensing of Earth via Satellite. Winter 2003.
http://spacejournal.ohio.edu/pdf/shippert.pdf
β’ Hyperspectral Imaging. From Wikipedia. Oct. 6th, 2014.
http://en.wikipedia.org/wiki/Hyperspectral_imaging
β’ Belkin, Mikhail; Niyogi, Partha. Laplacian Eigenmaps for Dimensionality Reduction and
Data Representation. Neural Computation, vol 15. Dec. 8th, 2002. Web.
http://web.cse.ohio-state.edu/~mbelkin/papers/LEM_NC_03.pdf
β’ Rokhlin, Vladimir; Szlam, Arthur; Tygert, Mark. A Randomized Algorithm for
Principal Component Analysis. SIAM Journal on Matrix Analysis and
Applications Volume 31 Issue 3. August 2009. Web.
ftp://ftp.math.ucla.edu/pub/camreport/cam08-60.pdf
β’ Matlab Toolbox for Dimension Reduction. Delft University. Web. Oct. 6th,
2014.
http://homepage.tudelft.nl/19j49/Matlab_Toolbox_for_Dimensionality_Redu
ction.html
β’ IC: Hyperspectral Remote Sensing Scenes. Web. Oct. 6th, 2014.
http://www.ehu.es/ccwintco/index.php?title=Hyperspectral_Remote_Sensing_
Scenes
β’ Hyperspectral Images. Web. Oct. 6th, 2014.
https://engineering.purdue.edu/~biehl/MultiSpec/hyperspectral.html