nonnegative matrix factorization with sparseness constraints s. race ma591r
Post on 28-Mar-2015
252 Views
Preview:
TRANSCRIPT
Nonnegative Matrix Factorization with Sparseness Constraints
S. Race MA591R
Introduction to NMF
Factor A = WH A – matrix of data
m non-negative scalar variables n measurements form the columns of A
W – m x r matrix of “basis vectors” H – r x n coefficient matrix
Describes how strongly each building block is present in measurement vectors
Introduction to NMF con’t
Purpose: “parts-based” representation of the data Data compression Noise reduction
Examples: Term – Document Matrix Image processing Any data composed of hidden parts
Introduction to NMF con’t
Optimize accuracy of solution: min || A-WH ||F where W,H ≥ 0 We can drop nonnegative constraints
min || A-(W.W)(H.H) ||
Many options for objective function Many options for algorithm
W,H will depend on initial choices Convergence is not always guaranteed
Common Algorithms
Alternating Least Squares Paatero 1994
Multiplicative Update Rules Lee-Seung 2000 Nature Used by Hoyer
Gradient Descent Hoyer 2004 Berry-Plemmons 2004
Why is sparsity important?
Nature of some data Text-mining Disease patterns
Better Interpretation of Results Storage concerns
Non-negative Sparse Coding I
Proposed by Patrik Hoyer in 2002 Add a penalty function to the
objective to encourage sparseness OBJ: Parameter λ controls trade-off
between accuracy and sparseness f is strictly increasing: f=Σ Hij works
Sparse Objective Function
The objective can always be decreased by scaling W up, H down Set W= cW and H=(1/c)H
Thus, alone the objective will simply yield the NMF solution
Constraint on the scale of H or W is needed Fix norm of columns of W or rows of H
Non-negative Sparse Coding I
Pros Simple, efficient Guaranteed to reach global minimum
using multiplicative update rule Cons
Sparseness controlled implicitly: Optimal λ found by trial and error
Sparseness only constrained for H
NMF with sparseness constraints II
First need some way to define the sparseness of a vector
A vector with one nonzero entry is maximally sparse
A multiple of the vector of all ones, e, is minimally sparse
CBS Inequality How can we combine these ideas?
Hoyer’s Sparseness Parameter
sparseness(x)=
where n is the dimensionality of x
This measure indicates that we can control a vector’s sparseness by manipulating its L1 and L2 norms
Picture of Sparsity function for vectors w/ n=2
Implementing Sparseness Constraints
Now that we have an explicit measure of sparseness, how can we incorporate it into the algorithm?
Hoyer: at each step, project each column of a matrix onto the nearest vector of desired sparseness.
Hoyer’s Projection Algorithm
Problem: Given any vector, x, find the closest (in the Euclidean sense) non-negative vector s with a given L1 norm and a given L2 norm
We can easily solve this problem in the 3 dimensional case and extend the result.
Hoyer’s Projection Algorithm Set si=xi + (L1-Σxi)/n for all i Set Z={} Iterate
Set mi=L1/(n-size(Z)) if i in Z, 0 otherwise Set s=m+β(s-m) where β≥0 solves quadratic If s, non-negative we’re finished Set Z=Z U {i : si <0} Set si=0 for all i in Z Calculate c=(Σsi – L1)/(n-size(Z)) Set si=si-c for all i not in Z
The Algorithm in words Project x onto hyperplane Σsi=L1
Within this space, move radially outward from center of joint constraint hypersphere toward point
If result non-negative, destination reached Else, set negative values to zero and
project to new point in similar fashion
NMF with sparseness constraints
Step 1: Initialize W, H to random positive matrices
Step 2: If constraints apply to W or H or both, project each column or row respectively to have unchanged L2 norm and desired L1 norm
NMF w/ Sparseness Algorithm
Step 3: Iterate If sparseness constraints on W apply,
Set W=W-μw(WH-A)HT
Project columns of W as in step 2 Else, take standard multiplicative step
If sparseness constraints on H apply Set H=H- μHWT(WH-A) Project rows of H as in step 2 Else, take standard multiplicative step
Advantages of New Method
Sparseness controlled explicitly with a parameter that is easily interpretted
Sparseness of W, H or both can be constrained
Number of iterations required grows very slowly with the dimensionality of the problem
Dotted Lines Represent Min and
Max Iterations
Solid Line shows average number
required
An Example from Hoyer’s Work
Text Mining Results
Text to Matrix Generator Dimitrios Zeimpekis and E. Gallopoulos University of Patras http://scgroup.hpclab.ceid.upatras.gr/scg
roup/Projects/TMG/ NMF with sparseness constraints from
Hoyer’s web page
top related