iccsa2014 - slides
Post on 20-Aug-2015
60 Views
Preview:
TRANSCRIPT
Part-based data analysis with Masked Non-negative Matrix
FactorizationGabriella Casalino
Ph.D. Student
Department of Informatics, University of Bari, Italygabriella.casalino@uniba.it
Supervisors: Corrado Mencar
Assistant Professor of Informatics, University of Bari
corrado.mencar@uniba.it
Nicoletta Del BuonoAssociate Professor of Mathematics,
University of Barinicoletta.delbuono@uniba.it
•Exponential growth of
information
•Need of techniques and
tools to manage data
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Intelligent Data Analysis (IDA)
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Non-negative data
Low rank approximation
process and conceptualize huge amount of data matrices
discover latent structures by projecting data onto a low dimensional
space
capture the essential structure of input data
some examples:Singular value decomposition (SVD)
Factor analysis (FA)
Principal component analysis (PCA)
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Drawbacks
Not able to maintain the non-negativity of the data
Difficulties to provide interpretation of the
mathematical factors
Allows a low-rank representation of non-negative data by using additive components only
Non-negativity of data is preserved
Learning part-based representation: parts are generally combined additively to form a whole
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
The sketch of a swimming figure
can be represented by
the limbs in different positions
A face can be represented by
its parts like nose, eyes,
mouth
Non Negative Matrix Factorization (NMF)Lee, D., & Seung, H. (1999). “Learning the parts of objects by non-negative matrix
factorization”. Nature.
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Original data•highly dimensional• low informative
Basis matrix Encoding Matrix•each column represents:•latent factor hidden in data•conceptual properties of data•base of the subspace that better explains data
•each column represents:• weights associated with each basis vector•coefficients of data in the subspace
Image Mining
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Lee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix factorization. Nature.
W
H
X
Reconstructed matrix
Parts that allow to describe a
face
Coefficients that indicate the weight of
each base for representing the
original face
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
Text MiningLee, D., & Seung, H. (1999). Learning the parts of objects by non-negative matrix
factorization. Nature.
W
H X
NMF discovers semantic
features of text-
documents
Weight each feature in
reconstructing the documents
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
A NMF is an optimization problem.
Mean squared error objective function:
Convex in W or H, but not both ⇒ hard to get global min
Other Objective Functions;Divergence objective function
Weighted Mean Squared Error objective function
Weighted Divergence objective function
Bregman Divergence Class of objective functions
Different Algorithms to compute NMF;Multiplicative update rules
Alternating Least Squares
Gradient Descent
...
NMF could be a good tool for Intelligent Data Analysis
Capable of representing data as an additive combination of parts
Dimensionality reduction helps to understand data
Ability of interpreting factors in the problem domain
Not unique decomposition
W and H very dense => difficult to bring out useful knowledge
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
What is a part?
We define a part as a small selection of features that presents a local linear relationship in a subset
of data
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
y
Data in this subset can be represented by the part [x 0 z]
Data in this subset can be represented by the part [x y 0]
Masked Non Negative Matrix Factorization (MNMF)
Mask:the analyst can select the parts she’s interested to discover in data
the base matrix W is defined by a user-provided mask matrix
data in the subspace are described by the parts
NEW
Masked Non Negative Matrix Factorization (MNMF)
New objective function:constrains the columns in W to contain only few non-zero elements
NEW
Masked Non Negative Matrix Factorization (MNMF)
New iterative updating rules:objective function non-increasing under the updating rules
NEW
Part 1
Part 2
Sepal
length
Sepal
width
Petal lengt
h
Petal width
IRIS Dataset
Query Mask
MNMF
The analyst can specify the parts she is interested
to discover indata
Part Two: Widths
Part One: Lengths
The class of data “Setosa” presents a linear relationship between sepal and
petal lengths, and sepal and petal widths
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
MNMF improves NMF for IDA
Capable of representing data as a additive combination of parts
Dimensionality reduction helps to understand data
Ability of interpreting factors in the problem domain
not unique decomposition
W and H very dense, difficulty to bring out useful knowledge
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
W and H very sparse => easy to bring out useful knowledge
Knowledge injection in the factorization process
Future work
Automatic detection of “wrong” parts, and automatic selection of subsets of data
Automatic selection of parts through metaheuristics
Massive experimentations on real datasets
Part-based data analysis with Masked Non-Negative Matrix Factorization
gabriella.casalino@uniba.it
GraziePart-based data analysis with Masked Non-Negative Matrix
Factorizationgabriella.casalino@uniba.it
Merci
top related