vector space models: theory and applicationscental.fltr.ucl.ac.be/team/~panchenko/vbm.pdfvector...
Post on 07-May-2018
220 Views
Preview:
TRANSCRIPT
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space Models: Theory and Applications
Alexander Panchenko
Centre de traitement automatique du langage (CENTAL)Université catholique de Louvain
FLTR 2620Introduction au traitement automatique du langage
8 December 2010
FLTR2620 - Vector-Space Models 1/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Plan
1 Vector Algebra Basics
2 Vector Space Model
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 2/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 3/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space
Vector Space
Vector SpaceSet of elements x1, x2, x3, ... called vector space L if this set is closedunder vector addition and scalar multiplication operations. Elementsof this set called vectors.
The following conditions must hold for ∀x1, x2, x3 ∈ L and ∀α, β:1 Commutativity x1 + x2 = x2 + x1.2 Associativity of vector addition: (x1 + x2)+ x3 = x1 +(x2 + x3).3 Additive identity: For all x, 0 + x = x + 0 = x.4 Existence of additive inverse: For any x, there exists a −x such
that x + (−x) = 0.5 Associativity of scalar multiplication: α(βx) = (αβ)x.6 Distributivity of scalar sums: (α+ β)x = αx + βx.7 Distributivity of vector sums: α(x1 + x2) = αx1 + αx2.8 Scalar multiplication identity: 1x = x.
FLTR2620 - Vector-Space Models 4/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 5/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space
Euclidean SpaceEuclidean n-dimensional space Rn is a vector space, where (1)scalars are real numbers, (2) every element is represented by a tupleof real numbers, (3) addition is componentwise, and (4) scalarmultiplication is multiplication on each term separately.
A scalar α is an element of the field of real numbers R:
α ∈ R,
for exampleα = 3.14,
β = 5.25,
γ = 1.45.
FLTR2620 - Vector-Space Models 6/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space: Vectors
A vector x is n-tuple of real numbers, an element of n-dimensionalEuclidean space Rn:
x =
x1x2x3
∈ Rn =
n︷ ︸︸ ︷R× R× ...× R,
for example
x1 =
3.145.251.45
∈ R3, x2 =
3.145.251.455.336.44
∈ R5.
FLTR2620 - Vector-Space Models 7/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space: Column and Row Vectors
“By default” the vectors are column vectors:
x =
x1x2x3
The transpose of a column vector is a row vector:
xT =
x1x2x3
T
= (x1, x2, x3).
FLTR2620 - Vector-Space Models 8/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space: Vector Addition, Scalar Multiplication
Vector addition is componentwise
x1 + x2 = (x11 + x21, x12 + x22, ..., x1n + x2n),
for example
x1 = (3.14, 5.25, 1.45)T , x2 = (1.45, 5.25, 3.14).
x1 + x2 = (4.59, 10.50, 4.59)T .
Multiplication of a vector x by a scalar α:
αx = (αx1, αx2, ..., αxn)T ,
for exampleα = 2, x = (3.14, 5.25, 1.45)T ,
αx = (6.28, 10.50, 2.90)T .
FLTR2620 - Vector-Space Models 9/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Geometrical Interpretation
FLTR2620 - Vector-Space Models 10/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space: Dot Product, Vector Norm
Dot (inner) product of two vectors
x1 · x2 = x11x21 + x12x22 + ...+ x1nx2n =
n∑i=1
x1ix2i,
for example
x1 = (3.14, 5.25, 1.45)T , x2 = (1.45, 5.25, 3.14).
x1 · x2 = 4.55 + 27.56 + 4.55 = 36.66
Euclidean norm of a vector
‖x‖ =√
x · x =
√√√√ n∑i=1
x2i ,
for example‖x1‖ =
√3.142 + 5.252 + 1.452 =
√9.85 + 27.56 + 2.10 = 6.28
FLTR2620 - Vector-Space Models 11/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Euclidean Space: Cosine
Cosine between two vectors
cos(x1, x2) =x1 · x2
‖x1‖ ‖x2‖
for example
x1 = (3.14, 5.25, 1.45)T , x2 = (0, 0, 1),
cos(x1, x2) =0 + 0 + 1.45
6.28 · 1= 0.23(≈ 77◦)
The cosine is defined in terms of vector norm, and inner product.Therefore, for every linear space with inner product we can calculatecosine between vectors.
FLTR2620 - Vector-Space Models 12/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Euclidean Space
Geometrical Interpretation
Euclidean norm of a vector ‖x‖ is its length. Length of the projectionof one vector to another equals: ‖ax‖ = ‖a‖ cos(a, i) = a·i
‖a‖ .
FLTR2620 - Vector-Space Models 13/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space Basis
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 14/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space Basis
Linear Independence
Linear CombinationLinear combination of k vectors is an expression as following:
α1x1 + α2x2 + ...+ αkxk,
where α1, α2, ..., αk ∈ R are scalars.
Linearly Dependent and Independent VectorsVectors x1, x2, ...xk are linearly dependent iff there exist scalarsα1, α2, ..., αk , not all zero, such that
α1 · x1 + α2 · x2 + ...+ αk · xk = 0
If no such scalars exist, then the vectors are said to be linearlyindependent.
FLTR2620 - Vector-Space Models 15/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space Basis
Basis
BasisA basis of a vector space L is a subset b1,b2, ...,bn of vectors in Lsuch that all basis vectors are linearly independent and if every vectorx ∈ L can be represented as a linear combination of basis vectors:
For all x ∈ L exist α1, α1, ..., αn ∈ R such that
x = α1b1 + α2b2 + ...αnbn.
Uniqueness of representationA vector x ∈ L can be represented only in a one way with help of abasis of this vector space.
FLTR2620 - Vector-Space Models 16/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Vector Space Basis
Standard Basis
Standard BasisThe standard basis for a Euclidean space consists of one unit vectorpointing in the direction of each axis of the Cartesian coordinatesystem.
The standard basis for the three-dimensional Euclidean space R3
are three following orthogonal vectors of unit length:i = (1, 0, 0), j = (0, 1, 0),k = (0, 0, 1).The standard basis for the n-dimensional Euclidean space Rn isset of the following vectors:
b1 = (1, 0, 0, 0, ..., 0)b2 = (0, 1, 0, 0, ..., 0)...bn = (0, 0, 0, 0, ..., 1).
FLTR2620 - Vector-Space Models 17/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 18/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix
A m× n matrix X is a rectangular array of scalars xij ∈ R.
X =
x11 x12 ... x1n...
......
...xm1 xm2 ... xmn
∈ Rm×n
for example
X =
1.12 0.55 0.58 0.235.52 0.03 1.96 0.030.37 0.78 2.02 0.03
∈ R3×4.
A matrix with m rows and n columns X can be represented as a set ofm row vectors or as a set of n column vectors:
X = (x1, x2, ..., xm)T ,X = (x1, x2, ..., xn).
FLTR2620 - Vector-Space Models 19/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Operations
Matrix addition C = A + B is elementwise
cij = aij + bij.
Matrix multiplication by a scalar C = αA is multiplication oneach element separately
cij = αaij.
Matrix Euclidean norm equals
‖A‖ =
√√√√ n∑i=1
n∑j=1
a2ij
Transpose of the matrix AT is the matrix obtained by exchangingA’s rows and columns: aij = aji.
FLTR2620 - Vector-Space Models 20/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Product: Coordinate Form
A =
a11 a12 ... a1n...
......
...am1 am2 ... amn
,B =
b11 b12 ... b1k
... ... ... ...bn1 bn2 ... bnk
.
The product C = AB of two matrices A and B is defined as following:
cij =
n∑l=1
ailblj = ai · bj.
Matrix multiplication is defined only if the dimensions of the matricesA, and B are compatible:
C︷ ︸︸ ︷[m× k] =
A︷ ︸︸ ︷[m× n]×
B︷ ︸︸ ︷[n× k] .
FLTR2620 - Vector-Space Models 21/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Product: Vector Form
The “Row by Column” MethodRepresent A as a set of m row vectors, and B as a set of k columnvectors. Then if C = AB, element cij of C is the inner product of thei-th row of A and the j-th column of B:
cij = ai · bj, i = 1,m, j = 1, k.
A =
a11 a12 ... a1n
... ... ... ...am1 am2 ... amn
=
a1...
am
,
B =
b11 b12 ... b1k
... ... ... ...bn1 bn2 ... bnk
=(b1,b2, ...,bk
).
FLTR2620 - Vector-Space Models 22/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Multiplication: Vector Form
FLTR2620 - Vector-Space Models 23/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Product: Example
For example, let A =
2 4 65 7 12 3 5
and B =
4 10 25 1
.
The dimensions of the matrices agree⇒ matrix multiplication isdefined:
C︷ ︸︸ ︷[3× 2] =
A︷ ︸︸ ︷[3× 3]×
B︷ ︸︸ ︷[3× 2] .
The matrix multiplication equals
C = AB =
(2 · 4 + 4 · 0 + 6 · 5) (2 · 1 + 4 · 2 + 6 · 1)(5 · 4 + 7 · 0 + 1 · 5) (5 · 1 + 7 · 2 + 1 · 1)(2 · 4 + 3 · 0 + 5 · 5) (2 · 1 + 3 · 2 + 5 · 1)
=
38 1625 2018 12
FLTR2620 - Vector-Space Models 24/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Properties of Matrix Product
Matrix multiplication is associative:
A(BC) = (AB)C.
Matrix multiplication is distributive over matrix addition:
A(B + C) = AB + AC.
Matrix product is compatible with scalar multiplication:
α(AB) = (αA)B = A(αB).
Matrix multiplication is NOT commutative:
AB 6= BA
FLTR2620 - Vector-Space Models 25/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Matrices
Matrix Factorization
Singular Value Decomposition is a factorization of a rectangularm× n matrix A such that
A = UDVT ,
where U is a m× m matrix, and V is a n× n matrix. These matricesare composed of orthogonal column vectors
UTU = I,VTV = I.
The m× n matrix D has nonegative real numbers long the diagonalcalled singular values.
FLTR2620 - Vector-Space Models 26/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Definition
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 27/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Definition
Main Characteristics of the Vector Space Model
Vector Space Model (VSM) calculates similarity between mhomogeneous objects O = {o1, o2, ..., om}.The model represents an object o as a vector (point) x in an-dimensional Euclidean space Rn.Every dimension of the vector space corresponds to a feature ofan object.Set of all object are represented with a feature matrix X
X =
x1x2...
xm
=
x11 x12 ... x1n
x21 x22 ... x2n...
......
...xm1 xm2 ... xmn
.
The similarity between objects is modeled in terms of spatialdistance between vectors (points).
FLTR2620 - Vector-Space Models 28/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Definition
Vector Space Model
Vector-Space ModelFormally, Vector Space Model can be represented as a quadruple〈A,B, S,M〉, where
B is a set b1, ..., bn of basis elements that determine thedimensionality of the space and the interpretation of eachdimension.
A specifies the weighting function A : Rn → Rn. It takes asinput a vector x representing an object o, and returns itsnormalized version.
S is a similarity function S : Rn×2 → [0; 1] that maps pairs ofvectors onto a scalar that represents measure of their similarity.
M is a transformation that takes one vector space L and maps itonto another vector space L̃, in order to reduce dimensionality.
Vector space model sometimes called semantic space model in thecontext of distributional analysis [Lowe, ].
FLTR2620 - Vector-Space Models 29/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Basis Elements
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 30/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Basis Elements
Interpretation: Basis Elements and Objects
Basis elements b1, ..., bn define the interpretation of eachdimension, or to the standard basis vectors b1, ...,bn.Type of objects defines the interpretation for each vector,represented by a VSM.The bag-of-words (BOW) is a vector space model, whereobjects are text documents, and basis elements are words of thesetext documents:
Here b1 = “car”, b2 = “auto”, b3 = “insurance”, b4 = “best”,and o1 = “Doc1”, o2 = “Doc2”, o3 = “Doc3”.
FLTR2620 - Vector-Space Models 31/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Basis Elements
Interpretation: Feature Matrix
Basis elements (features) can be also lemmas, multi-wordexpressions, named entities, documents, syntactic dependencies,morphemes, etc.
Term-Document matrix: objects are documents, features arewords of the document. Problem: information retrieval, textcategorization and clustering.Term-Term matrix: objects are terms, features are contextwords / words from a dictionary definition. Problem:computational lexical semantics, distributional analysis.Term Senses-Terms matrix: objects are word senses, featuresare words. Problem: word sense disambiguation.Term-Syntactic Dependencies matrix: objects are terms,features are syntactic dependencies of a term. Problem:computational lexical semantics....
FLTR2620 - Vector-Space Models 32/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Weighting Function
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 33/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Weighting Function
Weighting Function
Weighting FunctionWeighting function A : Rn → Rn takes as input a vector x,representing an object o, and returns its normalized version.Weighting is used to adapt the feature value according to its actualimportance.
Identity function (trivial): A(x) = x.Logarithmic weighting function: A(xij) = 1 + log(xij), xij > 0.Length-normalization with Euclidean norm:
A(x) =x‖x‖
.
Convert to probability distribution:
A(xij) = p(i, j) =xij∑nj=1 xij
=xij
‖xi‖l.
FLTR2620 - Vector-Space Models 34/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Weighting Function
Weighting Function
Entropy weighting:
A(xij) = xij +
(1 +
n∑k=1
piklog(pik)
log(n)
), pik =
xik∑nl=1 xil
.
Pointwise Mutual Information:
A(xij) = logp(i, j)
p(i)p(j).
TF-IDF (Term Frequency - Inversed Document Frequency):
A(xij) =
TF︷ ︸︸ ︷xij∑n
k=1 xik·
IDF︷ ︸︸ ︷log
m|{xlj > 0, l = 1,m}|
...FLTR2620 - Vector-Space Models 35/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Weighting Function
Weighting Function: Example
Consider the following term-document matrix X, where xij is termfrequency:
Let us normalize it with the Euclidean norm:
xDoc1 = xDoc1‖xDoc1‖ =
(27,3,0,14)T√
272+32+02+142 = (27,3,0,14)T
30.56 = (0.88, 0.10, 0, 0.46)T .
Finally, we obtain the normalized term-document matrix:
FLTR2620 - Vector-Space Models 36/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 37/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Similarity Function
Similarity Function
A similarity function S(x, y) defines a measure of similarity of twovectors x, y ∈ Rn. It should follow the following properties for anyvectors x, y:
Non-negativity: S(x, y) ≥ 0.
Maximality: S(x, x) ≥ S(x, y).Symmetry : S(x, y) = S(y, x).
FLTR2620 - Vector-Space Models 38/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Distance Function
Distance FunctionA distance (dissimilarity) function D(x, y) defines distance betweentwo vectors x, y ∈ Rn. It should follow the following properties forany vectors x, y, z:
Non-negativity D(x, y) ≥ 0.
Identity of indiscernibles D(x, y) = 0 iff x = y.
Symmetry D(x, y) = D(y, x).Triangle inequality: D(x, z) ≤ D(x, y) + D(y, z).
FLTR2620 - Vector-Space Models 39/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Converting Distance to Similarity
A distance measure between two vectors x, y ∈ Rn can be convertedto a similarity measure between them as following:
S(x, y) = 1− D(x, y), if S(x, y) ∈ [0; 1]S(x, y) = 1− 2D(x, y), if S(x, y) ∈ [−1; +1]
FLTR2620 - Vector-Space Models 40/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Some Similarity and Distance Functions
Minkowski distance (Lq distance):
D(x, y) = q
√√√√ n∑i=1
(xi − yi)q.
Euclidean distance (L2 distance):
D(x, y) =
√√√√ n∑i=1
(xi − yi)2 = ‖x− y‖ .
Manhattan or city block distance (L1 distance):
D(x, y) =n∑
i=1
|xi − yi|.
FLTR2620 - Vector-Space Models 41/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Similarity Function
Some Similarity and Distance Functions
Jaccard similarity:
S(x, y) =∑n
i=1 min(xi, yi)∑ni=1 max(xi, yi)
.
Dice similarity:
S(x, y) =2 ·∑n
i=1 min(xi, yi)∑ni=1 (xi, yi)
.
Cosine similarity:S(x, y) =
x · y‖x‖ ‖y‖
.
FLTR2620 - Vector-Space Models 42/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Transformation
Plan
1 Vector Algebra BasicsVector SpaceEuclidean SpaceVector Space BasisMatrices
2 Vector Space ModelDefinitionBasis ElementsWeighting FunctionSimilarity FunctionTransformation
3 Applications of the Vector Space Models
4 References and Further Reading
FLTR2620 - Vector-Space Models 43/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Transformation
Transformation: Dimensionality Reduction
TransformationM is a transformation that takes a vector space L and maps it ontoanother vector space L̃, in order to reduce dimensionality, so thatdim(L) ≥ dim(L̃).
The goal of a dimensionality reduction is to find a smallernumber of uncorrelated or lowly correlated dimensions.Reasons for dimensionality reduction:
The VSM assumes independence of dimensions. In practice,some dimensions are linear combinations of other dimensions:synonyms, various spellings, etc.High computational complexity in the high-dimensional space.Can help discover latent structure in the data.
FLTR2620 - Vector-Space Models 44/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Transformation
Transformation: Dimensionality Reduction
Simple dimensionality reduction can be done on thepreprocessing stage: stop words, rare dimensions, etc.
In addition, feature matrix factorization methods can be used fordimensionality reduction:
Truncated Singular Value Decomposition (SVD)Non-Negative Matrix Factorization (NMF)...
FLTR2620 - Vector-Space Models 45/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Transformation
Truncated Singular Value Decomposition
FLTR2620 - Vector-Space Models 46/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Various applications of the Vector Space Models
1 Information Retrieval2 Computational Lexical Semantics3 Word Sense Disambiguation4 Other Applications
FLTR2620 - Vector-Space Models 47/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Information Retrieval
Problem FormulationGiven a user query q find the k most relevant documents {d1, ..., dk}from collection of n documents {d1, ..., dm}.
A – TF-IDF
B – Terms from all documents
O – Documents
S – Cosine similarity
M – Truncated SVD (Latent Semantic Indexing)
Documents are represented as vectors in the bag-of-word space. Usertext query is represented as a vector in the same space as thedocuments.
FLTR2620 - Vector-Space Models 48/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Information Retrieval
Let search query beq = “car”,
then it will be represented as the following vector:
q = (1, 0, 0, 0).
FLTR2620 - Vector-Space Models 49/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Computational Lexical Semantics
Problem FormulationGiven a term t find the k most semantically similar terms {t1, ..., tk}from the vocabulary of n terms {t1, ..., tn}.
A – Pointwise Mutual Information
B – Words / Terms / Syntactic Contexts
O – Terms
S – Cosine similarity / Kullback-Leibler divergence
M – Truncated SVD (Latent Semantic Analysis)/ Non-NegativeMatrix Factorization
Distributional hypothesis of Harris: “terms are semantically similar ifthey appear within similar context windows”.
FLTR2620 - Vector-Space Models 50/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Computational Lexical Semantics
FLTR2620 - Vector-Space Models 51/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Word Sense Disambiguation
Problem FormulationGiven a word occurrence w find its sense from the k possible senses{s1, ..., sk}.
A – Identity function / Length-normalization
B – Words / Terms
O – Term Senses
S – Inner Product (simplified Lesk)
M – No
Term senses are represented as vectors in the BOW of the dictionarydefinitions. Term is represented as a vector in the same space as termsenses.
FLTR2620 - Vector-Space Models 52/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Some Other Applications
Named Entity Disambiguation
Text Documents Clustering
Text Documents Categorization
Collaborative Recommendations
...
FLTR2620 - Vector-Space Models 53/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
References I
Berry, M. W. and Browne, M. (2005).Understanding Search Engines: Mathematical Modeling and Text Retrieval (Software,Environments, Tools), Second Edition.SIAM, Society for Industrial and Applied Mathematics.
Berry, M. W., Drmac, Z., and Jessup, E. R. (1999).Matrices, vector spaces, and information retrieval.SIAM Rev., 41:335–362.
Lowe, W.Towards a theory of semantic space.
Manning, C. D., Raghavan, P., and Schütze, H. (2008).Introduction to Information Retrieval.Cambridge University Press, 1 edition.
Van de Cruys, T. (2010).Mining for Meaning.The Extraction of Lexicosemantic Knowledge from Text.
FLTR2620 - Vector-Space Models 54/55
institution-logo
Vector Algebra Basics Vector Space Model Applications of the Vector Space Models References and Further Reading
Acknowledgments
Some illustrations for this presentation were borrowed from[Manning et al., 2008], [Van de Cruys, 2010], and Wikipedia. I wouldlike to thank the authors of these figures.
FLTR2620 - Vector-Space Models 55/55
top related