latent semantic analysis (lsa)berlin.csie.ntnu.edu.tw/courses/information retrieval and...latent...
TRANSCRIPT
Latent Semantic Analysis (LSA)
Berlin ChenBerlin ChenDepartment of Computer Science & Information Engineering
National Taiwan Normal University
References:1. G.W.Furnas, S. Deerwester, S.T. Dumais, T.K. Landauer, R. Harshman, L.A. Streeter, K.E. Lochbaum, “Information
Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure,” SIGIR1988Retrieval using a Singular Value Decomposition Model of Latent Semantic Structure, SIGIR19882. J.R. Bellegarda, ”Latent semantic mapping,” IEEE Signal Processing Magazine, September 20053. T. K. Landauer, D. S. McNamara, S. Dennis, W. Kintsch (eds.) , Handbook of Latent Semantic Analysis, Lawrence Erlbaum,
2007
Taxonomy of Classic IR ModelsSet Theoretic
Fuzzy
Classic Models
BooleanV t
Extended Boolean
Algebraic
Retrieval: Adhoc
Use
VectorProbabilistic Generalized Vector
Latent SemanticAnalysis (LSA)
Non-Overlapping Lists
Structured ModelsFilteringr
TProbabilistic
Neural Networks
pp gProximal Nodes
Browsing
ask
Inference Network Belief NetworkLanguage Models
Browsing
FlatStructure Guided
IR – Berlin Chen 2
Structure GuidedHypertext
Classification of IR Models Along Two Axesg• Matching Strategy
– Literal term matching (matching word patterns between the query and documents)
• E.g., Vector Space Model (VSM), Hidden Markov Model (HMM), Language Model (LM)
– Concept matching (matching word meanings between the query and documents)Concept matching (matching word meanings between the query and documents)
• E.g., Latent Semantic Analysis (LSA), Probabilistic Latent Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Word Topic Model (WTM)Word Topic Model (WTM)
• Learning CapabilityTerm weighting query expansion document expansion etc– Term weighting, query expansion, document expansion, etc.
• E.g., Vector Space Model, Latent Semantic Indexing • Most models are based on linear algebra operations
– Solid theoretical foundations (optimization algorithms)• E.g., Hidden Markov Model (HMM), Probabilistic Latent
Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA),
IR – Berlin Chen 3
Semantic Analysis (PLSA), Latent Dirichlet Allocation (LDA), Word Topic Model (WTM)
• Most models belong to the language modeling approach
Two Perspectives for IR Models (cont.)( )
• Literal Term Matching vs Concept Matching
中國解放軍蘇愷戰
Literal Term Matching vs. Concept Matching
香港星島日報篇報導引述軍事觀察家的話表示 到二軍蘇愷戰機
香港星島日報篇報導引述軍事觀察家的話表示,到二零零五年台灣將完全喪失空中優勢,原因是中國大陸戰機不論是數量或是性能上都將超越台灣,報導指出中國在大量引進俄羅斯先進武器的同時也得加快研發自製武器系統 目前西安飛機製造廠任職的改進型飛自製武器系統,目前西安飛機製造廠任職的改進型飛豹戰機即將部署尚未與蘇愷三十通道地對地攻擊住宅飛機,以督促遇到挫折的監控其戰機目前也已經取得了重大階段性的認知成果。根據日本媒體報導在台海戰爭隨時可能爆發情況之下北京方面的基本方針 使
中共新一代空軍戰
力
戰爭隨時可能爆發情況之下北京方面的基本方針,使用高科技答應局部戰爭。因此,解放軍打算在二零零四年前又有包括蘇愷三十二期在內的兩百架蘇霍伊戰鬥機。
力
– There are usually many ways to express a given concept, so literal terms in a user’s query may not match those of a relevant
IR – Berlin Chen 4
literal terms in a user’s query may not match those of a relevant document
Latent Semantic Analysis (LSA)y ( )
• Also called Latent Semantic Indexing (LSI), Latent g ( ),Semantic Mapping (LSM), or Two-Mode Factor Analysis
– Three important claims made for LSAThree important claims made for LSA• The semantic information can derived from a word-document
co-occurrence matrix
• The dimension reduction is an essential part of its derivation
• Words and documents can be represented as points in the• Words and documents can be represented as points in the Euclidean space
LSA exploits the meaning of words by removing “noise” that is– LSA exploits the meaning of words by removing noise that is present due to the variability in word choice
• Namely, synonymy and polysemy that are found in documents
IR – Berlin Chen 5
T. Landauer, D. S. McNamara, S. Dennis, & W. Kintsch (Eds.), Handbook of Latent Semantic Analysis.Hillsdale, NJ: Erlbaum.
Latent Semantic Analysis: Schematicy• Dimension Reduction and Feature Extraction
– PCA feature spacePCAXφT
iiy =Y
knX ∑
=
k
iiiy
1
φn
X̂
kφ1φ kφ1φ
kgivenaforˆmin2
XXorthonormal basis
– SVD (in LSA)kgiven afor min XX −
Σ VT
UA A’latent semantic
spacek
rxr
r ≤ min(m,n)rxn
U
mxrmxn mxn
kxkA
k
IR – Berlin Chen 6
kF given afor min 2AA −′latent semantic
space
LSA: An Example
– Singular Value Decomposition (SVD) used for the word-document matrix
• A least-squares method for dimension reduction
xProjection of a Vector :x
x1ϕ
2ϕ
Ty2y
1θ
IR – Berlin Chen 71 where,
cos
1
111 1
1
=
===
φ
xφxx
xx TT
y ϕϕ
θ1y
LSA: Latent Structure Space
• Two alternative frameworks to circumvent vocabulary mismatch
Doc terms structure model
doc expansion
literal term matching latent semantic i l
query expansion
literal term matching structure retrieval
Query terms structure model
IR – Berlin Chen 8
LSA: Another Example (1/2)( )
1.2.3.4.55.6.7.8.9.
IR – Berlin Chen 9
10.11.12.
LSA: Another Example (2/2)( )
Query: “human computer interaction” Words similar in meaning are “near” each other in the LSA space evenif they never co-occur in a document; Documents similar in concept are “near”
An OOV wordDocuments similar in concept are near each other in the LSA space even if they share no words in common.
Three sorts of basic comparisons- Compare two words- Compare two documentsp- Compare a word to a document
IR – Berlin Chen 10
LSA: Theoretical Foundation (1/10)( )
• Singular Value Decomposition (SVD)Row A Rn
Col A Rm∈∈g p ( )
w1w
d1 d2 dn
Σr VT
d1 d2 dn
w1w2
Both U and V has orthonormalcolumn vectors
Col A Rm∈compositions
=
w2
rxr rxnA Umxr
Σr V rxmw2
VTV=IrXr
column vectors
UTU=IrXr
≤ i ( )
units
wm
mxn mxrwm
d d dK ≤ r ||A||F
2 ≥ ||A’|| F2
r ≤ min(m,n)
w1w2
d1 d2 dn
Σk V’Tkxm
d1 d2 dn
w1w2 ∑∑
= =
=m
i
n
jijFaA
1 1
22
= kxk kxnA’ U’mxkDocs and queries are represented in a k-dimensional space. The quantities of
IR – Berlin Chen 11
wmmxn mxk
wmk dimensional space. The quantities ofthe axes can be properly weighted according to the associated diagonalvalues of Σk
LSA: Theoretical Foundation (2/10)
• “term-document” matrix A has to do with the co-occurrences b t t ( it ) d d t ( iti )between terms (or units) and documents (or compositions)– Contextual information for words in documents is discarded
• “bag-of-words” modelingbag of words modeling
• Feature extraction for the entities of matrix Ajia ,
1. Conventional tf-idf statistics
2. Or, :occurrence frequency weighted by negative entropy jia , y g y g y
( ) ∑=−×==
m
ijiji
j
jiji fd
d
fa
1,
,, ,1 ε
j,
occurrence count
⎞⎛ nn
ij
ff
d 1
1
document lengthnegative normalized entropy
normalized entropy of term i occurrence count of term i in the collection
IR – Berlin Chen 12
∑=∑ ⎟⎟⎠
⎞⎜⎜⎝
⎛−=
==
n
jjii
n
j i
ji
i
jii f
ffn 1
,1
,, ,loglog
1 τττ
εin the collection
10 ≤≤ iε
LSA: Theoretical Foundation (3/10)( )
• Singular Value Decomposition (SVD)g p ( )– ATA is symmetric nxn matrix
• All eigenvalues λj are nonnegative real numbers
• All eigenvectors vj are orthonormal ( Rn)0....21 ≥≥≥≥ nλλλ ( )ndiag λλλ ,...,, 11
2 =Σ∈
D fi i l l j 1λ
[ ]nvvvV ...21= 1=jT
jvv ( )nxn
T IVV =
i• Define singular values:– As the square roots of the eigenvalues of ATA– As the lengths of the vectors Av1 Av2 Av
njjj ,...,1 , == λσsigma
As the lengths of the vectors Av1, Av2 , …., Avn
11 Av=σFor λi≠ 0, i=1,…r,iii
Tii
TTii vvAvAvAv λλ ===
2
IR – Berlin Chen 13.....
22 Av=σ{Av1, Av2 , …. , Avr } is an orthogonal basis of Col A
ii
iiiiiii
Av σ=⇒
LSA: Theoretical Foundation (4/10)( )
• {Av1, Av2 , …. , Avr } is an orthogonal basis of Col A{ 1 2 r } g
– Suppose that A (or ATA) has rank r ≤ n
( ) 0====• jT
ijjTT
ijT
iji vvAvAvAvAvAvAv λ– Suppose that A (or A A) has rank r ≤ n
Define an orthonormal basis {u u u } for Col A
0.... ,0.... 2121 ====>≥≥≥ ++ nrrr λλλλλλ
– Define an orthonormal basis {u1, u2 ,…., ur} for Col A
iiiiii AvuAvAvAv
u 11 =⇒== σσ
[ ] [ ]rrr
ii
vvvAuuuAv
... 2121 =Σ⇒
σ V : an orthonormal matrix
Known in advance
U is also an orthonormal matrix
(mxr)
• Extend to an orthonormal basis {u1, u2 ,…, um} of Rm
[ ] [ ]TT
nrmr vvvvAuuuu =Σ⇒ ... ... ... ... 2121 ∑∑=m n
ijFaA 22
IR – Berlin Chen 14
T
TT
VUA
AVVVUAVU
Σ=⇒
=Σ⇒=Σ⇒
22
22
1
2 ... rFA σσσ +++= ?
∑∑= =i j1 1
nxnI ?( )
( ) ( ) ( ) ⎟⎟⎠
⎞⎜⎜⎝
⎛=
−×−×−
−××
rnrmrrm
rnrrnm 00
0Σ Σ
LSA: Theoretical Foundation (5/10)( )
Av thespans i iu thespans
mxn
AA of space row
iTA of space row
mxn
1 V U
Rn Rm
V
1U
2U UTV
2 V 2U
( )VV
000Σ
UUVUΣ ⎟⎟⎠
⎞⎜⎜⎝
⎛⎟⎟⎠
⎞⎜⎜⎝
⎛= 11
21 T
TT0=AX
U
( )
VAV
VΣU
V00
=
=
⎟⎠
⎜⎝⎟⎠
⎜⎝
11
111
221
T
T
T0 AX
AVU =Σ
IR – Berlin Chen 15
A=
LSA: Theoretical Foundation (6/10)( )• Additional Explanations
– Each row of is related to the projection of a correspondingU– Each row of is related to the projection of a corresponding row of onto the basis formed by columns of
UA V
VUA TΣ=
• the i-th entry of a row of is related to the projection of a
AVUUVVUAV T =Σ⇒Σ=Σ=⇒
Ucorresponding row of onto the i-th column of
– Each row of is related to the projection of a corresponding f t th b i f d b
A
V
V
UTrow of onto the basis formed by UTA
( )VUA
TTTT
TΣ=
( )UAV
VUUVUVUUAT
TTTT
=Σ⇒
Σ=Σ=Σ=⇒
IR – Berlin Chen 16
• the i-th entry of a row of is related to the projection of a corresponding row of onto the i-th column of TA
VU
LSA: Theoretical Foundation (7/10)( )
• Fundamental comparisons based on SVDFundamental comparisons based on SVD– The original word-document matrix (A)
d1 d2 dn t t d t d t f t f Aw1w2
d1 d2 dn
A
• compare two terms → dot product of two rows of A– or an entry in AAT
• compare two docs → dot product of two columns of A
wmmxn
– or an entry in ATA• compare a term and a doc → each individual entry of A
wjwi
– The new word-document matrix (A’)• Compare two terms
dot product of two rows of U’Σ’A’A’T=(U’Σ’V’T) (U’Σ’V’T)T=U’Σ’V’TV’Σ’TU’T =(U’Σ’)(U’Σ’)T
For stretching I x
U’=Umxk
i
→ dot product of two rows of U Σ• Compare two docs
→ dot product of two rows of V’Σ’A’TA’=(U’Σ’V’T)T ’(U’Σ’V’T) =V’Σ’T’UT U’Σ’V’T=(V’Σ’)(V’Σ’)T
or shrinking Irxr
Σ’=Σk
V’=Vnxk
dk
ds
IR – Berlin Chen 17
p• Compare a query word and a doc → each individual entry of A’
(scaled by the square root of singular values )
LSA: Theoretical Foundation (8/10)( )
• Fold-in: find the representation for a pseudo-document qF bj t ( i d ) th t did t i th– For objects (new queries or docs) that did not appear in the original analysis
• Fold-in a new mx1 query (or doc) vector q y ( )
( ) 111ˆ −
×××× Σ= kkkmmT
k Uqq The separate dimensions are differentially weighted.
See Figure A in next page
Represented as the weighted sum of its component word
( )Query is represented by the weighted sum of it constituent term vectors scaled by the inverse of singular values.
are differentially weighted.Just like a row of V
– Represented as the weighted sum of its component word (or term) vectors
– Cosine measure between the query and doc vectors in the latent semantic space (docs are sorted in descending order of their cosine values)
( ) Σ=ΣΣ=
dqdqcoinedqsimTˆˆ
)ˆˆ(ˆˆ2
IR – Berlin Chen 18
( )ΣΣ
=ΣΣ=dq
dqcoinedqsimˆˆ
),(,
row vectors
LSA: Theoretical Foundation (9/10)( )
• Fold-in a new 1 X n term vector 1 See Figure B below1
11ˆ−×××× = kkknnk ΣVtt
See Figure B below
<Figure A>
<Figure B>
IR – Berlin Chen 19
LSA: Theoretical Foundation (10/10)( )
• Note that the first k columns of U and V are orthogonal, g ,but the rows of U and V (i.e., the word and document vectors), consisting k elements, are not orthogonal
• Alternatively, A can be written as the sum of k rank-1 matrices
∑=≈ =ki
Tiiik vuAA 1 σ
– and are respectively the eigenvectors of U and V
• LSA with relevance feedback (query expansion)
iu iv
LSA with relevance feedback (query expansion)
( ) ( ) knnT
kkkmmT
k VdUqq ××−×××× +Σ= 11
11ˆ
– is a binary vector whose elements specify which documents to add to the query IR – Berlin Chen 20
d
LSA: A Simple Evaluation
• Experimental resultsp– HMM is consistently better than VSM at all recall levels– LSA is better than VSM at higher recall levels
IR – Berlin Chen 21
Recall-Precision curve at 11 standard recall levels evaluated onTDT-3 SD collection. (Using word-level indexing terms)
LSA: Pro and Con (1/2)( )
• Pro (Advantages)– A clean formal framework and a clearly defined optimization
criterion (least-squares)• Conceptual simplicity and clarityConceptual simplicity and clarity
– Handle synonymy problems (“heterogeneous vocabulary”)
• Replace individual terms as the descriptors of documents by independent “artificial concepts” that can specified by any one of several terms (or documents) or combinationsone of several terms (or documents) or combinations
– Good results for high-recall search• Take term co-occurrence into account
IR – Berlin Chen 22
LSA: Pro and Con (2/2)( )• Disadvantages
Contextual or positional information for words in documents is– Contextual or positional information for words in documents is discarded (the so-called bag-of-words assumption)
– High computational complexity (e g SVD decomposition)High computational complexity (e.g., SVD decomposition)
– Exhaustive comparison of a query against all stored documents is needed (cannot make use of inverted files ?)( )
– LSA offers only a partial solution to polysemy (e.g. bank, bass,…)• Every term is represented as just one point in the latentEvery term is represented as just one point in the latent
space (represented as weighted average of different meanings of a term)
– To date, aside from folding-in, there is no optimal way to add information (new words or documents) to an existing word-document space
IR – Berlin Chen 23
document space• Re-compute SVD (or the reduced space) with the added
information is a more direct and accurate solution
LSA: Junk E-mail Filteringg
• One vector represents the centriod of all e-mails that are f i t t t th hil th th th t i d f llof interest to the user, while the other the centriod of all
e-mails that are not of interest
IR – Berlin Chen 24
LSA: Dynamic Language Model Adaptation (1/4)y g g ( )• Let wq denote the word about to be predicted, and
H the admissible LSA history (context) for thisHq-1 the admissible LSA history (context) for this particular word– The vector representation of H 1 is expressed by 1
~dThe vector representation of Hq-1 is expressed by• Which can be then projected into the latent semantic
space
1−qd
UdSvv Tqqq 111
~~~ −−− ==~ ~
[ ]Σ=S :notation of changeLSA representation
• Iteratively update and as the decoding evolves
1~
−qd 1 −qv
]0...1...0[1~1~1
Tiq
qq d
nd ε−
+−
=VSM representation ]0...1...0[1q
q nd
nd +−
[ ] )1(~)1(1~~11 iiqq
Tqqq uvn
nUdSvv ε−+−=== −−
VSM representation
LSA representation
IR – Berlin Chen 25
qn
[ ] or iiqqq
u)ε(v~)n(λn
−+−⋅= − 1111
withexponential decay
LSA: Dynamic Language Model Adaptation (2/4)y g g p ( )
• Integration of LSA with N-grams
),|Pr()|Pr( )(1
)(1
)(1 = −−+−
lq
nqq
lnqq HHwHw
gram-therefer totssuperscripand the
,word for history suitablesomedenotes where )()(
1−ln
nand
wH
)~(
LSA the),1with ,...(component gramtherefer to tssuperscrip and the
121 >+−−− nqqq
d
nwwwnand
:asrewritten becan expression This
:)(component 1−qd
)|Pr(
)|,Pr()|Pr( )()(
)(1
)(1)(
1∑
= −−+− nl
nq
lqqln
qq HHw
HHwHw
IR – Berlin Chen 26
)|,Pr( 11∑∈
−−Vw
qqii
HHw
LSA: Dynamic Language Model Adaptation (3/4)
• Integration of LSA with N-grams (cont.)
)|Pr()|Pr(
)|,Pr()()()(
)(1
)(1
nln
nq
lqq
HwHHw
HHw −− = Assume the probability of the document history given the current word is not affected by the immediate context preceding it
)|~Pr()|Pr(
),|Pr()|Pr(
1211121
)(1
)(1
)(1
nqqqqqnqqqq
qqqqq
wwwwdwwww
HwHHw
+−−−−+−−−
−−−
⋅=
⋅
LL
by the immediate context preceding it
)~Pr()~|Pr()|(
)|~Pr()|Pr(
11
1121
qqq
qqnqqqq
ddw
wdwwww
−−
−+−−− ⋅= L
)Pr()()|(
)|Pr( 11121
q
qqqnqqqq w
wwww +−−− ⋅= L
)|Pr( )(1 =+−ln
qq Hw
)Pr()~|Pr(
)|Pr(
)|(
1121
1
⋅ −+−−−
q
qqnqqqq
wdw
wwww L
IR – Berlin Chen 27
)Pr()~|Pr(
)|Pr(
)(
1121∑ ⋅
∈
−+−−−
Vw i
qinqqqi
q
i wdw
wwww L
LSA: Dynamic Language Model Adaptation (4/4)y g g ( )
word of relevance"" thereflects )~|Pr( y,Intuitivel 1 wdw qqq −
:~hrough observed t as history, admissible theto 1dq−
1)~|Pr( dw qq
1
1
~ )~|(
)|(
S
dwKT
−
−
≈
211
21121
121
~~
)~,(cosSvSu
vSuSvSu
Tqq
qq−
−− ==
words),content""relevant (i.e.,~offavricsemanticwith theclosely
most aligns meaning whose wordsfor highest be willit such, As
1dq−
).""likeworksfunction""(e.g.,fabricabout thisn informatioparticularany convey not dowhich wordsfor lowest and
)(y 1
the
q
IR – Berlin Chen 28
)( g ,J. Bellegarda, Latent Semantic Mapping: Principles & Applications (Synthesis Lectures on Speech and Audio
Processing), 2008
LSA: Cross-lingual Language Model Adaptation (1/2)Adaptation (1/2)
• Assume that a document-aligned (instead of sentence-aligned) Chinese English bilingual corpus is providedaligned) Chinese-English bilingual corpus is provided
IR – Berlin Chen 29Lexical triggers and latent semantic analysis for cross-lingual language model adaptation, TALIP 2004, 3(2)
LSA: Cross-lingual Language Model Adaptation (2/2)Adaptation (2/2)
• CL-LSA adapted Language Modelp g gis a relevant English doc of the Mandarin
doc being transcribed, obtained by CL-IR
Eid
Cid
( )EdP ( )( ) ( )21TrigramBGUnigramLCACL
21Adapt
,
,, −−
+⋅= kkkEik
Eikkk
cccPdcPP
dcccP
λ ( ) ( )21Trigram-BGUnigram-LCA-CL , −− kkkik
( ) ( ) ( )∑ EE dPPdP ( ) ( ) ( )( )sim
Unigram-LCA-CL =∑γ
Ei
eT
Ei
ec
dePecPdcP
rr
( ) ( )( )
( )1 ,sim
,sim>>
′≈∑′
γγ
γ
T ecececP rr
IR – Berlin Chen 30
′c
LSA: SVDLIBC
• Doug Rohde's SVD C Library version 1.3 is basedg yon the SVDPACKC library
• Download it at http://tedlab.mit.edu/~dr/
IR – Berlin Chen 31
LSA: Exercise (1/4)( )
• Given a sparse term-document matrix Row#Tem
Col.# Doc
Nonzero entries
– E.g., 4 terms and 3 docsDoc
4 3 6 20 2.3
2 nonzero entries at Col 0
Col 0, Row 0 C l 0 R 2
2.3 0.0 4.2 0.0 1.3 2.2 3 8 0 0 0 5Term
2 3.811 1.3
Col 0, Row 2 1 nonzero entry
at Col 1Col 1, Row 1
3 t
E h t b i ht d b TF IDF
3.8 0.0 0.5 0.0 0.0 0.0
Term30 4.21 2.2
3 nonzero entryat Col 2
Col 2, Row 0 Col 2, Row 1 C l 2 R 2– Each entry can be weighted by TFxIDF score
• Perform SVD to obtain term and document vectors t d i th l t t ti
2 0.5 Col 2, Row 2
represented in the latent semantic space• Evaluate the information retrieval capability of the LSA
approach by using varying sizes (e g 100 200 600
IR – Berlin Chen 32
approach by using varying sizes (e.g., 100, 200,...,600 etc.) of LSA dimensionality
LSA: Exercise (2/4)( )
• Example: term-document matrixp
51253 2265 21885277
Indexing Term no. Doc no. Nonzero
entries
77508 7.725771596 16.213399612 13.080868709 7.725771713 7.725771744 7.7257711190 2 11190 7.7257711200 16.2133991259 7.725771
LSA100 Ut• SVD command (IR_svd.bat)
svd -r st -o LSA100 -d 100 Term-Doc-Matrix
…… LSA100-Ut
LSA100-S
output
IR – Berlin Chen 33
svd -r st -o LSA100 -d 100 Term-Doc-Matrix
sparse matrix input prefix of output filesNo. of reserved
eigenvectors name of sparse
matrix input
LSA100-Vt
LSA: Exercise (3/4)( )
• LSA100-Ut100 512530.003 0.001 ……..
51253 words
0.002 0.002 …….
word vector (uT): 1x100 • LSA100-Vt• LSA100-S
100
100 22650.021 0.035 ……..
2265 docs
1002686.18829.941559 59
0.012 0.022 …….
IR – Berlin Chen 34
559.59….
100 eigenvalues
doc vector (vT): 1x100
LSA: Exercise (4/4)( )
• Fold-in a new mx1 query vectorFold in a new mx1 query vector
( ) 111ˆ −
×××× Σ= kkkmmT
k Uqq The separate dimensions are differentially weighted( )
Query represented by the weightedsum of it constituent term vectors
are differentially weightedJust like a row of V
• Cosine measure between the query and doc vectors in the latent semantic spacethe latent semantic space
( ) Σ=ΣΣ=
dqdqcoinedqsimTˆˆ
)ˆˆ(ˆˆ2( )ΣΣ
=ΣΣ=dq
dqcoinedqsimˆˆ
),(,
IR – Berlin Chen 35