approximate tree kernels
DESCRIPTION
TRANSCRIPT
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
Approximate Tree KernelsKonrad Rieck, Tammo Krueger, Ulf Brefeld, Klaus-Robert
Muller
Presented ByNiharjyoti Sarangi
Indian Institute of Technology Madras
April 21, 2012
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
OUTLINE OF THE PRESENTATION
1 BACKGROUND
Learning from tree-structured dataApplication Domains
2 PARSE TREE KERNELS
Computing PTKComputational constraints
3 APPROXIMATE TREE KERNELS
Computing ATKValidity of ATKTypes of learning
4 RESULTS
PerformanceTimeMemory
5 Conclusion
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
TREE-STRUCTURED DATA
Trees: carry hierarchical informationFlat feature Vectors: Fail to capture the underlyingdependency structure
Parse TreeAn ordered, rooted tree that represents the syntactic structureof a string according to some formal grammar.
A tree X is called a parse tree of G = (S,P, s) if X is derived byassembling productions p ∈ P such that every node x ∈ X islabeled with a symbol l(x) ∈ S.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
EXAMPLES
Figure: Parse trees for natural language text and the HTTP networkprotocol.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
LEARNING FROM TREES
Kernel functions for Structured dataConvolution of local kernelsParse tree kernel proposed by Collins and Duffy(2002)
Kernel Functionsk : X × X→ R is a symmetric and positive semi-definitefunction, which implicitly computes an inner product in areproducing kernel Hilbert space
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
APPLICATION DOMAINS
Natural Language ProcessingWeb Spam DetectionNetwork Intrusion DetectionInformation Retreival from structureddocuments...
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
COMPUTING PTK
A generic technique for defining kernel functions overstructured data is the convolution of local kernels defined oversub-structures.
Parse Tree kernelk(X,Z) =
∑x∈X
∑z∈Z c(x, z) , Where, X and Z are two parse trees.
Notationsxi: i-th child of a node x|X|: Number of nodes in Xχ: Set of all possible trees
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
ILLUSTRATION
Figure: Shared subtrees in two parse trees.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
COUNTING FUNCTION
c(x, z) is known as the counting function which recursivelydetermines the number of shared subtrees rooted in the treenodes x and z.
Defining c(x,z)
c(x, z) =
0 if x,z not derived from same Pλ if x,z are leaf nodesλ∏|x|
i=1 c(xi, zi) otherwise
0 ≤ λ ≤ 1 , balances the contribution of subtrees, such thatsmall values of decay the contribution of lower nodes in largesubtrees
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
COMPUTATIONAL COMPLEXITY
The complexity is©(n2), where n is the number of nodesin each parse tree.
Experimental data
The computation of a parse tree kernel for two HTML documentscomprising 10,000 nodes each, requires about 1 gigabyte of memoryand takes over 100 seconds on a recent computer system.
We need to compare a large number of parse trees. Goingby the above statistics, the use of PTKs are rendered to beof no practical significance because of the computingresources required.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
ATTEMPTED IMPROVEMENTS
A feature selection procedure based on statistical tests.Suzuki et.al.Limiting computation to node pairs with matching grammarsymbols. Moschitti
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
COMPUTING ATK
Approximation of tree kernels is based on the observation thattrees often contain redundant parts that are not only irrelevantfor the learning task but also slow-down the kernelcomputation unnecessarily.
Approximate Tree kernel
k(X,Z)=∑
s∈S w(s)∑
x∈Xl(x)=s
∑z∈Z
l(z)=sc(x, z) , Where, X and Z are two parse trees.
Selection function: w : S→ 0, 1Controls whether subtrees rooted in nodes with the symbols ∈ S contribute to the convolution. (w(s) = 0 or w(s) = 1)
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
APPROXIMATE COUNTING FUNCTION
c(x, z) is the approximate counting function
Defining c(x, z)
c(x, z) =
0 if x,z not derived from same P0 if x or z not selectedλ if x,z are leaf nodesλ∏|x|
i=1c(xi, zi) otherwise
The selection function w(s) is decided based on the domain anddata. the exact parse tree kernel is obtained as a special case ofATK if w(s) = 1 for all symbols s ∈ S.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
ATK IS A VALID KERNEL
ProofLet Φ(X) be the vector of frequencies of all subtrees occurringin X. Then, by definition, Kwcan always be written as
Kw = 〈PwΦ(X),PwΦ(Z)〉,For any w, the projection Pw is independent of the actual X andZ, and hence Kw is a valid kernel.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
ATK IS FASTER THAN PTK
Speed up factor qw
qw =
∑s∈S#s(X)#s(Z)∑
s∈Sws#s(X)#s(Z)
Where #s(X) denotes the occurances of nodes x ∈ X that were selected.
Looking at the above equation, we can argue that even if onlyone symbol is rejected in Approximate Tree Kernel, we get aspeedup qw ≥ 1.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
SUPERVISED SETTING
Given n labeled parse trees (X1, y1), · · · , (Xn, yn), where yi arethe class labels.An ideal kernel gram matrix Y is given as follows:
Yij = [|yi = yj|]− [|yi 6= yj|]
Kernel Target alignment
〈Y, Kw〉F =∑yi=yj
Kij−∑yi 6=yj
Kij
Our target now is to maximize the above term w.r.t w.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
SUPERVISED SETTING (CONTD.)
Optimization Problem
w? = argmaxw∈[0,1]|S|
n∑i,j=1i6=j
∑s∈S
w(S)∑x∈Xi
l(x)=s
∑z∈Zi
l(z)=s
c(x, z)
subject to, ∑s∈S
w(s) ≤ N, N ∈ N
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
UNSUPERVISED SETTING
Average Frequency of Node comparison
f (s) =1n2
n∑i,j=1
#s(Xi)#s(Xj)
ComparisonRatio(ρ) =ExpectedNodeComparisons
ActualNumberOfComparisonsinPTK
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
UNSUPERVISED SETTING (CONTD.)
Optimization Problem
w? = argmaxw∈[0,1]|S|
n∑i,j=1i6=j
∑s∈S
w(S)∑x∈Xi
l(x)=s
∑z∈Zi
l(z)=s
c(x, z)
subject to, ∑s∈S w(s)f (s)∑
s∈S f (s)≤ ρ
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
SYNTHETIC DATA
Figure: Classification performancefor the supervised synthetic data.
Figure: Detection performance forthe unsupervised synthetic data.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
REAL DATA
Figure: Classification performancefor question classification task.
Figure: Detection performance forthe intrusion detection task (FTP).
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
TIME
Figure: Training and testing time of SVMs using the exact and theapproximate tree kernel.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
TIME(COND.)
Figure: Run-times for web spam (WS) and intrusion detection (ID).
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
MEMORY
Figure: Memory requirements for web spam (WS) and intrusiondetection (ID).
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
CONCLUSION
Approximate Parse tree Kernels give us a fast and efficientway to work with parse trees.Improvements in terms of run-time and memoryrequirements. For large trees, the approximation reduces asingle kernel computation from 1 gigabyte to less than 800kilobytes, accompanied by run-time improvements up tothree orders of magnitude.Best results were obtained for Network IntrusionDetection.
BACKGROUND PARSE TREE KERNELS APPROXIMATE TREE KERNELS RESULTS Conclusion
QUESTIONS
Any Questions ???