administrivia
DESCRIPTION
Administrivia. - PowerPoint PPT PresentationTRANSCRIPT
Administrivia What: LTI Seminar
When: Friday Oct 30, 2009, 2:00pm - 3:00pmWhere: 1305 NSHFaculty Host: Noah Smith
Title: SeerSuite: Enterprise Search and Cyberinfrastructure for Science and Academia
Speaker: Dr. C. Lee Giles, Pennsylvania State University Cyberinfrastructure or e-science has become crucial in many areasof science as data access often defines scientific progress. Opensource systems have greatly facilitated design and implementation andsupporting cyberinfrastructure. However, there exists no open sourceintegrated system for building an integrated search engine and digitallibrary that focuses on all phases of information and knowledgeextraction, such as citation extraction, automated indexing andranking, chemical formulae search, table indexing, etc. ….
Counts for two writeups if you attend!
Two Page Status Report on Project – due Wed 11/2 at 9am
This is a chance to tell me how your project is progressing - what's you accomplished, and what you plan to do in the future. There's no fixed
format, but here are some things you might discuss. What dataset will you be using? What does it look like (e.g., how many
entities are there, how many tokens, etc)? Looking over the data is always a good first step before you start working with it, what did you do to get acquainted with the data?
Do you plan on looking at the same problem, or have you changed your plans?
If you plan on writing code, what have you written so far, in what languages, and what do you still need to do?
In you plan on using off-the-shelf code, what have you installed, what experiences have you had with it?
If you've run a baseline system on the data and gotten some results, what are they? are they consistent with what you expected?
Brin’s 1998 paper
Poon and Domingos – continued!
plus Bellare & McCallum
10-28-2009Mostly pilfered from Pedro’s slides
Idea: exploit “pattern/relation duality”:
1. Start with some seed instances of (author,title) pairs (e.g., “Isaac Asimov”, “The Robots of Dawn”)
2. Look for occurrences of these pairs on the web.
3. Generate patterns that match heuristically chosen subsets of the occurrences
- order, URLprefix, prefix, middle, suffix
4. Extract new (author, title) pairs that match the patterns.
5. Go to 2.
[some workshop, 1998]
Result: 24M web pages + 5 books 199 occurrences 3 patterns 4047 occurrences + 5M pages 3947 occurrences 105 patterns … 15,257 books
But: • mostly learned “science fiction books” at least in early rounds• some manual intervention• special regex’s for author/title used
RelationPatterns
PatternsRelation
Markov Networks: [Review] Undirected graphical models
Cancer
CoughAsthma
Smoking
Potential functions defined over cliques
Smoking Cancer Ф(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
c
cc xZ
xP )(1
)(
x c
cc xZ )(
iii xfw
ZxP )(exp
1)(
First-Order Logic
Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y)
Literal: Predicate or its negationClause: Disjunction of literalsGrounding: Replace all variables by constants
E.g.: Friends (Anna, Bob)World (model, interpretation):
Assignment of truth values to all ground predicates
Markov Logic: Intuition
A logical KB is a set of hard constraintson the set of possible worlds
Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible
Give each formula a weight(Higher weight Stronger constraint)
satisfiesit formulas of weightsexpP(world)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A) Smokes(B)
Cancer(B)Friends(B,A)
Friends(B,B)
Smokes(Anna) Cancer(Anna) W(edge:s(a)->c(a))
F F 1.5
F T 1.5
T F 0
T T 1.5
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
Friends(A,B) Smokes(A) Smokes(B) W(f(a,b),s(a),s(b))
F F F 1.1
F F T 1.1
F T F 1.1
F T T 1.1
T F F 1.1
T F T 0
T T F 0
T T T 1.1
Markov Logic NetworksMLN is template for ground Markov nets Probability of a world x:
Weight of formula i No. of true groundings of formula i in x
iii xnw
ZxP )(exp
1)(
Parameter tying: Groundings of same clause
Generative learning: Pseudo-likelihoodDiscriminative learning: Cond. Likelihood [like CRFs – but we need to do inference. They use a
Collins-like method that computes expectations near a MAP soln. WC]
Weight Learning
No. of times clause i is true in data
Expected no. times clause i is true according to MLN
)()()(log xnExnxPw iwiw
i
MAP/MPE Inference
Problem: Find most likely state of world given evidence
This is just the weighted MaxSAT problemUse weighted SAT solver
(e.g., MaxWalkSAT [Kautz et al., 1997] )
i
iiy
yxnw ),(maxarg
The MaxWalkSAT Algorithm
for i ← 1 to max-tries do solution = random truth assignment for j ← 1 to max-flips do if ∑ weights(sat. clauses) > threshold then return solution c ← random unsatisfied clause with probability p flip a random variable in c else flip variable in c that maximizes ∑ weights(sat. clauses) return failure, best solution found
MAP=WalkSat, Expectations=????
MCMC????:Deterministic dependencies break MCMCNear-deterministic ones make it very slow
Solution:Combine MCMC and WalkSAT
→ MC-SAT algorithm [Poon & Domingos, 2006]
Slice Sampling [Damien et al. 1999]
Xx(k)
u(k)
x(k+1)
Slice
U P(x)
The MC-SAT Algorithm
X ( 0 ) A random solution satisfying all hard clausesfor k 1 to num_samples
M Ø
forall Ci satisfied by X ( k – 1 )
With prob. 1 – exp ( – wi ) add Ci to MendforX ( k ) A uniformly random solution satisfying M
endfor
MaxWalkSat
“SampleSat”: MaxWalkSat + Simulated Annealing
What can you do with MLNs?
Problem: Given database, find duplicate records
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) => SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)
Entity Resolution
Can also resolve fields:
HasToken(token,field,record)SameField(field,record,record)SameRecord(record,record)
HasToken(+t,+f,r) ^ HasToken(+t,+f,r’) => SameField(f,r,r’)SameField(f,r,r’) <=> SameRecord(r,r’)SameRecord(r,r’) ^ SameRecord(r’,r”) => SameRecord(r,r”)SameField(f,r,r’) ^ SameField(f,r’,r”) => SameField(f,r,r”)
P. Singla & P. Domingos, “Entity Resolution withMarkov Logic”, in Proc. ICDM-2006.
Entity Resolution
Hidden Markov Models
obs = { Obs1, … , ObsN }state = { St1, … , StM }time = { 0, … , T }
State(state!,time)Obs(obs!,time)
State(+s,0)State(+s,t) => State(+s',t+1)State(+s,t) => State(+s,t+1) [variant we’ll use-WC]Obs(+o,t) => State(+s,t)
What did P&D do with MLNs?
Information Extraction (simplified)
Problem: Extract database from text orsemi-structured sources
Example: Extract database of publications from citation list(s) (the “CiteSeer problem”)
Two steps: Segmentation:
Use HMM to assign tokens to fields Entity resolution:
Use logistic regression and transitivity
Motivation for joint extraction and matching
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Information Extraction(simplified)
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
More: H. Poon & P. Domingos, “Joint Inference in InformationExtraction”, in Proc. AAAI-2007.
Information Extraction (simplified)
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Information Extraction (less simplified)
Token(+t,i,c) => InField(i,+f,c)!Token("aardvark",i,c) v InField(i,”author”,c)…!Token("zymurgy",i,c) v InField(i,"author",c)… !Token("zymurgy",i,c) v InField(i,"venue",c)
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Information Extraction (less simplified)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasPunct(c,i) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasComma(c,i) => InField(i+1,+f,c)=> InField(1,”author”,c)=> InField(2,”author”,c)=> InField(midpointOfC, "title", c) [computed off-line –WC]
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Token(+t,i,c) => InField(i,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Information Extraction (less simplified)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasPunct(c,i) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasComma(c,i) => InField(i+1,+f,c)=> InField(1,”author”,c)=> InField(2,”author”,c)Center(c,i) => InField(i, "title", c)
Token(+t,i,c) => InField(i,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Information Extraction (less simplified)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasPunct(c,i) => InField(i+1,+f,c)InField(i,+f,c) ^ !HasComma(c,i) => InField(i+1,+f,c)=> InField(1,”author”,c)=> InField(2,”author”,c)=> InField(midpointOfC, "title", c)
Token(w,i,c) ^ IsAlphaChar(w) ^ FollowBy(c,i,”.”) => InField(c,”author”,i) v InField(c,”venue”,i)
LastInitial(c,i) ^ LessThan(j,i) => !InField(c,”title”,j) ^ !InField(c,”venue”,j) FirstInitial(c,i) ^ LessThan(i,j) => InField(c,”author”,j) FirstVenueKeyword(c,i) ^ LessThan(i,j) =>
!InField(c,”author”,j) ^ !InField(c,”title”,j)
Initials tend to appear in author or venue field.
Positions before the last non-venue initial are usually not title or venue.
Positions after first “venue keyword” are usually not author or title.
Information Extraction (less simplified)SimilarTitle(c,i,j,c’,i’,j’): true if • c[i..j] and c’[i’…j’] are both “titlelike”
• i.e., no punctuation, doesn’t violate rules above
• c[i..j] and c’[i’…j’] are “similar”• i.e. start with same trigram and end with same token
SimilarVenue(c,c’): true if c and c’ don’t contain conflicting venue keywords (e.g., journal vs proceedings)
Information Extraction (less simplified)SimilarTitle(c,i,j,c’,i’,j’): …SimilarVenue(c,c’): …JointInferenceCandidate(c,i,c’): • trigram starting at i in c also appears in c’• and trigram is a possible title• and punct before trigram in c’ but not c
Information Extraction (less simplified)SimilarTitle(c,i,j,c’,i’,j’): …SimilarVenue(c,c’): …JointInferenceCandidate(c,i,c’): [InField(i,+f,c) ^ !HasPunct(c,i) => InField(i+1,+f,c)]InField(i,+f,c) ^ !HasPunct(c,i) ^ (!exists c’:JointInferenceCandidate(c,i,c’)) =>
InField(i+1,+f,c)
Jnt-SegWhy is this joint? Recall we also have:Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)
Information Extraction (less simplified)SimilarTitle(c,i,j,c’,i’,j’): …SimilarVenue(c,c’): …JointInferenceCandidate(c,i,c’):
InField(i,+f,c) ^ ~HasPunct(c,i) ^ (!exists c’:JointInferenceCandidate(c,i,c’) ^SameCitation(c,c’) ) =>
InField(i+1,+f,c) Jnt-Seg-ER
Results: segmentation
Percent error reduction for best joint model
Results: matching
Cora F-S: 0.87 F1Cora TFIDF: 0.84 max F1
Fraction of clusters correctly constructed using transitive closure of pairwise decisions
William’s summary MLNs are a compact, elegant way of describing a
Markov network Standard learning methods work Network may be very very large Inference may be expensive Doesn’t eliminate feature engineering
E.g., complicated “feature” predicates
Experimental results for joint matching/NER are not that strong overall Cascading segmentation and then matching improves
segmentation, maybe not matching But it needs to be carefully restricted (efficiency?)
Bellare & McCallum
Outline
Goal: Given (DBLP record, citation-text) that do match,
learn to segment citations.Methods: Learn a CRF to align the record and text (sort of
like learning an edit distance) Generate alignments, anduse them as training
data for a linear-chain CRF that does segmentation (aka extraction) This CRF does not need records to work
Alignment….
Notation:
like feature :),,,(
][~][ wherepair a :
textfrom tokensoflist :
names field DB oflist :
fields DB from tokensoflist :
211
21
2
1
1
xyx
xx
x
y
x
af
jii,ja
booktitle""][ EMNLP""][
])[],[(editDist
22
21
ij
kji
yx
xx
Alignment feature: depends on a and x’s
Extraction feature: depends on a, y1 and x2
Learning for alignment… Generalized expectation
criterion: rather than minimize Edata[f]-Emodel[f] … plus a penalty term for the weights…minimize a weighted squared difference between Emodel[f] and p, where p is the user’s prior on the value of the feature.
Sum of marginal probabilities divided by size of variable set
“We simulate user-specified expectation criteria [i.e. p’s] with statistics on manually labeled citation texts.” … top 10 features by MI, p in 11 bins, w=10
ResultsOn 260 records, 522 record-text pairs
Results
“Gold standard”- hand-labeled
extraction data
Trained on DBLP
records
Trained on records
partially aligned with high-
precision rules
…and also use partial
match to DB records at test time
CRF trained with
extraction criteria
derived from labeled data
Alignments and expectations
Simplified version of the idea: from Learning String Edit Distance, Ristad and Yianilos, PAMI 1998
HMM Example
},...,{, emittingafter in is HMM state:
of substring a:...
ofprefix a :...
aka string,input :...
)Pr(parameter,y probabilitemission :)(
)Pr(parameter,y probabilitn transitio:)'(
HMM theof parameters :
1
1
1
21
'
Ktt
t
Tutt
ut
Tt
t
TT
l
ll
llsxs
xxxxx
xxxx
xxxx
xsxl
ssll
1 2
Pr(1->2)
Pr(2->1)
Pr(2->2)Pr(1->1)
Pr(1->x)
d 0.3
h 0.5
b 0.2
Pr(2->x)
a 0.3
e 0.5
o 0.2
Sample output: xT=heehahaha, sT=122121212
HMM Inference
t=1 t=2 t=2 ... t=T
l=1 ...
l=2 ...
...
l=K ...
Key point: Pr(si=l) depends only on Pr(l’->l) and si-1
so you can propogate probabilities forward...
x1 x2 x3 xT
Pair HMM Notation
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMM"pair " a of parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
Andrew used “null”
Pair HMM Example
}..1{}..1{, with associatedproperty hidden :
edits, of string:
, aka strings,input :,
),...,(),,( written also y,probabilitemission :)(
editsemissions/:}END{)},{()},{()},{(
HMMpair theof parameters :
*
VTrzr
Ezz
yxyx
bbae
babaE
kkk
n
VT
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
Pair HMM Example
1
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<e,-> 0.05
<h,t> 0.05
<-,h> 0.01
... ..
Sample run: zT = <h,t>,<e,e><e,e><h,h>,<e,->,<e,e>Strings x,y produced by zT: x=heehee, y=teehe
Notice that x,y is also produced by z4 + <e,e>,<e,-> and many other edit strings
Distances based on pair HMMs
),,(:),,(:
)Pr()Pr()|,Pr(VTnnVTnn yxzEDITz i
i
yxzEDITz
nVT zzyx
)|,Pr(log),(stochastic TTTT yxyxd
)|Pr(maxarglog),(),,(:
viterbi n
yxzEDITz
TT zyxdTTnn
Pair HMM Inference
),()1,(),(),1(),()1,1(
),(),Pr(),(),Pr(),(),Pr(
),Pr(),(1111
vtvt
vvt
tvt
vtvt
vt
yvtxvtyxvt
yyxxyxyxyx
yxvt
)1,1( vt ),1( vt
)1,( vt ),( vt
Dynamic programming is possible: fill out matrix left-to-right, top-down
Pair HMM Inference
),()1,(),(),1(),()1,1(
),Pr(),(
vtvt
vt
yvtxvtyxvt
yxvt
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
Pair HMM Inference
t=1 t=2 t=2 ... t=T
v=1 ...
v=2 ...
...
v=K ...
One difference: after i emissions of pair HMM, we do not know the column position
i=1
i=1 i=3
i=1 i=2
i=3
Multiple states
SUB
e Pr(e)
<a,a> 0.10
<e,e> 0.10
<h,h> 0.10
<a,-> 0.05
<h,t> 0.01
<-,h> 0.01
... ..
IXe Pr(e)
<a,-> 0.11
<e,-> 0.21
<h,-> 0.11
… …
IY
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=2
An extension: multiple states
...v=K
...
...v=2
...v=1
t=T...t=2t=2t=1l=1
conceptually, add a “state” dimension to the model
EM methods generalize easily to this setting
SUB
IX