machine learning for the web: a unified view
DESCRIPTION
Machine Learning For the Web: A Unified View. Pedro Domingos Dept. of Computer Science & Eng. University of Washington Includes joint work with Stanley Kok, Daniel Lowd, Hoifung Poon, Matt Richardson, Parag Singla, Marc Sumner, and Jue Wang. Overview. Motivation Background Markov logic - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/1.jpg)
Machine LearningFor the Web:
A Unified View
Pedro DomingosDept. of Computer Science & Eng.
University of Washington
Includes joint work with Stanley Kok, Daniel Lowd,Hoifung Poon, Matt Richardson, Parag Singla,
Marc Sumner, and Jue Wang
![Page 2: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/2.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 3: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/3.jpg)
Web Learning Problems
Hypertext classification Search ranking Personalization Recommender systems Wrapper induction Information extraction Information integration Deep Web Semantic Web
Ad placement Content selection Auctions Social networks Mass collaboration Spam filtering Reputation systems Performance optimization Etc.
![Page 4: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/4.jpg)
Machine Learning Solutions
Naïve Bayes Logistic regression Max. entropy models Bayesian networks Markov random fields Log-linear models Exponential models Gibbs distributions Boltzmann machines ERGMs
Hidden Markov models Cond. random fields SVMs Neural networks Decision trees K-nearest neighbor K-means clustering Mixture models LSI Etc.
![Page 5: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/5.jpg)
How Do We Make Sense of This?
Does a practitioner have to learn all the algorithms?
And figure out which one to use each time? And which variations to try? And how to frame the problem as ML? And how to incorporate his/her knowledge? And how to glue the pieces together? And start from scratch each time? There must be a better way
![Page 6: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/6.jpg)
Characteristics of Web Problems
Samples are not i.i.d.(objects depend on each other)
Objects have lots of structure (or none at all)Multiple problems are tied togetherMassive amounts of data (but unlabeled)Rapid change Too many opportunities . . .
and not enough experts
![Page 7: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/7.jpg)
We Need a Language
That allows us to easily define standard models
That provides a common framework That is automatically compiled into learning
and inference code that executes efficiently That makes it easy to encode practitioner’s
knowledge That allows models to be composed
and reused
![Page 8: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/8.jpg)
Markov Logic
Syntax: Weighted first-order formulas Semantics: Templates for Markov nets Inference: Lifted belief propagation, etc. Learning: Voted perceptron, pseudo-
likelihood, inductive logic programming Software: AlchemyApplications: Information extraction,
text mining, social networks, etc.
![Page 9: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/9.jpg)
Overview
MotivationBackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 10: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/10.jpg)
Markov Networks Undirected graphical models
Cancer
CoughAsthma
Smoking
Potential functions defined over cliques
Smoking Cancer Ф(S,C)
False False 4.5
False True 4.5
True False 2.7
True True 4.5
c
cc xZxP )(
1)(
x c
cc xZ )(
![Page 11: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/11.jpg)
Markov Networks Undirected graphical models
Log-linear model:
Weight of Feature i Feature i
otherwise0
CancerSmokingif1)CancerSmoking,(1f
5.11 w
Cancer
CoughAsthma
Smoking
iii xfw
ZxP )(exp
1)(
![Page 12: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/12.jpg)
First-Order Logic
Symbols: Constants, variables, functions, predicatesE.g.: Anna, x, MotherOf(x), Friends(x, y)
Logical connectives: Conjunction, disjunction, negation, implication, quantification, etc.
Grounding: Replace all variables by constantsE.g.: Friends (Anna, Bob)
World: Assignment of truth values to all ground atoms
![Page 13: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/13.jpg)
Example: Friends & Smokers
![Page 14: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/14.jpg)
Example: Friends & Smokers
habits. smoking similar have Friends
cancer. causes Smoking
![Page 15: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/15.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
![Page 16: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/16.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 17: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/17.jpg)
Markov Logic
A logical KB is a set of hard constraintson the set of possible worlds
Let’s make them soft constraints:When a world violates a formula,It becomes less probable, not impossible
Give each formula a weight(Higher weight Stronger constraint)
satisfiesit formulas of weightsexpP(world)
![Page 18: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/18.jpg)
Definition
A Markov Logic Network (MLN) is a set of pairs (F, w) where F is a formula in first-order logic w is a real number
Together with a set of constants,it defines a Markov network with One node for each grounding of each predicate in
the MLN One feature for each grounding of each formula F
in the MLN, with the corresponding weight w
![Page 19: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/19.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
![Page 20: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/20.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Two constants: Anna (A) and Bob (B)
![Page 21: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/21.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A) Smokes(B)
Cancer(B)
Two constants: Anna (A) and Bob (B)
![Page 22: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/22.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
![Page 23: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/23.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
![Page 24: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/24.jpg)
Example: Friends & Smokers
)()(),(,
)()(
ySmokesxSmokesyxFriendsyx
xCancerxSmokesx
1.1
5.1
Cancer(A)
Smokes(A)Friends(A,A)
Friends(B,A)
Smokes(B)
Friends(A,B)
Cancer(B)
Friends(B,B)
Two constants: Anna (A) and Bob (B)
![Page 25: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/25.jpg)
Markov Logic NetworksMLN is template for ground Markov nets Probability of a world x:
Typed variables and constants greatly reduce size of ground Markov net
Functions, existential quantifiers, etc. Infinite and continuous domains
Weight of formula i No. of true groundings of formula i in x
iii xnw
ZxP )(exp
1)(
![Page 26: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/26.jpg)
Relation to Statistical Models
Special cases: Markov networks Markov random fields Bayesian networks Log-linear models Exponential models Max. entropy models Gibbs distributions Boltzmann machines Logistic regression Hidden Markov models Conditional random fields
Markov logic allows objects to be interdependent (non-i.i.d.)
Markov logic makes it easy to combine and reuse these models
![Page 27: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/27.jpg)
Relation to First-Order Logic
Infinite weights First-order logic Satisfiable KB, positive weights
Satisfying assignments = Modes of distributionMarkov logic allows contradictions between
formulas
![Page 28: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/28.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 29: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/29.jpg)
Inference
MAP/MPE state MaxWalkSAT LazySAT
Marginal and conditional probabilities MCMC: Gibbs, MC-SAT, etc. Knowledge-based model construction Lifted belief propagation
![Page 30: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/30.jpg)
Inference
MAP/MPE state MaxWalkSAT LazySAT
Marginal and conditional probabilities MCMC: Gibbs, MC-SAT, etc. Knowledge-based model construction Lifted belief propagation
![Page 31: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/31.jpg)
Lifted Inference
We can do inference in first-order logic without grounding the KB (e.g.: resolution)
Let’s do the same for inference in MLNsGroup atoms and clauses into
“indistinguishable” setsDo inference over those First approach: Lifted variable elimination
(not practical)Here: Lifted belief propagation
![Page 32: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/32.jpg)
Belief Propagation
Nodes (x)
Features (f)
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
![Page 33: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/33.jpg)
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
Nodes (x)
Features (f)
![Page 34: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/34.jpg)
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
Nodes (x)
Features (f)
![Page 35: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/35.jpg)
Lifted Belief Propagation
}\{)(
)()(fxnh
xhfx xx
}{~ }\{)(
)( )()(x xfny
fyxwf
xf yex
, :Functions of edge counts
Nodes (x)
Features (f)
![Page 36: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/36.jpg)
Lifted Belief Propagation
Form lifted network composed of supernodesand superfeatures Supernode: Set of ground atoms that all send and
receive same messages throughout BP Superfeature: Set of ground clauses that all send and
receive same messages throughout BP
Run belief propagation on lifted network Guaranteed to produce same results as ground BP Time and memory savings can be huge
![Page 37: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/37.jpg)
Forming the Lifted Network
1. Form initial supernodesOne per predicate and truth value(true, false, unknown)
2. Form superfeatures by doing joins of their supernodes
3. Form supernodes by projectingsuperfeatures down to their predicatesSupernode = Groundings of a predicate with same number of projections from each superfeature
4. Repeat until convergence
![Page 38: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/38.jpg)
Theorem
There exists a unique minimal lifted network The lifted network construction algo. finds it BP on lifted network gives same result as
on ground network
![Page 39: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/39.jpg)
Representing SupernodesAnd Superfeatures
List of tuples: Simple but inefficientResolution-like: Use equality and inequality Form clusters (in progress)
![Page 40: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/40.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 41: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/41.jpg)
Learning
Data is a relational databaseClosed world assumption (if not: EM) Learning parameters (weights) Generatively Discriminatively
Learning structure (formulas)
![Page 42: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/42.jpg)
Generative Weight Learning
Maximize likelihoodUse gradient ascent or L-BFGSNo local maxima
Requires inference at each step (slow!)
No. of true groundings of clause i in data
Expected no. true groundings according to model
)()()(log xnExnxPw iwiwi
![Page 43: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/43.jpg)
Pseudo-Likelihood
Likelihood of each variable given its neighbors in the data [Besag, 1975]
Does not require inference at each stepConsistent estimatorWidely used in vision, spatial statistics, etc. But PL parameters may not work well for
long inference chains
i
ii xneighborsxPxPL ))(|()(
![Page 44: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/44.jpg)
Discriminative Weight Learning
Maximize conditional likelihood of query (y) given evidence (x)
Approximate expected counts by counts in MAP state of y given x
No. of true groundings of clause i in data
Expected no. true groundings according to model
),(),()|(log yxnEyxnxyPw iwiwi
![Page 45: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/45.jpg)
wi ← 0for t ← 1 to T do yMAP ← Viterbi(x) wi ← wi + η [counti(yData) – counti(yMAP)]return ∑t wi / T
Voted Perceptron
Originally proposed for training HMMs discriminatively [Collins, 2002]
Assumes network is linear chain
![Page 46: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/46.jpg)
wi ← 0for t ← 1 to T do yMAP ← MaxWalkSAT(x) wi ← wi + η [counti(yData) – counti(yMAP)]return ∑t wi / T
Voted Perceptron for MLNs
HMMs are special case of MLNsReplace Viterbi by MaxWalkSATNetwork can now be arbitrary graph
![Page 47: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/47.jpg)
Structure Learning
Generalizes feature induction in Markov nets Any inductive logic programming approach can be
used, but . . . Goal is to induce any clauses, not just Horn Evaluation function should be likelihood Requires learning weights for each candidate Turns out not to be bottleneck Bottleneck is counting clause groundings Solution: Subsampling
![Page 48: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/48.jpg)
Structure Learning
Initial state: Unit clauses or hand-coded KBOperators: Add/remove literal, flip sign Evaluation function:
Pseudo-likelihood + Structure prior Search: Beam [Kok & Domingos, 2005]
Shortest-first [Kok & Domingos, 2005]
Bottom-up [Mihalkova & Mooney, 2007]
![Page 49: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/49.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 50: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/50.jpg)
Alchemy
Open-source software including: Full first-order logic syntaxMAP and marginal/conditional inferenceGenerative & discriminative weight learning Structure learning Programming language features
alchemy.cs.washington.edu
![Page 51: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/51.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning SoftwareApplicationsDiscussion
![Page 52: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/52.jpg)
Applications
Information extraction Entity resolution Link prediction Collective classification Web mining Natural language
processing
Social network analysis Ontology refinement Activity recognition Intelligent assistants Etc.
![Page 53: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/53.jpg)
Information Extraction
Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.
![Page 54: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/54.jpg)
Segmentation
Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.
Author
Title
Venue
![Page 55: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/55.jpg)
Entity Resolution
Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.
![Page 56: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/56.jpg)
Entity Resolution
Parag Singla and Pedro Domingos, “Memory-EfficientInference in Relational Domains” (AAAI-06).
Singla, P., & Domingos, P. (2006). Memory-efficentinference in relatonal domains. In Proceedings of theTwenty-First National Conference on Artificial Intelligence(pp. 500-505). Boston, MA: AAAI Press.
H. Poon & P. Domingos, Sound and Efficient Inferencewith Probabilistic and Deterministic Dependencies”, inProc. AAAI-06, Boston, MA, 2006.
P. Hoifung (2006). Efficent inference. In Proceedings of theTwenty-First National Conference on Artificial Intelligence.
![Page 57: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/57.jpg)
State of the Art
Segmentation HMM (or CRF) to assign each token to a field
Entity resolution Logistic regression to predict same field/citation Transitive closure
Alchemy implementation: Seven formulas
![Page 58: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/58.jpg)
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue}citation = {C1, C2, ...}position = {0, 1, 2, ...}
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
![Page 59: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/59.jpg)
Types and Predicates
token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue, ...}citation = {C1, C2, ...}position = {0, 1, 2, ...}
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Optional
![Page 60: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/60.jpg)
Types and Predicates
Evidence
token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue}citation = {C1, C2, ...}position = {0, 1, 2, ...}
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
![Page 61: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/61.jpg)
token = {Parag, Singla, and, Pedro, ...}field = {Author, Title, Venue}citation = {C1, C2, ...}position = {0, 1, 2, ...}
Token(token, position, citation)InField(position, field, citation)SameField(field, citation, citation)SameCit(citation, citation)
Types and Predicates
Query
![Page 62: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/62.jpg)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
![Page 63: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/63.jpg)
Formulas
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
![Page 64: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/64.jpg)
Formulas
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
![Page 65: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/65.jpg)
Formulas
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
![Page 66: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/66.jpg)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
![Page 67: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/67.jpg)
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
Formulas
![Page 68: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/68.jpg)
Formulas
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
![Page 69: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/69.jpg)
Formulas
Token(+t,i,c) => InField(i,+f,c)InField(i,+f,c) ^ !Token(“.”,i,c) <=> InField(i+1,+f,c)f != f’ => (!InField(i,+f,c) v !InField(i,+f’,c))
Token(+t,i,c) ^ InField(i,+f,c) ^ Token(+t,i’,c’) ^ InField(i’,+f,c’) => SameField(+f,c,c’)SameField(+f,c,c’) <=> SameCit(c,c’)SameField(f,c,c’) ^ SameField(f,c’,c”) => SameField(f,c,c”)SameCit(c,c’) ^ SameCit(c’,c”) => SameCit(c,c”)
![Page 70: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/70.jpg)
Results: Segmentation on Cora
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
Tokens
Tokens + Sequence
Tok. + Seq. + Period
Tok. + Seq. + P. + Comma
![Page 71: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/71.jpg)
Results:Matching Venues on Cora
0
0.2
0.4
0.6
0.8
1
0 0.2 0.4 0.6 0.8 1
Recall
Pre
cis
ion
Similarity
Sim. + Relations
Sim. + Transitivity
Sim. + Rel. + Trans.
![Page 72: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/72.jpg)
Overview
Motivation BackgroundMarkov logic Inference Learning Software ApplicationsDiscussion
![Page 73: Machine Learning For the Web: A Unified View](https://reader035.vdocuments.us/reader035/viewer/2022081513/56814a7d550346895db7910d/html5/thumbnails/73.jpg)
Conclusion Web provides plethora of learning problems Machine learning provides plethora of solutions We need a unifying language Markov logic: Use weighted first-order logic
to define statistical models Efficient inference and learning algorithms
(but Web scale still requires manual coding) Many successful applications
(e.g., information extraction) Open-source software / Web site: Alchemy
alchemy.cs.washington.edu