fine-grained soft semantic constraints yuval marton university of maryland ymarton/pub/ibm/hybrid...
TRANSCRIPT
![Page 1: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/1.jpg)
Fine-Grained Soft
Semantic Constraints
Yuval MartonUniversity of Maryland
http://umiacs.umd.edu/~ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt
![Page 2: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/2.jpg)
Yuval Marton, IBM talk 2
Why Care?
Tell’em apart:
In spite of similar contexts
These, too:
In spite of same form
![Page 3: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/3.jpg)
Yuval Marton, IBM talk 3
Road map
• Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
![Page 4: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/4.jpg)
Yuval Marton, IBM talk 4
Dissertation Theme
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Syntactic and Semantic Constraints– Soft Constraints– Fine-Grained– Syntactic (parsing)– Semantic (“concepts”, paraphrases)
• Evaluated in – Word-pair semantic similarity ranking and – Statistical Machine Translation (SMT)
![Page 5: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/5.jpg)
Yuval Marton, IBM talk 5
Soft Constraints
• Hard constraints– [0,1]; in/out– Decrease search space– “structural zeroes”– Theory-driven– Faster, slimmer
• Soft constraints– [0..1]; fuzzy– Only bias the model– Data-driven: Let patterns emerge
Universe
Hard
Universe
Soft
![Page 6: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/6.jpg)
Yuval Marton, IBM talk 6
Fine-grained
• Granularity is a big deal– Soft syntactic constraints in SMT
• Chiang 2005 vs. Marton and Resnik 2008
• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking
• Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009
• Pos results better results
![Page 7: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/7.jpg)
Yuval Marton, IBM talk 7
Soft Syntactic Constraints• X X1 speech ||| X1 espiche
– What should be the span of X1?
• Chiang’s 2005 constituency feature– Reward rule’s score if rule’s
source-side matches a constituent span
– Constituency-incompatible emergent patterns can still ‘win’ (in spite of no reward)
– Good idea -- Neg-result • But what if…
![Page 8: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/8.jpg)
Yuval Marton, IBM talk 8
Rule granularity
• Chiang: Single weight for all constituents (parse tags)
• … But what if we can assign a separate feature and weight for each constituent?
• E.g., NP-only: (NP= )
• Or VP-only: (VP= )
![Page 9: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/9.jpg)
Yuval Marton, IBM talk 9
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
– Soft semantic constraints in word-pair similarity ranking
• Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009
• Pos results better results
![Page 10: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/10.jpg)
Yuval Marton, IBM talk 10
Word-pair similarity ranking
• Give each word pair a similarity score– Rooster – voyage– Coast – shore
• Noun-noun (Rubinstein & Goodenough, 1965)
• Verb-verb (Resnik & Diab, 2000)
• Result: list of pairs ordered by similarity• Spearman rank correlation
![Page 11: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/11.jpg)
Yuval Marton, IBM talk 11
Similarity measures
• Distributional profiles (DP)– Which words did I occur next to?
• Context vectors
• Similar vectors similar meaning
![Page 12: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/12.jpg)
Yuval Marton, IBM talk 12
Bank (pure word-based)
Bank
![Page 13: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/13.jpg)
Yuval Marton, IBM talk 13
Bank (pure concept-based)
BankTellerMoney
…
Financial Institution
Water
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
![Page 14: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/14.jpg)
Yuval Marton, IBM talk 14
Bank (Hybrid Model)
BankRiverBankFin.Inst
![Page 15: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/15.jpg)
Yuval Marton, IBM talk 15
Fine-grained
• Granularity is a big deal
Soft syntactic constraints in SMT• Chiang 2005 vs. Marton and Resnik 2008• Neg results pos results
Soft semantic constraints in word-pair similarity ranking
• Mohammad and Hirst 2006 vs. Marton, Mohammad and Resnik 2009
• Pos results better results
![Page 16: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/16.jpg)
Yuval Marton, IBM talk 16
Unified Model
• Soft constraints in a log-linear model– Syntactic
– Semantic
– …
• ihi(x)
• Constraints = Add more terms to the sum
![Page 17: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/17.jpg)
Yuval Marton, IBM talk 17
Road map
Brief overview of doctoral work
• Hybrid knowledge / corpus-based semantic similarity methods– Pure and hybrid methods
– Hard and soft constraints
– Fine-grained
– Named-entities
![Page 18: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/18.jpg)
Yuval Marton, IBM talk 18
Distributional profiles (DPs)
• Distributional Hypothesis (Harris 1940; Firth 1957)
• First order vs. second order (vector representation)
• Strength of association– Counts, PMI, TF/IDF-based,
Log-likelihood ratios …
• Vector similarity (cosine, L1, L2,..)
word x word
Bush Obama
President .93 .96
Democrat .13 .89
Republican .88 .15
White-house
.76 .91
… .45 .74
![Page 19: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/19.jpg)
Yuval Marton, IBM talk 19
Taxonomies and Groupings
• WordNet– Synsets– Relations (“is-a”)– Arc distance– The tennis problem
• UMLS• Thesaurus
– Flat– Coarse – Implicit relations,
potentially non-classical
job
Academic job
Is-a
Postdoc
Is-a
Industry job
Is-a
CEO
Is-a
![Page 20: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/20.jpg)
Yuval Marton, IBM talk 20
Hybrid measures
• WordNet– Resnik’s method (info content)– Lin and others
• Thesaurus Concept-based – Mohammad and Hirst (coarse-grained)– Distance b/w most similar senses– Pro: Semantic relatedness (non-classical relations)
Resource-poor languages and domains– Con: Small thesaurus low applicability
Bankriver = water ??
![Page 21: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/21.jpg)
Yuval Marton, IBM talk 21
Concept-Word DPs
• Concept-word collocation matrix
• Aggregate collocation info of words under concept
• Potentially iterative process
• Clean-up
Concept x word
Fin.Inst Water
bank .97 .85
teller .88 .07
money .94 .15
water .32 .91
… .45 .74
![Page 22: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/22.jpg)
Yuval Marton, IBM talk 22
Use concept-based DPs to bias word-based DPs
Bank
BankTellerMoney
…
WaterFinancial Institution WaterFinancial Institution
RiverBankWater
…
–Compare closest senses
–Bankriver = water ??
BankRiverBankFin.Inst
+
=
![Page 23: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/23.jpg)
Yuval Marton, IBM talk 23
Fine-grained soft constraints
• DPWS: distributional profile of word senses
• Use concept-based DPs to bias word-based DPs– Hybrid-filtered
– Hybrid-proportional
![Page 24: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/24.jpg)
Yuval Marton, IBM talk 24
Hybrid-filteredFin.Inst concept
DP
Water concept
DP
bank
DP
bankriver
DPWS
bank .97 .85 .76 .76
teller .88 .07 .54 .54
money .94 .15 .68 .68
water .00 .91 .62 .00
… .45 .74 .25 .25
Filter out collocates in word DP,
if not appearing in concept DP
![Page 25: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/25.jpg)
Yuval Marton, IBM talk 25
Hybrid-proportional
Fin.Inst concept
DP
Water concept
DP
bank
DP
bankriver
DPWS
bank .97 .85 .76 .33
teller .88 .07 .54 .05
money .94 .15 .68 .08
water .00 .91 .62 .00
… .45 .74 .25 .15
Only discount collocate’s value in word DP in proportion to the ratio of its count in current concept DP relative to all concept DPs of the target word
![Page 26: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/26.jpg)
Yuval Marton, IBM talk 26
WSD with DPWS
• Each sense of each word has a unique profile
– Bankfin.inst ≠ Bankriver ≠ water !
• Pro:– Not aggregated: unlike concept DPs
– Non/less smearing: unlike word DPs that smear all senses in a single profile
![Page 27: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/27.jpg)
Yuval Marton, IBM talk 27
Results
![Page 28: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/28.jpg)
Yuval Marton, IBM talk 28
evaluation
• Word-pair similarity ranking– Spearman Rank correlation
• Paraphrasing in SMT– BLEU, TER, METEOR, ..
![Page 29: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/29.jpg)
Yuval Marton, IBM talk 29
comparison
• WordNet results
• LSA results
![Page 30: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/30.jpg)
Yuval Marton, IBM talk 30
Challenges
• Antonyms (black – white)
• “Hyperonyms” (vehicle – car)
• Co-hypernyms / co-taxonyms
![Page 31: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/31.jpg)
Yuval Marton, IBM talk 31
conclusion
• Hybrid Knowledge/Corpus-Based Statistical NLP Models Using Fine-Grained Soft Constraints– Soft Constraints
– Fine-Grained
– Semantic (“concepts”)
– Semantic relatedness,resource-poor setting, special domains
Univ.
Soft
BankRiverBankFin.Inst
![Page 32: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/32.jpg)
Yuval Marton, IBM talk 32
Thank you!
Questions?
Advisors: Philip Resnik & Amy Weinberg
Department of Linguistics and CLIP Lab
![Page 33: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/33.jpg)
Paraphrase generation
For some target OOV phrase Phr:
• Build distributional profile DPPhr
• Gather contexts of Phr
• Gather paraphrase candidates
• Score / Rank candidates
• Output K-best candidates
Paraphrase Generation for Phr
Build distributional profile DPPhr
Gather contexts of Phr
Gather paraphrase candidates
Score / Rank candidates
Output K-best candidates
![Page 34: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/34.jpg)
Distributional Profiles
• Example of collocational distributional profile (DP) for word “cord”:
• Sliding window(+/- 6 tokens)
• SoA: conditional probability (CP), mutual info (PMI), log-likelihood ratios (LLR), …
• Using LLR
Collocate Co-occurrence Count
Strength-of-Association (SoA)
Hanging 8 12.20
Ventral 6 18.44
Trousers 14 62.19
… … …
Paraphrase Generation for Phr
Build distributional profile DPPhr
Gather contexts of Phr
Gather paraphrase candidates
Score / Rank candidates
Output K-best candidates
![Page 35: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/35.jpg)
DP Similarity
• Each DP is represented as a vector
• Use any vector similarity
• Using cosine: cos(DPcord , DPrope)
• Example: estimating similarity between “cord” and “rope”:
SoA with “cord”
12.20
18.44
62.19
…
SoA with “rope”
10.43
4.97
31.82
…
Paraphrase Generation for Phr
Build distributional profile DPPhr
Gather contexts of Phr
Gather paraphrase candidates
Score / Rank candidates
Output K-best candidates
![Page 36: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/36.jpg)
Gather contexts
• Gather all contexts L _ R for “cord”:
• Length of context: start small, increase if too frequent
Left context (L) _ Right context (R)
A full cord is a large amount of wood.
History of the Cord 810 and 812
a soft tufted cord used in embroidery
a knotted cord that runs out from a reel
the cord of his electric razor.
living well after spinal
cord injury or disease
… cord …
![Page 37: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/37.jpg)
Gather paraphrase candidates
• What else appears b/w L _ R ?
Left context (L) _ Right context (R)
A full wave analysis is required since it
is a large amount of electromagnetic
History of the world since his death in
810
a soft tufted soft tufted cord of silk, cotton, or worsted
used in embroidery
a knotted rope that runs out
the cable of his electric razor.
spinal accessory nerve injury
… … …
![Page 38: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/38.jpg)
Score / Rank candidates
• Measure distributional similarityof target (“cord”) with each candidate:
candidate score
rope cos(DPcord , DPrope) = .83
cable cos(DPcord , DPrope) = .79
accessory nerve cos(DPcord , DPaccessory nerve) = .46
world since his death in
cos(DPcord , DPworld since his death in) = .03
… …
![Page 39: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/39.jpg)
Output k-best candidates
• K = 20
• Limit span between L _R to 10 tokens
• Use best candidates to augment phrase table
![Page 40: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/40.jpg)
Some real examples (unigrams)
•
![Page 41: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/41.jpg)
Some real examples (ngrams)
•
![Page 42: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/42.jpg)
English to Chinese
• 29k line subset created to emulate low density language setting
![Page 43: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/43.jpg)
Spanish to English
•
![Page 44: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/44.jpg)
Comparison with Pivoting
•
![Page 45: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/45.jpg)
Comparison with Pivoting
• Pivoting is subject to translational “shift”– Due to double translation step
• Pivoting suffers from having function words as top candidates– Perhaps by-product of their alignment
“promiscuity”
• Monolingual paraphrases suffer from having antonyms as top candidates
![Page 46: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/46.jpg)
Monolingually-Derived Paraphrases: Advantages
• Significant gains in SMT results for small sets
• Good for resource-poor languages
• Not relying on bitexts (a limited resource)
• Larger monolingual paraphrase training set
yields better paraphrases
• General: Can plug in any similarity measure
![Page 47: Fine-Grained Soft Semantic Constraints Yuval Marton University of Maryland ymarton/pub/ibm/Hybrid Knowledge-CorpusBasedSem-IBM_090728.ppt](https://reader034.vdocuments.us/reader034/viewer/2022051821/56649d0e5503460f949e3252/html5/thumbnails/47.jpg)
Challenges
• Quality: distributional paraphrases suffer from high ranking antonyms, co-hypernyms
• Smaller gains than the pivoting technique Callison-Burch et al. (2006), but can scale up.
• How to benefit from POS and syntactic info e.g, Callison-Burch (2008)
• How to benefit from semantic info / WSDe.g., Marton, Mohammad & Resnik 2009; Erk & Pado 2008
• Scaling: need to explore if can get gains on bigger SMT sets before exhausting capacity of handling huge monolingual set.