logistics course reviews project report deadline: march 16 poster session guidelines: – 2.5...
Post on 19-Dec-2015
214 views
TRANSCRIPT
![Page 1: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/1.jpg)
Logistics
• Course reviews• Project report deadline: March 16• Poster session guidelines:– 2.5 minutes per poster (3 hrs / 55 minus overhead)– presentations will be videotaped– food will be provided
![Page 2: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/2.jpg)
Task: Named-Entity Recognition in new corpus
![Page 3: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/3.jpg)
![Page 4: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/4.jpg)
Named-Entity Recognition
• Fragment of an example sentence:
Julian Assange accused the United
PER PER Other Other LOC
![Page 5: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/5.jpg)
NER as Machine Learning
• Fragment of an example sentence:
Julian Assange accused the United
PER PER Other Other LOC
Yi
Xi
Word label {Other, LOC, PER, ORG}
Some feature representation of the word
![Page 6: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/6.jpg)
Feature Vector: Three ChoicesWords:
current wordContext:
current word, previous word, next wordFeatures:
current word, previous word, next wordis the word capitalized?"word shape" (compact summary of orthographic information, like internal digits and punctuation)prefixes up to length 5, suffixes up to length 5any word in a +/- six word window (*not* differentiated by position the way previous word and next word are)
![Page 7: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/7.jpg)
Discriminative vs Generative I
Y
X
AssangeCapitalized=1Previous=Julian POS= noun
Y
X
AssangeCapitalized=1Previous=Julian POS= noun
![Page 8: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/8.jpg)
Generative vs Discriminative I
NB LR0
10
20
30
40
50
60
70
80
90
WordsContextFeatures
• 10K training words from CoNLL (British newswire) looking only for PERSON
• Metric: F1
51.3
59.1
70.8
52.8
65.5
81.5
![Page 9: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/9.jpg)
Do More Features Always Help?
• How do we evaluate multiple feature sets?– On validation set, not test set!
• Detecting underfitting– Train & test performance similar and low
• Detecting overfitting– Train performance high, test performance low
• The same holds every time we want to consider models of varying complexity!
![Page 10: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/10.jpg)
Sequential Modeling
• Fragment of an example sentence:
Julian Assange accused the United
PER PER Other Other LOC
Yi
Xi
Random variable with domain {Other, LOC, PER, ORG}
Random variable for vector of features about the word
![Page 11: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/11.jpg)
Hidden Markov Model (HMM)
Y1 Y2 Y4 Y5Y3
X1 X2 X4 X5X3
Julian Assange accused the United
![Page 12: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/12.jpg)
Hidden Markov Model (HMM)
Julian Assange accused the United
Y1 Y2 Y4 Y5Y3
X1 X2 X4 X5X3
![Page 13: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/13.jpg)
Hidden Markov Model (HMM)
Julian
Assange
accused the UnitedCapitalized=1
Previous=Julian
POS= noun
Y1 Y2 Y4 Y5Y3
X1
X2
X4 X5X3
![Page 14: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/14.jpg)
Advantage of Sequential Modeling
NB HMM0
10
20
30
40
50
60
70
80
wordscontextfeatures
51.3
59.1
70.8
57.461.8
70.8
Reminder: Plain logistic regression gives us 81.5!
![Page 15: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/15.jpg)
Max Entropy Markov Model (MEMM)• Markov chain over Xi’s
• Each Xi has logistic regression CPD given Yi
X1 X2 X4 X5X3
Y1
Y2
Y4 Y5Y3
Julian
Assange
accused the UnitedCapitalized=1
Previous=Julian
POS= noun
![Page 16: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/16.jpg)
Max Entropy Markov Model (MEMM)• Pro: uses features in a powerful way• Con: downstream evidence doesn’t help because of v-structures
X1 X2 X4 X5X3
Y1
Y2
Y4 Y5Y3
Julian
Assange
accused the UnitedCapitalized=1
Previous=Julian
POS= noun
![Page 17: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/17.jpg)
NB HMM MEMM0
10
20
30
40
50
60
70
80
90
wordscontextfeatures
51.3
59.1
70.8
57.461.8
70.8
MEMM vs HMM vs NB
59.1
68.3
84.6
Finally beat logistic regression!
![Page 18: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/18.jpg)
Conditional Random Field (CRF)
Julian Assange accused the United
Y1 Y2 Y4 Y5Y3
X1 X2 X4 X5X3
![Page 19: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/19.jpg)
Comparison: Sequence Models
HMM MEMM CRF0
10
20
30
40
50
60
70
80
90
100
WordsContextFeatures
59.1
68.3
84.6
59.6
70.2
85.8
57.461.8
70.8
![Page 20: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/20.jpg)
Tradeoffs in Learning I
• HMM– Simple closed form solution
• MEMM – Gradient ascent for parameters of logistic P(Yi | Xi)– But no inference required for learning
• CRF– Gradient ascent for all parameters– Inference over entire graph required at each
iteration
![Page 21: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/21.jpg)
Tradeoffs in Learning: II
• Can we learn from unsupervised data?• HMM– Yes, using EM
• MEMM/CRF– No
• Discriminative objective: maximize log P(Y | X)– But if Y is not observed, we can’t maximize its
probability
![Page 22: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/22.jpg)
PGMs and ML
• PGMs deal well with predictions of structured objects (sequences, graphs, trees)– Exploit correlations between multiple parts of the
prediction task• Can easily incorporate prior knowledge into
model• Learned model can often be used for multiple
prediction tasks• Useful framework for knowledge discovery
![Page 23: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/23.jpg)
Inference• Exact marginals?– Clique tree calibration gives all marginals– Final labeling might not be jointly consistent
• Approximate marginals?– Doesn’t make sense in this context
• MAP?– Gives single coherent solution– Hard to get ROC curves (tradeoff precision & recall)
![Page 24: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/24.jpg)
Mismatch of Objectives• MAP inference optimizes LL = log P(Y | X)• Actual performance metric is usually different (e.g., F1)• Performance is best if we can get these two metrics to
be relatively well-aligned– If MAP assignment gets significantly lower F1 than ground
truth, model needs to be adjusted
• Very useful for debugging approximate MAP– If LL(y*) >> LL(yMAP)– If LL(y*) << LL(yMAP)
- algorithm found local optimum- LL bad surrogate for objective
![Page 25: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/25.jpg)
Richer Models
Julian Assange accused the United
said Stephen, Assange’s laywer to
Y1 Y2 Y4 Y5Y3
X1 X2 X4 X5X3
Y101 Y102 Y104 Y105Y103
X101 X102 X104 X105X103
![Page 26: Logistics Course reviews Project report deadline: March 16 Poster session guidelines: – 2.5 minutes per poster (3 hrs / 55 minus overhead) – presentations](https://reader030.vdocuments.us/reader030/viewer/2022032800/56649d2d5503460f94a049e2/html5/thumbnails/26.jpg)
Summary
• Foundation I: Probabilistic model– Coherent treatment of uncertainty– Declarative representation:• separates model and inference• separates inference and learning
• Foundation II: Graphical model– Encode and exploit structure for compact
representation and efficient inference– Allows modularity in updating the model