1
Semantic Role Labeling with support vector machines
Yongjia Wang
2
An Intuitive Example
3
What the data looks like
4
General Ideas of SVM SRL
Model free classification Off-Line machine learning for information retrieval. Using linguistic information readily available from many standard tools: parsers, chunkers … Still need additional semantics related linguistic knowledge to generate final prediction.
Doesn’t come for free of course. Need manually/semi-automatically labeled training/testing data. Need other resources to compile the training data: WordNet, VerbNet … to provide pre-
defined frames. Go one step beyond syntactic structure
Still about shallow semantics Also called semantic parsing
Types Constituent-by-constituent (syntactic constituent): I took this approach Relation-by-relation (dependency relation) Word-by-word (finer grained) Hybrid: Combinations of multiple variants within the same type or across multiple types, the
final results are selected from alternatives with different ‘confidence’. Need global optimization. There are examples but no standards for this.
5
SRL General Procedure
Training Data Pruning Get rid of parsing errors Get unbiased training data – positive/negative examples for binary classifiers
Argument Identification Binary classifier. Can be tuned independent of classification
Argument Classification For n classes, train n binary classifieres instead of single n-class classifier Each class can be trained and tuned independently Reduce amount of data required Finer grained information for post-processing
Post-Processing Resolve conflicts by knowledge, as the previous classifications are purely local Global optimization , can be formalized more mathematically.
Evaluation Accuracy & Recall
6
The Project I Did
Pre-processing Parsing and other file processing Naïve data pruning, pick enough positive and negative data for each
label classifier.
Argument identification with libSVM (Ignored) Simple binary classification
Argument classification with libSVM (the main part about SVM) Local feature based classification using libSVM Compared tradeoff between performance and information gain
Post-processing (Simplified) Just take the classifier with highest probability adjusted by the label’s
background probability. No Conflict resolution and global optimization thereafter.
7
IssuesHuge Feature space
Prediction feature representation Option1: color {red, green, blue} as {0,1,2} Option2: as {(1,0,0), (0,1,0), (0,0,1)} Categorial features don’t have contingent relationships (e.g.
red is closer to green than to blue?), but just the way it is encoded. The numerical information will be misused if a single numerical value is used. It results in loss of information intuitively because of being overwhelmed by randomness.
Bit vector encoding makes all values orthogonal, but on the other hand increases feature space a lot
Feature selection Cannot do that gradually. e.g. 3127 verbs. Have to decide
whether to take 3127 more features or not.
8
IssuesData Prunning
Data Error Parser error Labeling error
Performance Issues Previous studies showed that good data pruning improve
performance
Computational issues Cannot afford to train each classifier using all data Pick subset of data containing enough positive and
negative examples
9
Rooms for Further ImprovementFeature Reduction
Grouping feature values Grouping verbs with similar semantics. Verb clustering is an separate issue has been
studied.
Factorize features For feature like ‘Path’, it’s possible to factorize
it, rather than treating every instance as orthogonal values