Download - Semantic Role Labeling with support vector machines

1

Semantic Role Labeling with support vector machines

Yongjia Wang

2

An Intuitive Example

3

What the data looks like

4

General Ideas of SVM SRL

Model free classification Off-Line machine learning for information retrieval. Using linguistic information readily available from many standard tools: parsers, chunkers … Still need additional semantics related linguistic knowledge to generate final prediction.

Doesn’t come for free of course. Need manually/semi-automatically labeled training/testing data. Need other resources to compile the training data: WordNet, VerbNet … to provide pre-

defined frames. Go one step beyond syntactic structure

Still about shallow semantics Also called semantic parsing

Types Constituent-by-constituent (syntactic constituent): I took this approach Relation-by-relation (dependency relation) Word-by-word (finer grained) Hybrid: Combinations of multiple variants within the same type or across multiple types, the

final results are selected from alternatives with different ‘confidence’. Need global optimization. There are examples but no standards for this.

5

SRL General Procedure

Training Data Pruning Get rid of parsing errors Get unbiased training data – positive/negative examples for binary classifiers

Argument Identification Binary classifier. Can be tuned independent of classification

Argument Classification For n classes, train n binary classifieres instead of single n-class classifier Each class can be trained and tuned independently Reduce amount of data required Finer grained information for post-processing

Post-Processing Resolve conflicts by knowledge, as the previous classifications are purely local Global optimization , can be formalized more mathematically.

Evaluation Accuracy & Recall

6

The Project I Did

Pre-processing Parsing and other file processing Naïve data pruning, pick enough positive and negative data for each

label classifier.

Argument identification with libSVM (Ignored) Simple binary classification

Argument classification with libSVM (the main part about SVM) Local feature based classification using libSVM Compared tradeoff between performance and information gain

Post-processing (Simplified) Just take the classifier with highest probability adjusted by the label’s

background probability. No Conflict resolution and global optimization thereafter.

7

IssuesHuge Feature space

Prediction feature representation Option1: color {red, green, blue} as {0,1,2} Option2: as {(1,0,0), (0,1,0), (0,0,1)} Categorial features don’t have contingent relationships (e.g.

red is closer to green than to blue?), but just the way it is encoded. The numerical information will be misused if a single numerical value is used. It results in loss of information intuitively because of being overwhelmed by randomness.

Bit vector encoding makes all values orthogonal, but on the other hand increases feature space a lot

Feature selection Cannot do that gradually. e.g. 3127 verbs. Have to decide

whether to take 3127 more features or not.

8

IssuesData Prunning

Data Error Parser error Labeling error

Performance Issues Previous studies showed that good data pruning improve

performance

Computational issues Cannot afford to train each classifier using all data Pick subset of data containing enough positive and

negative examples

9

Rooms for Further ImprovementFeature Reduction

Grouping feature values Grouping verbs with similar semantics. Verb clustering is an separate issue has been

studied.

Factorize features For feature like ‘Path’, it’s possible to factorize

it, rather than treating every instance as orthogonal values

Download - Semantic Role Labeling with support vector machines

Top Related