semantic role labelling (srl) for information extraction of bio-medical data

Post on 21-Nov-2014

87 Views

Category:

Technology

5 Downloads

Preview:

Click to see full reader

DESCRIPTION

Information Extraction (IE) focuses on retrieving certain type of information from natural language texts by automatic processing. IE plays an important role in biomedical domain since the knowledge within this area is significantly growing. Relation between entities within this domain can facilitate in various tasks within this domain. Thus this thesis work focuses on extracting semantic information using the concept of semantic role labeling (SRL) with the help of background knowledge sources like ontology.

TRANSCRIPT

1

Thesis Final Presentation

Semantic Role Labeling for Information Extraction of Bio-medical data

Shikha Jacob Mathew Vinith Varghese

School of Engineering, Jönköping University

03/26/2014

2

Agenda

Description of the problem

Purpose of this study

Methodology

IE System

Results

Conclusion & Future Work

3

Description of the problem

Biomedical domain is flooded with large amount of information.

Research questions

• How to improvise the Natural Language Processing (NLP) components with increased performance so as to extract high quality information from biomedical domain?

• To find a solution using domain specific knowledge to generate high quality relations between different entities within the domain based on Semantic Role Labeling (SRL)?

StructuredExtract relevent

and useful Manage

4

Purpose of this study

Develop a useful Information Extraction (IE) system that extracts relation between entities using information obtained from SRL within the biomedical domain.

This is accomplished by introducing two features:

Name Entity Recognition Ontology

5

Methodology: Research Approach (1/2)

Design Science Research

Quantitative Evaluation

Adaptive Software Development

6

Methodology: Research Framework (2/2)

The general methodology of design science research

7

IE System - Framework (1/2)

8

IE System - Framework (2/2)

Relational Detector:The steps include:

Dependency Parser- Parser Model

Semantic Role Labeler

9

Semantic Role Labeling (SRL)

• SRL process includes: Pre-processing Argument Identification Argument Classification Post-Processing

• Features used: Word Form, Lemmatizer, POS tagging, Head Word, Dependency Label. Introduced domain specific Features: Ontology, NER.

10

SRL Example

A0 is Agent A1: Patient or Theme

11

Domain Specific Name Entity Recognizer (NER)

The Concept Used for NER:

• Conditional Random Field (CRF): Statistical Modeling Method Pattern Recognition Structured Prediction.

• Use: Argument’s Boundary Identification• Patterns from POS Tag.

12

Domain-specific NER Example

13

Domain Specific Ontology

• Conceptual knowledge organised in a computer based representation.

• IE needs ontologies for interpreting the texts and extracting relevant information .

• Metathesaurus and Semantic Network• UIMA Semantic Types

Broad categories for concepts (Metathesaurus) Use: Predicate Identification: Process/function ST

14

Results: Predicate Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

Evaluation criteria: Precision, Recall, F1-measure

15

Results: Predicate Identification (2/2)

• More biomedical predicates identified• Drawbacks:

few false negatives missing predicates

16

Results: Argument Identification (1/2)

P R F10

20

40

60

80

100

Before Feature

After Feature

17

Results: Argument Identification(2/2)

• Boundary of the predicate is small

[John is playing with a bat ] and a ball.

• Lack of identifying predicates

18

Conclusion (1/2)

High quality information extraction

More predicates-more arguments-more relations

Biomedical Field-Researcher:

Integrated information- further study/investigation Manage and structure Easy access

Predicate ArgumentArgument

RELATIONS

19

Conclusion (2/2)

Drawbacks

Speed

Missing predicates

False Negatives

Small predicate boundary

20

Further investigation

Feature Engineering based on context of the text

Name entity classification-NER

How to introduce features so as to not compromise the performance of the system

Catagories, relationships, synonyms

Clause Boundary/Proportional attachments

For making the system specific

Speed/Performance

Predicate boundary identification

Better result-Ontology Information

21

Thank You!!

Questions

top related