1 classification of semantic relations in noun compounds using mesh marti hearst, barbara rosario...
Post on 21-Dec-2015
224 views
TRANSCRIPT
![Page 1: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/1.jpg)
1
Classification of Semantic Relations in Noun Compounds using MeSH
Marti Hearst, Barbara RosarioSIMS, UC Berkeley
![Page 2: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/2.jpg)
2
LINDI Project Synopsis
Goal: Extract semantics from textMethod: statistical corpus analysisFocus: BioMedical text Interesting inferences (Swanson)Rich lexical resourcesDifficult NLP problems
Noun Compounds
![Page 3: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/3.jpg)
3
Noun Compounds (NCs)
Any sequence of nouns that itself functions as a noun asthma hospitalizations asthma hospitalization rates bone marrow aspiration needle health care personnel hand wash
Technical text is rich with NCs Open-labeled long-term study of the subcutaneous sumatriptan efficacy and tolerability in acute migraine treatment.
![Page 4: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/4.jpg)
4
NCs: 3 computational tasks
(Lauer & Dras ’94)IdentificationSyntactic analysis (attachments)
Baseline headache frequency Tension headache patient
Semantic analysis Headache treatment treatment for
headache Corticosteroid treatment treatment that uses
corticosteroid
[ ][ ][ ][ ]
![Page 5: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/5.jpg)
5
NC Semantic Relations
Linguistic theories regarding the nature of the relations between constituents in NCs all conflict. J. Levi ‘78P. Downing ’77B. Warren ‘78
![Page 6: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/6.jpg)
6
NC Semantic relations38 Relations found by iterative refinement based on 2245 NCsGoals:More specific than case rolesGeneral enough to aid coverageAllow for domain-specific relations
![Page 7: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/7.jpg)
7
Semantic relationsExamples
Frequency/time of influenza season, headache interval
Measure of relief rate, asthma mortality, hospital survival
Instrument aciclovir therapy, laser irradiation, aerosol treatment
“Purpose” headache drugs, voice therapy, influenza treatment
Defect hormone deficiency, csf fistulas, gene mutation
Inhibitor Adrenoreceptor blockers, influenza prevention
![Page 8: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/8.jpg)
8
Multi-class Assignment
Some NCs can be describe by more than one semantic relationships
eyelid abnormalities : location and defectfood allergy: cause and activator cell growth: change and activitytumor regression:change and
ending/reduction
![Page 9: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/9.jpg)
9
Extraction of NCs
1. Titles and abstracts from Medline (medical bibliographic database)
2. Part of Speech Tagger3. Extraction of sequences of units
tagged as nouns4. Collection of 2245 NCs with 2
nouns
![Page 10: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/10.jpg)
10
Models
Lexical (words) headache pain
Class based model using MeSH descriptors for levels of descriptions MeSH 2: C.23 G.11
MeSH 3: C23.888 G11.561
MeSH 4: C23.888.592 G11.561.796
MeSH 5: C23.888.592 G11.561.796
MeSH 6: C23.888.592.612 G11.561.796 .444
![Page 11: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/11.jpg)
11
MeSH Tree Structures 1. Anatomy [A] 2. Organisms [B] 3. Diseases [C] 4. Chemicals and Drugs [D] 5. Analytical, Diagnostic and Therapeutic Techniques and Equipment [E] 6. Psychiatry and Psychology [F] 7. Biological Sciences [G] 8. Physical Sciences [H] 9. Anthropology, Education, Sociology and Social Phenomena [I] 10. Technology and Food and Beverages [J] 11. Humanities [K] 12. Information Science [L] 13. Persons [M] 14. Health Care [N] 15. Geographic Locations [Z]
![Page 12: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/12.jpg)
12
MeSH Tree Structures 1. Anatomy [A] Body Regions [A01] + Musculoskeletal System [A02] Digestive System [A03] + Respiratory System [A04] + Urogenital System [A05] + Endocrine System [A06] + Cardiovascular System [A07] + Nervous System [A08] + Sense Organs [A09] + Tissues [A10] + Cells [A11] + Fluids and Secretions [A12] + Animal Structures [A13] + Stomatognathic System [A14] (…..)
Body Regions [A01] Abdomen [A01.047]
Groin [A01.047.365] Inguinal Canal [A01.047.412] Peritoneum [A01.047.596] + Umbilicus [A01.047.849]
Axilla [A01.133] Back [A01.176] + Breast [A01.236] + Buttocks [A01.258] Extremities [A01.378] + Head [A01.456] + Neck [A01.598] (….)
![Page 13: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/13.jpg)
13
Mapping Nouns to MeSH Concepts
headache recurrence C23.888.592.612.441 C23.550.291.937
headache pain C23.888.592.612.441 G11.561.796.444
breast cancer cells A01.236 C04 A11
![Page 14: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/14.jpg)
14
Levels of Descriptionheadache pain (C23.888.592.612.441 G11.561.796.444)
Only Tree: C G C(Diseases) G (Biological Sciences)
Level 1 : C 23 G 11 C 23 (Diseases: Pathological Conditions) G 11 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology)
Level 2 : C 23 888 G 11 561 C 23.888 (Diseases:Pathological Conditions: Signs and symptoms) G 11.561 (Biological Sciences: Musculoskeletal, Neural, and Ocular Physiology:Nervous
System Physiology)
Level 3 : C 23 888 592 G 11 561 796 C 23.888.592 (Diseases :Pathological Conditions: Signs and symptoms: Neurologic
Manifestations) G 11.561.796 (Biological Sciences: Musculoskeletal, Neural, and Ocular
Physiology:Nervous System Physiology:Sensation)
![Page 15: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/15.jpg)
15
Classification Task & Method
Multi-class (18) classification problem
Multi layer Neural Networks to classify across all relations simultaneously.
Evaluation: distinguish between Seen: NCs where 1 or 2 words appeared in the
training set Unseen: NCs in which neither word appeared in
the training set
![Page 16: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/16.jpg)
16
Accuracy for 18-way Classification
Training
855 NCs
(50%)
Testing:
805 NCs
(75 unseen)
Correct answer in first two (71%-73%)
Correct answer ranked first (61%-62%)
Correct answer in first three (76%-78%)
Baseline (guessing most frequent class)
Lexical
MeSH
![Page 17: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/17.jpg)
17
Accuracies for 18-way classification: generalization on unseen NCs
Training:
73 NCs
(5%)
Testing:
1587 NCs
(810 unseen)
(95%)
MeSH
Lexical
MeSH on
unseen
Lexical on
unseen
![Page 18: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/18.jpg)
18
Accuracies by Unseen Noun
Training:
73 NCs
(5%)
Testing:
1587 NCs
(810 unseen)
(95%)
Case 1: first N unseen
Case 2: second N unseen
Case 3: both N seen
Case 4: neither N seen
![Page 19: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/19.jpg)
19
Accuracy for each relation
![Page 20: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/20.jpg)
20
Accuracy for sample relations
Produces (genetic)
Ex. Test Set:thymidine alleletumor dna csf mrna acetylase gene virion rna (…)
![Page 21: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/21.jpg)
21
Accuracy for sample relations
Frequency/time of
Test Set:disease recurrenceheadache recurrenceenterovirus seasoninfluenza seasonmosquito seasonpollen seasondisease stagetranscription stagedrive timeinjection timeischemia timetravel time
![Page 22: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/22.jpg)
22
Accuracy for sample relations
Purpose
Test Set:varicella vaccine tb vaccines poliovirus vaccine influenza vaccinationinfluenza immunizationabscess drainage acne therapy asthma therapy asthma treatment carcinogen treatment disease treatment hiv treatment
![Page 23: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/23.jpg)
23
Related work
Finin (1980) Detailed AI analysis, hand-coded
Rindflesch et al. (2000) Hand-coded rule base to extract certain
types of assertions
![Page 24: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/24.jpg)
24
Related workVanderwende (1994) automatically extracts semantic information from an on-line
dictionary manipulates a set of handwritten rules 13 classes 52% accuracy
Lapata (2000) classifies nominalizations into subject/object binary distinction 80% accuracy
Lauer (1995): probabilistic model 8 classes 47% accuracy
![Page 25: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/25.jpg)
25
Related workPrepositional Phrase Attachment The problem
Eat spaghetti with a fork Eat spaghetti with sauce V N1 P N2
Attachment/association, not semantics Approaches
Word occurrences (Hindle & Rooth ’93) Using a lexical hierarchy
Conceptual association (Resnik ’93, Resnik & Hearst ’93) Transformation-based (Brill & Resnik ’94) MDL to find optimal tree cut (Li & Abe ’98)
Lindi: use ML techniques to determine appropriate level of lexical hierarchy, classify into semantic relations
![Page 26: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/26.jpg)
26
ConclusionsA simple method for assigning semantic relations to noun compounds Does not require complex hand-coded rules Does make use of existing lexical resources
High accuracy levels for an 18-way class assignment Small training set gets ~60% accuracy on
mixed seen and unseen words Tiny training set (73 NCs) gets ~40%
accuracy on entirely unseen words Off-the-shelf, unoptimized ML algorithms
![Page 27: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/27.jpg)
27
Future work
Analysis of cases where it doesn’t workNC with > 2 termsHow to generalize patterns found for noun compounds to other syntactic structures? How can we best formally represent semantics?How can we deal with non medical words? Should we use other ontologies (e.g.,WordNet)?
![Page 28: 1 Classification of Semantic Relations in Noun Compounds using MeSH Marti Hearst, Barbara Rosario SIMS, UC Berkeley](https://reader030.vdocuments.us/reader030/viewer/2022032521/56649d585503460f94a373bf/html5/thumbnails/28.jpg)
28
Using Relations
Eventual plan: combine relations with constituents’ ontology membershipsExamples
Instrument_2 (biopsy,needle) -> Instrument_2(Diagnostic, Tool)
Procedure(brain,biopsy) -> Procedure(Anatomical-Element, Diagnostic)
Procedure(tumor, marker) -> Procedure(Disease-element, Indicator)