1 semantic indexing with typed terms using rapid annotation 16th of august 2005 tke-05 workshop on...
TRANSCRIPT
![Page 1: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/1.jpg)
1
Semantic Indexing with Typed Terms usingRapid Annotation
16th of August 2005
TKE-05 Workshop on Semantic Indexing, Copenhagen
Chris BiemannUniversity of Leipzig
![Page 2: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/2.jpg)
2
Outline• The benefits of typed terms and relations
• Alleviating the ontology bottleneck
• Rapid annotation
• Sources for annotation candidates
• Annotation tools
• Case study: Annotation of „Deutscher Wortschatz“
• Conclusion
![Page 3: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/3.jpg)
3
Typed terms and relations
The bag of words model treats all terms equally• Document similarity based on all terms• No views on data possible
Typed terms and relations:• Multiple views on documents w.r.t. types• Document similarity restricted to types and augmented by
relations• Enables some tasks of Question Answering
![Page 4: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/4.jpg)
4
Motivating example: untypedDocuments:
1. The government official A. Smith signed a contract over the purchase of 100 tanks from weapon manufacturer B. Miller.
2. „Weapon sales increased“, a government official stated, „especially tanks sell well“
3. A holiday cruise on a yacht invites to take photos of seagulls.
4. The photos show A. Smith on a cruise with B. Miller‘s yacht.
Similarity of terms: Clustering:
Doc 1 Doc 2 Doc 3 Doc 4
Doc 1 -
Doc 2 3 -
Doc 3 0 0 -
Doc 4 2 0 3 -
1
4 3
2
![Page 5: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/5.jpg)
5
Motivating example: type PERSON
Documents:
1. The government official A. Smith signed a contract over the purchase of 100 tanks from weapon manufacturer B. Miller.
2. „Weapon sales increased“, a government official stated, „especially tanks sell well“
3. A holiday cruise on a yacht invites to take photos of seagulls.
4. The photos show A. Smith on a cruise with B. Miller‘s yacht.
Similarity of terms: Clustering:
Doc 1 Doc 2 Doc 3 Doc 4
Doc 1 -
Doc 2 0 -
Doc 3 0 0 -
Doc 4 2 0 0 -
1
4 3
2
![Page 6: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/6.jpg)
6
The ontology bottleneck • Semantic Web people believe that annotation with ontology
relations will enable semantic search, ...• Annotation: Chose an ontology, label all instances in the
document
Problems:• New documents have to be annotated all over again• Merging of ontologies• Despite tools, users are reluctant to annotate their
documents
Doc 1
Anno 1
Doc 2
Anno 2
Doc 3
Anno 3
Doc n
Anno n....
Merged ontology
interface
![Page 7: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/7.jpg)
7
Centralized annotation• Types and relations for terms are assigned globally and
once-for-all.• No (logically grounded, consistent) ontology, but a free
collection of types and relations suited to the problem• Annotation is done for document collections
Doc 1
Annotation
Doc 2
Doc 3 Doc n....
interface
documentcollection
![Page 8: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/8.jpg)
8
Generating Candidates for Annotation
• Given N terms from the collection, it is not feasible to present N² pairs to an annotator. Most of the pairs will not be related
• Needed: Method that produces terms with similar types and related pairs at high rate
Method here:• Co-occurrence statistics: Pairs of terms that occur
significantly often together in sentences/documents. • Co-occurrences of higher orders: pairs of terms that have
similar co-occurrence statistics
Co-occurrences reflect syntagmatic and paradigmatic relations, the former are ruled out in higher orders
![Page 9: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/9.jpg)
9
The cats and dogs example
cat co-occurrences: dog, her, food, pet, litter, she, burglar, animal, my, mouse, feline, Garfield, like, Cat, bag
cat order 2: cats, pet, dog, animals, animal, dogs, pets, neutered, her, she, Synindex, like, tabbie, pigs, shelter
cat order 4: pet, pets, cats, dog, pigs, animals, dogs, animal, owners, zoo, wild, birds, rabbits, puppies, tiger
![Page 10: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/10.jpg)
10
Graphical annotation tool: colourizing co-occurrences
![Page 11: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/11.jpg)
11
Specifying types and relations
• Click on node / edge opens context menu restricted to POS
![Page 12: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/12.jpg)
12
Web-based annotation tool for arbitrary candidate sources
![Page 13: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/13.jpg)
13
Rule-based candidate generation• If some annotation is already present, then rules can be specified to
obtain candidates at even higher rate.• It is possible to guess the type of candidates
Example:
Rule 1: If IS-A(A,B) and PROPERTY(B), then PROPERTY(A)yields LIVING(dog) as candidate
Rule 2: If IS-A(A,B) and COHYPONYM(A,C) then IS-A(C,B)yields IS-A(cat, animal) as candidate
dog catLIVING
animalLIVING
IS-A
CO-HYPONYM
![Page 14: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/14.jpg)
14
Tool to accept or reject rule-based candidates
![Page 15: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/15.jpg)
15
Case study: Annotating Deutscher Wortschatz
www.wortschatz.uni-leipzig.de
In terms of numbers:• In 1‘000 hours, annotators could chose between• 46 semantic types and• 57 relations, and produced• 150‘000 type instances and• 150‘000 relation instances for over• 80‘000 distinct terms, that is text coverage of• 90%, with a speed of• 5 units per minute
![Page 16: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/16.jpg)
16
Different relations from different sources
![Page 17: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/17.jpg)
17
Example: Query resolution with types and relations
Query: „Find documents mentioning at least two heads of computer companies!“
1. Translate into formal query:
Qset = {B | IS-A(A, computer company), HEAD-OF(B,A)}
b1 Qset, b2Qset, b1 b2
2. Access search engine with possible b1, b2
![Page 18: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/18.jpg)
18
What Google found:Find documents mentioning at
least two heads of computer companies!
#1 hit 14.08.2005 www.google.com
![Page 19: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/19.jpg)
19
Conclusion
• Typed terms and relation can facilitate processing of electronic documents for a wide range of applications
• Rapid annotation alleviates the acquisition bottleneck by- globally annotating- local dependencies
• Intuitive tools for annotation are highly important to achieve large amounts in short time
![Page 20: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/20.jpg)
20
QUESTIONS?!?
THANK YOU
![Page 21: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/21.jpg)
21
Bonus material
• Co-occurrences
• Co-occurrences of higher orders
![Page 22: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/22.jpg)
22
Statistical Co-occurrences
• occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors)
• Significant Co-occurrences reflect relations between words• Significance Measure (log-likelihood):
- k is the number of sentences containing a and b together- ab is (number of sentences with a)*(number of sentences with b)- n is total number of sentences in corpus
( , ) log log !
with number of sentences,
.
sig A B x k x k
n
abx
n
![Page 23: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/23.jpg)
23
Iterating Co-occurrences
• (sentence-based) co-ocurrences of first order:words that co-occur significantly often together in sentences
• co-occurrences of second order:
words that co-occur significantly often in collocation sets of first order
• co-occurrences of n-th order:words that co-occur significantly often in collocation sets of (n-1)th order
When calculating a higher order, the significance values of the preceding order are not relevant. A co-occurrence set consists of the N highest ranked co-occurrences of a word.
![Page 24: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/24.jpg)
24
Constructed Example IOrd 1 dog terrier cat mouse barking bite yelp
dog - - - X x X
terrier - - - x x X
cat - - x - x -
mouse - - X - x -
barking X X - - - -
bite X X x x - -
yelp x x - - - -
Ord 2 dog terrier cat mouse barking bite yelp
dog 3 1 1 - - -
terrier 3 1 1 - - -
cat 1 1 1 - - -
mouse 1 1 1 - 1 -
barking - - - - 2 2
bite - - - 1 2 2
yelp - - - - 2 2
![Page 25: 1 Semantic Indexing with Typed Terms using Rapid Annotation 16th of August 2005 TKE-05 Workshop on Semantic Indexing, Copenhagen Chris Biemann University](https://reader030.vdocuments.us/reader030/viewer/2022032702/56649ce45503460f949b0c42/html5/thumbnails/25.jpg)
25
Constructed Example II
Ord 3 dog terrier cat mouse barking bite yelp
dog - - - - - -
terrier - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - 1 1
bite - - - - 1 1
yelp - - - - 1 1
Ord 2 dog terrier cat mouse barking bite yelp
dog x - - - - -
terrier x - - - - -
cat - - - - - -
mouse - - - - - -
barking - - - - x x
bite - - - - x x
yelp - - - - x x