evaluating concept maps as a cross-language knowledge discovery tool for ndltd ryan richardson ph.d....

30
Evaluating Concept Maps As A Evaluating Concept Maps As A Cross-Language Cross-Language Knowledge Discovery Tool for NDLTD Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA Edward A. Fox, [email protected] Professor of Computer Science, Virginia Tech John Woods Undergraduate Student, Virginia Tech

Upload: cornelia-richard

Post on 24-Dec-2015

213 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

Evaluating Concept Maps As AEvaluating Concept Maps As ACross-Language Cross-Language

Knowledge Discovery Tool for NDLTDKnowledge Discovery Tool for NDLTD

Ryan Richardson

Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

 

Edward A. Fox, [email protected]

Professor of Computer Science, Virginia Tech

 

John Woods

Undergraduate Student, Virginia Tech

Page 2: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

Outline of TalkOutline of Talk

1. NDLTD cross-language problem2. Concept maps as a solution?3. Feasibility of concept map solution4. Automatic generation of concept maps5. Automatic translation6. Conclusion

References

Page 3: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

1. NDLTD cross-language problem1. NDLTD cross-language problem

NDLTD union catalog has over 200,000 ETDs. Contains over 23,000 non-English ETDs, which

English speakers might want to access Non-English speakers want access to English

ETDs in their own languages. Due to size of average ETD, it is difficult to

quickly determine if an ETD is relevant to one’s research.

Page 4: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

1. NDLTD cross-language problem1. NDLTD cross-language problem

Language NumberEnglish 123,696

Portuguese 11434

German 4131

French 3868

Spanish 1561

Chinese 1463

Catalan 804

Others 19962 (most unclassified)

Total 166919 (summer’05)

Page 5: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Concept maps as a solution2. Concept maps as a solution

Introduced by Novak and Gowin, 1984 Defined as “graphical representations of

knowledge that are comprised of concepts and the relationships between them”

Concepts are presented in hierarchical fashion with the most general concepts at the top, and with more specific, less general concepts arranged below.

Page 6: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Concept maps as a solution2. Concept maps as a solution

Concept maps have been shown to help students learn new concepts more rapidly (McNaught & Kennedy, 1997).

Concept maps have been widely studied in the education field for over 20 years.

Page 7: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Example concept map2. Example concept map

Page 8: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Concept maps as a solution2. Concept maps as a solution

Advantages over abstracts:– Can draw from entire document, including figures

and images– Nodes in concept maps can point to other concept

maps, making them multi-layered. Thus, they can show as much detail as a user wants.

Advantages over keywords:– Typically have more concepts than the number of

entries in a keyword list– Can show relationships between key concepts, in

addition to just a listing

Page 9: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Concept maps as a solution2. Concept maps as a solution

Advantages over automatically translating entire document:

– Unlike a document, a concept map does not have to read naturally to be understandable.

– Automatically translating an entire ETD is computationally expensive and often unnecessary.

Page 10: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Concept maps as a solution2. Concept maps as a solution

Our solution is to use automatically generated, automatically translated (if desired) concept maps as a quick summary format for ETDs.

This can be presented as part of a search interface in the user’s preferred language.

Page 11: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

2. Proposed solution to NDLTD cross-2. Proposed solution to NDLTD cross-language problemlanguage problem

Page 12: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

3. Feasibility of concept map solution3. Feasibility of concept map solution

We conducted an initial experiment with 28 users, using hand-drawn concept maps in English and Spanish. Spanish maps were automatically translated into English using Babelfish.

Users had to rank papers based on relevance to a particular question.

Page 13: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

3. Feasibility of concept map solution3. Feasibility of concept map solution

Results: Having concept maps helped users for all 4

questions. The difference was significant in 1 of 4 cases

(comparing concept maps and abstracts). For details see (Richardson & Fox, 2005)

Page 14: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Experimented with automatically making concept maps based on term co-occurrence.

Evaluated several techniques to find phrases:– T-scores– Association rules (Agrawal et al., 1993)– Mutual information (Hindle, 1990)– Dice co-efficient (Frakes & Baeza-Yates, 1992)

Page 15: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Assoc. rules T-score Mutual information

Dice co-efficient

NASA <> transportation

Energiya<> buran interview<>

Stephen_Garber

military_characterization<>

reluctant_support

NASA <> option interview<>

Stephen_Garber

range<> capability military_characterization<>

military

Shuttle <> U.S._space

range<> capability Johnson<> soviet_year

Joseph_P<> hopkin

Buran <> Energiya NASA<>shuttle energiya<>buran blueprint<> misleading

Soviet <> space_technology

cross<>range cross<>range myriad_american<> blueprint

Soviet <> aviation_week

Stephen_Garber<>

November

Izvestiya<> December

soviet_figures<> chronology

Page 16: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Findings:– Association rules and t-scores found the most

interesting relations in the ETDs.– Mutual information and Dice co-efficient found

uncommon fixed phrases that often did not relate to the topic of the ETD.

We selected association rules for our automatic map generation.

Page 17: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Concept map layouts– Whole document maps

Display document contents in one large map.– Chapter-level maps

Each chapter is a separate map.– Table-of-contents-based maps

Use the chapter and section headings as a framework, and then embellish it using association rules.

Page 18: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Table of contents map of thesis "How Politics and Culture Affected the Design of the U.S. Space Shuttle and the Soviet Buran", Chapter 3.

Page 19: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Automatically generated concept maps - study: We conducted an experiment testing which

layout was preferred by users. Users rated the Table-of-contents maps better

than the other two approaches for:– concept-selection,– link-selection, and– usefulness for determining relevance.

For details see (Richardson & Fox, 2005).

Page 20: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

Automatically generated concept maps - details: 50 ETDs from Virginia Tech 91 theses from Universidad de Las Américas

in Puebla, Mexico. English ETDs have concepts maps in both

the chapter-level and table-of-contents formats.

The Spanish ETDs have concept maps in the chapter-level format (t.o.c. information was not available at the time).

Page 21: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

4. Automatic generation of concept maps4. Automatic generation of concept maps

ETD concept map browser interface

Page 22: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

5. Automatic translation of concept maps5. Automatic translation of concept maps

Once we automatically generate concept maps, the next step is to translate them well enough to be useful.

Simple dictionary lookup and/or Babelfish do very poorly on technical domains.

Parallel corpora generally are not available in many domains and for many language pairs.

Page 23: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

5. Automatic translation of concept maps5. Automatic translation of concept maps

Solution: Automated phrase extraction from comparable (not parallel) corpora

Prior work: Used by (López-Ostenero, Gonzalo, & Verdejo,

2004) on picture captions to assist image searching

Page 24: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

5. Automatic translation of concept maps5. Automatic translation of concept mapsMethod: Starts out with initial definitions from an

English/Spanish dictionary developed at U. of Maryland.

Creates a Markov model of valid phrases in English and Spanish.

Uses a statistical test from (Evans & Zhai, 1996) to determine which phrases are meaningful and which should be disregarded.

Corpus is English & Spanish ETDs from VT and UDLA, as well as Wikipedia articles.

Page 25: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

5. Automatic translation of concept maps5. Automatic translation of concept maps

Progress: Still in testing stages We plan to test the translator on concept

maps made with CMapTools, developed at the Institute of Human and Machine Cognition (http://www.ihmc.us).

Their collection of over 20,000 maps has student-drawn maps in English, Spanish, Portuguese, and Italian.

Then we will move to translating automatically generated concept maps.

Page 26: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

Conclusion: Recall Outline of TalkConclusion: Recall Outline of Talk

1. NDLTD cross-language problem

2. Concept maps as a solution?

3. Feasibility of concept map solution

4. Automatic generation of concept maps

5. Automatic translation

6. Conclusion

Page 27: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

6. Conclusion6. Conclusion

We believe that concept maps can be a useful tool to find relevant ETDs within the NDLTD collection.

We have preliminary results indicating that they can be useful for both mono-lingual and cross-lingual relevance determination.

Page 28: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

6. Conclusion6. Conclusion

Current work: Working on using the similarity in the

chapter/section layout of ETDs in the same discipline to improve quality of automatically-generated maps

Working to improve the quality of automatically-translated concept maps

Page 29: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

ReferencesReferences Agrawal, R., Imielinski, T., & Swami, A. (1993, May). Mining

association rules between sets of items in large databases. Paper presented at the ACM SIGMOD Conference on Management of Data, Washington, D.C.

Cañas, A. J. (2005). Institute for Human and Machine Cognition, 2005, from http://ihmc.us/

Evans, D. A., & Zhai, C. (1996, June 24-27). Noun-Phrase Analysis in Unrestricted Text for Information Retrieval. Paper presented at the Association for Computational Linguistics, Santa Cruz, California.

Frakes, W. B., & Baeza-Yates, R. (1992). Information Retrieval, Data Structure and Algorithms: Prentice Hall.

Hindle, D. (1990). Noun classification from predicate-argument structures. Paper presented at the ACL-90, Pittsburgh.

Page 30: Evaluating Concept Maps As A Cross-Language Knowledge Discovery Tool for NDLTD Ryan Richardson Ph.D. Candidate, Virginia Tech, Blacksburg, Virginia, USA

References (cont.)References (cont.)

López-Ostenero, F., Gonzalo, J., & Verdejo, F. (2004). Noun Phrases as Building Blocks for Cross-Language Search Assistance. Information Process and Management, 41, 549-568.

McNaught, C., & Kennedy, D. (1997). Use of Concept Mapping in the design of learning tools for interactive multimedia. Journal of Interactive Learning Research, 8(3-4), 389-406.

NDLTD. (2005). Networked Digital Library of Theses and Dissertations. http://www.ndltd.org.

Novak, J. D., & Gowin, D. B. (1984). Learning How To Learn. Cambridge, UK: Cambridge University Press.

Richardson, R., & Fox, E. A. (2005, June 7-11). Using concept maps in digital libraries as a cross-language resource discovery tool. Paper presented at the Joint Conference on Digital Libraries (JCDL 2005), Denver.