visualizing textual data cpsc 601.28 a. butt / feb. 26 '09
TRANSCRIPT
![Page 1: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/1.jpg)
Visualizing textual data
CPSC 601.28
A. Butt / Feb. 26 '09
![Page 2: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/2.jpg)
Overview
• Project implications• Summarize "Tilebars"
– Hearst / PARC (Xerox)• Summarize "Visualizing the Non-Visual"
– Wise et al / Pacific Northwest Lab (Battelle)• Key Issues• Summary• References
![Page 3: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/3.jpg)
Project Implications
• Research area is partly based on text-based environmental reports– textual reporting feeds into textual (quasi-judicial)
regulatory framework– rooms of binders (e.g. >20,000 pages for Mackenzie
Pipeline Project)• Vocabulary specialized / semantically complete
– "no significant adverse environmental impacts"
![Page 4: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/4.jpg)
TileBars
• goals are to simultaneously view:– length of a document– relative frequency of specific words– distribution of words with respect to each other
• benefits include:– enhanced relevancy of search response– patterns of frequency by document / author– compactness of information
![Page 5: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/5.jpg)
Tilebars
• Visual representation via– rectangular block: size equates to document length– three bars within the block: each corresponds to a
query– in each bar tiles indicate location, saturation of tile
indicates frequency
•5 articles, 3 search queries•1st, 2nd, 5th appear compact / relevant•1st and 2nd appear to have better concurrency•3rd and 4th potentially less relevant, greater time investment to read
![Page 6: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/6.jpg)
Visualizing the Non-Visual
• goals are to:– overcome time constraints in processing textual
information– overcome attention constraints; avoid becoming
overwhelmed by volume of textual information• benefits include:
– escape limitations of traditional text– increase throughput and comprehension of
information processing– feedback on text structure to enhance visualization
![Page 7: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/7.jpg)
Visualizing the Non-Visual
• Employ a "natural landscape" metaphor– leverage evolutionary psychological adaptations via
natural landscapes for representation– galaxy or star-fields ("night sky")– themescapes ("cartographic" or "landscape") – although statistical measures used for clustering, they
are not used as directly as in tile bars– self-organizing maps
![Page 8: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/8.jpg)
Galaxies
•PNL software development (DOE)•Display is a review of cancer literature•Branched to SPIRE / In-SPIRE for government documents
![Page 9: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/9.jpg)
Themescapes
•PNL software development (DOE)•Branched to SPIRE / In-SPIRE for government documents (renamed "Themeview")•Branched into NVAC (National Visual and Analytics Centre) - part of the Homeland Security infrastructure
![Page 10: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/10.jpg)
Themescapes (2.0?)
•Branched progeny of themescapes•Used in searching IP / Patents•Subscription service
•Failed metaphors??
![Page 11: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/11.jpg)
Key Issues
• Vocabulary / semantics - how do you interpret meaning from text statistics?– earlier failures of natural language processing– contingent semantics
• Employing metaphors (Zhang 2008)– rely on unusual linkages (versus analogy) to highlight– degree of "unusual-ness" is critical: too much or too
little leads to confusion
![Page 12: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/12.jpg)
Summary
www.wordle.net
![Page 13: Visualizing textual data CPSC 601.28 A. Butt / Feb. 26 '09](https://reader035.vdocuments.us/reader035/viewer/2022062322/5697bf741a28abf838c7fa95/html5/thumbnails/13.jpg)
References
Marti A. Hearst: TileBars: Visualization of Term Distribution Information in Full Text Information Access. CHI 1995: 59-66
James A. Wise and James J. Thomas and Kelly Pennock and David Lantrip and Marc Pottier and Anne Schur and Vern Crow. Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. Proc. IEEE Symp. Information Visualization, InfoVis, pp. 51-58, IEEE Computer Soc. Press, 30-31, October 1995. (in text pages 442-450)
Jin Zhang. The Implication of Metaphors in Information Retrieval. Visualization in Information Retrieval, Elsevier, 2008. (pages 215-237)