document collections cs5984: information visualization chris north
TRANSCRIPT
![Page 1: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/1.jpg)
Document Collections
cs5984: Information Visualization
Chris North
![Page 2: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/2.jpg)
Where are we?
• Multi-D• 1D• 2D• Hierarchies/Trees• Networks/Graphs• Document collections• 3D
• Design Principles• Empirical Evaluation• Java Development• Visual Overviews• Multiple Views• Peripheral Views
![Page 3: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/3.jpg)
Structured Document Collections
• Multi-dimensional• author, title, date, journal, …
• Trees• dewey decimal
• Networks• web, citations
![Page 4: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/4.jpg)
Envision
• Ed Fox, et al.
• Multi-D
• similar to Spotfire
![Page 5: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/5.jpg)
Unstructured Document Collections
• Focus on Full Text
• Examples:• digital libraries, encyclopedia
• Web, homepages, photo collections
• Tasks:• search, keyword
• Browse
• Themes, subjects, topics, library coverage
• Size, distributions
![Page 6: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/6.jpg)
Visualization Strategies
• Cluster Maps
• Keyword Query
• Relationships
• Reduced representation
• User controlled layout
today
today
![Page 7: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/7.jpg)
Cluster Map
• Create a “map” of the document collection
• Similar documents near
• Dissimilar document far
• “Grocery store” concept
![Page 8: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/8.jpg)
Document Vectors
Doc1 Doc2 Doc3 …
• “aardvark” 1 2 0• “banana” 2 1 0• “chris” 0 0 3• …
• Similarity between pair of docs = •
• Layout documents in 2-D map by similarity• similar to spring model for graph layout
![Page 9: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/9.jpg)
Cluster Algorithms
• Partition clustering: Partition into k subsets
• Pick k seeds
• Iteratively attract nearest neighbors
• Hierarchical clustering: Dendrogram
• Group nearest-neighbor pair
• Iterate
![Page 10: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/10.jpg)
Kohonen Maps
• Xia Lin, “Document Space”• samal, ying
• http://faculty.cis.drexel.edu/sitemap/index.html
![Page 11: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/11.jpg)
![Page 12: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/12.jpg)
Themescapes, Cartia• PNL• Mountain height
= Cluster size
![Page 13: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/13.jpg)
WebSOM
• http://websom.hut.fi/websom/
![Page 15: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/15.jpg)
Cluster Map
• Good:• Map of collection
• Major themes and sizes
• Relationships between themes
• Scales up
• Bad:• Where to locate documents with multiple themes?
» Both mountains, between mountains, …?
• Relationships between documents, within documents?
• Algorithm becomes (too) critical
![Page 16: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/16.jpg)
Keyword Query
• Keyword query, Search engine• Rank ordered list
• “Information Retrieval”
![Page 17: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/17.jpg)
Tilebars
• Hearst, “Tilebars”• reenal, xueqi
• http://elib.cs.berkeley.edu/tilebars/
![Page 18: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/18.jpg)
VIBE• Korfhage, http://www.pitt.edu/~korfhage/interfaces.html
• Documents located between query keywords using spring model
![Page 19: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/19.jpg)
VR-VIBE
![Page 20: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/20.jpg)
Keyword Query
• Good:• Reduces the browsing space
• Map according to user’s interests
• Bad:• What keywords do I use?
• What about other related documents that don’t use these keywords?
• No initial overview
• Mega-hit, zero-hit problem
![Page 21: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/21.jpg)
Assignment• Thurs: Document Collections
• Bederson, “Image Browsing”» Rui, anusha
• Card, “Web Book and Web Forager”» mrinmayee, ming
• Demo your hw3: tues or thurs
![Page 22: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/22.jpg)
Next Week• Tues: 3-D data
• Kniss, “Interactive Volume Rendering with Direct Manip”» xueqi, mahesh
• Thurs: Workspaces• Robertson, “Task Gallery”
» supriya, varun
• Upson, “AVS”» christa, jun
• Thanksgiving break
• Tues 27: Debates• Kobsa, “Empirical comparison of comm infovis systems”
» kunal, zhiping
![Page 23: Document Collections cs5984: Information Visualization Chris North](https://reader030.vdocuments.us/reader030/viewer/2022032612/56649f095503460f94c1de40/html5/thumbnails/23.jpg)
Upcoming Sched
• Tues: 3-D data
• Thurs: Workspaces
• Thanksgiving break
• Tues 27: Debates
• Thurs 29: How (not) to lie with visualization
• Dec: project presentations
• Dec 7: CHI 2-pagers due, student posters due