skytree visualization fireside chat is big data ...tmm/talks/skytree14/skytree14.pdf · is big data...

60
SkyTree Visualization Fireside Chat Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science University of British Columbia Google Hangout on Air October 1 2014 http://www.cs.ubc.ca/~tmm/talks.html#skytree14

Upload: others

Post on 27-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

SkyTree Visualization Fireside ChatIs Big Data Visualization Possible?Tamara MunznerDepartment of Computer ScienceUniversity of British Columbia

Google Hangout on AirOctober 1 2014

http://www.cs.ubc.ca/~tmm/talks.html#skytree14

Page 2: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

About me: Geometry Center 1991-1995

2

http://geomview.org/

http://youtu.be/-gLNlC_hQ3M

• geometry and topology vis– 3D, 4D, non-Euclidean

http://youtu.be/sKqt6e7EcCs

http://youtu.be/x7d13SgqUXg

Geomview

The Shape of Space

Outside In

http://youtu.be/6j4T7l49H3Y http://www.crcpress.com/product/isbn/9781568814537

Page 3: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

About me: Stanford 1995-2000

• infovis: network vis– 3D hyperbolic trees/networks– computational linguistics network

3

H3

http://youtu.be/fhbQy_NCwWI

Constellation

http://youtu.be/7sJC3QVpSkQ

Page 4: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

About me: UBC 2002-

4

technique-driven work

problem-driven work

evaluation

theoretical foundations

Page 5: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

When to use visualization

• human in the loop needs the details– doesn't know exactly what questions to ask in advance– longterm analysis– automation stepping stone, refining, trustbuilding– presentation

• external representation: perception vs cognition• intended task, measurable definitions of effectiveness

5

Computer-based visualization systems provide visual representations of datasets designed to help people carry out tasks more effectively.

more at:Visualization Analysis and Design, Chapter 1. Munzner. AK Peters, 2014, to appear.

Visualization is suitable when there is a need to augment human capabilities rather than replace people with computational decision-making methods.

Page 6: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why show data to people?

• summaries lose information – confirm expected and find unexpected patterns– assess validity of statistical model

6

Page 7: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why show data to people?

• summaries lose information – confirm expected and find unexpected patterns– assess validity of statistical model

6

Identical statisticsIdentical statisticsx mean 9x variance 10y mean 8y variance 4x/y correlation 1

Anscombe’s Quartet

Page 8: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why show data to people?

• summaries lose information – confirm expected and find unexpected patterns– assess validity of statistical model

6

Identical statisticsIdentical statisticsx mean 9x variance 10y mean 8y variance 4x/y correlation 1

Anscombe’s Quartet

Page 9: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Technique-driven work: Networks

• scaling up networks– multilevel networks, 10K-100K nodes

• topologically aware decomposition, layout, browsing

– trees, millions of nodes• guaranteed visibility of semantically meaningful marks

7

ii

ii

ii

ii

166 7. Making Views

(a) Original Graph

Graph Hierarchy 1 Graph Hierarchy 2 Graph Hierarchy 3

(b) Graph Hierarchies

Figure 7.25: GrouseFlocks uses containment to show graph hierarchy struc-ture. (a) Original graph. (b) Several alternative hierarchies built from thesame graph. The hierarchy alone is shown in the top row. The bottom rowcombines the graph encoded with connection with a visual representationof the hierarchy using containment. From [Archambault et al. 08], Figure3.

TreeJuxtaposerPRISAD

http://youtu.be/GdaPj8a9QEo

http://youtu.be/AWXAe8zvkt8

TopoLayoutSmashing Peacocks FurtherGrouseGrouseFlocksTugGraph

http://youtu.be/fq8EIAOutvs

http://youtu.be/t1Xbt6XOWp8

Page 10: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Technique-driven work: Dimensionality reduction

• closest overlap between vis and ML– Glimmer: MDS on the GPU– Glint: DR for costly distances– QSNE: sparse documents

• high quality for millions of items

8

QSNE

Glimmer

http://youtu.be/PLaBAPM6qLI

Glint

Page 11: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

MulteeSum

Problem-driven work: Genomics

9

source: Human

destination: Lizardchr1

chr2

chr3

chr4

chr5

chr6

chr7

chr8

chr9

chr10

chr11

chr1

2

chr13

chr14

chr15

chr16

chr17

chr18

chr1

9ch

r20

chr2

1

chr22

chrX

chrY

chr3

chr1

chr2

chr3

chr4

chr5

chr6

chra

chrb

chrc

chrd

chrf

chrg

chrh

saturationline

- +

10Mb

chr3

go to:

chr3 chr3

237164 146709664

386455 146850969

orientation:

match

inversion

invert

out in

MizBee http://youtu.be/86p7brwuz2g

http://youtu.be/AHDnv_qMXxQVariant View http://youtu.be/76HhG1FQngICerebral

Page 12: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Problem-driven work: Many domains

10Overview: investigative journalism

RelEx: in-car overlay networksLiveRAC: system management time-series

Vismon: fisheries management http://youtu.be/h0kHoS4VYmk

http://youtu.be/89lsQXc6Ao4

http://youtu.be/ld0c3H0VSkw

http://vimeo.com/71483614

Page 14: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

12

Page 15: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview design evolution

13

v4

Page 16: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview design evolution

13

v4

• how to find the needle in the haystack?

• how to convince that the haystack has no needles?

Page 17: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview design evolution

13

v1

v4

• how to find the needle in the haystack?

• how to convince that the haystack has no needles?

Page 18: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview design evolution

13

v1

v3

v4

• how to find the needle in the haystack?

• how to convince that the haystack has no needles?

Page 19: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

14

Overview origin story: WikiLeaks meets Glimmer

Page 20: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

14

Overview origin story: WikiLeaks meets Glimmer

• WikiLeaks: hacker-journalist Jonathan Stray analyzing Iraq warlogs– conjecture that existing label classification falls short of showing all meaningful

structure in data• friendly action, criminal incident, ...

– had some NLP, needed better vis tools

Page 21: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

14

Overview origin story: WikiLeaks meets Glimmer

• WikiLeaks: hacker-journalist Jonathan Stray analyzing Iraq warlogs– conjecture that existing label classification falls short of showing all meaningful

structure in data• friendly action, criminal incident, ...

– had some NLP, needed better vis tools

• Glimmer: multilevel dimensionality reduction algorithm– scalability to 30K documents and terms

[Glimmer: Multilevel MDS on the GPU. Ingram, Munzner, Olano. IEEE TVCG 15(2):249-261, 2009. ]

Page 22: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

15

Task 1

InHD data

Out2D data

ProduceIn High dimensional data

Why?What?

Derive

In2D data

Task 2

Out 2D Data

How?Why?What?

EncodeNavigateSelect

DiscoverExploreIdentify

In 2D dataOut ScatterplotOut Clusters & points

OutScatterplotClusters & points

Task 3

InScatterplotClusters & points

OutLabels for clusters

Why?What?

ProduceAnnotate

In ScatterplotIn Clusters & pointsOut Labels for clusters

wombat

Visual dimensionality reduction for document datasets

• more on visual DR: hour-long talk Dimensionality Reduction from Several Angleshttp://www.cs.ubc.ca/~tmm/talks.html#linz14

Page 23: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

Page 24: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

Page 25: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy

Page 26: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

Page 27: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Page 28: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Network Data

Topology

Paths

Targets

Page 29: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Network Data

Topology

Paths

Targets

Page 30: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

Network Data

Topology

Paths

Targets

Page 31: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

Network Data

Topology

Paths

Targets

Page 32: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

ProduceAnnotate

tag

ProduceAnnotate

Network Data

Topology

Paths

Targets

Page 33: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs– following or cross-cutting hierarchy!

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

ProduceAnnotate

tag

ProduceAnnotate

Network Data

Topology

Paths

Targets

Page 34: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs– following or cross-cutting hierarchy!

• simple annotation

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

ProduceAnnotate

tag

ProduceAnnotate

Network Data

Topology

Paths

Targets

Page 35: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs– following or cross-cutting hierarchy!

• simple annotation• progress tracking

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

ProduceAnnotate

tag

ProduceAnnotate

Network Data

Topology

Paths

Targets

Page 36: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

What/Why/How interplay

16

• why: understand clusters

• what: derive data of full cluster hierarchy– explore space of possible clusterings

• how: show cluster hierarchy– arrange space: node-link

• how: support tagging clusters/docs– following or cross-cutting hierarchy!

• simple annotation• progress tracking• user-defined semantics

Tables

Dataset Types

Networks

Link

Node (item)

Trees

Arrange Networks And Trees

Node-link Diagrams

TREESNETWORKS

Connections and Marks

ProduceAnnotate

tag

ProduceAnnotate

Network Data

Topology

Paths

Targets

Page 37: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

How: Idiom design decisions

17

Juxtapose and Coordinate Views

Share Encoding: Same/Di!erent

Share Data: All/Subset/None

Linked Highlighting

Why?

How?

What?

• facet: juxtapose linked views– linked color coding

• cluster hierarchy tree• DR scatterplot• tags

– reading text/keywords• cluster list• doc reader

Identity Channels: Categorical Attributes

Spatial region

Color hue

Motion

Shape

Page 38: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview video (version 1)

18

http://www.cs.ubc.ca/labs/imager/tr/2012/modiscotag/

Page 39: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Path to adoption

• version 1– fast cluster hierarchy construction for sparse data– research prototype by PhD student– positive initial assessment from AP Caracas bureau chief

• barrier to adoption: difficult install/load process

19

2011

v1

Page 40: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Path to adoption

• version 1– fast cluster hierarchy construction for sparse data– research prototype by PhD student– positive initial assessment from AP Caracas bureau chief

• barrier to adoption: difficult install/load process

• version 2– web deployment, DocumentCloud integration, usability

• many months of engineering– Knight Foundation funding to the rescue!

• published story by unaffiliated reporter: police corruption in Tulsa

20

2011 2012

v1 v2$

Page 41: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Path to adoption

• even more rounds of what/why/how interplay– which views needed? what should they show? how should they show it?– usability and utility

• version 3– published story: VP candidate Ryan asked for federal help even as championed cuts– published story: gun control debate

• version 4– followup investigation: government corruption in Texas– published story: police misconduct in New York (Pulitzer prize finalist!)

21

2011 2012 2013 2014

v1 v2 v3 v4$

Page 42: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview video v4

22

http://vimeo.com/71483614

Page 43: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Overview video v4

22

• versions 3 and 4– no DR scatterplot– tree arrangement emphasizing nodes not links– combined doc/cluster viewer

http://vimeo.com/71483614

Page 44: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

Page 45: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)

Page 46: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis

Page 47: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters

Page 48: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

Page 49: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)

Page 50: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis

Page 51: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents

Page 52: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

Page 53: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

Discover

Page 54: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

QueryIdentify Compare Summarise

Discover

Page 55: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

Search

Target known Target unknown

Location knownLocation unknown

Lookup

Locate

Browse

Explore

[A Multi-Level Typology of Abstract Visualization Tasks. Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

QueryIdentify Compare Summarise

Discover

Page 56: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

• prove non-existence of evidence

Search

Target known Target unknown

Location knownLocation unknown

Lookup

Locate

Browse

Explore

[A Multi-Level Typology of Abstract Visualization Tasks. Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

QueryIdentify Compare Summarise

Discover

Page 57: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

• prove non-existence of evidence– even harder!

Search

Target known Target unknown

Location knownLocation unknown

Lookup

Locate

Browse

Explore

[A Multi-Level Typology of Abstract Visualization Tasks. Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

QueryIdentify Compare Summarise

Discover

Page 58: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Why: Task abstractions

23

• what’s in this collection? (of leaked docs)– generate hypothesis– summarize clusters– explore clusters

• locate evidence (within FOIA dump)– verify hypothesis– identify clusters/documents– locate clusters/documents

• prove non-existence of evidence– even harder! – exhaustive reading vs filtering out irrelevant

Search

Target known Target unknown

Location knownLocation unknown

Lookup

Locate

Browse

Explore

[A Multi-Level Typology of Abstract Visualization Tasks. Brehmer and Munzner. IEEE TVCG 19(12):2376-2385, 2013 (Proc. InfoVis 2013). ]

QueryIdentify Compare Summarise

Discover

Page 59: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Now what?

• continuing adoption– food stamp distribution delays in North Carolina

– Surprise! Many credit card agreements allow repossession

– The brilliance of Louis C.K.'s emails: He writes like a politician

– Private memo reveals winding tale involving John McCain, the NRA, and... condors

• continuing development– Knight Foundation funds v5: named entity recognition, plugin API

• InfoVis14 paper

24

Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists. Brehmer, Ingram, Stray, and, Munzner.

https://www.overviewproject.org/

http://overview.ap.org/

http://www.cs.ubc.ca/labs/imager/tr/2014/Overview/

Page 60: SkyTree Visualization Fireside Chat Is Big Data ...tmm/talks/skytree14/skytree14.pdf · Is Big Data Visualization Possible? Tamara Munzner Department of Computer Science ... • version

Algorithm: Spinoff series

• dimensionality reduction for huge text collections– great algorithm problem in its own right!– QSNE: fast and high-quality DR for millions of documents

• key feature: handle sparseness appropriately

25

[Dimensionality Reduction for Documents with Nearest Neighbor Queries. Ingram and Munzner. Neurocomputing (Special Issue on Visual Analytics using Multidimensional Projections), to appear 2014.]http://www.cs.ubc.ca/labs/imager/tr/2014/QSNE/