visual analytics
TRANSCRIPT
09/05/14 pag. 1
Information visualization lecture 8
visual analytics
Katrien Verbert Department of Computer Science
Faculty of Science Vrije Universiteit Brussel
09/05/14 pag. 2
Motivation
h"p://bigdatablog.emc.com/2013/08/08/mlb-‐a-‐big-‐fan-‐of-‐big-‐data/
09/05/14 pag. 3
Motivation
• Volume:Gigabyte(109), Terabyte(1012), Petabyte(1015), Exabyte(1018), Zettabytes(1021)
• Variety: – structured, semi-structured, unstructured; – text, image, audio, video, record
• Velocity: dynamic, sometimes time-varying
Big Data refers to datasets that grow so large that it is difficult to capture, store, manage, share, analyze and visualize with the typical database soFware tools.
09/05/14 pag. 4
The social layer in an interconnected world
2+ billion
people on the Web by end 2011
30 billion RFID tags today
(1.3B in 2005)
4.6 billion camera phones
world wide
100s of millions of GPS
enabled devices sold
annually
76 million smart meters in 2009… 200M by 2014
12+ TBs of tweet data
every day
25+ TBs of log data
every day
? TB
s of
data every day
09/05/14 pag. 5
Motivation
Raw data has no value in itself, only the extracted informaIon has value.
Src : h"p://www.sas.com/knowledge-‐exchange/business-‐analyIcs/featured/big-‐data-‐drives-‐performance-‐%E2%80%93-‐if-‐you-‐can-‐exploit-‐the-‐informaIon-‐overload/index.html
09/05/14 pag. 6
Information overload
• Refers to the danger of getting lost in data, which may be: – irrelevant to the current task at hand, – processed in an inappropriate way, or – presented in an inappropriate way.
Src: h"p://supermarketpeople.co.uk/archives/713
09/05/14 pag. 7
Visual Analytics - Mastering the Information Age
h"ps://www.youtube.com/watch?v=5i3xbitEVfs
09/05/14 pag. 8
Visual analytics is the science of analytical reasoning facilitated by interactive visual interfaces.
Definition
09/05/14 pag. 9
Whom does it matter
• Research Community J • Business Community - New tools, new capabilities, new
infrastructure, new business models etc.
Financial Services...
09/05/14 pag. 10
Visual analytics is a multidisciplinary field
09/05/14 pag. 11
History
09/05/14 pag. 12
History discovery tools
• Shneiderman (IV ‘02) suggests combining computational analysis approaches such as data mining with information visualization
– Too often viewed as competitors in past – Instead, can complement each other – Each has something valuable to contribute
• Alternatives – Issues influencing the design of discovery tools:
• Statistical algorithms vs. visual data presentation • Hypothesis testing vs. exploratory data analysis
– Each has Pro’s and Con’s
09/05/14 pag. 13
Hypothesis testing & exploratory data analysis
• Hypothesis testing – Advocates:
• By stating hypotheses up front, limit variables and sharpens thinking, more precise measurement
– Critics: • Too far from reality, initial hypotheses bias toward finding evidence to support it
• Exploratory Data Analysis – Advocates:
• Find the interesting things this way, we now have computational capabilities to do them
– Skeptics: • Not generalizable, everything is a special case, detecting statistical relationships
does not infer cause and effect
09/05/14 pag. 14
Recommendations
• Integrate data mining and information visualization • Allow users to specify what they are seeking • Recognize that users are situated in a social context • Respect human responsibility
09/05/14 pag. 15
History
• Information visualization systems inadequately supported decision making (Amar & Stasko IV ‘04):
– Limited affordances – Predetermined representations – Decline of determinism in decision-Making
• “Representational primacy” versus “Analytic primacy” – Telling truth about data vs. providing analytically useful visualizations
09/05/14 pag. 16
Representational primacy
• Pursuit of faithful data replication and comprehension • Above all else, we must show the data • Can be a limiting notion • May focus on low level tasks not relevant to users • Some data is hard to represent faithfully (think high
dimensional data)
09/05/14 pag. 17
Gaps between representation and analysis
• When explanatory or correlative models are the desired outcome, model visualization may be more important than data visualization
• One end is to build black boxes where data goes in and the answer comes out • Not a good idea to trust black boxes
– What can go wrong with this? – White box approach be"er.
09/05/14 pag. 18
Application of Precepts
Worldview-based tasks (“Did we show the right thing to the user?”)
– Determine Domain Parameters – Expose Multivariate Explanation – Facilitate Hypothesis Testing
Rationale-based tasks (“Will the user believe what she sees?”)
- Expose Uncertainty - ConcreIze RelaIonships - Expose Cause and Effect
Amar & Stasko 2005
09/05/14 pag. 19
Task level emphases
• Don’t just help “low-level” tasks – Find, filter, correlate, etc.
• Facilitate analytical thinking – Complex decision-making, especially under uncertainty – Learning a domain – Identifying the nature of trends – Predicting the future
09/05/14 pag. 20
History: coining the term
• 2003-04 Jim Thomas of PNNL, together with colleagues, develops notion of visual analytics
– Holds workshops at PNNL and at InfoVis ‘04 to help define a research agenda
• Agenda is formalized in book, Illuminating the Path – Available online, http://nvac.pnl.gov/
• People use visual analytics tools and techniques to
– Synthesize information and derive insight from massive, dynamic, ambiguous, and often conflicting data
– Detect the expected and discover the unexpected – Provide timely, defensible, and understandable assessments – Communicate assessment effectively for action.
Src: Stasko
09/05/14 pag. 21
Visual AnalyMcs
• Not really an “area” per se – More of an “umbrella” notion – Combines multiple areas or disciplines – Ultimately about using data to improve our knowledge and help make
decisions
• Main Components:
Src: Stasko
09/05/14 pag. 22
Visual Analytics: alternate Definition
Visual analytics combines automated analysis techniques with interactive visualizations for an effective understanding, reasoning and decision making on the basis of very large and complex data sets Keim et al, chapter in Information Visualization: Human-Centered Issues and Perspectives, 2008
09/05/14 pag. 23
In-Spire Video
h"ps://www.youtube.com/watch?v=YONTBZaxz8g
09/05/14 pag. 24
Synergy
• Combine strengths of both human and electronic data processing
– Gives a semi-automated analytical process – Use strengths from each – Below from Keim, 2008
09/05/14 pag. 25
InfoVis Comparison
• Clearly much overlap • Perhaps fair to say that infovis hasn’t always focused on
analysis tasks so much and that it doesn’t always include advanced data analysis algorithms
– Not a criticism, just not focus – InfoVis has a more narrow scope
Src: Stasko
09/05/14 pag. 26
Visual Analytics
• Encompassing, integrated approach to data analysis – Use computational algorithms where helpful – Use human-directed visual exploration where helpful – Not just “Apply A, then apply B” though – Integrate the two tightly
Stasko, 2013 Src: Stasko
09/05/14 pag. 27
Visual analytics aims at making data and information processing transparent
just as information visualization has changed our view on databases, the goal of visual analytics is to make our way of processing data and information transparent for an analytic discourse
09/05/14 pag. 28
The visual analytics process
The visual analyIcs process is characterized through interacIon between data, visualizaIons, models about the data, and the users in order to discover knowledge.
09/05/14 pag. 29
Information seeking mantra for visual analytics applications
Overview first, zoom/filter, details on demand
Analyze first, show the important, zoom/filter, analyze further, details on demand.
Keim et al. 2006
09/05/14 pag. 30
Applications
09/05/14 pag. 31
09/05/14 pag. 32
UTOPIAN: user-driven topic modeling
h"ps://www.youtube.com/watch?v=du6_s6hcaRA&feature=youtu.be
09/05/14 pag. 33
SketchPadN-D: WYDIWYG Sculpting and Editing in High-Dimensional Space
h"ps://www.youtube.com/watch?v=ar8kAdAfx6w
09/05/14 pag. 34
Some features
09/05/14 pag. 35
Relationship discovery
Scale independent representations, whole and parts at same time at multiple levels of abstraction, often linked
09/05/14 pag. 36
Relationship discovery
Explore high dimensional relationships, theme groupings, outlier detection, searching by proximity at multiple scales
09/05/14 pag. 37
Combined exploratory and confirmatory analytics
• Develop and refine hypothesis • Evidence collection, management, and matching to hypothesis • Tailor views/displays for thematic/hypothesis focus of interest • Often suggestive of predictions enabling proactive thinking
09/05/14 pag. 38
Multiple data types
• Supports multiple data types: structured/unstructured text • Imagery/video, cyber • Systems of either data type or application specific
09/05/14 pag. 39
Temporal views and interactions
• Most analytics situations involve time, pace, velocity • Group segments of thoughts by time • Compare time segments • Often combined with geospatial
09/05/14 pag. 40
Reasoning workspace
Stu Card, PARC
09/05/14 pag. 41
Grouping and outlier detection
• Form groups of thought/data • Labels and annotation • Compare groupings • Find small groups or outliers
41
09/05/14 pag. 42
Labeling
• Critically important, • Dynamic in scope, number labels, size, color • Positioning • Almost everything has labels • Labels tell semantic meaning
09/05/14 pag. 43
Multiple linked views
Temporal, geospatial, theme, cluster, list views with association linkages between views
09/05/14 pag. 44
Challenges
09/05/14 pag. 45
Challenge 1: scalability
• Challenge with regard to both: – visual representaIons – automaIc analysis
• Requirements/needs: – SoluIon needs to scale in size, dimensionality, data types, levels of quality. – Methods are needed to deal with input data that is noisy and conInuous. – Pa"erns and relaIonships need to be visualized on different levels of
details, and with appropriate levels of data and visual abstracIon.
09/05/14 pag. 46
Challenge 2: uncertainty
• Challenge: – large amount of noise and missing values from heterogeneous data sources – bias introduced by automaIc analysis methods as well as human percepIon
• Requirements/needs: – representaIon of the noIon of data quality and the confidence – analysts need to be aware of the uncertainty and be able to analyze quality
09/05/14 pag. 47
Challenge 3: text data stream
• Analysis and visualization of text steams is still a relatively new field. • Text stream data often
– have li"le structure, grammar and context, – are provided as high-‐frequency mulIlingual stream, – contain a high percentage of non-‐meaningful and irrelevant messages.
• The challenge is to handle the dynamic nature of the stream data and the low-tolerance of monitoring delay in emergency cases.
09/05/14 pag. 48
Challenge 4: interaction
• Novel interaction techniques and tangible user-interfaces for seamless intuitive visual communication with the system.
• The analyst should be able to – fully focus on the task at hand and – not be distracted by overly technical or complex user interfaces and interacIons.
• User feedback should be taken as intelligently as possible, requiring as little user input as possible.
09/05/14 pag. 49
Challenge 5: evaluation
• Challenge: – hard to assess the quality of visual analyIcs soluIons – due to the interdisciplinary nature and complex process
• Needs: – evaluaIon framework to assess effecIveness, efficiency and acceptance – of new visual analyIcs techniques, methods, and models
09/05/14 pag. 50
Challenge 6: infrastructure
• Challenge: – most soluIons develop their own infrastructures – mismatch between the level of service provided and real needs:
• fast and precise answers with progressive refinement, • incremental re-‐computaIon, • steering the computaIon towards data regions that are of interest.
• Requirements/needs: – infrastructure to bind together all the processes, funcIons, and services – repositories of available visual analyIcs soluIons
09/05/14 pag. 51
h"p://www.visual-‐analyIcs.eu/
09/05/14 pag. 52
Questions?
09/05/14 pag. 53
Readings
• Ch. 1-2
h"p://www.vismaster.eu/wp-‐content/uploads/2010/11/VisMaster-‐book-‐lowres.pdf
09/05/14 pag. 54
References
• Amar, R. A., & Stasko, J. T. (2005). Knowledge precepts for design and evaluation of information visualizations. Visualization and Computer Graphics, IEEE Transactions on, 11(4), 432-442.
• D. A. Keim, F. Mansmann, J. Schneidewind, and H. Ziegler. Challenges in visual data analysis. In Information Visualization (IV 2006), Invited Paper, July 5-7, London, United Kingdom. IEEE Press, 2006.
• Daniel Keim, Gennady Andrienko, Jean-Daniel Fekete, Carsten Görg, Jörn Kohlhammer, and Guy Melancon. 2008. Visual Analytics: Definition, Process, and Challenges. In Information Visualization, Andreas Kerren, John T. Stasko, Jean-Daniel Fekete, and Chris North (Eds.). Lecture Notes In Computer Science, Vol. 4950. Springer-Verlag, Berlin, Heidelberg 154-175
• Shneiderman, B. (2002)Inventing discovery tools: combining information visualization with data mining1. Information visualization, 1(1), 5-12.