big data visualization
DESCRIPTION
Big Data Visualization Kwan-Liu Ma Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis January 22nd 2014 We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies. Big Data Visualization Meetup - South Bay http://www.meetup.com/Big-Data-Visualisation-South-Bay/TRANSCRIPT
Kwan-Liu Ma Department of Computer Science University of California at Davis
Big Data Visualization
CA Technologies 1/22/2014
Big Data: Issues • Volume: size/scale • Velocity: rate • Variety: type/form • Veracity: accuracy and completeness
Visualiza0on • To explore and discover • To validate • To communicate
• An overview, a path, an interface
Extreme-‐Scale Scien0fic Simula0ons
Scien>fic Simula>ons
Large Scien>fic Data Visualiza>on
• In situ visualiza>on • Parallel visualiza>on that is highly scalable • In situ data reduc>on and triage • In situ data processing for interac>ve data explora>on and analysis
As we move to Exascale, it’s no longer feasible to store most of the data for post processing! We must do:
Supernova Simula>on
Simulation: John Blondin, NCSU
Fusion Simula>ons
Simulation: Dr. S. Ethier, the Princeton Plasma Physics Lab.
Big Network Analysis & Visualiza0on
FM3
GRIP
Treemap
Hilbert
Sunburst
Circle
222 nodes 2583 edges
Network Simplifica>on/Characteriza>on
Hamas
al Qaeda
TVCG 12(6) 2006
Network Simplifica>on/Characteriza>on
Friendster social network Astrophysics co-author network Links exhibit negative sensitivity (red) One competitive network (red) and between cluster centers one collaborative network (blue)
Using centrality sensitivity
Competitive
Collaborative
TVCG 18(1) 2012
The Graph Layout Problem • The cost of displaying a graph
• The hairball problem of large graph layouts – Large, dense graphs become
a mess – Inefficient use of space – Details cluLered
• Solu>ons – Filtering – Clustering – Abstrac>on – Focus+context California data 6,107 nodes 15,160 edges
High dimensional embedding method
A Fast Graph Layout Method l Hierarchically cluster the nodes (if no clustering given) l Traverse the hierarchy to order the nodes l Place the nodes in that order along a space filling curve
Order 1 Order 2 Order 3 Order 4 Order 5 Order 11
Hilbert curves
TVCG 14(6) 2008
Fast Graph Layout A Graph with 6,107 nodes 15,160 edges
Hibert Space filling curve: Gosper
Treemap
High dimensional embedding: 0.19s
One time clustering: 0.5 seconds Layout + rendering: 0.0005 seconds
LinLog (force directed): 10,737s
Fast Graph Layout Internet Connectivity 41,928 nodes 218,080 edges
Space filling curve: Hibert
Space filling curve: Gosper FM3 40.8s
GRIP 6.87s
One time clustering: 18.87 seconds Layout + rendering: 0.0036 seconds
Treemap
Dynamic Networks
Growing Internet Incremental clustering-based approach – Radial treemap layout
Video
Time-‐Varying Networks
• Almost all networks found in real-‐world applica>ons are >me-‐varying
• Both nodes and edges can change • Visualiza>on methods:
– Anima>ons – Small mul>ples visualiza>on – Difference visualiza>on – Storyline visualiza>on
Storyline Visualiza>on
XKCD.com
Storyline Visualiza>on
• Consis>ng of a series of lines, going from leU to right along the >me-‐axis, that converge and diverge in the course of their paths.
• Each line represents a unique en>ty (character) in the data.
• The star>ng & ending points of each line represent the lifespan of the corresponding en>ty.
• Lines are bundled together during the >me period of their interac>on.
• Exis>ng algorithms: 1. Rules and heuris>cs based [Ogawa & Ma 2008] 2. Gene>c algorithm [Tanahashi & Ma 2012] 3. Convex quadra>c op>miza>on [Liu et al. 2013] 4. Greedy algorithms
Star Wars
Matrix
Incep0on
Star Wars
Video
Enron Scandal Email Data 1230 days, 1264 employees, 495,408 messages, and 3478 email clusters
Video
Current Projects • Dynamic network visualiza>on [Biological science, Internet, social networks] • Visual recommenda>ons and predic>ve analysis [Transporta>on] • Visual analy>cs for cyber and airborne intelligence • Remote and collabora>ve visualiza>on • Volume data visualiza>on [Flow simula>on, biomedical imaging, NDT] • Health record visualiza>on • Visual analysis of driving behaviors and energy use [Transporta>on] • Visualiza>on for scien>fic storytelling • Massively parallel visualiza>on • In situ visualiza>on and data reduc>on • Visualizing large scale compu>ng [Scien>fic compu>ng, cloud compu>ng] • Video visualiza>on [Security] • Uncertainty visualiza>on • Visualiza>on interface design
CENTER FOR VISUALIZATION
Kwan-Liu Ma [email protected] hLp://www.cs.ucdavis.edu/~ma