big data visualization

28
Kwan-Liu Ma Department of Computer Science University of California at Davis Big Data Visualization CA Technologies 1/22/2014

Upload: bigdatavizbay

Post on 27-Jan-2015

113 views

Category:

Technology


0 download

DESCRIPTION

Big Data Visualization Kwan-Liu Ma Professor of Computer Science and Chair of the Graduate Group in Computer Science (GGCS) at the University of California-Davis January 22nd 2014 We are entering a data-rich era. Advanced computing, imaging, and sensing technologies enable scientists to study natural and physical phenomena at unprecedented precision, resulting in an explosive growth of data. The size of the collected information about the Web and mobile device users is expected to be even greater. To make sense and maximize utilization of such vast amounts of data for knowledge discovery and decision making, we need a new set of tools beyond conventional data mining and statistical analysis. One such a tool is visualization. I will present visualizations designed for gleaning insight from massive data and guiding complex data analysis tasks. I will show case studies using data from cyber/homeland security, large-scale scientific simulations, medicine, and sociological studies. Big Data Visualization Meetup - South Bay http://www.meetup.com/Big-Data-Visualisation-South-Bay/

TRANSCRIPT

Page 1: Big Data Visualization

Kwan-Liu Ma Department of Computer Science University of California at Davis

Big Data Visualization

CA Technologies 1/22/2014

Page 2: Big Data Visualization

Big  Data:  Issues    •  Volume:  size/scale  •  Velocity:  rate  •  Variety:  type/form  •  Veracity:  accuracy  and  completeness  

Page 3: Big Data Visualization

Visualiza0on    •  To  explore  and  discover  •  To  validate  •  To  communicate  

•  An  overview,  a  path,  an  interface  

Page 4: Big Data Visualization

Extreme-­‐Scale  Scien0fic  Simula0ons  

Page 5: Big Data Visualization

Scien>fic  Simula>ons  

Page 6: Big Data Visualization

Large  Scien>fic  Data  Visualiza>on  

•  In  situ  visualiza>on    •  Parallel  visualiza>on  that  is  highly  scalable  •  In  situ  data  reduc>on  and  triage  •  In  situ  data  processing  for  interac>ve  data  explora>on  and  analysis  

As we move to Exascale, it’s no longer feasible to store most of the data for post processing! We must do:

Page 7: Big Data Visualization

Supernova  Simula>on  

Simulation: John Blondin, NCSU

Page 8: Big Data Visualization

Fusion  Simula>ons  

Simulation: Dr. S. Ethier, the Princeton Plasma Physics Lab.

Page 9: Big Data Visualization

Big  Network    Analysis  &  Visualiza0on  

Page 10: Big Data Visualization

FM3

GRIP

Treemap

Hilbert

Sunburst

Circle

222 nodes 2583 edges

Page 11: Big Data Visualization

Network  Simplifica>on/Characteriza>on  

Hamas

al Qaeda

TVCG 12(6) 2006

Page 12: Big Data Visualization

Network  Simplifica>on/Characteriza>on  

Friendster social network Astrophysics co-author network Links exhibit negative sensitivity (red) One competitive network (red) and between cluster centers one collaborative network (blue)

Using centrality sensitivity

Competitive

Collaborative

TVCG 18(1) 2012

Page 13: Big Data Visualization

The  Graph  Layout  Problem  •  The  cost  of  displaying  a  graph  

•  The  hairball  problem  of  large  graph  layouts  –  Large,  dense  graphs  become  

a  mess  –  Inefficient  use  of  space  –  Details  cluLered  

•  Solu>ons  –  Filtering  –  Clustering  –  Abstrac>on  –  Focus+context   California data 6,107 nodes 15,160 edges

High dimensional embedding method

Page 14: Big Data Visualization

A  Fast  Graph  Layout  Method  l  Hierarchically  cluster  the  nodes  (if  no  clustering  given)  l  Traverse  the  hierarchy  to  order  the  nodes  l  Place  the  nodes  in  that  order  along  a  space  filling  curve  

Order 1 Order 2 Order 3 Order 4 Order 5 Order 11

Hilbert  curves  

TVCG 14(6) 2008

Page 15: Big Data Visualization

Fast  Graph  Layout   A Graph with 6,107 nodes 15,160 edges

Hibert Space filling curve: Gosper

Treemap

High dimensional embedding: 0.19s

One time clustering: 0.5 seconds Layout + rendering: 0.0005 seconds

LinLog (force directed): 10,737s

Page 16: Big Data Visualization

Fast  Graph  Layout   Internet Connectivity 41,928 nodes 218,080 edges

Space filling curve: Hibert

Space filling curve: Gosper FM3 40.8s

GRIP 6.87s

One time clustering: 18.87 seconds Layout + rendering: 0.0036 seconds

Treemap

Page 17: Big Data Visualization

Dynamic  Networks  

Page 18: Big Data Visualization

Growing  Internet  Incremental clustering-based approach – Radial treemap layout

Video

Page 19: Big Data Visualization

Time-­‐Varying  Networks  

•  Almost  all  networks  found  in  real-­‐world  applica>ons  are  >me-­‐varying  

•  Both  nodes  and  edges  can  change  •  Visualiza>on  methods:  

– Anima>ons  – Small  mul>ples  visualiza>on  – Difference  visualiza>on  – Storyline  visualiza>on  

Page 20: Big Data Visualization

Storyline  Visualiza>on  

XKCD.com

Page 21: Big Data Visualization

Storyline  Visualiza>on  

•  Consis>ng  of  a  series  of  lines,  going  from  leU  to  right  along  the  >me-­‐axis,  that  converge  and  diverge  in  the  course  of  their  paths.  

•  Each  line  represents  a  unique  en>ty  (character)  in  the  data.  

•  The  star>ng  &  ending  points  of  each  line  represent  the  lifespan  of  the  corresponding  en>ty.  

•  Lines  are  bundled  together  during  the  >me  period  of  their  interac>on.  

•  Exis>ng  algorithms:  1.   Rules  and  heuris>cs  based  [Ogawa  &  Ma  2008]  2.   Gene>c  algorithm  [Tanahashi  &  Ma  2012]  3.   Convex  quadra>c  op>miza>on  [Liu  et  al.  2013]  4.   Greedy  algorithms    

Page 22: Big Data Visualization

Star  Wars  

Page 23: Big Data Visualization

Matrix  

Page 24: Big Data Visualization

Incep0on  

Page 25: Big Data Visualization

Star  Wars  

Video

Page 26: Big Data Visualization

Enron  Scandal  Email  Data  1230 days, 1264 employees, 495,408 messages, and 3478 email clusters

Video

Page 27: Big Data Visualization

Current  Projects  •  Dynamic  network  visualiza>on  [Biological  science,  Internet,  social  networks]  •  Visual  recommenda>ons  and  predic>ve  analysis  [Transporta>on]  •  Visual  analy>cs  for  cyber  and  airborne  intelligence    •  Remote  and  collabora>ve  visualiza>on    •  Volume  data  visualiza>on  [Flow  simula>on,  biomedical  imaging,  NDT]  •  Health  record  visualiza>on  •  Visual  analysis  of  driving  behaviors  and  energy  use  [Transporta>on]  •  Visualiza>on  for  scien>fic  storytelling    •  Massively  parallel  visualiza>on  •  In  situ  visualiza>on  and  data  reduc>on    •  Visualizing  large  scale  compu>ng  [Scien>fic  compu>ng,  cloud  compu>ng]  •  Video  visualiza>on  [Security]  •  Uncertainty  visualiza>on    •  Visualiza>on  interface  design      

Page 28: Big Data Visualization

CENTER FOR VISUALIZATION

Kwan-Liu Ma [email protected] hLp://www.cs.ucdavis.edu/~ma