Transcript
Page 1: Big Data Visualization

Kwan-Liu Ma Department of Computer Science University of California at Davis

Big Data Visualization

CA Technologies 1/22/2014

Page 2: Big Data Visualization

Big  Data:  Issues    •  Volume:  size/scale  •  Velocity:  rate  •  Variety:  type/form  •  Veracity:  accuracy  and  completeness  

Page 3: Big Data Visualization

Visualiza0on    •  To  explore  and  discover  •  To  validate  •  To  communicate  

•  An  overview,  a  path,  an  interface  

Page 4: Big Data Visualization

Extreme-­‐Scale  Scien0fic  Simula0ons  

Page 5: Big Data Visualization

Scien>fic  Simula>ons  

Page 6: Big Data Visualization

Large  Scien>fic  Data  Visualiza>on  

•  In  situ  visualiza>on    •  Parallel  visualiza>on  that  is  highly  scalable  •  In  situ  data  reduc>on  and  triage  •  In  situ  data  processing  for  interac>ve  data  explora>on  and  analysis  

As we move to Exascale, it’s no longer feasible to store most of the data for post processing! We must do:

Page 7: Big Data Visualization

Supernova  Simula>on  

Simulation: John Blondin, NCSU

Page 8: Big Data Visualization

Fusion  Simula>ons  

Simulation: Dr. S. Ethier, the Princeton Plasma Physics Lab.

Page 9: Big Data Visualization

Big  Network    Analysis  &  Visualiza0on  

Page 10: Big Data Visualization

FM3

GRIP

Treemap

Hilbert

Sunburst

Circle

222 nodes 2583 edges

Page 11: Big Data Visualization

Network  Simplifica>on/Characteriza>on  

Hamas

al Qaeda

TVCG 12(6) 2006

Page 12: Big Data Visualization

Network  Simplifica>on/Characteriza>on  

Friendster social network Astrophysics co-author network Links exhibit negative sensitivity (red) One competitive network (red) and between cluster centers one collaborative network (blue)

Using centrality sensitivity

Competitive

Collaborative

TVCG 18(1) 2012

Page 13: Big Data Visualization

The  Graph  Layout  Problem  •  The  cost  of  displaying  a  graph  

•  The  hairball  problem  of  large  graph  layouts  –  Large,  dense  graphs  become  

a  mess  –  Inefficient  use  of  space  –  Details  cluLered  

•  Solu>ons  –  Filtering  –  Clustering  –  Abstrac>on  –  Focus+context   California data 6,107 nodes 15,160 edges

High dimensional embedding method

Page 14: Big Data Visualization

A  Fast  Graph  Layout  Method  l  Hierarchically  cluster  the  nodes  (if  no  clustering  given)  l  Traverse  the  hierarchy  to  order  the  nodes  l  Place  the  nodes  in  that  order  along  a  space  filling  curve  

Order 1 Order 2 Order 3 Order 4 Order 5 Order 11

Hilbert  curves  

TVCG 14(6) 2008

Page 15: Big Data Visualization

Fast  Graph  Layout   A Graph with 6,107 nodes 15,160 edges

Hibert Space filling curve: Gosper

Treemap

High dimensional embedding: 0.19s

One time clustering: 0.5 seconds Layout + rendering: 0.0005 seconds

LinLog (force directed): 10,737s

Page 16: Big Data Visualization

Fast  Graph  Layout   Internet Connectivity 41,928 nodes 218,080 edges

Space filling curve: Hibert

Space filling curve: Gosper FM3 40.8s

GRIP 6.87s

One time clustering: 18.87 seconds Layout + rendering: 0.0036 seconds

Treemap

Page 17: Big Data Visualization

Dynamic  Networks  

Page 18: Big Data Visualization

Growing  Internet  Incremental clustering-based approach – Radial treemap layout

Video

Page 19: Big Data Visualization

Time-­‐Varying  Networks  

•  Almost  all  networks  found  in  real-­‐world  applica>ons  are  >me-­‐varying  

•  Both  nodes  and  edges  can  change  •  Visualiza>on  methods:  

– Anima>ons  – Small  mul>ples  visualiza>on  – Difference  visualiza>on  – Storyline  visualiza>on  

Page 20: Big Data Visualization

Storyline  Visualiza>on  

XKCD.com

Page 21: Big Data Visualization

Storyline  Visualiza>on  

•  Consis>ng  of  a  series  of  lines,  going  from  leU  to  right  along  the  >me-­‐axis,  that  converge  and  diverge  in  the  course  of  their  paths.  

•  Each  line  represents  a  unique  en>ty  (character)  in  the  data.  

•  The  star>ng  &  ending  points  of  each  line  represent  the  lifespan  of  the  corresponding  en>ty.  

•  Lines  are  bundled  together  during  the  >me  period  of  their  interac>on.  

•  Exis>ng  algorithms:  1.   Rules  and  heuris>cs  based  [Ogawa  &  Ma  2008]  2.   Gene>c  algorithm  [Tanahashi  &  Ma  2012]  3.   Convex  quadra>c  op>miza>on  [Liu  et  al.  2013]  4.   Greedy  algorithms    

Page 22: Big Data Visualization

Star  Wars  

Page 23: Big Data Visualization

Matrix  

Page 24: Big Data Visualization

Incep0on  

Page 25: Big Data Visualization

Star  Wars  

Video

Page 26: Big Data Visualization

Enron  Scandal  Email  Data  1230 days, 1264 employees, 495,408 messages, and 3478 email clusters

Video

Page 27: Big Data Visualization

Current  Projects  •  Dynamic  network  visualiza>on  [Biological  science,  Internet,  social  networks]  •  Visual  recommenda>ons  and  predic>ve  analysis  [Transporta>on]  •  Visual  analy>cs  for  cyber  and  airborne  intelligence    •  Remote  and  collabora>ve  visualiza>on    •  Volume  data  visualiza>on  [Flow  simula>on,  biomedical  imaging,  NDT]  •  Health  record  visualiza>on  •  Visual  analysis  of  driving  behaviors  and  energy  use  [Transporta>on]  •  Visualiza>on  for  scien>fic  storytelling    •  Massively  parallel  visualiza>on  •  In  situ  visualiza>on  and  data  reduc>on    •  Visualizing  large  scale  compu>ng  [Scien>fic  compu>ng,  cloud  compu>ng]  •  Video  visualiza>on  [Security]  •  Uncertainty  visualiza>on    •  Visualiza>on  interface  design      

Page 28: Big Data Visualization

CENTER FOR VISUALIZATION

Kwan-Liu Ma [email protected] hLp://www.cs.ucdavis.edu/~ma  


Top Related