Transcript
Page 1: Topological Data Analysis: visual presentation of multidimensional data sets

Topological  Data  Analysis

Visual presentation of multidimensional data sets

Page 2: Topological Data Analysis: visual presentation of multidimensional data sets

Current  vs  New SQL Topological  Data  Analysis

Page 3: Topological Data Analysis: visual presentation of multidimensional data sets

Topology

The  Seven  Bridges  of  Königsberg,  a  problem  solved  by  Leonard  Euler  (1736).

The  study  of  qualitative  properties  of  certain  objects  (topological  spaces)  that  are  invariant  under  a  certain  kind  of  transformation  (continuous  map),  especially  those  properties  that  are  invariant  under  a  certain  kind  of  equivalence  (homeomorphism).

Page 4: Topological Data Analysis: visual presentation of multidimensional data sets

Topology  Data  Analysis  Pipeline

a b

a.  First  approximate  the  unknown  space  X  in  a  combinatorial  structure  K

b. Then  compute  topological  invariants  of  K

Page 5: Topological Data Analysis: visual presentation of multidimensional data sets

Combinatorial  Representations The  Čech  Complex

Page 6: Topological Data Analysis: visual presentation of multidimensional data sets

Combinatorial  Representations Alpha  Complex Vietoris-­‐‑Rips  Complex

Cubical  Complex Witness  Complex

Page 7: Topological Data Analysis: visual presentation of multidimensional data sets

Topological  Invariants A  topological  invariant  is  a  map  f    that  assigns  the  same  object  to homeomorphic  spaces,  that  is:

Homology:  is  a  machine  that  converts  local  data  about  a  space  into  global  algebraic  structure

Reference:  Wikipedia,  2010.

Page 8: Topological Data Analysis: visual presentation of multidimensional data sets

Morse  Theory  and  Reeb  Graph Theorem:   Suppose  h  :  X  g        is  a  discrete  Morse  function. Then  X  is  homotopy  equivalent  to  a  CW-­‐‑complex  with  exactly  one  cell  of  dimension  p  for  each  critical  simplex  of  dimension  p.

Reference:  Teng  Ma  ;  Zhuangzhi  Wu  ;  Pei  Luo  ;  Lu  Feng.  Reeb  graph  computation  through  spectral  clustering,  2011.

Page 9: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Demographics

Data  shape: [220:45]

Page 10: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  YT  channel  stats

Data  shape: [1500:12]

Page 11: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  dataset

Data  shape: [17770:480189] 8.5  billions  of  elements

Page 12: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  dataset

Music

Indian

Anime

French

Honk  Kong

US  Cartoons

Kids Movie

German

US Retro

Horror

Page 13: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  comparison

PCA Isomap

LLE

Spectral  Embedding

LTSA Hessian  LLE

Page 14: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (music)

Page 15: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (kids  movie)

Page 16: Topological Data Analysis: visual presentation of multidimensional data sets

Case  study:  Netflix  (horror)

Page 17: Topological Data Analysis: visual presentation of multidimensional data sets

[email protected]

www.datarefiner.com

Please  sign  up  for  free  beta  access:

Page 18: Topological Data Analysis: visual presentation of multidimensional data sets

Questions?


Top Related