computational topology: a new way of viewing data › 2017 › 11 › ... · 2017-11-29 · 2. saba...
TRANSCRIPT
Computational Topology: A New Way Of Viewing DataTHANOS GENTIMIS LSU 3/3/2015
WHAT?
What do you think?
Mobius Band …a way to confuse ants
Sometimes I can’t tell a coffee mug from a donut
The hairy ball theorem …no use trying to get the perfect haircut …
… unless your head is shaped like a donut.
Algebraic TopologyIs fun!
The betti numbers!Betti numbers are the invariants of interest.
Betti0 counts the number of connected components
Betti1 counts the number of non-trivial loops
Betti2 counts the number of non-trivial voids
Graphs Courtesy of Dmitriy Morozov
Betti0=3Betti1=0Betti2=0
Betti0=1Betti1=1Betti2=0
Betti0=1Betti1=2Betti2=1
Why?
Data has shape! And shape matters!Raw Data Point Cloud
The “shape” of this point-cloud can help us understand our data!Can we “quantify” that shape? Can we compute it?Can we use this shape to:
Find anomalies?Maybe Classify data?
Topological Data Analysis pipeline:We assume that the data lies in a topological space.
We take measurements, sampling that space.
We try to reconstruct it by using an approximation.
We compute the invariants to understand it.
Betti0=1Betti1=1Betti2=0
Graphs Courtesy of Vin De Silva
How?
Triangulations - Mesh
Given a sampling of a space we can try to partially reconstruct it by “triangulation”, using Cech Complexes, Rips Complexes or Alpha Shapes.
Simplicial ComplexA simplicial complex is the union of simple pieces (simplices) i.e. vertices, edges, triangles etc.
Two simplices must intersect at a simplex or not at all.
Cech ComplexStart with a point cloud X in some Euclidean space.
Fix a parameter r>0.
Consider balls of radius r around each point covering X.
The Cech Complex is the nerve of this cover.
The Nerve theorem guarantees similar
topologies!
Proximity measures
Given a set of nodes with an attribute vector.
A metric on the space on nodes can be defined as the weighted sum of distances of their attributes:
One can then create a simplicial complex representing those nodes and use TDA!
Applications!
-10
1
-0.500.51
-0.5
0
0.5
1
w(t)w(t+1)
w(t
+ 2
)
Signal Processing
0 0.05 0.1 0.15 0.2-1
-0.5
0
0.5
1
time(s)
w(t
)
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Be
tti 0
0 0.02 0.04 0.06 0.08 0.1 0.12 0.14
Be
tti 1
Signal in time domain 3D delay embeddingBarcodes Betti numbers
Choosing delays and embedding dimension Algebraic topology
1
2
( )
( )
( )
w t
w t
w t
Wheeze detection
Non-wheeze signals
Wheeze signals
0
1
1
0
0
1
1
1
Coverage Hole detection
Coverage Hole detection.
Betti0=1Betti1=3Betti2=0
Graphs Courtesy of Harish Chintakunta
The coverage area of the sensors is
approximated by a simplicial complex.
“Holes in coverage” become non-zero
Betti1
Analysis of a socio-plexClusters based on
feature “closeness”
What is the meaning of loops? Voids?
Simplifying Social Networks
Movie Courtesy of Adam Wilkerson
The socioplex is reduced by a “strong homotopy collapse
The core contains the PI’sThe periphery is mostly graduate students!
Gait Recognition
The video sequence of a human captured from the side is analyzed.
The silhouettes are joinedand a simplicial structure is created
The persistent betti numbers can beused to distinguish between walkingand running.
Graphs Courtesy of Rocio Gonzalez-Diaz
Gene Periodicity
Graphs Courtesy of Jose Perea
The gene expression is measured as a time series
Snippets of that series are compared to discover similaritiesand repetitions.
Their topological invariantsare computed and they are givena periodicity score!
ConclusionsIn TDA you don’t go in with your data and a question. You go in with just your data and look for the questions!
We really don’t know how many applications it has … we find new uses in different unconnected areas.
It is a combination of Mathematics, Computer Science, Statistics, Engineering, but ultimately it needs experts on the specific field to analyze the metadata.
Useful links
Jose Perea (Gene Expression Periodicity) https://fds.duke.edu/db/aas/math/joperea
Harish Chintakinta (Coverage Hole Detection) http://www4.ncsu.edu/~hkchinta/index.html
Dmitiry Morozov (Cyclic Coordinates, Dionysus) http://www.mrzv.org/
Vin De Silva (Theory and applications) http://pages.pomona.edu/~vds04747/public/index.html
Vidit Nanda (Theory and applications, Perseus) http://www.sas.upenn.edu/~vnanda/
Gunnar Carlesson (Theory and applications, Ayasdi,Javaplex) http://math.stanford.edu/~gunnar/
Rocio Gonzalez(Gait recognition) http://ma1.eii.us.es/miembros/rogodi/XSLT.asp?xml=rogodi.xml&xsl=profesores.xsl
Bibliography 1. Harish Chintakunta, Thanos Gentimis, Rocio Gonzalez-Diaz, Maria-Jose Jimenez, Hamid Krim,
Information Theoretic Persistent Barcodes, Pattern Recognition Journal, Elsevier, 2014
2. Saba Emrani, Thanos Gentimis and Hamid Krim, Persistent Homology Of Delay Embeddings, IEEE Signal Processing Letters,Vol.21(April 2014), No:14
3. Thanos Gentimis, Harish Chintakunta, Hamid Krim, Introduction to Computational Topology, Signal Processing Magazine, to appear
4. Graig Bell, Thanos Gentimis, Directed Filtrations, preprint submitted to Computational Geometry
5. Maria Bampasidou, Thanos Gentimis, Modeling Collaborations with Persistent Homology,preprint available on arxiv
Thanks!
History Of Computational TopologyComputational work of Frosini, Ferri, and collaborators in Bologna, Italy.
Doctoral thesis of Robins at Boulder, Colorado.
Bio-geometry project of Edelsbrunner, Harer at Duke, North Carolina.
General Framework provided by Carlesson, Zomorodian in Stanford, California.
Popularization by Ghrist, De Silva Upenn, Pennsylvania.