studying the shape of data using topology - inegi · topological data analysis (tda) tda is a...

Post on 25-Apr-2018

218 Views

Category:

Documents

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Studying the Shape of DataUsing Topology

Michael LesnickInstitute for Mathematics and its Applications, USA

INFOTECJune 17, 2014

Topological Data Analysis (TDA)

TDA is a branch of statistics.

Goal: Apply topology to develop tools forstudying qualitative features of data.

Two Data TypesData type 1:A finite set of points in Rn

[We call such data point cloud data.]

Two Data TypesData type 1:A finite set of points in Rn

[We call such data point cloud data.]

Two Data TypesData type 2:A function f : X ! R, X any space.

(We also study functions f : X ! Rm, m > 1).

Two Data TypesData type 2:A function f : X ! R, X any space.

(We also study functions f : X ! Rm, m > 1).

Topological Data Analysis (TDA)

TDA is a branch of statistics.

Goal: Apply topology to develop tools forstudying qualitative features of data.

Informally, qualitative features=“coarse-scale, global geometric features.”

Topological Data Analysis (TDA)

TDA is a branch of statistics.

Goal: Apply topology to develop tools forstudying qualitative features of data.

Informally, qualitative features=“coarse-scale, global geometric features.”

Examples of qualitative features of PCD (in 2-D):

Clusters

Clusters

Clusters

Cycles

Cycles

Tendrils/Flares

Tendrils/Flares

“Graph Structure”

Qualitative Features of Functions

Modes

“Craters”

In TDA, we seek to develop:

• Formal definitions of such features

• Computational tools for detecting,visualizing such features

• (When data is random) methodology forquantifying the statistical significance ofsuch features.

We focus on tools suitable for highdimensional PCD.

In TDA, we seek to develop:

• Formal definitions of such features

• Computational tools for detecting,visualizing such features

• (When data is random) methodology forquantifying the statistical significance ofsuch features.

We focus on tools suitable for highdimensional PCD.

In TDA, we seek to develop:

• Formal definitions of such features

• Computational tools for detecting,visualizing such features

• (When data is random) methodology forquantifying the statistical significance ofsuch features.

We focus on tools suitable for highdimensional PCD.

In TDA, we seek to develop:

• Formal definitions of such features

• Computational tools for detecting,visualizing such features

• (When data is random) methodology forquantifying the statistical significance ofsuch features.

We focus on tools suitable for highdimensional PCD.

Why Study Qualitative Features ofData?

Key Premise:Insight into shape of scientific data has a goodchance of giving insight into the science itself.

An example:

• Statistics of natural images (persistenthomology)

Statistics of Natural ImagesCarlsson et al. studied a set of 5000 3⇥ 3-pixelpatches sampled from natural images.

• After normalization of intensity+contrast, eachpatch lies on 7-D sphere.

• Discovery: Densest regions of data setconcentrate around a Klein bottle.

Statistics of Natural ImagesCarlsson et al. studied a set of 5000 3⇥ 3-pixelpatches sampled from natural images.

• After normalization of intensity+contrast, eachpatch lies on 7-D sphere.

• Discovery: Densest regions of data setconcentrate around a Klein bottle.

Statistics of Natural ImagesCarlsson et al. studied a set of 5000 3⇥ 3-pixelpatches sampled from natural images.

• After normalization of intensity+contrast, eachpatch lies on 7-D sphere.

• Discovery: Densest regions of data setconcentrate around a Klein bottle.

Klein Bottle in Space of 3⇥3Patches

[Source: Carlsson, Perea 2014]

Application: Texture classification [Perea,Carlsson 2013].

Other Applications of TDA• biophysics of proteins• genomics + evolutionary biology• astronomy• coverage detection in wireless sensor networks• shape segmentation• shape comparison/shape matching• basketball analytics

Introduction to Algebraic Topology

What is Algebraic Topology?

Informally, branch of math concerned withproperties of geometric objects that are invariantunder “continuous deformations.”

Continuous deformations:

• bending

• twisting

• stretching

• (but not tearing)

What is Algebraic Topology?

Informally, branch of math concerned withproperties of geometric objects that are invariantunder “continuous deformations.”

Continuous deformations:

• bending

• twisting

• stretching

• (but not tearing)

Classic Example

Classic Example

Algebraic Topology + HolesPrimary example of a property invariant undercontinuous deformations: Presence of holes.

Algebraic topology is largely concerned with:

1 formalizing the notion of a “hole” ingeometric object,

2 calculating numbers of holes of di↵erenttypes,

3 understanding mathematical implications ofpresence of holes.

Algebraic Topology + HolesPrimary example of a property invariant undercontinuous deformations: Presence of holes.

Algebraic topology is largely concerned with:

1 formalizing the notion of a “hole” ingeometric object,

2 calculating numbers of holes of di↵erenttypes,

3 understanding mathematical implications ofpresence of holes.

Algebraic Topology + HolesPrimary example of a property invariant undercontinuous deformations: Presence of holes.

Algebraic topology is largely concerned with:

1 formalizing the notion of a “hole” ingeometric object,

2 calculating numbers of holes of di↵erenttypes,

3 understanding mathematical implications ofpresence of holes.

Types of holes

In algebraic topology, we define i-dimensionalholes for each i � 0.

0-D holes are connected components

The pair of ovals has two 0-D holes.

1-D holes in 3-D objects are “holes you can seethrough.”

The donut has one 1-D hole.

2-D holes in 3-D objects are hollow spaces.

A ballon has one 2-D hole.

Counting Holes: Betti numbers

For a geometric object X , we define Bi(X), theith Betti number of X , to be the number ofi-dimensional holes in X .

Examples

B0(X) = 2;B1(X) = 0;B2(X) = 0.

Examples

B0(X) = 1;B1(X) = 2;B2(X) = 0.

Computing Betti Numbers

For discretely represented geometric objects,Bi(X) is easily computable via linear algebra.

Persistent Homology

Topology of PCD?

How can we use the hole-detection formalism oftopology to develop robust computationalmethods for studying qualitative features of data?

One approach: Persistent Homology.

• Introduced in 2000

• Widely studied and applied

Topology of PCD?

How can we use the hole-detection formalism oftopology to develop robust computationalmethods for studying qualitative features of data?

One approach: Persistent Homology.

• Introduced in 2000

• Widely studied and applied

Persistent HomologyProduces simple descriptors of qualitativefeatures of data called barcodes.

A barcode is a set of closed intervals in R.

Model Example

X

How can we detect the cycle in X?

Naive Idea

Choose r > 0. Let U(X, r) be the union ofballs of radius r centered at the points of X .

Idea: Consider B1(U(X, r)) for some choice of r.

Naive Idea

Choose r > 0. Let U(X, r) be the union ofballs of radius r centered at the points of X .

Idea: Consider B1(U(X, r)) for some choice of r.

Example

X U(X,r)

B0(U(X, r)) = 1;B1(U(X, r)) = 1;B2(U(X, r))) = 0.

When X is nice enough, for a good choice of r,B1(U(X, r)) detects the cycle in X .

Problems with this Descriptor

1 No clear way to choose r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Problems with this Descriptor

1 No clear way to choose r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Problems with this Descriptor

1 No clear way to choose r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Problems with this Descriptor

1 No clear way to choose r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Problems with this Descriptor

1 No clear way to choose r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Example: No Good Choice of r

Example: No Good Choice of r

Example: Sensitivity to Outliers

B1(U(X, r)) = 7;

Problems with this Descriptor

1 No canonical choice of r.

2 Invariant is unstable with respect toperturbation of data or small changes in r.

3 Doesn’t distinguish small holes from big ones

4 Invariant is very sensitive to outliers.

Let’s deal with problems 1-3 first.

A Solution

Consider not single choice of radius r, but allchoices of r at once.

This gives us a filtration, that is, a 1-parameterfamily of geometric objects:

F (X) = {U(X, r)}r2[0,1)

Example

Example

Example

Example

Example

Example

Example

Example

Example

Key Mathematical Observation

Not only can we count holes in each space in afiltration, we can track holes in a consistent wayacross the whole filtration at once.

The formalization of this idea is persistenthomology.

BarcodesFor each i � 0, we can define barcode Bi(X), aset of closed intervals in R.

Each interval represents a i-D cylce in thefiltration.

Also records the radii at which that cycle forms,closes up.

BarcodesFor each i � 0, we can define barcode Bi(X), aset of closed intervals in R.

Each interval represents a i-D cylce in thefiltration.Also records the radii at which that cycle forms,closes up.

Properties of a Barcode

• Allows us to distinguish in significantfeatures from insignificant features

• Records the size/scale of the feature

• Is stable w.r.t. perturbations of the data.

• Is computable in practice (using a variant ofGaussian Elimination).

Properties of a Barcode

• Allows us to distinguish in significantfeatures from insignificant features

• Records the size/scale of the feature

• Is stable w.r.t. perturbations of the data.

• Is computable in practice (using a variant ofGaussian Elimination).

Properties of a Barcode

• Allows us to distinguish in significantfeatures from insignificant features

• Records the size/scale of the feature

• Is stable w.r.t. perturbations of the data.

• Is computable in practice (using a variant ofGaussian Elimination).

Properties of a Barcode

• Allows us to distinguish in significantfeatures from insignificant features

• Records the size/scale of the feature

• Is stable w.r.t. perturbations of the data.

• Is computable in practice (using a variant ofGaussian Elimination).

Stability

Stability

Once we have barcodes, can do furtherprocessing to find geometric representations ofthe significant holes.

This framework for building descriptors of datavia barcodes is very flexible.

Example: We can build filtrations from pointcloud data whose barcodes detect flares orclusters.

Can also be adapted to detect qualitativefeatures of functions.

This framework for building descriptors of datavia barcodes is very flexible.

Example: We can build filtrations from pointcloud data whose barcodes detect flares orclusters.

Can also be adapted to detect qualitativefeatures of functions.

This framework for building descriptors of datavia barcodes is very flexible.

Example: We can build filtrations from pointcloud data whose barcodes detect flares orclusters.

Can also be adapted to detect qualitativefeatures of functions.

Advertisement

Do you have data that might have interestingshape?

Come talk to us!

Thanks!

top related