data comes in shapes

16
Data Comes in Shapes July 16, 5 th Elephant Tim Poston Chief Scientist http://forushealth.com http://geometeer.com [email protected]

Upload: tim-poston

Post on 16-Aug-2015

152 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Data Comes in ShapesJuly 16, 5th Elephant

Tim PostonChief Scientist

http://forushealth.comhttp://geometeer.com

[email protected]

Mostly numbers.

What are data?

Are numbers only numbers?

Numbers come in patterns:That is what ‘big data’ is all about.

Patterns are shapes.

Studying data shapes is geometry.

Patterns are shapes.

… but not the geometry of high school.

Studying data shapes is not the geometry of high school.

It is not replacing the 3D minds of children

with flattened (though intricate) teen imagination.

If we have three variables, we have three dimensions.

If we have n variables, we have n dimensions.

To think about n dimensions, we have two choices:

Practice thinking in 3D

Turn it all into algebra

We have to do both.

What does a matrix

a b c

c d e

f g h[ ]even mean?

[ ][] []

A matrix

a b c 1 a

c d e 0 = c

f g h 0 f

describes a transformation

by listing how a few things change.

[ ][] []

A matrix

a b c 0 b

c d e 1 = d

f g h 0 g

describes a transformation

by listing how a few things change.

[ ][] []

A matrix

a b c 0 c

c d e 0 = e

f g h 1 h

describes a transformation

by listing how a few things change.

a b c

c d e

f g h[ ]is just a list of where (1,0,0), (0,1,0) and (0,0,1) go.

A matrix

Remember that, and you always clarify how the algebra works.

Remember that, and you always clarify how the code should work.

Principal component analysis (PCA)

just finds a rotation (matrix) so that the data pointslie as close as possible to coordinate axes.

In n dimensions.

The simplex method (“Linear Programming”) looks at points constrained by inequalities

a1x1 + a2x2 + … + anxn + c ≥ 0

which just means ‘lying on one side of a line/plane/hyperplane, in 2D/3D/nD’.

A convex polygon/polyhedron/polytope.

The simplex method (“Linear Programming”) looks ata convex polytope, and seeks the highest point.

Find a genuine corner (any corner).

Go up the most vertical edge, till you meet another face.

Do that again. And again.

And again. And again. And reach the top. All the matrix ‘pivoting’, degenerate case handling, etc.,is just implementing that.

Support vector machine explanations

(like this from Wikipedia)tend to skimp on the geometry.

What is a support line / plane /hyperplane?

How do you find one? (Very like simplex method.)

Geometry organises what algebra needs to do.

Algebra (often linear) organises what code needs to do.

Planning code needs algebra, which needs geometry.

Some bugs come from coding wrong.

Some bugs come from coding the wrong algebra.

Some bugs come from algebraising the wrong geometry.

Try to think at all levels!

Thank you!

Tim Poston

http://forushealth.comhttp://[email protected]