three new ideas in sdp-based manifold learning alexander gray georgia institute of technology...

19
Three New Ideas in SDP- based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and Statistical Tools

Upload: bernice-lyons

Post on 05-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Three New Ideas in SDP-based Manifold Learning

Alexander GrayGeorgia Institute of Technology

College of Computing

FASTlab: Fundamental Algorithmic and Statistical Tools

Page 2: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

The FASTlabFundamental Algorithmic and Statistical Tools Laboratory

1. Arkadas Ozakin: Research scientist, PhD Theoretical Physics2. Dong Ryeol Lee: PhD student, CS + Math3. Ryan Riegel: PhD student, CS + Math4. Parikshit Ram: PhD student, CS + Math5. William March: PhD student, Math + CS6. James Waters: PhD student, Physics + CS7. Hua Ouyang: PhD student, CS8. Sooraj Bhat: PhD student, CS9. Ravi Sastry: PhD student, CS10. Long Tran: PhD student, CS11. Michael Holmes: PhD student, CS + Physics (co-supervised)12. Nikolaos Vasiloglou: PhD student, EE (co-supervised)13. Wei Guan: PhD student, CS (co-supervised)14. Nishant Mehta: PhD student, CS (co-supervised)15. Wee Chin Wong: PhD student, ChemE (co-supervised)16. Abhimanyu Aditya: MS student, CS17. Yatin Kanetkar: MS student, CS18. Praveen Krishnaiah: MS student, CS19. Devika Karnik: MS student, CS20. Prasad Jakka: MS student, CS

Page 3: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

10 sample tasks

• “Find engines like this one” (querying)• “Plot the distribution of engine sizes and emissions” (density

estimation)• “Predict the lifetime maintenance cost” (regression)• “Predict existence of fault or not” (classification)• “Predict the number of failures next year” (time series analysis)• “Show all engines on a 2-d plot” (dimension reduction)• “Show or remove the unusual engines” (outlier detection)• “Show the different types of engines” (clustering)• “Is this group equivalent to this group?” (two-sample testing)• “What’s the best action to take based on this behavior?”

(reinforcement learning/control)

Types of data:• Sensor measurements• Documents• Database records, etc.

Page 4: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Rankmap

• Can do manifold learning using only ordinal data

Page 5: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Isometric Separation Maps

• Preserve class proximity

Page 6: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Density-Preserving Maps

• Preserve densities, not distances

Page 7: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

The problem: big datasets

D

N

M

Could be large: N (#data), D (#features), M (#models)

Page 8: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Dual-tree All-nearest-neighbors

• O(N2) O(N)

Page 9: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Rank-approximate Nearest-neighbor Search

• Distance approximation rank approximation

Page 10: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Multi-scale

Decompositions

e.g. kd-trees

[Bentley 1975], [Friedman, Bentley & Finkel 1977],[Moore & Lee 1995]

How can we compute these efficiently?

Page 11: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 1

Page 12: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 2

Page 13: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 3

Page 14: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 4

Page 15: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 5

Page 16: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

A kd-tree: level 6

Page 17: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Some application highlights

• Our software is being put into the pipelines of the world’s massive-scale science projects– Astronomy sky surveys (LSST, Pan-

STARRS, DES): 1B objects/month– Large Hadron Collider: 1M events/sec

Page 18: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Some application highlights

• Others– McAfee spam blacklisting: 300M emails/day– Supermarket demand forecasting– Algorithmic trading– Audio fingerprint matching– Legal document browsing and search

Page 19: Three New Ideas in SDP-based Manifold Learning Alexander Gray Georgia Institute of Technology College of Computing FASTlab: Fundamental Algorithmic and

Software

• MLPACK (C++)– First scalable comprehensive ML library

• MLPACK-db – fast data analytics in relational

databases (SQL Server)

• MLPACK Pro

- Very-large-scale data