gl conference2014 toolkits_alice

26
Machine Learning Toolkits in GraphLab Create Alice Zheng GraphLab, Inc.

Upload: graphlab-inc

Post on 14-Jun-2015

135 views

Category:

Data & Analytics


0 download

DESCRIPTION

GraphLab's Alice Zheng presents on using the toolkits within GraphLab Create to build data products.

TRANSCRIPT

Page 1: Gl conference2014 toolkits_alice

Machine Learning Toolkits in GraphLab Create Alice Zheng GraphLab, Inc.

Page 2: Gl conference2014 toolkits_alice

Going Beyond Data Engineering

GraphLab Create enables Data Intelligence •  Recommender systems for retailers •  Fraud detection for financial institutions •  Market segmentation and ad targeting •  Churn prediction for telecom •  Community detection and friend

recommendation for social networks

©  2014  GraphLab,  Inc.  

Page 3: Gl conference2014 toolkits_alice

The Data Pipeline

Raw Data

Features

Models

Data Engineering

Data Intelligence

Predictions

Page 4: Gl conference2014 toolkits_alice

GraphLab Create Design Principles

•  Easy to use •  Powerful •  Fast •  Composable

Page 5: Gl conference2014 toolkits_alice

Example: Movie Recommender

City of God

Wild Strawberries

The Celebration

Women on the Verge of a Nervous Breakdown

What do I recommend???

Page 6: Gl conference2014 toolkits_alice

Example: Movie Recommender

City of God

Wild Strawberries

The Celebration

La Dolce Vita

Women on the Verge of a Nervous Breakdown

Page 7: Gl conference2014 toolkits_alice

User-Movie Interaction Matrix Women  on  the  Verge  …  

The  Celebra2on  

City  of  God   Wild  Strawberries  

La  Dolce  Vita  

Bob  

Anna  

David  

Ethan  

Page 8: Gl conference2014 toolkits_alice

Matrix Factorization User-item interactions

Information about users Information about items

Item latent factors User latent factors

×

+ +

Page 9: Gl conference2014 toolkits_alice

Demo

Page 10: Gl conference2014 toolkits_alice

The Moral of the Story

•  Data scientists need the right tools for the right job

•  There is always a more clever model •  There is probably some bug in your data •  GraphLab Create •  Versatile, composable, automated •  Play, learn, build better models

Page 11: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural networks/

deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 12: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural

networks/deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 13: Gl conference2014 toolkits_alice

GraphLab Create Toolkits •  Recommenders

•  Item similarity, factorization machine, matrix factorization, non-negative matrix factorization, matrix factorization for ranking

•  Graph analytics •  PageRank, triangle counting, degree distribution, graph coloring, connected

components, shortest path, k-core decomposition •  User-defined graph computation

•  Nearest Neighbors •  Brute-force and ball trees

•  Topic modeling •  LDA

•  Regression/Classification •  Linear regression, logistic regression, SVM, gradient boosted trees, neural

networks/deep learning •  Clustering

•  K-Means •  Other popular ML libraries

•  Vowpal Wabbit

Page 14: Gl conference2014 toolkits_alice

Come to Training Day!

•  GraphLab data science training day tomorrow!

•  A full day of lectures and exercises •  Data engineering, model building,

deployment, all on GraphLab Create

Page 15: Gl conference2014 toolkits_alice

Speed + Scale

•  How much do you need? •  How much data do you really have?

Page 16: Gl conference2014 toolkits_alice

Data Funnel

Raw Data

Features Models

PB GB—TB

MB

Page 17: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Page 18: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 19: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 20: Gl conference2014 toolkits_alice

Data Analytics Life Cycle Extract

Transform Load

Model Learning

Page 21: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL

Page 22: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 23: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 24: Gl conference2014 toolkits_alice

Data Analytics Life Cycle

ETL Model

Learning

Page 25: Gl conference2014 toolkits_alice

Benchmarks

0   200   400   600   800   1000   1200   1400   1600   1800  

Run Time of Item Similarity on Netflix Dataset

GraphLab Create (1 Node), 3.6 minutes

Mahout (5 Node), 29 minutes

Page 26: Gl conference2014 toolkits_alice

Become a GLC User!

•  We push the frontier of the industry •  ... and our customers guide us •  Our features are customer driven •  Tell us what you think!