diving in the deep end of the big data pool

20
Diving In The Deep End of the Big Data Pool François Garillot @huitseeker

Upload: francois-garillot

Post on 11-Jul-2015

275 views

Category:

Software


0 download

TRANSCRIPT

Page 1: Diving In The Deep End Of The Big Data Pool

Diving In The Deep End of the Big Data Pool

François Garillot@huitseeker

Page 2: Diving In The Deep End Of The Big Data Pool

17:45 Thursday

Understanding your Unicorns: Data Science Team Building in Action

Location: 120-121

Page 3: Diving In The Deep End Of The Big Data Pool

4 analytical PhDs

3 weeks

1 org with data

& a QUESTION

Page 4: Diving In The Deep End Of The Big Data Pool

François Garillot (me)

Stephen Gadd Marisa Figueiredo

Federica Capranico

Page 5: Diving In The Deep End Of The Big Data Pool
Page 6: Diving In The Deep End Of The Big Data Pool
Page 7: Diving In The Deep End Of The Big Data Pool

Globetrotters

Family

Entertainment

SMB

Sport

Music Festivals

Football Fans

In Car Market Buyers

Pet Owners

Technology

Drivers

Mums Preschool

University Students

Gamblers

Mums

Shoppers

Music

Zone 1 commuters Infrequent

Zone 1 commuters Freq.Zone 1 commuters Resident

Zone 1 commuters Regular

Zone 1 commuters

Entertainment FilmsFood Coffee Shops

Gamers

Autos

B2B

Business/Finance

Careers

Education

Entertainment

Family & Youth

Gambling

Gaming

IT

Lifestyle

News

Property

Government

Retail

Search

Social

Sport

Telco

Travel

Page 8: Diving In The Deep End Of The Big Data Pool

Globetrotters

Family

Entertainment

SMB

Sport

Music Festivals

Football Fans

In Car Market Buyers

Pet Owners

Technology

Drivers

Mums Preschool

University Students

Gamblers

Mums

Shoppers

Music

Zone 1 commuters Infrequent

Zone 1 commuters Freq.Zone 1 commuters Resident

Zone 1 commuters Regular

Zone 1 commuters

Entertainment FilmsFood Coffee Shops

Gamers

Autos

B2B

Business/Finance

Careers

Education

Entertainment

Family & Youth

Gambling

Gaming

IT

Lifestyle

News

Property

Government

Retail

Search

Social

Sport

Telco

Travel

5+millions

50+ K

Page 9: Diving In The Deep End Of The Big Data Pool

... so: Things Not To Mess Up

Page 10: Diving In The Deep End Of The Big Data Pool

Nobody ever get those two right

Page 11: Diving In The Deep End Of The Big Data Pool

unsupervised clustering

find new segments

based on web

browsing history

Page 12: Diving In The Deep End Of The Big Data Pool

relative distances

spatial representation

unsupervised clustering based on web browsing history

have a position for each user

no implementation that works at scale!

find new segments

simrank

Page 13: Diving In The Deep End Of The Big Data Pool

Simrank & MDS

website

websitewebsite

website

22 million nodes

123 million edges

simrank

5+ millions

25+ trillions

Clustering

Page 14: Diving In The Deep End Of The Big Data Pool

Simrank & MDS

MDS: scalable but too complex to

do in time

website

websitewebsite

website

22 million nodes

123 million edges

simrank

5+ millions

MDS

Clustering

(45, 36)

✓Implemented

✖ Fail

Page 15: Diving In The Deep End Of The Big Data Pool

Lay the bare stuff down first, THEN refine

Page 16: Diving In The Deep End Of The Big Data Pool

Cluster stilla huge mess to deploy

Page 17: Diving In The Deep End Of The Big Data Pool

Results

Singles

Locality-Sensitive Hashing

Hand-made code !

typical web browsing: pof.com, tagged.com

“The year of being single”, Marketing Magazine, 2013

“The rise of the single economy”, The Guardian, 2014

Page 18: Diving In The Deep End Of The Big Data Pool

Final results obtainedon the last day

Page 19: Diving In The Deep End Of The Big Data Pool

Essential : fuel & friends

Page 20: Diving In The Deep End Of The Big Data Pool

- power & network fail

- Bare pipeline first

- Distributed is hard, let's go Think instead !

- Fuel & friends