a year of data science at metail

32
A Year Of Data Science at Metail Matt McDonnell - Data Scientist

Upload: matt-mcdonnell

Post on 16-Apr-2017

203 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: A Year of Data Science at Metail

A Year Of Data Science at MetailMatt McDonnell - Data Scientist

Page 2: A Year of Data Science at Metail

Business Context

Startup: “A group of people operating in an environment of uncertainty striving for a repeatable and scalable business model“

Page 3: A Year of Data Science at Metail

A scalable startup needs a Customer Factory

Figure adapted from ‘Scaling Lean’ by Ash Maurya https://leanstack.com/scaling-lean-book/

Page 4: A Year of Data Science at Metail

A look behind the curtain – what’s the data?

See Metail in action:

http://metail.myshopify.com?utm_source=DataInsightsNov2016

(Scary UTM code is there so I don’t have to spend the next week digging into ‘Who are these mysterious visitors?’)

Live Demo Starts Here!Sheepish explanation of why it’s not working starts here

Page 5: A Year of Data Science at Metail

The road to Data Science

• Understand the data

• Learn the tools

• Build the analytics for business intelligence

• More sophisticated data analysis for deeper understanding

• Apply machine learning techniques

• Develop models for prediction and decision making

Page 6: A Year of Data Science at Metail

My experience prior to MetailCareers

• Physics PostdocOxford, Griffith

• Technical ConsultantMathWorks

• Quant DeveloperFidelity Worldwide Investment

• Quant AnalystFidelity Worldwide Investment

Tools used:

(plus some Java, C#, Excel and VBA when I had to)

Understanding the data and tools

Page 7: A Year of Data Science at Metail

My experience since joining Metail

Lots of event stream data

Many AWS components

Outputs:- Business Intelligence- Bespoke Analysis- Productionised Science

Page 8: A Year of Data Science at Metail

Tools to learn

Tools we used a year ago

• R for analysis and science• dplyr, tidyr, ggplot

• Looker for some of the analysis

Tools we use now

• Python • pandas, SQLAlchemy, boto3,

seaborn

• Still some R • dplyr, tidyr, ggplot

• Looker for most of day to day analysis

• Swagger

• AWS stack

Page 9: A Year of Data Science at Metail

Data Analytics

Business intelligence • How well is the customer factory working? (KPIs)

• What about if we do this? (A/B Tests)

• How’s our retention? (Cohort analysis)

• How efficiently are we digitising garments? (Process monitoring)

• How are we growing?

To answer this we need …

LOTS AND LOTS OF SQL! (yay.)

Most of it embedded in Looker LookML (basically YAML) (yay - again.)

Page 10: A Year of Data Science at Metail

Data Analytics

Raw Events Engagement States Analytics Model

(Looker demo goes here if time allows)

Page 11: A Year of Data Science at Metail

Data Science

Exploring Digitised Garments

Page 12: A Year of Data Science at Metail

Event Data{

"schema": "iglu:com.snowplowanalytics.snowplow\/unstruct_event\/jsonschema\/1-0-0","data": {

"schema": "","data": {

"name": "GarmentCoverage","data": {

"page": {"garments": 24,"garmentsWithCtas": 14,"scrollPosY": 201,"load": {"isInitiator": false,"elapsedTimeMs": 1424

}},"batch": {

"garments": 12,"garmentsWithCtas": 7,"ctas": [{

"sku": "32536","x": 0.2721021611002,"y": 1.6311844077961

},{

"sku": "32544","x": 0.51768172888016,"y": 1.6311844077961

},{

"sku": "32545","x": 0.51768172888016,"y": 1.0134932533733

},{

"sku": "32548","x": 0.51768172888016,"y": 0.39580209895052

},{

"sku": "53282","x": 0.76326129666012,"y": 0.39580209895052

},{

"sku": "53337","x": 0.026522593320236,"y": 1.0134932533733

},{

"sku": "134499","x": 0.2721021611002,"y": 0.39580209895052

}]

}}

}}

}

GarmentCoverage event

"scrollPosY": 201,

"garmentsWithCtas": 7,

{"sku": "32544","x": 0.51768172888016,"y": 1.6311844077961},

Page 13: A Year of Data Science at Metail

Spread of digitised garments

• Look at positions of all digitised garments for a given category.

• page is in units of #scrolls (based on browser height on the user’s device)

• Digitised garments on /women-dress and /women-tops-tees are more spread out than garments on /women-jeans

Page 14: A Year of Data Science at Metail

Views by garment position

• Aggregate visitors who see garment ‘X’ in a given category on a given date.

• Scale these visitor counts by the maximum #visitors for a garment on that date in that category.

• In the /women-dress category:• Digitised garments are spread between 0 and 120 page scrolls

with median ~40

• Long “tail” of digitised garments which get much fewer visits.

• The average digitised garment typically gets 20% of the visitors as the most popular garment in that category (on a given day).

Date url_path sku Users Page scaled_count

2016-01-01 /women-dress

101742 699 5.0 0.743617

2016-01-01 /women-dress

101743 700 4.0 0.744681

Page 15: A Year of Data Science at Metail

Views by category

• Look at positions of all digitized garments for a given category.

• ‘page’ is in units of #scrolls (based on browser height on the user’s device)

• Digitised garments on /women-dress and /women-tops-tees are more spread out than digitised garments on /women-jeans. Could also be that there are more digitised garments in /women-tops-tees.

• There are some “hotspots” of digitised garment positions e.g. ~page 100 for /women-tops-tees. Unfortunately, they are quite far down the category page and visitor counts are typically around 10-20% of the values for the most popular garments (closest to the top of the category page)

/women-tops-tees /women-jeans /women-dress

Page 16: A Year of Data Science at Metail

Views as time series

• Digitised garments on /women-dress over time

• The “hotspot” moves further down the page: most discernibly in the last 2 weeks.

Page 17: A Year of Data Science at Metail

Data Science

Exploring User Body Shapes

Page 18: A Year of Data Science at Metail

BMI QuantilesBMI: 17.6Height: 160cmWeight: 45kg

BMI: 19.9Height: 157cmWeight: 49kg

BMI: 22.2Height: 153cmWeight: 52kg

BMI: 25.8Height: 146cmWeight: 55kg

BMI: 29.7Height: 155cmWeight: 71kg

Page 19: A Year of Data Science at Metail

Our Shape Segmentation

Spoon Triangle Bottom Hourglass Rectangle Hourglass Top Hourglass Inverted Triangle

Page 20: A Year of Data Science at Metail

Adapting the shape segmentation rules of the Lee et al. (2007) paper used by FFIT

Users Segmented by Shape

Hips – Waist (cm)

Bu

st –

Wai

st (

cm)

Page 21: A Year of Data Science at Metail

Shape Distribution and Popular Garments

Page 22: A Year of Data Science at Metail

Engagement by Shape% of users trying on at least two garments on personalised MeModel

1SD

Page 23: A Year of Data Science at Metail

Data Science

Learning User Behaviour

Page 24: A Year of Data Science at Metail

Understanding Users

Event stream summary over a month

Visits by day of month

All users

Distinct typesOf users

Machine Learning Techniques

Page 25: A Year of Data Science at Metail

Data Driven User Segmentation

Distinct typesOf users

Use Machine Learning techniques to characterise which features define users in each cluster

Page 26: A Year of Data Science at Metail

Identify clusters: engaged and converted users

Cluster Labels into Redshift / Looker

Acquisition Rate

RPV

Seen Size Advice Rate

Page 27: A Year of Data Science at Metail

Acquisition

Retention Reuse

Retention Revisit

Deep Funnel

Revenue

Revenue

674 users 595 users 541 users 721 users 312 users

Try-ons (any model)

A first look at the clusters

Page 28: A Year of Data Science at Metail

Future plans: more MODELLING!

Some possibilities:• Use engagement clustering to create labels for supervised learning• Engagement prediction using trained machine learning• Apply Probabilistic Graphical Modelling techniques

• (I quite like Daphne Koller’s Coursera course and book https://www.coursera.org/learn/probabilistic-graphical-models/home/welcome )

• More Bayesian reasoning• … (any suggestions?)

Time permitting, SAMIAM (http://reasoning.cs.ucla.edu/samiam/) demo goes here

Page 29: A Year of Data Science at Metail

Bayesian inference – what are the variables?

(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)

Page 30: A Year of Data Science at Metail

Bayesian inference – how are things related?

(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)

Page 31: A Year of Data Science at Metail

Bayesian inference – what can we infer?

(Disclaimer: this is me playing around with SAMIAM for 15 minutes and not an actual model)

Page 32: A Year of Data Science at Metail

That’s all folks!

Questions?