chang liu insight 2014

17
True Fit Skin Care Chang Liu Fellow at Insight Data Science 2014

Upload: r4trty

Post on 19-Jul-2015

64 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Chang liu insight 2014

True Fit Skin Care

Chang Liu Fellow at Insight Data Science 2014

Page 2: Chang liu insight 2014

So  many  products…  

What  makes  it  so  hard?  Overwhelming  informa7on  

Page 3: Chang liu insight 2014

So  many  products…   So  many  reviews…  

What  makes  it  so  hard?  Overwhelming  informa7on  

Page 4: Chang liu insight 2014

So  many  products…   So  many  reviews…  

What  makes  it  so  hard?  Overwhelming  informa7on  

Reviews  can  be  so  long…  

Page 5: Chang liu insight 2014

So  many  products…   So  many  reviews…  

What  makes  it  so  hard?  Overwhelming  informa7on  

Reviews  can  be  so  long…  

So  many  ingredients…  

Page 6: Chang liu insight 2014

So  many  products…   So  many  reviews…  

Time    spent  

Money  wasted  

Happiness  

What  makes  it  so  hard?  Overwhelming  informa7on  

Reviews  can  be  so  long…  

So  many  ingredients…  

Page 7: Chang liu insight 2014

32k  Reviewers  •  w/  2+  reviews  

~1200  Products  •  ~80  brands  •         8  categories  

184k  Reviews  •  Ra7ng  [1-­‐5]  •  Review  text  •  Quick  take  

Collabora7ve  Filter  using  User  Reviews  from  Sephora.com  

Product  

X   Y   …  

Review

ers  

1   …  

2   …  

3   …  

… …  

… …  

N …  

Algorithm:    •  Item-­‐centric  collabora7ve  filter  •  Pearson’s  correla7on  coefficients  

to  measure  pairwise  similarity  

Page 8: Chang liu insight 2014

32k  Reviewers  •  w/  2+  reviews  

~1200  Products  •  ~80  brands  •         8  categories  

184k  Reviews  •  Ra7ng  [1-­‐5]  •  Review  text  •  Quick  take  

Collabora7ve  Filter  using  User  Reviews  from  Sephora.com  

Product  

X   Y   …  

Review

ers  

1   …  

2   …  

3   …  

… …  

… …  

N …  

Algorithm:    •  Item-­‐centric  collabora7ve  filter  •  Pearson’s  correla7on  coefficients  

to  measure  pairwise  similarity  

Similarity = cXY =(Xi − X)(Yi −Y )

i=1

N

(Xi − X)2

i=1

N

∑ (Yi −Y )2

i=1

N

recommendation scoreui = rujcijj

M

∑ / cij

Page 9: Chang liu insight 2014

32k  Reviewers  •  w/  2+  reviews  

~1200  Products  •  ~80  brands  •         8  categories  

184k  Reviews  •  Ra7ng  [1-­‐5]  •  Review  text  •  Quick  take  

Collabora7ve  Filter  using  User  Reviews  from  Sephora.com  

Product  

X   Y   …  

Review

ers  

1   …  

2   …  

3   …  

… …  

… …  

N …  

Algorithm:    •  Item-­‐centric  collabora7ve  filter  •  Pearson’s  correla7on  coefficients  

to  measure  pairwise  similarity  

Cross  Valida9on  •  5-­‐fold  for  reviewer  •  Leave-­‐one-­‐out  for  product  •  Accuracy  =  86.3%  ±  1%  

Similarity = cXY =(Xi − X)(Yi −Y )

i=1

N

(Xi − X)2

i=1

N

∑ (Yi −Y )2

i=1

N

recommendation scoreui = rujcijj

M

∑ / cij

Page 10: Chang liu insight 2014

Visualize  the  similarity  matrix  

White  =  high  similarity  Black      =  low  similarity  

Sorted  by  brands  alphabe7cally    

Page 11: Chang liu insight 2014

White  in  a  square  =  

Users  reviews  are  similar  for  all  products  in  a  brand  

=  Strong  customer  loyalty  

There  are  structures!  

Page 12: Chang liu insight 2014

“Organic    &  Natural”  

Expensive!  

There  are  structures!  For  example…  

Cost  effec7ve  

Page 13: Chang liu insight 2014

There  are  structures!  For  example…  

Ac9onable  Insights  For  Sephora.com:  Send  marke7ng  emails  to  new  customers  of  brands  with  stronger  customer  loyalty!  

“Organic    &  Natural”  

Expensive!  

Cost  effec7ve  

Page 14: Chang liu insight 2014

Chang  Liu  PhD.  in  Civil  Engineering  @CMU  [email protected]  linkedin.com/in/changliucmu    github.com/R4trtry    

Page 15: Chang liu insight 2014

Is  the  ra7ng  a  good  measure  of  reviewers’  perspec7ve?  

•  Trained  a  NaïveBaysian  classifier  for  sen7ment  analysis  

•  W/  250  thousand  reviews  from  Birchbox.com  

•  A  website  that  sends  out  free  samples  from  smaller  brands  and  gathers  massive  user  reviews  

Most  common  words   Most  informa9ve  feature  

Word   Count   Nega9ve   Posi9ve  

skin   91349   re-­‐wash   Penny  

product   82481   garbage   hook  

use   64044   mediocre   gorgeous  

love   55691   ketchup   perk  

feel   47879   trash   stock  

face   42615   unimpressive   glowing  

like   41427   survey   splurge  

great   34155   ineffec7ve   effortless  

really   31672   gag   Christmas  

smell   27621   worthless   happily  

   text    quick  take  Precision    95.3%  85.4%  Recall  89.8%  93.1%  

Worth    every    penny!  

Another  Valida9on  

Page 16: Chang liu insight 2014

Is  the  ra7ng  a  good  measure  of  reviewers’  perspec7ve?  

Another  Valida9on  

Page 17: Chang liu insight 2014

Product  X  

Product  Y  

similarity  87.4%  

Product  X  

Product  Y  

1  1  

1  1  1  

1  1  1  

1  1  1  

1  

Review

ers  

Product  

X   Y   …  

1   …  

2   …  

3   …  

… …  

… …  

N   …  

M products reviewed by N reviewers

Pairwise similarities are measuredby Pearson's correlation coefficients:

cXY =(Xi − X)(Yi −Y )

i=1

N

(Xi − X)2

i=1

N

∑ (Yi −Y )2

i=1

N

Then weight the ratingsbased on the correlation coefficients:

Scorei =cijr uj

j

M

| cij |

ruj : User u's preference on item j

Algorithm:  Item-­‐centric  collabora7ve  filter