data science outside the box: developing a generic scoring algorithm for customer acquisition
TRANSCRIPT
![Page 1: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/1.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari eRum 2016Erik Barzagar-NazariData Scientist
Data Science outside
Developing a Generic Scoring Algorithm for Customer Acquisition
![Page 2: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/2.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Interdisciplinary Team Statisticians | Engineers | Economists | Sociologist | …
Based in Kassel - Germany
Data Science Consulting, Training, Support, Software and Analytic Services with a focus on R
About eoda
![Page 3: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/3.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Aims of Today’s Talk
I Present a real-world case study
II Discuss unique challenges
III Take a look into our solution
IV Reflect the benefits of using R
![Page 4: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/4.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Our Client: databyte GmbH
Provides business information
Database of about five million companies
100 million pieces of information such as sales, size, branches and many more
Daily updated!
![Page 5: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/5.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Use Case: Customer Acquisition
databyte’s clients are usually businesses/organizations…
…looking for new business clients(e.g. for direct marketing campaigns)
![Page 6: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/6.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Use Case: Customer Acquisition
List of current customers
Dataset of new potential business clients
Scoring
Start
![Page 7: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/7.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Case Study: Our Task
Main taskDevelop a new scoring algorithm, that…
…learns from the current customer
base &…
…identifies the most promising entries in databyte’s database.
![Page 8: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/8.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges
Image source: http://vignette3.wikia.nocookie.net/simpsons/images/4/43/Daredevil_bart.jpg/revision/latest/scale-to-width-down/1000?cb=20160619043051
![Page 9: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/9.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Training on Customer Data
Standard approachTrain a binary classifier to distinguish between non-customers & customers
{0;1}
Bad News: Does not work in this case, because we only know the positive data.
![Page 10: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/10.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Training on Customer Data
Positive Data = Customer DataAlready known customers of the clientP
UUnlabeled Data = databyte’s DatabaseContains companies, that may fit into the clients customer base as well as companies that do not
NNegative Data = ?Companies, that definitely do no fit into the client’s customer base
![Page 11: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/11.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Training on Customer Data
Positive-Unlabeled-ClassificationPUThere are strategies to deal with PU-Problems, but…
…there are no well established best practices yet…strategies usually require strong assumptions…PU-Classifiers require a lot of tuning, and are quite fragile
![Page 12: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/12.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Self-Training Algorithm
databyte has many clients
![Page 13: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/13.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Self-Training Algorithm
The scoring algorithm must be able to train itselfbased on
unseen training data (= customer lists)!
![Page 14: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/14.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Challenges | Conclusion
PU
We have to get creative!
![Page 15: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/15.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Solution
![Page 16: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/16.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Solution | Basic Idea
Our approach is based on similarities.
Core concept:1. Cluster customer data and extract medoids, these are
representative customers
2. Calculate similarities between database entries and medoids
![Page 17: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/17.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Solution | Basic Steps
Segmentation StepIdentify segments based on branches
Core concept:1. Cluster customer data and extract medoids, these are
representative customers2. Calculate similarities between database entries and
medoids
Weighting StepWeight similarities based on the distribution of branches
![Page 18: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/18.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Solution | Pros & Cons
It works and performs nicely!
Comprehensible approach, even for laymen.
Similarity calculation is costly.
Lack of “rock-solid” theory.
Pro
Con
![Page 19: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/19.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Benefits of Using R
![Page 20: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/20.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Benefits of Using R
{data.table} {proxy} fpc::pamk()
Fast & efficient data handling
Library of distance and similarity measures
Allows calculation of cross-proximities
Many measures are implemented in C!
Partitioning around medoids…
…with estimation of number of clusters
![Page 21: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/21.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
Thank youfor your attention!
Any questions?
![Page 22: Data Science outside the box: Developing a generic scoring algorithm for customer acquisition](https://reader031.vdocuments.us/reader031/viewer/2022021918/589c49401a28ab227d8b56c3/html5/thumbnails/22.jpg)
© 2010 – 2016 eoda GmbHErik Barzagar-Nazari www.eoda.de
@eodaGmbH
@eodaGmbH eodaGmbH
blog.eoda.de
eoda GmbHUniversitätsplatz 12
34127 Kassel - Germany
www.eoda.de/[email protected]
+49 561 202724-40
The Data Science Specialists.