lecture on data science in a data-driven culture
TRANSCRIPT
![Page 1: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/1.jpg)
Data-Driven CultureDATA-DRIVEN and DATA-SCIENCE
Johan Himberg / Reaktor 29.2.2016
![Page 2: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/2.jpg)
survey data on the business practices and IT investments of 179 large, publicly traded companies
Firms that emphasise “data driven decision making”have output and productivity that is 5-6% higher than what would be expected given other investments and IT usage.
relationship also appears in asset utilisation, return on equity and market value
Why “data-driven”WHY
2
Brynjolfson et al (2011) on Data-Driven
![Page 3: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/3.jpg)
Business acumen what for
Operations Researchoptimal decisions and actions
Probability theory how to handle uncertainties
Analyticsinsights and machine learning from data
Computer Science how to implement all that
Data Science in businessWHY
3
![Page 4: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/4.jpg)
Data Science & analyticsBASICS
![Page 5: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/5.jpg)
BASICS
5
Some dimensions 1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERGFEBRUARY 2016
![Page 6: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/6.jpg)
Data Science & analyticsBUSINESS CASES
![Page 7: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/7.jpg)
SECTION TITLE
7
Beware of empty “data-speak”
A quote from my colleague Janne Sinkkonen from a presentation at Helsinki University Machine learning course:
“Data-speak” hides the processes behind data. What creates the data? What is done with the results?
The goal is not “data analysis”
Define your goal and setup without using the word ‘data’.
REAKTOR2016
![Page 8: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/8.jpg)
Sell audiences Google, Facebook, media, …
Sell information credit rating, car register,…
Information businessBUSINESS CASE
8
![Page 9: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/9.jpg)
OperationsBUSINESS CASE
9
Create beneficial eventsmarketing: targeting, cross-sell, up-sell, conversionfind right product/service to sell or buy, find a good doctor, expert etc.
Avoid non-beneficial eventschurn, people leaving, waste, credit loss, fraud, …system failures, …
Optimizecustomer value,work force, schedules, prices, discounts, stocks,relevancy for customer,production quality, speed
Rationaliseprocess efficiency, lead times, handle complexity, search time …
Understand: customer & product base, transactions, or processes internally: ERP, CRM, HR, sales systems, production, …externally: location, routes, weather, demographics, estates, …
![Page 10: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/10.jpg)
Efficiency and competitionReact faster, streamlined decision making, risk awarenessFinancial efficiencyInnovations
Well-informed strategic decisionsUnderstanding customer groups’ needs for product and service developmentUnderstanding and predicting world events, economics, demographics, ….React to market fluctuation or changes in financial environment
Internal and external image and cultureTransparency, learning as a part of company cultureCustomer satisfaction, personalisation, brand
StrategicBUSINESS CASE
10
![Page 11: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/11.jpg)
Netflix"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content.
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
ExampleVIRTUES
11
![Page 12: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/12.jpg)
Data Science & analyticsTASKS & RISKS
![Page 13: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/13.jpg)
BASICS
13
Some dimensions 1. Business case
2. Analytical task
1. Active - Passive system
2. Informative - Operative aim
3. Modelling (model selection and fitting)
4. Data: structure, amount, velocity, and source
REAKTOR / JOHAN HIMBERGFEBRUARY 2016
![Page 14: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/14.jpg)
BASICS
14
Informative - Operative
Informative (for understanding)
Analysis results for understanding things, results for management for making decisions: reports, predictions, what-if analyses, simulations, visualisations,…
Operative
Automated system that makes decisions based on some rules or models, or
results that are directly operative, if not automated.
REAKTOR / JOHAN HIMBERGFEBRUARY 2016
![Page 15: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/15.jpg)
BASICS
15
Active - Passive
Active
You make an “intervention” and gather evidence in tests designed to reveal an effect.
Example: A/B testing.
Passive
Data is just collected, captured “as it happens”: customer transactions, sales, web-browsing, tweets
REAKTOR / JOHAN HIMBERGFEBRUARY 2016
![Page 16: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/16.jpg)
BASICS
16
Use cases
REAKTOR2016
Descriptive What has happened?
Diagnostic Why did it happen?
Passive Active
Customer profiles
Customer segmentation
Shopping cart analysis
Predictive What will happen?
Prescriptive What should I do?
Informative
Operative
Marketing impact analysis
Price elasticity analysis
Web design testing
Up-sell/cross-sell
New customer acquisition
Churn prediction
Life-time value prediction
Demography prediction
Marketing impact optimisation
Recommendation system
in a dynamic environment
![Page 17: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/17.jpg)
Data Science & analyticsRISKS & PROBLEMS
![Page 18: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/18.jpg)
RISKS / PROBLEMS
18
Issues by analytics use case
REAKTOR2016
Descriptive • isolated / ad hoc reports • isolated ad hoc decisions • feedback loop (report - decision
- effect) • ignoring statistics • analysts as sql-monkeys • UI / visualization
Diagnostic • statistical skills • testing and organisation • correlation vs. causality • requires lots of
communication
Passive Active
Predictive • what to predict: how to
quantify the target • access to historical data • quantifying and understanding
the risk(s) • prediction accuracy validation
for future
Prescriptive • what to optimize? • complex software system • technical feedback loop • co-op between “human” and
“artificial intelligence” • monitoring
Informative
Operative
![Page 19: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/19.jpg)
•Focusing on wrong things•not recognising the analytics use cases•“data first”: long time from investment to benefits•not starting from the beef: actions and decisions• thinking only IT solutions and products•careful examination and validation of the algorithms, but not setting targets and risks according to the business target
•Organisation •silos: communication through hierarchy•no access to data, internal politics• technical details decided by business people•business criteria set by technical people
Examples…RISKS / PROBLEMS
19
![Page 20: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/20.jpg)
•Underestimating complexity (time & scope)•both software and analytics to be build simultaneously• the time and effort needed with “data wrangling”• the time used for UIs and visualisations• the feedback loop
•Unrealistic expectations (quality) •on analytical systems in general (they are not that intelligent); rules needed•a product, a model, an algorithm, a data scientist solves all the problems•risks and targets cannot always be defined properly right away• there is no guarantee on accuracy on a particular case before trying
…more examplesRISKS / PROBLEMS
20
![Page 21: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/21.jpg)
Culture that helps to handle riskWISE - DETERMINED - CURIOUS
![Page 22: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/22.jpg)
Wise: Solve the right problems with analytics! Determined: aim at specific, concrete thingsCurious: be ready to divert, seek for evidenceBayesian: understand uncertainties and risksTruthful: don’t bend results upon wishes, it’s data scienceCourageous: act on evidenceActive and Agile: test, don’t just observe; inspect - adapt - learnTransparent and Helpful: co-operate from end-to-end, don’t silo
Culture that helps to handle riskVIRTUES
22
![Page 23: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/23.jpg)
Culture that helps to handle riskWISE - DETERMINED - CURIOUS
![Page 24: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/24.jpg)
Netflix prize competition (2006-2008)
Who gets the best RMSE (root mean squared error) on true user likings?
BUT
"The goal of our ranking system is to find the best possible ordering of a set of items for a member, within a specific context, in real-time. ... Our business objective is to maximize member satisfaction and month-to-month subscription retention, which correlates well with maximizing consumption of video content. We therefore optimize our algorithms to give the highest scores to titles that a member is most likely to play and enjoy.”---Netflix Prize objective... is just one of the many components of an effective recommendation system... We also need to take into account factors such as context, title popularity… Supporting all the different contexts in which we want to make recommendations requires a range of algorithms that are tuned to the needs of those contexts.”
- 2012 Xavier Amatriain and Justin Basilico, Personalization Science and Engineering
Aim at the right thingsVIRTUES
24
![Page 25: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/25.jpg)
Always aim at something specific … but be open-minded and curious
Example: Röntgen and Fleming (Nobel laureates)
• their most famous findings were “accidental”, but
• they were skilled scientists doing disciplined research for some other aim
Explore occasionally “from data to insights”. But not aimlessly.
If you find something interesting, make a disciplined analysis, preferably a test.
CuriosityVIRTUES
25
![Page 26: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/26.jpg)
Culture that helps to handle riskBAYESIAN - TRUTHFUL
![Page 27: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/27.jpg)
The main ingredients of data science!
Making decisions based on data analysis requires the concepts of risk and probability.
Understanding probabilities VIRTUES
27
![Page 28: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/28.jpg)
Culture that helps to handle riskCOURAGE
![Page 29: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/29.jpg)
Courage
“Data driven means that progress in an activity is compelled by data rather than by intuition or personal experience. It is often labeled as the business jargon for what scientists call evidence based decision making
- Wikipedia 2016-02-24
“I take risks, sometimes patients die. But not taking risks causes more patients to die, so I guess my biggest problem is I've been cursed with the ability to do the math.
- Fictional character Dr. House in Fox television series “House”
![Page 30: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/30.jpg)
Culture that helps to handle riskHELPFUL - TRANSPARENT - AGILE
![Page 31: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/31.jpg)
Agile - Transparent Doing data-driven work and data science in any organisation model boils down to
“Involve everyone along the information path”
Agile development - Team decides details
Start from
•concrete actions that can be optimized
•decisions they require, and
•how to measure the effects properly
Remember the feedback loop!
Develop constantly
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 32: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/32.jpg)
Action
optimize decide deploy
Data
big, small, open local, web, meta, …
Information
report visualize
model
Bus
ines
s dr
iver
s
aim 1
aim 2
aim 3
aim 4
aim 5For example
• Automatised decisions; recommendation, targeting
• Simulation
• prescriptive, predictive modelling
For example
• documentation on meaning of the data
• KPIs, profiles, segments, factors, DW dashboards
• descriptive, diagnostic, predictive modelling
For example
• source integrations
• Extract - Load - Transform
• Metadata
• modelling for cleansing & consistency
modellingwhat are the actions what are the insights
wranglingwhat data means
testingwhat is the impact
Think & plan from deployment to data
Pick an aim!
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 33: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/33.jpg)
Action Data Information
Bus
ines
s dr
iver
s
aim 1
start from here!
aim 3
aim 4
aim 5
For example
• Business: need optimising for customer retention
• Marketing: we could start with special offer by SMS
• Data Scientist: we’ll set up test & control groups!
For example
• Solution expert: Field ZPOR means revenue per unit and it is calculated based on …
• Customer transactions are not in Data Warehouse, they’re aggregated on monthly level - Let’s get daily data from system Z
For example
• Now we have transactions for 1M users for 1 yr fields a,b,c,d,e …
• …
modellingwhat are the actions what are the insights
wranglingwhat data means
testingwhat is the impact
Data-Driven is inherently iterative and benefits from agility. Data and processes are often not like assumed.Be curious, keep backlog, inspect, adapt.
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 34: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/34.jpg)
Action Data Information
Bus
ines
s dr
iver
s
aim 1
aim 2
aim 3
aim 4
aim 5For example
• deploy campaign, collect responses
For example
• calibrate & apply model
For example
• get data for modeling
• store results
modellingwhat are the actions what are the insights
wranglingwhat data means
testingwhat is the impact
Execute based on model, collect data
THE LOOP: results
![Page 35: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/35.jpg)
Action Data Information
Bus
ines
s dr
iver
s
aim 1
aim 2
aim 3
aim 4
aim 5Backlog example
• test & control group handling in marketing automation
• Involve N.N. to the process
Backlog example
• define new information source
• Look for a new data source for determining income on zip code areas
• correct documentation
• automatization for the campaign modelling
Backlog example
• better system configuration & architecture
• automatization for the campaign process…
• new data: record information on all campaigns
modellingwhat are the actions what are the insights
wranglingwhat data means
testingwhat is the impact
Information path focused backlog
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 36: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/36.jpg)
Don’t silo • A change of culture; information (not data) is everybody’s business as well as
money
• One data scientist can’t excel all of this:
• PO / Technical Account Manager
• Business specialist
• Solution owner / process owner
• Data Steward
• Developer
• Visualization / UX expert
![Page 37: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/37.jpg)
Data Scientists’ special role • Data scientists main tasks are in methods, but also in
processes and machinery of
• making evidence based decisions (automated if possible)
• finding out confidence on the outcome (by active tests if possible)
• getting insights based on models and data
• Data scientist often act as a “glue”.
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 38: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/38.jpg)
Culture that helps to handle riskTECHNOLOGY
![Page 39: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/39.jpg)
Technology• Different analytical tasks need different tools. One has to integrate
different systems. Remember that you need a feedback loop!
• Prefer systems
• that give mass-access to historical, transactional data on individual level instead of just aggregates (avoid being “blinded by averages”)
• from which you’ll get the data, transformations, and results out to another system (avoid being “data hostage”)
• where you see what the analytics actually does at least on modular level (avoid being “method hostage”) Prefer being able to see the actual implementation (open source)
• Pick a product when you know the task, your needs, the product quality.
Lecture @AaltoBIZ, Johan Himberg, 2015
![Page 40: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/40.jpg)
References• Brynjolfsson, Erik and Hitt, Lorin M. and Kim, Heekyung Hellen, Strength in Numbers: How Does Data-
Driven Decisionmaking Affect Firm Performance? (April 22, 2011). Available at SSRN:http://ssrn.com/abstract=1819486 or http://dx.doi.org/10.2139/ssrn.1819486
• Netflix case: http://techblog.netflix.com/2012/04/netflix-recommendations-beyond-5-stars.html
• Big Data landscape: http://mattturck.com/2016/02/01/big-data-landscape/#more-917
• Data science skills
• http://www.oralytics.com/2012/06/data-science-is-multidisciplinary.html
• http://www.oralytics.com/2013/03/type-i-and-type-ii-data-scientists.html
![Page 41: Lecture on Data Science in a Data-Driven Culture](https://reader031.vdocuments.us/reader031/viewer/2022030311/58ee80611a28ab0c178b46b1/html5/thumbnails/41.jpg)
www.reaktor.com