how to build a successful data team - florian douetteau (@dataiku)
TRANSCRIPT
How to build A Successful Data Team
March 2016
Hi ! I’m FLORIAN DOUETTEAU, CEO of Dataiku
x 54 +
x 1+
+ 58++It’s Me !!
It’s our software !!
…and our software is
The most complete Data Science platform
Deployment
Dataiku - Data Tuesday
Meet Hal Alowne
4Big Guys• 10B$+ Revenue• 100M+ customers• 100+ Data Scientist
Hal AlowneBI ManagerDim’s Private Showroom
Hey Hal ! We need a big data platform
like the big guys.Let’s just do as they do!
‟”Average E-commerce Web site
• 100M$ Revenue• 1 Million customer• 1 Data Analyst (Hal Himself)
Dim SumCEO & Founder Dim’s Private Showroom
Big DataCopy Cat Project
5
Technology Disconnect
Welcome to Technoslavia !
6
LOL PLATFORM ANTI-PATTERN
7
Test and Invest in Infrastructure == Skilled Peopleor
Go For Cloud / Packaged Infrastructure
Your Brand New Hadoop Clusteris perceived as slow, not so used and not reliable
TECHNO MISMATCH ANTI-PATTERN
8
Assume Being Polyglotor
Be a Dictator
VS
VS
The PythonClan
The RTribe
The Old ElephantFraternity
The New ElephantClub
PREDICTIVE ANALYTICS DEPLOYMENT STRATEGY
9
Website 2000’ winners
Companies that were able to release fast
"Artificial Intelligence with Data for Internet of Things" 2010’ winners
Companies able to put intelligence in production
?
Design a way to put “PREDITICTIVE MODELS” IN PRODUCTION
10
PEOPLE DISCONNECT
Classic Business Intelligence Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor BI Solution Architect
Model Designer
ETL Developer
Dashboard / Report Designer
DBA / IT Data Owner
Specs
Data Science Team Organization
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor Data Team Manager
Data Engineer
Data Analyst
Data System Engineer / Data Architect
Specs
Data Scientist
Built From Scratch
13
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor
DBA / IT Data Owner
Specs
DATA SCIENTISTS EVERYWHERE
Built From Engineering
14
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor
Specs
DATA ENGINEERS
DATA ANALYSTS
Built From Analysts
15
Business Leader Data Consumer
Line-of-business Data Consumer
Business ProjectSponsor
Specs
Manage Expectations
16
Data Plumberer
DataEngineer
Data Scientist
Data Waiter
DataCleaner
DataAnalyst
REALJOB
DREAMJOB
Perfectly Natural Hidden thoughts
17
Business ProjectSponsor
Data Team Manager
Data EngineerData Analyst
Data Scientist
Managing Extreme Personalities
18
Data SCIENTIST
Highly Creative
Passionate
Hard to hire ?
Hard to manage ?
Want to take your job ? Ambitious
Paired for Data
19
Data AnalystDiscover Patterns
Data EngineerMake things work
Fightdata entropy
Entropytech
entropy
When do you prefer ?
20
One AnalystOne EngineerOne Data Scientist That work together ?
Four data scientists
21
Data Disconnect
What is the main reason for data project to fail ?
22
DATA NOT
AVAILABLE
BUT FOR ONLY INCREMENTAL GAIN
0% 25% 50% 75% 100%
50 30 20
Contribution to the overall project performance
Business Goal Definition and Data Feature Engineering Algorithm
How to Get Data if you don’t have it
24
THE GRASSHOPER THE SPIDER THE FOX
The Cicada : Optimistic and Opportunistic Data
26
THE CICADA
As a startup
As a group inside a company
- Build a new product using open data
- Benefit from the data sharing initiative within your company
- Wait for data to be available in your data lake
The Spider: Power of the Network
27
THE SPIDER
As a startup
As a group inside a company
- Create a network of (web trackers | sensors)
- Make it available for free
- Build your service on people’s collected data
- Make a web service available to collect data
- Promote it internally so that people use it
The Fox: Hunt for the Big Money first
28
THE FOX
As a startup
As a group inside a company
- Hunt for a Business Group within a large company with a problem
- Build a SaaS solution using their data
- Replicate to competitors
- Take in a charge a critical problem as per the CEO’s request
- Build your own integrated tech team to solve it
- Use those ressources to reset data services internally
29
PRODUCT DISCONNECT
What is Big Data about ?
30
The Age Of Distributed Intelligence
31
Global, Personalised and Real Time Data Driven Services
32
Data to Visualize or Data to Automate ?
2013 2014 2015 2016 2017 2018
Automated Decision VIsualize To Decide
Moving to a world of automated decision making
Involve product team
33
Product FeaturePersonalised Item Ranking
Product FeatureNotify User Only when Needed
Product Feature:Historical Data For Path Optimisation
Have Product Management Deeply Involved In the Data Team
Where is your added value ?
34
Is the problem at the Core of my Business Process?
Is it a common problem / with share data ?
Go for Best of Breed SAAS
Solution
Can I Solve it on my own ?
Really ?
Build by the data team
Build by the data team ?
Build by the data team
Hire Consultants and Learn
Yes
Yes No
I can’t Ok, I can try
Yes!
No!
No
Be aware of the confort zone
35
MissionCritical
SmallStructured Large
Diverse
Sheer Curiosity
Reporting for Financein Any Industry
Analyze Each Tweet
Web NavigationFor E-Merchant
Ticket DataFor Discountsin Retail
Phone Call Logs for Security
RTB Data For Advertising
Customer Consumption For Anti-Churn in Utilities
Optimization
FilingsFor Fraud in Insurance
Not EnoughData To Learn From ?
Not Enough“Hard" Examples So that you can learn
Create an "API" Culture
36
Do not share• Random Piece of Code• Flat File
Do share• Reproductible documented workflows• Clean, documented APIs
Food for thoughts www.dataiku.com/blog
Free Data Science Software
www.dataiku.com/dss
THANK YOU !
Data Science Is no longer a science