data science - a commercial perspective › rss_data_science_a... · 2 data science 3 skills needed...
Post on 30-May-2020
1 Views
Preview:
TRANSCRIPT
1
Data science - a commercial perspective
Gordon Blunt
Gordon Blunt Analytics Ltd
Royal Statistical Society annual conference9th September 2015
2
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
3
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
4
My background
Work - ‘client side’Fast moving consumer goods (FMCG)
Royal Mail
Barclaycard
Work - consultancyCACI Ltd
GfK NOP LtdGordon Blunt Analytics Ltd (2008→)
FMCGFinancial servicesData consultancyMarket research
5
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
6
Nature of data science
My starting pointData science is statistics
orStatistics is, and always has been, data science
Data are the most important part of statisticsI’m not alone in this view . . .
‘Statistics starts with data’ [Breiman 2001]
Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]
But. . .
6
Nature of data science
My starting pointData science is statistics
orStatistics is, and always has been, data science
Data are the most important part of statisticsI’m not alone in this view . . .
‘Statistics starts with data’ [Breiman 2001]
Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]
But. . .
6
Nature of data science
My starting pointData science is statistics
orStatistics is, and always has been, data science
Data are the most important part of statisticsI’m not alone in this view . . .
‘Statistics starts with data’ [Breiman 2001]
Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]
But. . .
6
Nature of data science
My starting pointData science is statistics
orStatistics is, and always has been, data science
Data are the most important part of statisticsI’m not alone in this view . . .
‘Statistics starts with data’ [Breiman 2001]
Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]
But. . .
7
Some characteristics of data science
Massive data sets10n observations where n > (or possibly≫) 7
10m variables where m > (or possibly≫) 3
Modern computing powerComputers are very cheap todayCost per 1MB memory . . .
≈ 3 × 10−10 of cost in 19651
Other disciplines are now analysing data too, for example . . .Machine learning
Database management
Knowledge discovery in databases
1http://jcmit.com/memoryprice.htm
7
Some characteristics of data science
Massive data sets10n observations where n > (or possibly≫) 7
10m variables where m > (or possibly≫) 3
Modern computing powerComputers are very cheap todayCost per 1MB memory . . .
≈ 3 × 10−10 of cost in 19651
Other disciplines are now analysing data too, for example . . .Machine learning
Database management
Knowledge discovery in databases
1http://jcmit.com/memoryprice.htm
8
‘Components of a successful data science team’2
Skilled professionals needed1 Data Engineer
‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’
2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’
3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’
The emphases are mine, by the way
2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015
8
‘Components of a successful data science team’2
Skilled professionals needed1 Data Engineer
‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’
2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’
3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’
The emphases are mine, by the way
2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015
9
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
10
The commercial imperative
Companies want answers that are . . .Timely (often have short deadlines)
Practical (can be used in the business)
Useful (generates enough revenue)
Companies have . . .Mountains of data
Little time
Relatively few skilled analysts
Statistics must be taught as a practical subject, or it will beovertaken by other disciplines
11
Core skills - ‘softer’
CommunicationInfluencing
Appropriate language (often non-statistical!)
Brevity
Commercial awarenessTime managementAbility to work . . .
- independently- and / or as part of a ‘non-technical’ team
Problem solving
Creative thinking
And, please, common sense (e.g. the ‘sniff test’)!
11
Core skills - ‘softer’
CommunicationInfluencing
Appropriate language (often non-statistical!)
Brevity
Commercial awarenessTime managementAbility to work . . .
- independently- and / or as part of a ‘non-technical’ team
Problem solving
Creative thinking
And, please, common sense (e.g. the ‘sniff test’)!
12
Communication
InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations
- what does the client want?- what does the client need to be told?
Engage in debate at senior levels - can be challenging- might not have much time - be brief
Always have something positive to say
Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .
- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?
12
Communication
InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations
- what does the client want?- what does the client need to be told?
Engage in debate at senior levels - can be challenging- might not have much time - be brief
Always have something positive to say
Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .
- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?
12
Communication
InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations
- what does the client want?- what does the client need to be told?
Engage in debate at senior levels - can be challenging- might not have much time - be brief
Always have something positive to say
Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .
- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?
13
Core skills - technical
Statistics - knowledge assumed . . .‘Core’ statistics
- subjects found in undergraduate / masters courses
Experience of (messy) commercial data- these are the reason we need strong EDA skills
Limitations of traditional tests with large data sets
Advanced mathematical and computational methods
Coding and / or programming
Python
Hadoop
Weka
. . . and / or many others . . .
(of course!)
13
Core skills - technical
Statistics - knowledge assumed . . .‘Core’ statistics
- subjects found in undergraduate / masters courses
Experience of (messy) commercial data- these are the reason we need strong EDA skills
Limitations of traditional tests with large data sets
Advanced mathematical and computational methods
Coding and / or programming
Python
Hadoop
Weka
. . . and / or many others . . .
(of course!)
13
Core skills - technical
Statistics - knowledge assumed . . .‘Core’ statistics
- subjects found in undergraduate / masters courses
Experience of (messy) commercial data- these are the reason we need strong EDA skills
Limitations of traditional tests with large data sets
Advanced mathematical and computational methods
Coding and / or programming
Python
Hadoop
Weka
. . . and / or many others . . .
(of course!)
14
Statistics, big data and the commercial sector
A good starting point‘All models are wrong, but some are useful’ [Box 1979]
Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]
We need to teach . . .Simple is - often - better than ‘best’
- by the time we’ve built the ‘best’ model, it’s usually out of date
The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation
14
Statistics, big data and the commercial sector
A good starting point‘All models are wrong, but some are useful’ [Box 1979]
Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]
We need to teach . . .Simple is - often - better than ‘best’
- by the time we’ve built the ‘best’ model, it’s usually out of date
The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation
14
Statistics, big data and the commercial sector
A good starting point‘All models are wrong, but some are useful’ [Box 1979]
Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]
We need to teach . . .Simple is - often - better than ‘best’
- by the time we’ve built the ‘best’ model, it’s usually out of date
The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation
15
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
16
The skills needed are . . .
1 Communication- Influencing- Language- Brevity
2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling
- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better
7 Coding
http://www.gordonblunt.co.uk/publications.html
16
The skills needed are . . .
1 Communication- Influencing- Language- Brevity
2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling
- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better
7 Coding
http://www.gordonblunt.co.uk/publications.html
16
The skills needed are . . .
1 Communication- Influencing- Language- Brevity
2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling
- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better
7 Coding
http://www.gordonblunt.co.uk/publications.html
16
The skills needed are . . .
1 Communication- Influencing- Language- Brevity
2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling
- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better
7 Coding
http://www.gordonblunt.co.uk/publications.html
17
Outline
1 Background
2 Data science
3 Skills needed‘Softer’ skillsStatistical skills
4 Concluding thoughts
5 References
18
References
Box GEP.Robustness in the strategy of scientific model buildingin Launer and Wilkinson (Eds.) Robustness in Statistics ,Academic Press, 1979.
Breiman L.Statistical Modeling: The Two CulturesStatistical Science, Vol 16 No. 3: 199-231, 2001.
Cleveland WS.Data Science: An Action Plan for Expanding the Technical Areas of the Field ofStatisticsInternational Statistical Review, Vol 69, 21-26, 1982.
Tukey JW.The future of data analysisAnn. Math. Stat., Vol 33 No. 1: 1-67, 1962.
Tukey JW.Exploratory Data Analysis,Addison-Wesley, 1977.
Unwin A.Graphical Data Analysis with R,CRC Press, 2015.
top related