data science - a commercial perspective › rss_data_science_a... · 2 data science 3 skills needed...

Post on 30-May-2020

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Data science - a commercial perspective

Gordon Blunt

Gordon Blunt Analytics Ltd

Royal Statistical Society annual conference9th September 2015

2

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

3

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

4

My background

Work - ‘client side’Fast moving consumer goods (FMCG)

Royal Mail

Barclaycard

Work - consultancyCACI Ltd

GfK NOP LtdGordon Blunt Analytics Ltd (2008→)

FMCGFinancial servicesData consultancyMarket research

5

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

6

Nature of data science

My starting pointData science is statistics

orStatistics is, and always has been, data science

Data are the most important part of statisticsI’m not alone in this view . . .

‘Statistics starts with data’ [Breiman 2001]

Bill Cleveland and John Tukey voiced similar thoughts[Cleveland 2001], [Tukey 1962]

But. . .

7

Some characteristics of data science

Massive data sets10n observations where n > (or possibly≫) 7

10m variables where m > (or possibly≫) 3

Modern computing powerComputers are very cheap todayCost per 1MB memory . . .

≈ 3 × 10−10 of cost in 19651

Other disciplines are now analysing data too, for example . . .Machine learning

Database management

Knowledge discovery in databases

1http://jcmit.com/memoryprice.htm

7

Some characteristics of data science

Massive data sets10n observations where n > (or possibly≫) 7

10m variables where m > (or possibly≫) 3

Modern computing powerComputers are very cheap todayCost per 1MB memory . . .

≈ 3 × 10−10 of cost in 19651

Other disciplines are now analysing data too, for example . . .Machine learning

Database management

Knowledge discovery in databases

1http://jcmit.com/memoryprice.htm

8

‘Components of a successful data science team’2

Skilled professionals needed1 Data Engineer

‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’

2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’

3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’

The emphases are mine, by the way

2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015

8

‘Components of a successful data science team’2

Skilled professionals needed1 Data Engineer

‘does not need to be very academic [. . . ] technicalcompetency on the back-end frameworks and tools used forcapturing the data points’

2 Machine Learning Expert‘statistical background, having a deep interest in quantitativetopics [. . . ] solid understanding of data algorithms and datastructures in specific, and software engineering concepts’

3 Business Analyst‘an eye for details and [. . . ] exceptional analytical skills [. . . ]solid understanding of the organization’s business model’

The emphases are mine, by the way

2http://www.kdnuggets.com/2015/08/3-components-successful-data-science-team.html August 12 2015

9

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

10

The commercial imperative

Companies want answers that are . . .Timely (often have short deadlines)

Practical (can be used in the business)

Useful (generates enough revenue)

Companies have . . .Mountains of data

Little time

Relatively few skilled analysts

Statistics must be taught as a practical subject, or it will beovertaken by other disciplines

11

Core skills - ‘softer’

CommunicationInfluencing

Appropriate language (often non-statistical!)

Brevity

Commercial awarenessTime managementAbility to work . . .

- independently- and / or as part of a ‘non-technical’ team

Problem solving

Creative thinking

And, please, common sense (e.g. the ‘sniff test’)!

11

Core skills - ‘softer’

CommunicationInfluencing

Appropriate language (often non-statistical!)

Brevity

Commercial awarenessTime managementAbility to work . . .

- independently- and / or as part of a ‘non-technical’ team

Problem solving

Creative thinking

And, please, common sense (e.g. the ‘sniff test’)!

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

12

Communication

InfluencingWe (probably) need to sell our analysisUnderstand the client’s motivations

- what does the client want?- what does the client need to be told?

Engage in debate at senior levels - can be challenging- might not have much time - be brief

Always have something positive to say

Appropriate languageExplain in ways the client can understandBe careful about statistical jargon, for example . . .

- ‘error’ likely to be interpreted as ‘mistake’- ‘normal’ likely to be interpreted as ‘commonplace’- ‘significance’ - statistical or useful?

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

13

Core skills - technical

Statistics - knowledge assumed . . .‘Core’ statistics

- subjects found in undergraduate / masters courses

Experience of (messy) commercial data- these are the reason we need strong EDA skills

Limitations of traditional tests with large data sets

Advanced mathematical and computational methods

Coding and / or programming

Python

Hadoop

Weka

. . . and / or many others . . .

(of course!)

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

14

Statistics, big data and the commercial sector

A good starting point‘All models are wrong, but some are useful’ [Box 1979]

Exploratory / graphical data analysis are crucial[Tukey 1977, Unwin 2015]

We need to teach . . .Simple is - often - better than ‘best’

- by the time we’ve built the ‘best’ model, it’s usually out of date

The basics are crucial- EDA- data quality / cleaning- visualisation- graphical presentation

15

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

16

The skills needed are . . .

1 Communication- Influencing- Language- Brevity

2 Common sense3 Time management4 Ability to work with non-technical colleagues5 Statistics6 Modelling

- Exploratory / graphical data analysis / presentation- Critical assessment of models / methods- Not just statistical assessment- ‘Commercial utility’ - simpler is often better

7 Coding

http://www.gordonblunt.co.uk/publications.html

17

Outline

1 Background

2 Data science

3 Skills needed‘Softer’ skillsStatistical skills

4 Concluding thoughts

5 References

18

References

Box GEP.Robustness in the strategy of scientific model buildingin Launer and Wilkinson (Eds.) Robustness in Statistics ,Academic Press, 1979.

Breiman L.Statistical Modeling: The Two CulturesStatistical Science, Vol 16 No. 3: 199-231, 2001.

Cleveland WS.Data Science: An Action Plan for Expanding the Technical Areas of the Field ofStatisticsInternational Statistical Review, Vol 69, 21-26, 1982.

Tukey JW.The future of data analysisAnn. Math. Stat., Vol 33 No. 1: 1-67, 1962.

Tukey JW.Exploratory Data Analysis,Addison-Wesley, 1977.

Unwin A.Graphical Data Analysis with R,CRC Press, 2015.

top related