for business: what is a data scientist?
DESCRIPTION
Slides of the course on big data by Clement Levallois from EMLYON Business School. For business students. Check the online video connected with these slides. -> The definition and profile of a data scientist is presented: hacker, math person and domain specialist.TRANSCRIPT
MK99 – Big Data 1
Big data &
cross-platform analytics MOOC lectures Pr. Clement Levallois
MK99 – Big Data 2
What is a data scientist? [or, a guide for business to spot good ones and recruit them!]
MK99 – Big Data 3
Source: http://drewconway.com/zia/2013/3/26/the-data-science-venn-diagram
MK99 – Big Data 4
+ Math and stats knowledge • Maths and stats are excellent foundations
• But a data scientist has a different mindset
– Focuses on accuracy of prediction, not causality
– Even if this is not “elegant” in terms of formal models
– Ready to use any bit of information available in the data (text, networks, …)
– See the slide deck on “Machine Learning” for details
MK99 – Big Data 5
+ Hacking skills • Ability to think “out of the box”
– As an econometrician, and as a computer scientist, as a computational linguist and a network analyst!
– Concerned with scale and speed
– Not dependent on packaged software • Aware of, and contributing to developments in open source • Following current developments in different academic fields
MK99 – Big Data 6
+ Substantive expertise
• Substantive expertise = grasp of the business logic – Many jumps of optimization come from a good knowledge
of the specificities of the domain
– These domains can be quite complex!
– Data scientists must be able to understand and translate these business specificities into their data models
MK99 – Big Data 7
A data scientist should be able to… – Discover interesting angles in the dataset
• You see worthless metadata? I see gold!
– Choose from a wide choice of techniques across social and natural sciences
• Statistics, machine learning, network analysis, natural language processing, etc. • From economics, physics, psychology, linguistics, computational science, genomics,
neuroscience, etc.
– Implement these techniques, possibly on large datasets
• Can you implement them in your programming language of choice? • Can you deal with large datasets (what if it doesn’t fit in memory?) • Can you be quick (and not ask for a couple of nights to run a script) • Can you be cheap (buying more hardware is not always a solution you can afford)
MK99 – Big Data 8
How to hire and keep a data scientist in your business?
1. Find them where they hang out: stackoverflow, github, specialized communities on Twitter. Good profiles are PhD students near graduation, and / or leading developers of open source projects.
2. Allow plenty of time for their personal development
– Contributing to open source projects, attending conferences, working on personal projects on their working hours
3. Treat them not as executioners, but as business co-developers
MK99 – Big Data 9
This slide presentation is part of a course offered by EMLYON Business School (www.em-lyon.com)
Contact Clement Levallois (levallois [at] em-lyon.com) for more information.