new professional careers in data
Post on 20-Feb-2017
150 Views
Preview:
TRANSCRIPT
…in data
new professional careers
Who am I?
• David Rostcheck
• I’m a consulting data scientist
• Follow my articles on LinkedIn
We will talk about 4 things:
Big Data DataScience
Data Engineering
BusinessIntelligence
BIG DATA
What is big data?
is data that isso big
that it
requiresspecialized techniques
to handle
like: clusters
or cloud computing
or graph algorithms
Data may
change rapidly
so big data may also be fast data
big data requires
specialized tools
to handle
MAP/REDUCE
big data tools are in demand
but
keep your perspective
Big Data tools can be complex
It is often easier to solve problems at small scale, then scale up, if possible
remember:
not all companies use big data
but
all companies use data
DATA SCIENCE
What is data science?
Data science is
industrial research
on a company’s
own data
What is its goal?
to produce
advanced algorithms
that deliver a
competitive advantage
data scientists often work with unstructured data
… which can be large
“The qualifications for the job include the strength to tunnel through mountains of information and the vision to discern patterns where others see none”
- Bloomberg Businessweek
Is data science really science?
let’s compare…
academic science data science
Teams PhDs, graduate students
PhDs, technologists
Setting University Company
Publication Formal (academic publications, conferences)
Less formal (blogs, white papers, open source)
Funding Public grants Corporate
Goal Advance human knowledge
Create competitive advantage
Data science is industrial science
It shares some attributes with academic science, but has other differences
What kind of work do data scientists do?
data scientists create artificially intelligent systems
these are often called “narrow AI”
examples
•Recommender systems•Self-driving cars•AI agents•Smart energy management•Medical diagnosis•Machine vision
DATA ENGINEERING
What is data engineering?
data engineering is a specialized kind of
software engineering
with additional skills in
handling and processing data
data science vs. data engineering
data science data engineering
Approach Scientific (Exploration) Engineering (Development)
Problems Unbounded Bounded
Path to Solution Iterative, exploratory, nonlinear Mostly linear
Education More is better (PhD’s common) BS and/or self-trained
Presentation Skills Important Not as important
Research experience
Important Not as important
Programming skills Not as important Important
Data skills Important Important
What kind of special training does a data engineer need?
Data storage and processing– structured: (SQL) – unstructured (NoSQL) – Big Data (Hadoop, Apache Spark/Storm/Flink, cloud)
Data visualization
Machine Learning algorithms and platforms (ex. Dato)
Predictive APIs (ex. Watson)
Does a data engineer need more math than a regular software engineer?
It really helps.
Linear algebra & calculus are important to understand machine learning
BUSINESS INTELLIGENCE
Wait – aren’t data science and business intelligence really the same thing?
Maybe. Let’s compare…
business intelligence (BI) data science
Data analysis Yes Yes
Statistics Yes Yes
Visualization Yes Yes
Data Sources Usually SQL, often Data Warehouse
Less structured (logs, cloud data, SQL, noSQL, text)
Tools Statistics, Visualization Statistics, Machine Learning, Graph Analysis, NLP
Focus Present and past Future
Approach Analytic Scientific
Goal Better strategic decisions Advanced functionality
The two fields are closely related.
In some ways data science is an evolution of business intelligence.
which industries most use data-focused jobs?
right now:
Technology Education
FinanceConsultingHealth Care
( Technology employs over 50% of data workers)
but...
“Technology” companies like Uber, Amazon, AirBnB
compete in other industries (transportation,
retail, hotels)
“Software is eating the world”
– Andreessen Horowitz
which industries will AI change?
Ultimately, all of them.
Incorporating AI is a large business opportunity
data jobs are in demand
• “The hot job of the decade… Data scientists today are akin to Wall Street “quants” of the 1980s and 1990s”
- Harvard Business Review
• “18.7% projected growth 2010-2020”- VentureBeat
• “McKinsey projects […] ‘50 percent to 60 percent gap between supply and requisite demand’”
- Bloomberg Businessweek
On the other hand…
Some people believe data jobs themselves will be automated:
“New Teradata Platform Reduces Demand For Data Scientists”
- Forbes
“Automating the Data Scientist”- MIT Technology Review
What do we think?
• Yes, advanced tools will automate some data exploration
• But: research and communication are fundamental skills and are always in demand when the world is changing
• Data will continue to explode (Internet of Things)
• We will see more change and faster change
education for data jobs
options include:
academic programs,boot camps,
and online classes (Coursera ,
Udacity)
for data engineering:
– documentation and webinars (self-education)
– focus on data manipulation tools and machine learning
for data science:
– The more academic science and research expertise, the better
– Focus on projects that solve unknown problems
– Work with more experienced data scientists
Questions?
?Contact: drostcheck@leopardllc.com, twitter: @davidrostcheckArticles: http://linkedin.com/in/davidrostcheck
top related