DATA SCIENCE vs. DATA SCIENTIST
A READINESS AND ASSESSMENT
CREATE TALENT & TRANSFORM MALAYSIA TO DIGITAL ECONOMY
2
Objective
Data science - defined
Data scientist competency
Appendix
Contents
Data science vs. enterprise data science
Data scientist competency development approach
THE institute of enterprise analytics (TIOEA)
Our enterprise data science learning
Data scientist competency development approach
3
Strategy Implementa
tion
Executive
Developers
Data
Science
vs.
Data
Scientist
Objective
Review key functions of data
science and how data science
different from traditional
business intelligence (BI)
Understand key competency
area, skills, roles and
responsibilities and
deliverables of data scientist
4
What is data science?
Data science is not new, Data science is just modernizing existing reporting solution,
analytics solutions, data warehousing solution, business intelligence solutions and even data
management solutions.
So Data science is … New thinking , New thoughts, New ideas, New data source, New data
format/structure, New data architecture, New data processing mechanism, New innovation
on data, and New way of solving problems. That’s all.
Traditional Approach to
Data & Analytics
Data Source & Format
ERP, CRP, Oracle, SAP, MS SQL, etc.
Tables
Files
Data Structure
Structured Data
ER Model (Entity Relationship)
MDM Model ( Multi Dimensional)
Data Access
SQL
Filters & Aggregate Functions
Business Rules and Formulas, etc.
Analytics
Reports
Dashboards
Data Analysis
Analytics
Transformation
Data Source & Format
ERP, CRP, Oracle, SAP, MS SQL, etc.
Social ( Web, LinkedIn, Twitter, FB)
Streaming
Data Structure
Structured Data
Unstructured & Semi Structured
Machine Data
Data Access
Parallel processing
Distributed Computing
In-memory Analytics
Analytics
Predicate Analytics ( Liner Regression)
Data Mining
Clustering , Segmentation , etc.
Modern Approach to Data &
Analytics = Data Science
New Data Source & Format
New Data Architecture
New Analytics Architecture
New Analytics Techniques
Data Data
5
Computer
Science
Social
Science
Life
Science
Medical
Science
Material
Science …
Data Science
Measurable
Hidden
Values
Computer Science
Social Science Data
Life Science Data
Medial Science Data
Material Science Data
…
Social Data
Medical Data
Pharmacy Data
Model
Algorithm
Like computer science, social science, life
science and other sciences , data science is also
science to extract hidden values from any data
by applying scientific, statistical, mathematical
and computing techniques on it.
As you can see, Data science consists of all
sciences together since data is there
everywhere
What is data science ? Continued …
6
What is data science ? Continued …
Structure
Unstructured &
Semi-Structured Machine
Apply scientific, statistics, and mathematical techniques
Financial & Billing
Customer Behavior Cell Phone Call Record
Predictive Analytics
Advanced Analytics
Data Discovery … much more
Big Data
Linear Regression
Time Series & Neural Network
Clustering … much more
Data science offers a
powerful and new
approach to making data
discoveries by combining
aspects of statistics,
computer science, applied
mathematics, and
visualization together.
Data science can turn the
vast amounts of data the
digital age generates into
new insights and new
knowledge
Data Data Data
7
Data Science
Project
Scope
Research &
Development
Enterprise &
Industry
Developing new Models
Developing new Algorithm
New Analytics Techniques & Innovation
New Data Product or Platform Development
New Analytics Product or Platform
Development
etc.
Solving Business Problems
Target marketing & reduce marketing
spend
Consistent customer experience across
all channel – create personalized
customer experience … etc.
Enterprise
Data
Scientist
Typical Data Science Project Scope
Data
Scientist
Typical Data Science Scope
Based on the projects that I have been involved, the scope & focus of a data
scientist role differs but it’s very critical to understand the different focus area
and deliverables of a data science project.
Data Scientist Deliverables
Data Scientist Deliverables
Modernizing Existing
Business Intelligence
Solutions & Data
Solutions
8
Business
Process
Unstructur
ed
Data
Semi
Structured
Data
Structure
Data
Machine
Data
Analytics /
Data Science
Techniques
Finance
Customer
Marketing
Human Resource
Supply Chain
Industry
Oil & Gas
Media
Telecommunication
Power & Utility
Retail
etc. etc.
Enterprise Data Science Framework
Measurable
Business
Values
Linear Regression
Time Series
Clustering
Neural Network
Association
etc.
Reduced 2% Cost
Increased 5%
Revenue
etc.
On an enterprise data science project, an enterprise data scientist expected to
know the industry and it’s associated business process very well to lead, guide and
deliver the project. Following are the core enterprise data science building
blocks
9
Data Science
Project
Scope
Research &
Development
Enterprise &
Industry
Less focus on industry skill
Less focus on business process skill
Deeper Focus on Data skills
Deeper Focus on Analytical skills
Less focus on communication and people skills
Deeper Technology skills
Data Scientist Key Competency Area
Deep focus on industry skill
Deep focus on business process skill
Data skills
Analytical skills
Strong communication and people skills
Technical skills
There are many different skills that’s required to become a data scientist, but these
are our key observations on skills that’s required to deliver a data science project.
Please note, we didn’t list specific skills under each area. For example, under
Data, we will have data management, data governance, data quality, data
modeling ,data architecture, data integration, data mapping , etc.
Entry Level Senior Level
Basic Skill Deeper Skill
Basic Skill Deeper Skill
Entry Level Senior Level
Enterprise
Analytics
Transformati
on Leader
___________
Industry
Expert
Technology
Thought
Leader
____________
PhD’s
Academia
10
Data Science
Project
Scope
Research &
Development
Enterprise &
Industry
Less focus on industry skill
Less focus on business process skill
Deeper Focus on Data skills
Deeper Focus on Analytical skills
Less focus on communication and people skills
Deeper Technology skills
Deep focus on industry skill
Deep focus on business process skill
Data skills
Analytical skills
Strong communication and people skills
Technical skills
I found upskilling industry professionals who has prior experience in BI, data,
data warehousing would be a faster, stable and sustainable approach to deliver
and support an enterprise data science project
Entry Level Senior Level
Basic Skill Deeper Skill
Basic Skill Deeper Skill
Entry Level Senior Level
Data Scientist Competency Development Approach
Who may be a best fit for data scientist ?
Upskill on
Industry and
Business
Process
Upskill on
Advanced
Analytics and
Data Science
Techniques
11
Key Takeaways
Visionary
Domain Expert
Innovator
Transformation leader
Change Agent
Data Expert
Analytical Thinker
Technology Thought Leader
Based on our industry experience, some of the key characteristics of data
scientist on an enterprise analytics transformation initiatives as follows.
Key roles and responsibilities and deliverables of a data scientist on an
enterprise data science projects
Data Scientist Key Roles &
Responsibilities Data Scientist Key Deliverables
Business Case
Strategy and Roadmap
Standards, Policies and
Guidelines
Data Management Framework
Modern Enterprise Data
Architecture – Big Data Lake
Modern Enterprise Analytics
Architecture - Enterprise
Data Science
Plan of Action – Tactical level
Execution Plan
User Adoption
Tools and Templates and
Accelerators
Enterprise Analytics Transformation Initiative or
Enterprise Data Science Project
12
Key Takeaways
Data Engineer
Data Architect
Data Molder
ETL Developer
Information Modeler
Information Security Expert
Data Analyst
Data Visualization Engineer
etc.
Other roles and responsibilities that may involve in an Enterprise Analytics
Transformation Initiative or Enterprise Data Science Project
Please note, these roles are not a mandatory roles, it may or may not even
exists, these roles are subject to change, it’s dependents on project scope and
objectives.
Others Roles & Responsibilities Other Deliverables
Data Provisioning Functional
and Technical Components
Data Modeling
Information Modeling
Data Visualization
Components
etc.
Enterprise Analytics Transformation Initiative or
Enterprise Data Science Project
14
Industry Use Case
Research and Innovation
Consulting and
Implementation
Training && Talent
Development
Thought Leadership
Tools and Templates
THE institute of enterprise analytics (TIOEA)
TIOEA
Create talent and jobs
Simplify data science learning
and empower learner with
industry use cases and pre-
packaged business contents
Be a thought leader and
governance model for
enterprise data science
implementation
Accelerate enterprise data
science implementation with
proven innovation lab, tools,
templates, standards, polices
and guiltiness
Fresher's
Experienc
ed
Executives
We provide practical coaching and on job learning experience
Enterprise Data Science for Executives
Enterprise Big Data for Executives
Enterprise HADOOP for Executives
Enterprise Data Science for Architects
Enterprise Big Data for Architects
Enterprise HADOOP for Architects
Enterprise Data Science for Developers
Enterprise Big Data for Developers
Enterprise HADOOP for Developers
Learn to build
Learn to deliver
Learn to lead
CAP (Certified Analytics Professional )
Role-based Learning TIOEA
Dra
ft
Functional Learning
Industry Use Case
Strategy & Roadmap
Data
Analytics User Experience
Problem Statement
Business Needs & Challenges
Business Impacts and Benefits
Implementation Methodology
Implementation Options & Plan
Deliverables and Milestones
Data Governance
Data Management
Data Sources & Data Format
Data Modeling
Data Integration
Design and Leading Practices
Data Science Overview
Data Science vs. Enterprise Data Science
Predictive Analytics & Advanced Analytics
Treditional “BI” vs. Data Science
Analytics Techniques
Design and Leading Practices
Data Visualization
Self Servicing and Data Analysis
Reporting and
Insights and Improved Decision Making
Deployment - Desktop, Mobile and Cloud
Design and Leading Practices
Our Enterprise Science Lab ( HADOOP + SAP HANA + Oracle 12C + Analytics Tools + Open Source Technologies )
Learning Roadmap D
ra
ft
Technical Learning
Industry Use Case
“R” Programming
Python Programming
Machine Learning
Enterprise HADOOP
Problem Statement
Business Needs & Challenges
Business Impacts and Benefits
Implementation Methodology
Implementation Options & Plan
Deliverables and Milestones
HADOOP Overview
HADOOP Architecture
HADOOP Core Components
Data Management On HADOOP
Analytics & Application On HADOOP
HADOOP Ecosystem and Total Cost of
Ownership
Enterprise Data Science Overview
Data Science vs. Enterprise Data Science
Predictive Analytics & Advanced Analytics
Analytics on “R” Overview
Analytics on “Python” Overview
Analytics on “Natural Language Processing ”
Data Visualization
Treditional “BI” vs. Data Science
Self Servicing and Data Analysis
Insights and Improved Decision Making
Deployment on Desktop, Mobile and
Cloud
Change Management and Training
Our Enterprise Data Science Lab ( HADOOP + SAP HANA + Oracle 12C + Analytics Tools + Open Source Technologies )
Big Data Enabling Technologies
Cloud Technologies Overview
SAP Analytics Tools Overview
Oracle Analytics Tools Overview
Open source Technologies Overview
Data Management Technologies Overview
Data Management
Implementation Overview
Learning Roadmap D
ra
ft