the path to be a data scientist

66
The path to be a Data Scientist Poo Kuan Hoong, Ph.D Senior Manager Data Science, Nielsen Malaysia

Upload: poo-kuan-hoong

Post on 13-Apr-2017

147 views

Category:

Career


5 download

TRANSCRIPT

Page 1: The path to be a data scientist

The path to be a

Data Scientist Poo Kuan Hoong, Ph.D Senior Manager Data Science, Nielsen Malaysia

Page 2: The path to be a data scientist

Disclaimer: The views and opinions expressed in this slides are those of the author and do not necessarily reflect the official policy or position of Nielsen Malaysia. Examples of analysis performed within this slides are only examples. They should not be utilized in real-world analytic products as they are based only on very limited and dated open source information. Assumptions made within the analysis are not reflective of the position of Nielsen Malaysia.

Page 3: The path to be a data scientist

Agenda

• What is a data scientist?

• What kinds of companies that employ data scientists?

• What are the key functions of data scientist?

• What type of work does a data scientist do?

• General Aptitude to be a data scientist

• What skillsets needed to be a data scientist?

• What is data science?

• Where do I begin?

• MDEC National Big App Challenge 3.0 Knowledge Sharing

Page 4: The path to be a data scientist

Self Introduction Poo Kuan Hoong, http://www.linkedin.com/in/kuanhoong

• Senior Manager Data Science

• Senior Lecturer

• Chairperson Data Science Institute

• Coursera Facilitator

• Consultant

• Funding mentor

• Founder

• Speaker/Trainer

Page 6: The path to be a data scientist

https://www.facebook.com/rusergroupmalaysia/

Page 7: The path to be a data scientist

What is a Data Scientist?

Page 8: The path to be a data scientist
Page 9: The path to be a data scientist

Data Scientist

The term "data scientist" has been around for years, and the various advanced analytics specialties that fall under it are even older.

However, due to recent explosion of data, the term has been used in the convergence of disciplines and that leads to the soaring popularity.

Page 10: The path to be a data scientist

What are the job title?

• Data Scientist

• Data Engineer

• Big Data Engineer

• Machine Learning Scientist

• Business Analytics Specialist

• Data Visualization Developer

• BI Solutions Architect/ BI Specialist

• Operations Research Analyst

• Analytics Manager

• Machine Learning Engineer

• Statistician

• Business Intelligence (BI) Engineer

Page 11: The path to be a data scientist

Why the Global Need?

Abundance of Data

Availability of affordable compute resources

Internet of Things (IoT) sensors data

Page 12: The path to be a data scientist

950 Data Analyst (India)

8,411 Data Scientist (US)

808 Data Analyst (UK)

1,188 Data Manager (US)

81 Data Analyst (Australia)

Page 13: The path to be a data scientist

80 in April 2015 1,500 by 2020

The Star, Friday, 24 April 2015 “Malaysia needs 1,500 data scientists by 2020”

Page 14: The path to be a data scientist

What kinds of companies that employ data scientists?

Page 15: The path to be a data scientist

MNC

Government

BANKS

Page 16: The path to be a data scientist

What are the key functions of data scientist?

Page 17: The path to be a data scientist

Key functions of data scientist

Devising Business

Strategies from the insights

Descriptive and Predictive

Analytics

Data Mining and Analysis

Design

Understanding the business

problem

Page 18: The path to be a data scientist

Scenario 1: Customer Churn Analytics

Page 19: The path to be a data scientist

Churn analytics • Predicting who will switch mobile operator

Page 20: The path to be a data scientist

Customer churn - who do customers change operators?

• The top 3 reasons why subscribers change providers:

• They want a new handset

• They believe they pay too much for calls/data

• Providers do not offer additional loyalty benefits

Page 21: The path to be a data scientist

Data Collection

Data Preprocessing

Attributes selection • Attribute 1 • Attribute 2 • Attribute 3

Algorithm

Training Model Score Model Apply Data /Test Data

Predicting Output

Initialization Step Learn Step Apply Step

Machine Learning Framework

Page 22: The path to be a data scientist

Correlation Matrix

Page 23: The path to be a data scientist

Feature selection

Page 24: The path to be a data scientist

Models comparison

• Receiver operating characteristic curve (ROC curve) illustrates the performance of a binary classifier system as its discrimination threshold is varied.

Page 25: The path to be a data scientist

Scenario 2: Market Basket Analysis

Page 26: The path to be a data scientist

Market Basket Analysis Where should detergents be placed in the store to maximize sales?

Are bleach products purchased when detergents and orange juice are bought together?

Is cola typically purchased with bananas? Does the brand of cola make a difference?

How are the demographics of the neighbourhood affecting what customers are buying?

Page 27: The path to be a data scientist

What type of work does a data scientist do?

Page 28: The path to be a data scientist

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459

Page 29: The path to be a data scientist

http://www.forbes.com/sites/gilpress/2016/03/23/data-preparation-most-time-consuming-least-enjoyable-data-science-task-survey-says/#f37c7f758459

Page 30: The path to be a data scientist
Page 31: The path to be a data scientist

General aptitude to be Data Scientist

Page 32: The path to be a data scientist

Data Scientist

• Common sense • Curious mind • Clear and simplify

thought

• Love to solve puzzles

• Good listening, writing and communication skills

• Maths & Stats

• Business sense

Page 33: The path to be a data scientist

I have 4 red, 18 black and 8 brown socks in my sock drawer. If it is completely dark and I cannot see the colour of the socks that I am picking, how many socks do I need to take from the drawer to be sure that I have at least one pair of socks that are the same colour?

Page 34: The path to be a data scientist

What is the hidden number under the car?

Page 35: The path to be a data scientist

What skillsets needed to be a data scientist?

Page 36: The path to be a data scientist

Data scientist skillsets

• Data Mining

• Machine Learning

• R/Python

• Data Analysis

• Statistics

• SQL

• Java

• Algorithms

Image Source: http://imgur.com/hoyFT4t

Page 37: The path to be a data scientist

What is the average salary?

Page 38: The path to be a data scientist

Average salary: Data Scientist

Page 39: The path to be a data scientist

What is data science?

Page 40: The path to be a data scientist

Data Science

• Data science is as an evolutionary step in interdisciplinary fields like business analysis that incorporate computer science, modeling, statistics, analytics, and mathematics.

• At its core, data science involves using automated methods to analyze massive amounts of data and to extract knowledge from them.

• Drawing insight from a piece of data involves understanding how it fits into the larger picture of an organization,

Page 41: The path to be a data scientist
Page 42: The path to be a data scientist

Where do I begin?

Page 43: The path to be a data scientist

Massive Open Online Course (MOOC)

• MSC Malaysia MyProCert (SRI) – Data Science Massive Open Online Courses (MOOC)

• The Center of Applied Data Science (MDEC & HRDF)

• John Hopkins University – Data Science Specialization

• University of Washington - Data Science at Scale Specialization

• Data Analyst Nanodegree - Udacity

• CSCI E-109 Data Science (Harvard Extension School)

• Machine Learning - Stanford University

Page 44: The path to be a data scientist

BDA Undergraduate & Postgraduate Programme Undergraduate

• Multimedia University – Bachelor of Computer Science (Data Science Specialization)

• Sunway University - BSc (Hons) Information Systems (Business Analytics)

• Universiti Teknologi Malaysia (UTM), International Islamic University Malaysia, Monash University, University Institute Technology Mara (UiTM) & University Teknologi Petronas (UTP).

Postgraduate

• Big Data Analytics Post Graduate Programme

Page 45: The path to be a data scientist

Kaggle

• Data sets, real problems, in unprocessed manner.

• Recommend to go through past competitions.

• Read through the forums with particular competitions to find out useful discussion and tips/hints that will be useful for solving future problems.

• https://www.kaggle.com/

Page 46: The path to be a data scientist

UC Irvine Machine Learning Repository

• 360 data sets as a service to the machine learning community http://archive.ics.uci.edu/ml/

Page 47: The path to be a data scientist

Open data

• Open data from various countries

• Malaysia - http://www.data.gov.my/

• Singapore - https://data.gov.sg/

Page 48: The path to be a data scientist
Page 49: The path to be a data scientist

MDEC National Big App Challenge 3.0

Page 50: The path to be a data scientist

• June 4th – June 5th 2016, Berjaya Times Square

• The themes for AHKL2016 were as follows:

1. Big Data Analytics --- Powered by MDEC. Access to 65mil rows of real datasets sponsored by iProperty.com Malaysia

2. O2O Commerce --- Powered by MOLWallet MOLPay

3. Smart Living --- Powered by TIME Internet

Page 51: The path to be a data scientist
Page 52: The path to be a data scientist
Page 53: The path to be a data scientist

National MDEC Big App Challenge 3.0

Page 54: The path to be a data scientist
Page 55: The path to be a data scientist
Page 56: The path to be a data scientist
Page 57: The path to be a data scientist
Page 58: The path to be a data scientist

PropertySenze • B2B business model

• Provide machine learning and AI services to customers

• Visual Search

• Personalized customer experience

Page 59: The path to be a data scientist

BUSINESS MODEL

Big Data becomes Smart Data

1. PropertySenze contracts with

property sites and property developers

to generate analytics and visual

search

5. Analytics at the fingertips for both buyers and sellers

2. PropertySenze’s machine learning algorithm

enables search and buy similar properties that user

sees on the sites, from user‐generated photos and from user‐uploaded images

3. Enhanced search experience and personalized results for users

7. PropertySenze verifies all

transactions and charges

commission fees every month

4. Improved platform that recognizes

properties for retrieval purposes or instant

purchases.

6. Improved user experience that leads to more

engagement and sale transactions

Page 60: The path to be a data scientist

PropertySenze

Page 61: The path to be a data scientist
Page 62: The path to be a data scientist
Page 63: The path to be a data scientist
Page 64: The path to be a data scientist

Hackathon: Tips

• Have a well-shaped team with not more than one server-side developer with relevant experience, one good designer and one the amazing storyteller

• Understand the expected outcomes of the hackathon

• Develop something that everyone can see the benefits

• Have an impressive aim or objective

• Start promoting your product during the hackathon

• Hit the demo 100%. The pitch is for the product to shine

Page 65: The path to be a data scientist
Page 66: The path to be a data scientist

Thanks!

Questions?

@kuanhoong

https://www.linkedin.com/in/kuanhoong

[email protected]