data science for business - leonid zhukovschool of data analysis and artificial intelligence...

21
DATA SCIENCE FOR BUSINESS Lecture 1. Introduction to Data Science School of Data Analysis and Artificial Intelligence Department of Computer Science Moscow, April 10 th , 2020. [email protected]

Upload: others

Post on 16-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

DATA SCIENCE FOR BUSINESS

Lecture 1. Introduction to Data Science

School of Data Analysis and Artificial Intelligence Department of Computer Science

Moscow, April 10th, 2020.

[email protected]

Page 2: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

COURSE TECHNICALITIES

Lectures: Friday 18.10 - 19.30 , 10 lectures ZOOM: https://zoom.us/j/7723819319Seminars: Friday 19.40 - 21.00, 10 seminars ZOOM: https://zoom.us/j/636910206

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Class Website: http://www.leonidzhukov.net/hse/2020/datascienceSeminar Wiki: http://wiki.cs.hse.ru/Data_Science_for_Business_2020

Modeling software: RapidMiner https://rapidminer.com

Telegram Group: https://t.me/joinchat/ENzQEhr-hra2WhEjxvgayw

Page 3: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

TEACHING TEAM

Prof. Leonid [email protected]

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Ilya [email protected]

Anvar [email protected]

Page 4: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

THE CLASS 2020

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

80-85 votes

Page 5: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

TEXTBOOKS FOR THE COURSE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Page 6: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

RAPIDMINER MODELING SOFTWARE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

https://rapidminer.com

Page 7: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

COURSE SCHEDULE

1. Introduction to data science.2. Exploratory data analysis3. Predictive analytics and machine learning4. Case study 1: Retail pricing5. Case study 2: Churn modeling6. Case study 3: Customer segmentation7. Case study 4: Personalization8. Case study 5: Fraud detection9. Case study 6. Demand forecasting10. Impacting the business

Lecture topics

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

1. Data flow modeling and RapidMiner2. Working with data, ETL process, data exploration3. ML modeling pipeline4. Regression5. Classification6. Clustering7. Recommender systems8. Anomaly detection9. Time series forecasting10. Problem solving

Seminars - exercises

Exam

Page 8: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

DATA SCIENCE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Page 9: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

DATA SCIENTISTS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Page 10: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

DATA SCIENCE BUSINESS PROCESS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Analytical formulation Proof of concept - POC

Changes in business process

Algorithmic solution Minimal viable product - MVPBusiness problem

Page 11: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

ANALYTICAL METHODS IN DATA SCIENCE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Predictive modelingMachine learningData mining

Operations researchOptimization

Agent based modelingSimulations

Geo analyticsText analysis (NLP)Computer vision

Page 12: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

THREE MAIN REASONS TO USE ML IN BUSINESS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

ExplainPredictDetect

Optimise and improve

Page 13: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

SIMPLE EXAMPLE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Statistical Analysis"Measure and understand”

Predictive modeling "ML predict outcomes”

Optimization"Make optimal decisions"

Data explorationDescriptive statistics

Finding patternsPredicting outcomes

Finding optimal valuesFinding optimal parameters

Page 14: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

BUSINESS USE CASES

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Demand forecast

Marketing personalization

Pricing and promo effectiveness

Assortment optimization

Cross sell and upsell

Production optimization

Predictive maintenance

Logistics optimization

Project risk management

Robotics and automation

Manufacturing process

optimization

Predictive maintenance

Demand and supply forecast

Operations planning

Energy efficiency

Next best offer

Churn and retention modeling

Network optimization

Infrastructure capacity and utilization

Credit risk assessment

Fraud detection

Claim management

Churn and retention modeling

Next best offer

Back office automation RPA

Performance management

Workforce planning

Scenario simulations

Consumer Goods EnergyIndustrial GoodsBanking/FI EnterpriseTelecoms

Page 15: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

THREE TYPES OF MACHINE LEARNING

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Aim to discover structure:no target variable known

Optimise actions in a way that maximises cumulative reward

Unsupervised Learning

Reinforcement Learning

Aim to predict or model a known target

SupervisedLearning

Page 16: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Supervised Learning

Algorithms predict class of a new data point from a training

set of previously correctly identified

observations

Unsupervised Learning

Algorithms predict results without prior knowledge of the

response

Algorithm examplesTypes of algorithms

Regression

Classification

Ranking algorithms

Given examples of classes, the model assigns new input data to classes• Decision trees, k-nearest neighbors (kNN), Logistic regression

• Random Forest, Support Vector Machines (SVM),Gradient Boosted Decision trees (GBT)

• Neural networks + Deep Learning

Given several classes the model assigns input data to classes• Linear regression, Elastic nets

• Regression trees

Given ordered pairs examples the model ranks new data

Clustering

Dimensionality reduction

Density estimation

Anomaly detection

Estimates the probability distribution of values

Divide the input data into groups with similar data points assigned to the same group

• k-means, spectral

PCA

Detecting outliers in data

Mapping input data in lower dimensional space

Page 17: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

MODELING PROCESS PIPELINE

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Data understanding Data preparationModel

development and training

Model evaluation Model deployment

Source: BCG

Understanding the data at hand, checking the quality

ETL activities to get data into right format, cleaning, aggregation

Developing and training the model on given dataset

Validating model performance including business metrics on the holdout set

Running the pilot

Page 18: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

INTEGRATION OF ANALYTICS PROJECT INTO BUSINESS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Develop AI solution

Embed in business process

Integrate in IT Ecosystem Deploy at Scale

Data science, machine learning algorithms and optimization

Adjustment of ways of working and decision-making processes

Roll-out of solutions in additional geographies and/or business units

Technology infrastructure enablement & data

engineering

Collect dataHistorical data collection.

Integration with data sources

Page 19: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

ANALYTICS PROJECTS SUCCESS FACTORS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Algorithms & Data• Data analysis • Algorithm development

Technology/IT• Algorithm industrialization• Digital platforms development

Business transformation• Business process redesign• Enablement • Change management

Page 20: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,

NEXT STEPS

School of Data Analysis and Artificial Intelligence Department of Computer Science

.

Page 21: DATA SCIENCE FOR BUSINESS - Leonid ZhukovSchool of Data Analysis and Artificial Intelligence Department of Computer Science. 1.Data flow modeling and RapidMiner 2.Working with data,