introduction of data science

16
Data Science Introduction Jason Geng [email protected] [email protected]

Upload: jason-geng

Post on 22-Mar-2017

154 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction of Data Science

Data Science Introduction

Jason Geng

[email protected]@datascienceassociations.org

Page 2: Introduction of Data Science

Agenda

What is big data

What is data science

Data science applications

System infrastructure

Case study – recommendation system

Page 3: Introduction of Data Science
Page 4: Introduction of Data Science

Data Scientist

Analytics

ArtificialIntelligenceStatistics

Natural Language ProcessingFeature Engineering

ScientificMethod

Simulation

Data & Text Mining

Machine LearningPredictiveModeling

GraphAnalytics

Data Management

Data Warehousing

Mashups

Databases

Business IntelligenceBig Data

Information Retrieval

Art & Design

Business Mindset

ComputerScience

Visualization

Communication

Data Product Design

Domain Knowledge

Ethics

Privacy & Security

Programming

Cloud Computing Distributed SystemsTechnology & Infrastructure

GrowthHacking

Social network

Public Relation

Online ToolsResource

Page 5: Introduction of Data Science

Data Science Applications

Recommendation System

Self-driving

Text Cognition

Spam Filtering

Page 6: Introduction of Data Science

https://en.wikipedia.org/wiki/Data_science#/media/File:Data_visualization_process_v1.png

Page 7: Introduction of Data Science

Machine Learning AlgorithmSupervised

learning

Regression

Classification

Neural network,

deep learning

Unsupervised learning

Clustering

Page 8: Introduction of Data Science

Recommendation SystemAre a subclass of information filtering system that seek to predict the “rating” or “preference” that a user would give to an item ---- Wikipedia

Page 9: Introduction of Data Science

Case Study

Page 10: Introduction of Data Science

AlgorithmsCollaborative filtering

Content-based recommendation

Learning to rank

Context-aware recommendation

Social network recommendation

Page 11: Introduction of Data Science

Collaborative FilteringBasic Assumption• Users with similar interests have common

preference• Sufficiently large number of user preferences are

available

Main Approaches• User-based• Item-based

Page 12: Introduction of Data Science

User-based Filtering

User user-item rating

matrix

Make user-to-user

correlations

Find highly correlated

users

Recommend items to

Page 13: Introduction of Data Science

Item-based Filtering

User user-item ratings matrix

Make item-to-item correlations

Find items that are highly corated

Recommend items with highest correlation

Page 14: Introduction of Data Science

Steps in item-based CF

Predicted rating for item 2 for user 1

Page 15: Introduction of Data Science

Problem with Collaborative Filtering

New user cold start problem

New item cold start problem

Popularity bias: tend to recommend only popular items

Sparsity problem: if there are many items to be recommended, user/rating matrix is sparse and it hard to find the users who have rated the same item

Page 16: Introduction of Data Science