introduction of data science
TRANSCRIPT
Agenda
What is big data
What is data science
Data science applications
System infrastructure
Case study – recommendation system
Data Scientist
Analytics
ArtificialIntelligenceStatistics
Natural Language ProcessingFeature Engineering
ScientificMethod
Simulation
Data & Text Mining
Machine LearningPredictiveModeling
GraphAnalytics
Data Management
Data Warehousing
Mashups
Databases
Business IntelligenceBig Data
Information Retrieval
Art & Design
Business Mindset
ComputerScience
Visualization
Communication
Data Product Design
Domain Knowledge
Ethics
Privacy & Security
Programming
Cloud Computing Distributed SystemsTechnology & Infrastructure
GrowthHacking
Social network
Public Relation
Online ToolsResource
Data Science Applications
Recommendation System
Self-driving
Text Cognition
Spam Filtering
https://en.wikipedia.org/wiki/Data_science#/media/File:Data_visualization_process_v1.png
Machine Learning AlgorithmSupervised
learning
Regression
Classification
Neural network,
deep learning
Unsupervised learning
Clustering
Recommendation SystemAre a subclass of information filtering system that seek to predict the “rating” or “preference” that a user would give to an item ---- Wikipedia
Case Study
AlgorithmsCollaborative filtering
Content-based recommendation
Learning to rank
Context-aware recommendation
Social network recommendation
Collaborative FilteringBasic Assumption• Users with similar interests have common
preference• Sufficiently large number of user preferences are
available
Main Approaches• User-based• Item-based
User-based Filtering
User user-item rating
matrix
Make user-to-user
correlations
Find highly correlated
users
Recommend items to
Item-based Filtering
User user-item ratings matrix
Make item-to-item correlations
Find items that are highly corated
Recommend items with highest correlation
Steps in item-based CF
Predicted rating for item 2 for user 1
Problem with Collaborative Filtering
New user cold start problem
New item cold start problem
Popularity bias: tend to recommend only popular items
Sparsity problem: if there are many items to be recommended, user/rating matrix is sparse and it hard to find the users who have rated the same item