expediahotelrecommendationpresentation
TRANSCRIPT
![Page 1: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/1.jpg)
1
Expedia Hotel Recommendation
Capstone Project – Spring 2016Presented By : Gurpreet Dhillon
![Page 2: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/2.jpg)
2
Introduction
Which hotel will the user choose?
![Page 3: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/3.jpg)
3
Dataset
• Logs of customer behavior• Source : Kaggle Competitions• 37670293 observations of 24 attributes
![Page 4: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/4.jpg)
4
DatasetColumn name Description Data type
date_time Timestamp string
site_name ID of the Expedia point of sale (i.e. Expedia.com, Expedia.co.uk, Expedia.co.jp, ...) int
posa_continent ID of continent associated with site_name int
user_location_country The ID of the country the customer is located int
user_location_region The ID of the region the customer is located int
user_location_city The ID of the city the customer is located int
orig_destination_distancePhysical distance between a hotel and a customer at the time of search. A null means the distance could not be calculated
double
user_id ID of user int
is_mobile 1 when a user connected from a mobile device, 0 otherwise tinyint
is_package 1 if the click/booking was generated as a part of a package (i.e. combined with a flight), 0 otherwise int
channel ID of a marketing channel int
srch_ci Checkin date string
srch_co Checkout date string
srch_adults_cnt The number of adults specified in the hotel room int
srch_children_cnt The number of (extra occupancy) children specified in the hotel room int
srch_rm_cnt The number of hotel rooms specified in the search int
srch_destination_id ID of the destination where the hotel search was performed int
srch_destination_type_id Type of destination int
hotel_continent Hotel continent int
hotel_country Hotel country int
hotel_market Hotel market int
is_booking 1 if a booking, 0 if a click tinyint
cnt Number of similar events in the context of the same user session bigint
hotel_cluster ID of a hotel cluster int
![Page 5: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/5.jpg)
5
ApproachData
Exploration and Cleaning
Train and Test dataset
creation
Model creation
Model selection
![Page 6: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/6.jpg)
6
Data Exploration and Cleaning
• Use data.table package in R• Convert attribute datatypes to correct
datatypes• Handle missing values• Univariate Analysis : Find how data is
distributed• Bivariate Analysis : Find correlations between
variables and Hotel Clusters
![Page 7: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/7.jpg)
7
Data Exploration
![Page 8: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/8.jpg)
8
Hotel cluster distribution with hotel continent
![Page 9: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/9.jpg)
9
Hotel Cluster correlation with Distance from Origin
![Page 10: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/10.jpg)
10
Training and Testing Dataset creation
• Select 100 users with maximum number of observations as sample
• Verify that this is a representative sample• Select 80 % sample data for Training ML
models• Select 20% sample data for Testing ML models
![Page 11: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/11.jpg)
Model Creation
• Challenge : Large number of clusters• Solution : H2O package in R
• H2O is “The Open Source In-Memory, Prediction Engine for Big Data Science”
• The R H2O package communicates with the H2O JVM over a REST API
• Data is not in R, R only has a pointer to the data, an S4 object containing the IP address, port and key name for the data sitting in H2O
11
![Page 12: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/12.jpg)
12
Model Creation
Random Forest• Resample the data over and over • For each sample train a new classifier• Different classifiers overfit the data in a different way• Average out differences through voting
Performance Metrics:Accuracy 0.2066556
Logloss 3.379106
![Page 13: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/13.jpg)
13
Model Creation
GBM• Boosting method which builds on weak classifiers• Add a classifier at a time• Next classifier is trained to improve the already
trained ensemblePerformance Metrics:
Accuracy 0.2697774
Logloss 2.16233
![Page 14: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/14.jpg)
14
Model CreationDeep Learning
![Page 15: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/15.jpg)
15
Model CreationDeep Learning
• Input Layer: Training observations fed here• Hidden Layers: Intermediate layers which help the
Neural Network learn the complicated relationships involved in data
• Output Layer: Final output is extracted from previous two layers
Performance Metrics:
Accuracy 0.2837518
Log Loss 1.545127
![Page 16: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/16.jpg)
16
Results Model/Metric Random Forest GBM Deep LearningAccuracy 0.2066556 0.2697774 0.2837518Kappa 3.595714 3.616561 3.570305
R^2 0.9989192 0.9991958 0.9994364Logloss 3.379106 2.16233 1.545127
Accuracy: The fraction of instances that are correctly classified. Kappa: Comparison of the overall accuracy to the expected random chance accuracy. R^2: Explains the variance in the dependent variable (hotel_clusters) as explained by independent variables Log loss: Quantifies the accuracy of a classifier by penalizing false classifications.
![Page 17: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/17.jpg)
17
Conclusion
• Deep Learning model :best fit to make predictions on hotel clusters
• Expedia can show hotels that are likely to be booked on Expedia home page
• Hotel recommendation can also be included in emails sent to customers
![Page 18: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/18.jpg)
18
Conclusion
Satisfied and Loyal customers!!
![Page 19: ExpediaHotelRecommendationPresentation](https://reader033.vdocuments.us/reader033/viewer/2022051521/587b0f2e1a28abb15c8b61fd/html5/thumbnails/19.jpg)
19
Thank you!!