Download - SAS Presentation
![Page 1: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/1.jpg)
Speed Dating Data SetVaibhav, Tejasvi, Ritesh, Foram, Mary
![Page 2: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/2.jpg)
Outline• Introduction & Business Problem• Description of Data• Pre-Processing Steps• Exploratory Techniques & Interesting Observations• BI Model• Conclusions
![Page 3: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/3.jpg)
Introduction & Business ProblemCurrent popular dating apps geared toward young adults do not take
preferences and interests into consideration.
Goal: To create a superior dating app that results in a higher percentage of dates and relationships.
How: Use data from speed dating events to predict whether users are compatible.
![Page 4: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/4.jpg)
Description of Data• Source: Kaggle• 8,378 Observations from twenty-one speed dating events from
2002 to 2004• Each observation represents a four-minute date between two
people• Includes:
• User demographics• User interests/preferences• Scorecard for each user• Whether each user desired a second date with their partner
![Page 5: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/5.jpg)
Description of Data - Scorecard
![Page 6: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/6.jpg)
Pre-Processing Steps• Four of the speed dating events used a different ranking method for
their preferences• For these observations, we used the following method to scale the data
= ×
We rejected the following variables:• Match• Dec_o• Num_in_3
![Page 7: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/7.jpg)
Pre-Processing Steps• For certain models, the following nodes were applied:
• Impute• Mean value replaced blank interval variables• Median value replaced blank ordinal variables
• Replacement• Missing values replaced with a ‘.’
• Variable Transformation• Skewed variables transformed using log
• Variable Selection• Computed automatically by SAS
![Page 8: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/8.jpg)
Exploratory Techniques & Interesting Observations
• Overall match rate: 16.5%
• Individual ‘Yes’ rate: 42%
• Age Range: 18-55• Mean: 26.3• St. Deviation: 3.566• Skewness: 1.07
![Page 9: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/9.jpg)
Exploratory Techniques & Interesting Observations
> Gender
Note:‘0’ represents female ‘1’ represents male
![Page 10: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/10.jpg)
Exploratory Techniques & Interesting Observations> Age
![Page 11: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/11.jpg)
Exploratory Techniques & Interesting Observations> Season
![Page 12: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/12.jpg)
BI Model
![Page 13: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/13.jpg)
BI Model Comparison
![Page 14: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/14.jpg)
BI Model ComparisonModel Misclassification
RateTrue Positive Rate
Replacement + Decision Tree 18.9% 80.2%
Replacement + Gradient Boosting
18.1% 75.2%
● A decision tree after replacement is the superior model○ While the misclassification rate is slightly higher than for gradient
boosting, the true positive rate is significantly higher
![Page 15: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/15.jpg)
Our BI Model ResultsAll the ratings are on the scale of 1 to 10• If user likes a person greater than equal to 8 → user rates them on attractiveness
greater than equal to 7.5 → user thinks the probability of getting a match is greater than equal to 3 .Then there is a 86.28 percent chance that the user will say yes
• If the user likes the person greater than equal to 5.5 and less than 6.5 → if they are from London, England. They have 100 percent chance of saying a yes but if the user is from Alabama, Texas, Argentina there is 68.12 percent chance of saying no.
• If the user likes a person less than 5.5 → is a lawyer. Then there is a 93.16 percent chance that user will say no the other person. Similarly if the user is in the field of Informatics or Psychology, the user will say no 100 percent of the time and if the user is a journalist, there is an 83 percent chance of saying a yes.
![Page 16: SAS Presentation](https://reader035.vdocuments.us/reader035/viewer/2022062822/58812a1c1a28ab00438b534f/html5/thumbnails/16.jpg)
ConclusionWe are going to use the BI model for building an application and
the overview for the Dating Application will be :• User profile• Suggesting users people based on their preferences• Users ratings for the suggested profiles • BI model used for suggesting potential partners using the
ratings• Chat option• After a significant user base implement recommendation system