by kulka053/presentation full · dark knight rocky sita aur gita star trek cliffhanger a.i. mi...
TRANSCRIPT
![Page 1: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/1.jpg)
ByAtul S. Kulkarni
Graduate Student,University of Minnesota Duluth
Under The Guidance ofDr. Richard Maclin
![Page 2: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/2.jpg)
Problem Statementy Given a set of users with their previous ratings for a set of
movies, can we predict the rating they will assign to a movie they have not previously rated?
y Netflix puts it as y “The Netflix Prize seeks to substantially improve the accuracy of
predictions about how much someone is going to love a movie based on their movie preferences. Improve it enough and you win one (or more) Prizes. Winning the Netflix Prize improves our ability to connect people to the movies they love.” – www.netlfixprize.com
y So what do they want?y 10% improvement to their existing system.
y They are paying $1 Million for this.
![Page 3: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/3.jpg)
Problem Statementy Similarly, “which movie will you like” given that you
have seen X-Men, X-Men II, X-Men : The Last Stand and users who saw these movies also liked “X-Men Origins : Wolverine”?
y Answer:?
![Page 4: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/4.jpg)
Background - Datasety Data in the training file is per movie
y It looks like thisMovie#
Customer#,Rating,Date of Rating
Customer#,Rating,Date of Rating
Customer#,Rating,Date of Rating
- Example 4:
1065039,3,2005-09-06
1544320,1,2004-06-28
410199,5,2004-10-16
![Page 5: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/5.jpg)
Background – Dataset statsy Total ratings possible = 480,189 (user) * 17,770 (movies) = 8532958530 (8.5
Billion)y Total available = 100 Milliony The User x Movies matrix has 8.4 Billion entries
missingy Sparse Data
![Page 6: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/6.jpg)
Background of the Solutiony What if I was very conservative about my rating and
someone else was too generous?y I rate the movie I like the most as 3 and the least as 1.y someone else rates his/her high at 5 and high at 3.y So am I like this person?
y Difficult to say.
y We are comparing two people with very high personal biases. Which will result in obvious flawed similarity measure.
y Solution? Normalization of the data.
![Page 7: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/7.jpg)
Proposed Solutiony K-Nearest Neighbor approach (Overview)
y Given a query instance q(movieId, UserId)y normalize the data before processing.y Find the distance of this instance with all the users who
rated this movie.y Of the these users select the K users that are nearest to
the query instance as its neighborhood.y Average the rating of the users form this neighborhood
for this particular movie.y This is the predicted rating for the query instance.
![Page 8: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/8.jpg)
Proposed Solution - Exampley Example: (Representative data, not real)
Matrix
Star
Wars
Dark
knight Rocky
Sita
Aur
Gita
Star
Trek Cliffhanger A.I. MI X-Men
Jim 1 3 1 5 2 1 1
Sean 2 3 2 4 5 3
John 3 4 5 3 4
Sidd 4 3 4 2
Penny 5 2 2 5 1
Pete 5 ? 4 4
![Page 9: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/9.jpg)
Proposed Solution - Exampley calculate the Mean and Standard Deviation vectors.
meanRating standardDeviation
Jim 2 1.527525232
Sean 3.166666667 1.169045194
John 3.8 0.836660027
Sidd 3.25 0.957427108
Penny 3 1.870828693
Pete 4.333333333 0.577350269
![Page 10: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/10.jpg)
Proposed Solution - Exampley Normalized data
MatrixStar
Wars
Dark
knightRocky
Sita
Aur
Gita
Star
TrekCliffhanger A.I. MI X-Men
Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7
Sean -1 -0.14 -1 0.71 1.57 -0.14
John -1 0.24 1.434 -1 0.24
Sidd 0.783 -0.26 0.78 -1.3
Penny 1.069 -0.53 -0.53 1.07 -1.1
Pete 1.15 ? -0.6 -0.58
![Page 11: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/11.jpg)
Proposed Solution - Exampley So now we have a query instance q(Pete, Sita Aur Gita)
y i.e. we wish to evaluate how much will Pete like movie “Sita Aur Gita” on a scale of 1 - 5.
y To do this we need to indentify Pete’s two neighbors who rated this movie. (2-NN case).
y Users who rated the movie Sita Aur Gita are.
candidate_users
Jim
Sidd
Penny
![Page 12: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/12.jpg)
Proposed Solution - Exampley Users with their distance and the 2 neighbors in the
neighborhood are
y 2 Nearest Neighbors are Jim and Sidd.
Users Distance
Jim 0.500046868
Sidd 1.360699721
Peny 1.646395237
![Page 13: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/13.jpg)
Proposed Solution - Exampley The average of the ratings by Jim and Sidd to movie
“Sita Aur Gita” is “0.7956”.y So is our prediction “0.7956” correct? Not yet.y This prediction is in normalized form.y We need to bring it back to Pete’s prediction level.
How?y Multiply by Standard Deviation of Pete’s ratings.y Add Pete mean rating to this product.
y (0.7956 * 0.5773) + 4.3333 = 4.7925y So predicted rating for Pete is 4.7925.
![Page 14: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/14.jpg)
Experiments - Setupy This is a regression problem, hence we want to know if
we are off the expected value, how off are we?y Hence, Test Metric used is
y Root Mean Square Error (RMSE):
y Absolute Average Error (AAE):
y Time taken.
![Page 15: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/15.jpg)
Experiments - Resultsy Result on described dataset
Method Absolute Average Error Root Mean Square Error Time (Minutes)
K-NN 0.5087 0.67164 8640 *
C-K-NN 0.6894 0.88995 9
Netflix (Ladder Board
Topper)
NA 0.8596 NA
Netflix Current System1 NA 0.9514 NA
![Page 16: By kulka053/Presentation full · Dark knight Rocky Sita Aur Gita Star Trek Cliffhanger A.I. MI X-Men. Jim -0.65 0.65 -0.65 1.96 0 -0.65 -0.7 Sean -1 -0.14 -1 0.71 1.57 -0.14 John](https://reader033.vdocuments.us/reader033/viewer/2022060500/5f1a993fed474e133572f214/html5/thumbnails/16.jpg)
Experiments - ResultsRMSE Comparisons Time taken
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
K-NN C-K-NN Netflix (Current Topper)
Netflix (Current System)
Comparison of the RMSE and Absolute Average Error
RMSE
Absolute Average Error
0
1000
2000
3000
4000
5000
6000
7000
8000
9000
K-NN C-K-NN
Time in Minutes
Time in Minutes