recommendation 101 using hivemall
TRANSCRIPT
![Page 2: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/2.jpg)
Agenda
1. IntroductiontoHivemall2. Recommendation1013. MatrixFactorization4. BayesianProbabilisticRanking
2
![Page 3: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/3.jpg)
WhatisHivemall
ScalablemachinelearninglibrarybuiltasacollectionofHiveUDFs,licensedundertheApacheLicensev2
3
https://github.com/myui/hivemall
![Page 4: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/4.jpg)
Hivemall’s Vision:MLonSQL
ClassificationwithMahout
CREATETABLElr_modelASSELECTfeature,-- reducersperformmodelaveraginginparallelavg(weight)asweightFROM(SELECTlogress(features,label,..)as(feature,weight)FROMtrain)t-- map-onlytaskGROUPBYfeature;-- shuffledtoreducers
✓MachineLearningmadeeasyforSQLdevelopers(MLfortherestofus)✓InteractiveandStableAPIsw/ SQLabstraction
ThisSQLqueryautomaticallyrunsinparallelonHadoop
4
![Page 5: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/5.jpg)
HowtouseHivemall
MachineLearning
Training
Prediction
PredictionModel Label
FeatureVector
FeatureVector
Label
Datapreparation5
![Page 6: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/6.jpg)
CREATE EXTERNAL TABLE e2006tfidf_train (rowid int,label float,features ARRAY<STRING>
) ROW FORMAT DELIMITED
FIELDS TERMINATED BY '¥t' COLLECTION ITEMS TERMINATED BY ",“
STORED AS TEXTFILE LOCATION '/dataset/E2006-tfidf/train';
HowtouseHivemall- Datapreparation
DefineaHivetablefortraining/testingdata
6
![Page 7: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/7.jpg)
HowtouseHivemall
MachineLearning
Training
Prediction
PredictionModel Label
FeatureVector
FeatureVector
Label
FeatureEngineering
7
![Page 8: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/8.jpg)
create view e2006tfidf_train_scaled asselect
rowid,rescale(target,${min_label},${max_label}) as label,
featuresfrom
e2006tfidf_train;
Applying a Min-Max Feature Normalization
HowtouseHivemall- FeatureEngineering
Transformingalabelvaluetoavaluebetween0.0and1.0
8
![Page 9: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/9.jpg)
HowtouseHivemall
MachineLearning
Training
Prediction
PredictionModel Label
FeatureVector
FeatureVector
Label
Training
9
![Page 10: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/10.jpg)
HowtouseHivemall- Training
CREATE TABLE lr_model ASSELECTfeature,avg(weight) as weight
FROM (SELECT logress(features,label,..)
as (feature,weight)FROM train
) tGROUP BY feature
Trainingbylogisticregression
map-onlytasktolearnapredictionmodel
Shufflemap-outputstoreducesbyfeature
Reducersperformmodelaveraginginparallel
10
![Page 11: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/11.jpg)
HowtouseHivemall- Training
CREATE TABLE news20b_cw_model1 ASSELECT
feature,voted_avg(weight) as weight
FROM(SELECT
train_cw(features,label) as (feature,weight)
FROMnews20b_train
) t GROUP BY feature
TrainingofConfidenceWeightedClassifier
Votetousenegativeorpositiveweightsforavg
+0.7,+0.3,+0.2,-0.1,+0.7
TrainingfortheCWclassifier
11
![Page 12: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/12.jpg)
HowtouseHivemall
MachineLearning
Training
Prediction
PredictionModel Label
FeatureVector
FeatureVector
Label
Prediction
12
![Page 13: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/13.jpg)
HowtouseHivemall- Prediction
CREATETABLElr_predictasSELECTt.rowid,sigmoid(sum(m.weight)) asprobFROMtesting_exploded tLEFTOUTERJOINlr_model mON(t.feature =m.feature)GROUPBYt.rowid
PredictionisdonebyLEFTOUTERJOINbetweentestdataandpredictionmodel
Noneedtoloadtheentiremodelintomemory
13
![Page 14: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/14.jpg)
14
Classification✓ Perceptron✓ PassiveAggressive(PA,PA1,PA2)✓ ConfidenceWeighted(CW)✓ AdaptiveRegularizationofWeightVectors(AROW)✓ SoftConfidenceWeighted(SCW)✓ AdaGrad+RDA✓ FactorizationMachines✓ RandomForestClassification
Regression✓LogisticRegression(SGD)✓PARegression✓AROWRegression✓AdaGrad (logisticloss)✓AdaDELTA (logisticloss)✓FactorizationMachines✓RandomForestRegression
ListofsupportedAlgorithms
![Page 15: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/15.jpg)
ListofsupportedAlgorithms
15
Classification✓ Perceptron✓ PassiveAggressive(PA,PA1,PA2)✓ ConfidenceWeighted(CW)✓ AdaptiveRegularizationofWeightVectors(AROW)✓ SoftConfidenceWeighted(SCW)✓ AdaGrad+RDA✓ FactorizationMachines✓ RandomForestClassification
Regression✓LogisticRegression(SGD)✓AdaGrad (logisticloss)✓AdaDELTA (logisticloss)✓PARegression✓AROWRegression✓FactorizationMachines✓RandomForestRegression
SCW is a good first choiceTry RandomForest if SCW does not work
Logistic regression is good for getting a probability of a positive class
Factorization Machines is good where features are sparse and categorical ones
![Page 16: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/16.jpg)
ListofAlgorithmsforRecommendation
16
K-NearestNeighbor✓ Minhash andb-BitMinhash
(LSHvariant)✓ SimilaritySearch onVectorSpace
(Euclid/Cosine/Jaccard/Angular)
MatrixCompletion✓MatrixFactorization✓ FactorizationMachines(regression)
each_top_k function of Hivemall is useful for recommending top-k items
![Page 17: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/17.jpg)
OtherSupportedAlgorithms
17
AnomalyDetection✓ LocalOutlierFactor(LoF)
FeatureEngineering✓FeatureHashing✓FeatureScaling
(normalization,z-score)✓ TF-IDFvectorizer✓ PolynomialExpansion
(FeaturePairing)✓ Amplifier
NLP✓BasicEnglist textTokenizer✓JapaneseTokenizer(Kuromoji)
![Page 18: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/18.jpg)
Agenda
1. IntroductiontoHivemall2. Recommendation1013. MatrixFactorization4. BayesianProbabilisticRanking
18
![Page 19: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/19.jpg)
•ExplicitFeedback• ItemRating• ItemRanking
•ImplicitFeedback• Positive-onlyImplicitFeedback
• Bought(ornot)• Click(ornot)• Converged(ornot)
19
Recommendation101
![Page 20: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/20.jpg)
•ExplicitFeedback• ItemRating• ItemRanking
•ImplicitFeedback• Positive-onlyImplicitFeedback
• Bought(ornot)• Click(ornot)• Converged(ornot)
20
Recommendation101
CaseforCoursehero?
![Page 21: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/21.jpg)
U/I Item1 Item2 Item3 … ItemI
User1 5 3
User2 2 1
… 3 4
UserU 1 4 5
21
ExplicitFeedback
![Page 22: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/22.jpg)
U/I Item1 Item2 Item3 … ItemI
User1 ? 5 ? ? 3
User2 2 ? 1 ? ?
… ? 3 ? 4 ?
UserU 1 ? 4 ? 5
22
ExplicitFeedback
![Page 23: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/23.jpg)
23
ExplicitFeedback
U/I Item1 Item2 Item3 … ItemI
User1 ? 5 ? ? 3
User2 2 ? 1 ? ?
… ? 3 ? 4 ?
UserU 1 ? 4 ? 5
• VerySparseDataset• #offeedbackissmall• Unknowndata>>Trainingdata• Userpreferencetorateditemsisclear• Hasnegativefeedbacks• Evaluationiseasy(MAE/RMSE)
![Page 24: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/24.jpg)
U/I Item1 Item2 Item3 … ItemI
User1 ⭕ ⭕
User2 ⭕ ⭕
… ⭕ ⭕
UserU ⭕ ⭕ ⭕
24
ImplicitFeedback
![Page 25: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/25.jpg)
U/I Item1 Item2 Item3 … ItemI
User1 ⭕ ⭕
User2 ⭕ ⭕
… ⭕ ⭕
UserU ⭕ ⭕ ⭕
25
ImplicitFeedback• SparseDataset• NumberofFeedbacksarelarge• Userpreferenceisunclear• No negative feedback• Known feedback maybe negative• Unknownfeedbackmaybepositive• Evaluationisnotsoeasy(NDCG,Prec@K,Recall@K)
![Page 26: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/26.jpg)
26
ProsandCons
ExplicitFeedback
ImplicitFeedback
Datasize L JUser preference J LDislike/Unknown J LImpact ofBias L J
![Page 27: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/27.jpg)
Agenda
1. IntroductiontoHivemall2. Recommendation1013. MatrixFactorization4. BayesianProbabilisticRanking
27
![Page 28: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/28.jpg)
28
MatrixFactorization/Completion
Factorizeamatrixintoaproductofmatriceshavingk-latentfactor
![Page 29: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/29.jpg)
29
MatrixCompletion How-to
• MeanRatingμ• RatingBiasforeachItem Bi• RatingBiasforeachUserBu
![Page 30: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/30.jpg)
30
MeanRating
MatrixFactorization
Regularization
Biasforeachuser/item
CriteriaofBiasedMFFactorization
Diffinprediction
![Page 31: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/31.jpg)
31
TrainingofMatrixFactorization
Support iterative training using local disk cache
![Page 32: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/32.jpg)
32
PredictionofMatrixFactorization
![Page 33: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/33.jpg)
Agenda
1. IntroductiontoHivemall2. Recommendation1013. MatrixFactorization4. BayesianProbabilisticRanking
33
StillinBetabutwillofficiallybesupportedsoon
![Page 34: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/34.jpg)
34
ImplicitFeedback
AnaïveL approachbyfillingunknowncellasnegative
![Page 35: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/35.jpg)
35
SamplingschemeforImplicitFeedback
Samplepairs<u,i,j>ofPositiveItemi andNegativeItem jforeachUseru
• UniformusersamplingØ Sampleauser.Then,sampleapair.
• UniformpairsamplingØ Samplepairsdirectory(dist.alongw/originaldataset)
• With-replacementorwithout-replacementsampling
U/I Item1 Item2 Item3 … ItemI
User1 ⭕ ⭕
User2 ⭕ ⭕
… ⭕ ⭕
UserU ⭕ ⭕ ⭕
DefaultHivemallsamplingscheme:- Uniformusersampling- Withreplacement
![Page 36: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/36.jpg)
•Rendle etal.,“BPR:BayesianPersonalizedRankingfromImplicitFeedback”,Proc.UAI,2009.
•Amostproven(?)algorithmforrecommendationforimplicitfeedback
36
BayesianProbabilisticRanking
Keyassumption:useru prefersitemi overnon-observeditem j
![Page 37: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/37.jpg)
BayesianProbabilisticRanking
37
ImagetakenfromRendle etal.,“BPR:BayesianPersonalizedRankingfromImplicitFeedback”,Proc.UAI,2009.http://www.algo.uni-konstanz.de/members/rendle/pdf/Rendle_et_al2009-Bayesian_Personalized_Ranking.pdf
BPRMF’staskcanbeconsideredfilling0/1theitem-itemmatrixandgettingprobabilityofI>uJ
![Page 38: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/38.jpg)
TrainbyBPR-MatrixFactoriaztion
38
![Page 39: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/39.jpg)
39
PredictbyBPR-MatrixFactorization
![Page 40: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/40.jpg)
40
PredictbyBPR-MatrixFactorization
![Page 41: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/41.jpg)
41
PredictbyBPR-MatrixFactorization
![Page 42: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/42.jpg)
42
RecommendationforImplicitFeedbackDataset
1. EfficientTop-kcomputationisimportantforprediction O(U*I)
2. Memoryconsumptionisheavyforwhereitemsize|i|islarge
• MyMediaLite requireslotsofmemory• MaximumdatasizeofMovielens:33,000moviesby240,000users,20millionratings
3. Bettertoavoidcomputingpredictionsforeachtime
![Page 43: Recommendation 101 using Hivemall](https://reader034.vdocuments.us/reader034/viewer/2022042908/58f1ed281a28ab411b8b45bb/html5/thumbnails/43.jpg)
43
WesupportmachinelearninginCloud
Anyfeaturerequest?Or,questions?