kddm2 team24 machine learningkti.tugraz.at/.../2017/presentations/final/team-24.pdf · final...
TRANSCRIPT
Final presentationProject 4: Machine Learning
Josef Koini [ #24 ]Knowledge Discovery and Data Mining 2
TU Graz
29 June 2017
Recapitulation
29 June 2017 [KDDM2] Final presentation 2
What happened before...#Problem: automatic tagging of songs#Data set: Last.fm data set#Planned approach: Classification with Naive Bayes
29 June 2017 [KDDM2] Final presentation 3
Approach
29 June 2017 [KDDM2] Final presentation 4
Example{
"artist": "Neil Diamond","timestamp": "2011-08-09 02:43:27.936416","similars": [["TRWERMW128F92D19EB", 0.891814],
...,["TREJCAS128F9309618", 0.00048260300000000001]],
"tags": [["Soundtrack", "100"],["soft rock", "33"],["Neil Diamond", "33"],["brooklyn connections", "16"],["new york connections", "16"],["stage-and-screen", "16"],...,["diamond", "16"], ["male vocalists", "16"],
["cinematic", "16"], ["american", "16"]],"track_id": "TRJFKKR128F92D1950","title": "Dear Father”
}
29 June 2017 [KDDM2] Final presentation 5
Example{
"artist": "Neil Diamond","timestamp": "2011-08-09 02:43:27.936416","similars": [["TRWERMW128F92D19EB", 0.891814],
...,["TREJCAS128F9309618", 0.00048260300000000001]],
"tags": [["Soundtrack", "100"],["soft rock", "33"],["Neil Diamond", "33"],["brooklyn connections", "16"],["new york connections", "16"],["stage-and-screen", "16"],...,["diamond", "16"], ["male vocalists", "16"],
["cinematic", "16"], ["american", "16"]],"track_id": "TRJFKKR128F92D1950","title": "Dear Father”
}
29 June 2017 [KDDM2] Final presentation 6
Feature preparation#Bag of words approach# Title# Artist
#Bigrams#Minimum feature appearance#Tf-idf#Feature selection#Removing tracks without tags from training set
29 June 2017 [KDDM2] Final presentation 7
Classification#Scikit-learn#Multinomial Naive Bayes classifier#Top n tags#2 classifiers per tag# Title# Artist
29 June 2017 [KDDM2] Final presentation 8
Results
29 June 2017 [KDDM2] Final presentation 9
Evaluation#Accuracy#Precision#Recall#F1-measure
29 June 2017 [KDDM2] Final presentation 10
Problems#Unbalanced classes#Tracks without tags
29 June 2017 [KDDM2] Final presentation 11
Tags
29 June 2017 [KDDM2] Final presentation 12
Tag distribution
29 June 2017 [KDDM2] Final presentation 13
Accuracy
29 June 2017 [KDDM2] Final presentation 14
86,12% 87,89%90,78%
87,66% 92,40% 94,51% 96,38% 97,71% 98,83% 99,31%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 5 10 20 50 100 200 500 1000
Precision
29 June 2017 [KDDM2] Final presentation 15
64,57% 64,38% 64,43%
42,54%
51,03% 49,47% 53,11% 55,51% 57,97% 59,05%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 5 10 20 50 100 200 500 1000
Recall
29 June 2017 [KDDM2] Final presentation 16
68,15% 63,28% 62,33%
65,30%
57,71% 54,63%
46,12% 40,15%
33,68% 29,29%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 5 10 20 50 100 200 500 1000
F1
29 June 2017 [KDDM2] Final presentation 17
66,31% 63,82% 63,36%
51,52% 54,17% 51,92% 49,37% 46,60% 42,60%
39,16%
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
1 2 5 10 20 50 100 200 500 1000
Conclusion
29 June 2017 [KDDM2] Final presentation 18
To put it in a nutshell...#Number of tags influences the strategy#Efficient method#Fairly good results
29 June 2017 [KDDM2] Final presentation 19
Questions?Don‘t hesitate to ask ;)
29 June 2017 [KDDM2] Final presentation 20