what is jubatus? how it works for you?
TRANSCRIPT
![Page 1: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/1.jpg)
What is Jubatus?How it works for you?
NTT SIC Hiroki Kumazaki
![Page 2: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/2.jpg)
Jubatus is…• A Distributed Online Machine-Learning framework
• Distributed– Fault-Tolerance– Scale out
• Online– Fixed time computation
• Machine-Learning– More than “word count”!
![Page 3: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/3.jpg)
Architecture• ML model is combined with feature-extractor
MachineLearningModel
FeatureExtractor
Jubatus Server
Jubatus RPC
![Page 4: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/4.jpg)
Architecture
• Distributed Computation– Shared-Everything Architecture• It’s fast and fault-tolerant!
Mix
![Page 5: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/5.jpg)
Architecture
• It looks as if one server running.
Client
Jubatus RPC
Proxy
![Page 6: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/6.jpg)
Architecture
• It looks as if one server running– You can use single local Jubatus server for develop– Multiple Jubatus server cluster for production
Client
Jubatus RPC
The same RPC!
![Page 7: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/7.jpg)
Architecture• With heavy load…
Client
Jubatus RPC
Proxy
![Page 8: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/8.jpg)
Architecture• Dynamically scale-out!
Client
Jubatus RPC
Proxy
![Page 9: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/9.jpg)
Architecture• Whenever servers break down– Proxy conceals failures, so the service will continue.
Client
Jubatus RPC
Proxy
![Page 10: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/10.jpg)
Architecture
• Multilanguage client library– gem, pip, cpan, maven Ready!– It essentially uses a messagepack-rpc.
• So you can use OCaml, Haskell, JavaScript, Go with your own risk.
Client
Jubatus RPC
![Page 11: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/11.jpg)
Architecture• Many ML algorithms– Classifier– Recommender– Anomaly Detection– Clustering– Regression– Graph Mining
Useful!
![Page 12: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/12.jpg)
Classifier• Task: Classification of Datum
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) endendif __FILE__ == $0 puts fib(ARGV[0].to_i)end
Sample Task: Classify what programming language used
It’s It’s
![Page 13: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/13.jpg)
Classifier• Set configuration in the Jubatus server
ClassifierFreatureExtractor
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
Feature Extractor
![Page 14: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/14.jpg)
Classifier• Configuration JSON– It does “feature vector design”– very important step for machine learning
"converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf“ } ]}
setteings for extract feature from string
define function named “bigram”
original embedded function “ngram”
pass “2” to “ngram” to create “bigram”
for all dataapply “bigram”
feature weights based on tf/idfsee wikipedia/tf-idf
![Page 15: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/15.jpg)
Classifier• Feature Extractor becomes “bigram extractor”
Classifierbigramextractor
![Page 16: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/16.jpg)
Feature Extractor• What bigram extractor does?
bigramextractor
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
Feature Vector
![Page 17: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/17.jpg)
Classifier• Training model with feature vectors
key valueim 1mp 1po 1... ...): 1... ...de 1ef 1... ...
Classifier
key valuepu 1ut 1... ...{| ...|m 1m| 1{| 1en 1nd 1
key value@a 1$_ 1... ...my ...su 1ub 1us 1se 1... ...
![Page 18: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/18.jpg)
Classifier• Set configuration in the Jubatus server
Classifier
"method" : "AROW","parameter" : { "regularization_weight" : 1.0}
Feature Extractor
bigramextractor Classifier Algorithms
• Perceptron• Passive Aggressive• Confidence Weight• Adaptive Regularization of Weights• Normal Her d
![Page 19: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/19.jpg)
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuesi 1il 1... ...{| 1... ...
It’s
![Page 20: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/20.jpg)
Classifier• Use model to classification task– Jubatus will find clue for classification
AROW
key valuere 1): 1
... ...s[ 1... ...
It’s
![Page 21: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/21.jpg)
Via RPC• call feature extraction and classification from
client via RPC
AROWbigramextractor
lang = client.classify([sourcecode])
import sys
def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2)
if __name__ == “__main__”: print(fib(int(sys.argv[1])))
key value
im 1
mp 1
po 1
... ...
): 1
... ...
de 1
ef 1
... ...
It may be
![Page 22: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/22.jpg)
What classifier can do?• You can – estimate the topic of tweets– trash spam mail automatically– monitor server failure from syslog– estimate sentiment of user from blog post– detect malicious attack– find what feature is the best clue to classification
![Page 23: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/23.jpg)
What classifier cannot do• You cannot– train model from data without supervised answer– create a class without knowledge of the class– get fine model without correct feature designing
![Page 24: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/24.jpg)
How to use?• see examples in
http://github.com/jubatus/jubatus-example – gender– shogun– malware classification– language detection
![Page 25: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/25.jpg)
Recommender• Task: what datum is similar to the datum?
Name Star Wars
Harry Potter Star Trek Titanic Frozen
John 4 3 2 2
Bob 5 3
Erika 1 3 4 5
Jack 2 5
Ann 4 5
Emily 1 4 2 5 4
Which movie should we recommend Ann?
![Page 26: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/26.jpg)
Recommender• Do recommendation based on Nearest Neighbor
Movie Rating(high-dimensional)
Science Fiction
Star Trek loverJohn
Jack
Love RomanceFantasy
Erika
Ann
StarWars loverBob
Emily
Near
Far
![Page 27: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/27.jpg)
Recommender• Ann and Emily is near– we should recommend Flozen for Ann
Name Star Wars
Harry Potter Star Trek Titanic Frozen
Ann 4 5 ★
Emily 1 4 2 5 4
I bet Ann would like it!
![Page 28: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/28.jpg)
Recommender with Feature Extractor• Recommender server consist of Feature Extractor
and Recommender engine.– Jubatus calculates distance between feature vectors
RecommenderFeatureExtractor
Recommender Engine can use• Minhash• Locality Sensitive Hashing• Euclid Locality Sensitive Hashingfor defining distance.
![Page 29: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/29.jpg)
Recommender with Feature Extractor• Jubatus maps data in feature space– There are distances between data• How are they near or far?
key value
pu 1
ut 1
... ...
{| ...
|m 1
m| 1
{| 1
FeatureExtractor
key value
im 1
mp 1
... ...
... ...
“{ 1
fo 1
... ...
key value
Ma 1
ap 1
... ...
in 1
nt 1
te 1
er 1
Recommender
Ruby
Python
Java
![Page 30: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/30.jpg)
What Recommender can do?• You can– create recommendation engine in e-commerce– calculate similarity of tweets– find similar directional NBA player– visualize distance between “Star Wars” and “Star Trek”
![Page 31: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/31.jpg)
What Recommender cannot do?• You cannot– Label data(use classifier!)– get decision tree– get a-priori based recommendation
![Page 32: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/32.jpg)
Anomaly Detection• Task: Which datum is far from the others?
![Page 33: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/33.jpg)
Anomaly Detection• Task: Which datum is far from the others?
This One!
![Page 34: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/34.jpg)
Anomaly Detection• Distance based detection is not good– We cannot decide appropriate threshold of distance
Distance is equal!
![Page 35: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/35.jpg)
Anomaly Detection with Feature Extractor
• Anomaly detection server consist of Feature Extractor and anomaly detection engine.– Jubatus finds outlier from feature vectors
AnomalyDetection
FeatureExtractor
Anomaly Detection Engine can use• Minhash• Locality Sensitive Hashing• Euclid Locality Sensitive Hashingfor defining distance.
![Page 36: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/36.jpg)
Anomaly Detection• jubaanomaly can do it!– It base on local outlier factor algorithm
key value
pu 1
ut 1
... ...
{| ...
|m 1
m| 1
{| 1
FeatureExtractor
key value
im 1
mp 1
... ...
... ...
“{ 1
fo 1
... ...
key value
Ma 1
ap 1
... ...
in 1
nt 1
te 1
er 1
AnomalyDetection
Outlier!
![Page 37: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/37.jpg)
What Anomaly Detection can do?• You (might) can – find outlier– grasp the trend and overview of current data stream– detect or predict server's failure– protect Web services from zero-day attacks
![Page 38: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/38.jpg)
What Anomaly Detection cannot do?• You cannot– know the cluster distribution of data– find any kinds of outliers with 100% accuracy– easily understand how each outlier occurs– know why a datum is assigned high outlier score
![Page 39: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/39.jpg)
Conclusion• Jubatus have embedded feature extractor with
algorithms.• User should configure both feature extractor and
algorithm properly• Client use configured machine learning via
Jubatus-RPC• Classifier and Recommender and Anomaly may
be useful for your task.
![Page 40: What is jubatus? How it works for you?](https://reader036.vdocuments.us/reader036/viewer/2022062406/5587a06cd8b42a31368b45a1/html5/thumbnails/40.jpg)
DEMO
• I try to run the jubatus-example.