distributed machine learning: 1. a new era
TRANSCRIPT
Distributed Machine Learning
Yi Wang
Story Outline
• Use existing frameworks (2007~2010)
• Methods: Frequent itemset mining, Collaborative filtering, Spectral clustering, Graph partitioning, Restricted Boltzmann machine, Latent topic modeling
• Frameworks: MPI, MapReduce, Pregel, GBR
• Developing frameworks (2010~2014)
• MapReduce Lite (C++) for language models
• Peacock (Go) for latent topic modeling
Lessons
• Internet services relies on machine intelligence
• Intelligence comes from learning users’ behavior
• Value lies in long tails ▹
• It is more about big than fast
• Good system = good algorithm + good architecture
• More about engineering than math
• It is Industrial Revolution!
Pitfalls• De-noise data ▹
• Parallelize models in papers and textbooks
• Use existing frameworks
• MPI
• Mix frameworks with cluster operating systems ▹
• Less talking about production
• Use standard measures
• Java or Python ▹
Environment
• Balance business with killing tech development
• Separated
• Combined
• Switching
• Standalone business
• ML software
• ML frameworks
• MLaaP or MLaaS
◀
Webpages Description Support
www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com
Figure 5: Examples of mining tag-tags and webpage-webpages relationships.
Webpages Description Support
www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com
Figure 5: Examples of mining tag-tags and webpage-webpages relationships.
Long-tail is scale-free. Mean and median make no sense with long-tail distributions. Pie-charts make no sense either.
◀
application PageRank Indexing pCTR DNN
framework Pregel MapReduce SETI DistBelief
middlewareChubby (Zookeeper, etcd),
Bigtable (HBase), memcachg (memcached)
cluster OS Borg (Mesos, YARN, Kubernetes)
filesystem GFS (HDFS)
◀
var resp *Responseselect { case b := <- rpc.Call("B"): resp = extract(b) case c := <- rpc.Call("C"): resp = extract(c) case e := <- rpc.Call("E"): resp = extract(e) case <- time.Timeout(1*second): resp = nil}// use resp here.
var mutex = NewMutex();var returns = 0;var timer = setTimeout(timeout,1*second);!function rpcResp(resp) { mutex.Lock(); if (returns == 0) { clearTimeout(timer); use(resp); } returns++; mutex.Unlock();}
function timeout() { mutex.Lock(); returns++; mutex.Unlock();}!rpc.Call("B", rpcResp);rpc.Call("C", rpcResp);rpc.Call("D", rpcResp);
◀