distributed machine learning: 1. a new era

11
Distributed Machine Learning Yi Wang

Upload: yi-wang

Post on 19-Jul-2015

176 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Distributed Machine Learning:  1. A New Era

Distributed Machine Learning

Yi Wang

Page 2: Distributed Machine Learning:  1. A New Era

Story Outline

• Use existing frameworks (2007~2010)

• Methods: Frequent itemset mining, Collaborative filtering, Spectral clustering, Graph partitioning, Restricted Boltzmann machine, Latent topic modeling

• Frameworks: MPI, MapReduce, Pregel, GBR

• Developing frameworks (2010~2014)

• MapReduce Lite (C++) for language models

• Peacock (Go) for latent topic modeling

Page 3: Distributed Machine Learning:  1. A New Era

Lessons

• Internet services relies on machine intelligence

• Intelligence comes from learning users’ behavior

• Value lies in long tails ▹

• It is more about big than fast

• Good system = good algorithm + good architecture

• More about engineering than math

• It is Industrial Revolution!

Page 4: Distributed Machine Learning:  1. A New Era

Pitfalls• De-noise data ▹

• Parallelize models in papers and textbooks

• Use existing frameworks

• MPI

• Mix frameworks with cluster operating systems ▹

• Less talking about production

• Use standard measures

• Java or Python ▹

Page 5: Distributed Machine Learning:  1. A New Era

Environment

• Balance business with killing tech development

• Separated

• Combined

• Switching

• Standalone business

• ML software

• ML frameworks

• MLaaP or MLaaS

Page 6: Distributed Machine Learning:  1. A New Era

Page 7: Distributed Machine Learning:  1. A New Era

Webpages Description Support

www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com

Figure 5: Examples of mining tag-tags and webpage-webpages relationships.

Webpages Description Support

www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com

Figure 5: Examples of mining tag-tags and webpage-webpages relationships.

Page 8: Distributed Machine Learning:  1. A New Era

Long-tail is scale-free. Mean and median make no sense with long-tail distributions. Pie-charts make no sense either.

Page 9: Distributed Machine Learning:  1. A New Era

application PageRank Indexing pCTR DNN

framework Pregel MapReduce SETI DistBelief

middlewareChubby (Zookeeper, etcd),

Bigtable (HBase), memcachg (memcached)

cluster OS Borg (Mesos, YARN, Kubernetes)

filesystem GFS (HDFS)

Page 10: Distributed Machine Learning:  1. A New Era

var resp *Responseselect { case b := <- rpc.Call("B"): resp = extract(b) case c := <- rpc.Call("C"): resp = extract(c) case e := <- rpc.Call("E"): resp = extract(e) case <- time.Timeout(1*second): resp = nil}// use resp here.

Page 11: Distributed Machine Learning:  1. A New Era

var mutex = NewMutex();var returns = 0;var timer = setTimeout(timeout,1*second);!function rpcResp(resp) { mutex.Lock(); if (returns == 0) { clearTimeout(timer); use(resp); } returns++; mutex.Unlock();}

function timeout() { mutex.Lock(); returns++; mutex.Unlock();}!rpc.Call("B", rpcResp);rpc.Call("C", rpcResp);rpc.Call("D", rpcResp);