distributed machine learning: 1. a new era

Post on 19-Jul-2015

178 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Distributed Machine Learning

Yi Wang

Story Outline

• Use existing frameworks (2007~2010)

• Methods: Frequent itemset mining, Collaborative filtering, Spectral clustering, Graph partitioning, Restricted Boltzmann machine, Latent topic modeling

• Frameworks: MPI, MapReduce, Pregel, GBR

• Developing frameworks (2010~2014)

• MapReduce Lite (C++) for language models

• Peacock (Go) for latent topic modeling

Lessons

• Internet services relies on machine intelligence

• Intelligence comes from learning users’ behavior

• Value lies in long tails ▹

• It is more about big than fast

• Good system = good algorithm + good architecture

• More about engineering than math

• It is Industrial Revolution!

Pitfalls• De-noise data ▹

• Parallelize models in papers and textbooks

• Use existing frameworks

• MPI

• Mix frameworks with cluster operating systems ▹

• Less talking about production

• Use standard measures

• Java or Python ▹

Environment

• Balance business with killing tech development

• Separated

• Combined

• Switching

• Standalone business

• ML software

• ML frameworks

• MLaaP or MLaaS

Webpages Description Support

www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com

Figure 5: Examples of mining tag-tags and webpage-webpages relationships.

Webpages Description Support

www.openmoko.org Cell phone software 2607www.grandcentral.com download related webwww.zyb.com sites.www.simpy.com Four social 242www.furl.net bookmarking serviceswww.connotea.org web sites, includingdel.icio.us del.ici.os. The lastwww.masternewmedia.org/ne one is an article itws/2006/12/01/social bookm discuss this.arking services and tools.htmwww.troovy.com Five online maps. 240www.flagr.comhttp://outside.inwww.wayfaring.comhttp://flickrvision.commail.google.com/mail Web sites related to 204www.google.com/ig GMail service.gdisk.sourceforge.netwww.netvibes.comwww.trovando.it Six fancy search 151www.kartoo.com engines.www.snap.comwww.clusty.comwww.aldaily.comwww.quintura.comwwwl.meebo.com Integrated instant 112www.ebuddy.com message softwarewww.plugoo.com web sites.www.easyhotel.com Traveling agency web 109www.hostelz.com sites.www.couchsurfing.comwww.tripadvisor.comwww.kayak.comwww.easyjet.com/it/prenota Italian traveling 98www.ryanair.com/site/IT agency web sites.www.edreams.itwww.expedia.itwww.volagratis.com/vg1www.skyscanner.netwww.google.com/codesearch Three code search web 98www.koders.com sites and two articleswww.bigbold.com/snippets talking about codewww.gotapi.com search.0xcc.net/blog/archives/000043.htmlwww.google.co.jp Four Japanese web 36www.livedoor.com search engines.www.baidu.jpwww.namaan.netwww.operator11.com TV, media streaming 34www.joost.com related web sites.www.keepvid.comwww.getdemocracy.comwww.masternewmedia.orgwww.technorati.com From these web sites, 17www.listible.com you can get relevantwww.popurls.com resource quickly.www.trobar.org/prosody All web sites are 9librarianchick.pbwiki.com about literature.www.quotationspage.comwww.visuwords.com

Figure 5: Examples of mining tag-tags and webpage-webpages relationships.

Long-tail is scale-free. Mean and median make no sense with long-tail distributions. Pie-charts make no sense either.

application PageRank Indexing pCTR DNN

framework Pregel MapReduce SETI DistBelief

middlewareChubby (Zookeeper, etcd),

Bigtable (HBase), memcachg (memcached)

cluster OS Borg (Mesos, YARN, Kubernetes)

filesystem GFS (HDFS)

var resp *Responseselect { case b := <- rpc.Call("B"): resp = extract(b) case c := <- rpc.Call("C"): resp = extract(c) case e := <- rpc.Call("E"): resp = extract(e) case <- time.Timeout(1*second): resp = nil}// use resp here.

var mutex = NewMutex();var returns = 0;var timer = setTimeout(timeout,1*second);!function rpcResp(resp) { mutex.Lock(); if (returns == 0) { clearTimeout(timer); use(resp); } returns++; mutex.Unlock();}

function timeout() { mutex.Lock(); returns++; mutex.Unlock();}!rpc.Call("B", rpcResp);rpc.Call("C", rpcResp);rpc.Call("D", rpcResp);

top related