apache mahout - last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · agenda...

43
Apache Mahout Scaling Machine Learning Presented by: Isabel Drost

Upload: buiphuc

Post on 22-Nov-2018

223 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Apache Mahout

Scaling Machine Learning

Presented by:Isabel Drost

Page 2: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Agenda

● Motivation.

● Machine learning?

● Introducing Mahout.

● How can you help?

Page 3: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Some motivation.

Page 4: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

January 3, 2006 by Matt Callowhttp://www.flickr.com/photos/blackcustard/81680010

Page 5: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Follow news stories

Automatic topic tracker.Search through papers.

September 10, 2008 by Alex Barthhttp://www.flickr.com/photos/a-barth/2846621384

Page 6: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

March 7, 2008 by extranoise

http://www.flickr.com/photos/extranoise/2317950586/

Page 7: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Movie recommendation

Aggregate reviews from IMDB, twitter, ...

IMDB + movie reviews.

March 22, 2008 by Crystian Cruzhttp://www.flickr.com/photos/crystiancruz/2353895708

Page 8: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

● Lots and lots of data.

● Structured and unstructured.

Page 9: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Mission

Provide scalable data mining algorithms.

Page 10: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine Learning?

Page 11: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?
Page 12: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Density of ObjectDensity of Fluid

=.

WeightWeight−Apparent immersed weight

Archimedes generates model:

Page 13: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

June 25, 2008 by chase-mehttp://www.flickr.com/photos/sasy/2609508999

Page 14: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

March 28, 2007 by dullhunkhttp://www.flickr.com/photos/dullhunk/437551254

Page 15: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning generates model

Page 16: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 17: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

January 8, 2008 by Pink Sherbet Photographyhttp://www.flickr.com/photos/pinksherbet/2177961471/

Page 18: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 19: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

E-Bay

password

Differenttopic

Auctionstatus?

PhishingSpam?

Requestedpassword?

Page 20: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

01

1000011

One of your mails:

Apache

London

Hadoop

Lucene

London

. . .

Page 21: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 22: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?
Page 23: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 24: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Parameter tuning

● Penalty for mistakes.

● Kernel type for data transformation.

● Tune kernel parameters.

Page 25: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 26: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Training

● Build model from data.

Page 27: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 28: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Nature changes?

● Spammers adapt to spam filters.● Users write mails in different styles.● Expand to new languages.● ...

Page 29: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Machine learning pipeline

Gather data.(and meta data).

Identifycharacteristics.

Chose rightalgorithm.

Tune parametersof your algorithm.

Train on thegathered data.

Keep model in syncwhen nature changes.

Page 30: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Introducing Mahout

Page 31: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Classification

● Categorize data.

● Examples:● Identify spam mails.● Classify movies as “Action”, “Comedy” ...

Page 32: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Classification

● Naive bayes.

● Complementary naive bayes.

● Winnow/Perceptron

● Others upcoming.

Page 33: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Discovering groups of data

● Group data by similarity.

● Examples:● News articles by topic.● Developers by favorite modules.

Page 34: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Discovering groups of data

● Canopy.

● K-Means.

● Dirichlet based.

● PLSI.

● Others upcoming.

Page 35: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Recommendation mining

● Recommend items.

● Examples:● Find books a user my like.● Identify movies a user likes.

Page 36: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Upcoming

● More algorithms.

● More examples.

Page 37: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

What Mahout can do for you

“Why should I participate?”

Page 38: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Jumpstart your project with proven code.

January 8, 2008 by dreizehn28http://www.flickr.com/photos/1328/2176949559

Page 39: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Discuss with researchers and engineers.November 16, 2005 [phil h]

http://www.flickr.com/photos/hi-phi/64055296

Page 40: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Become a community member.

Page 41: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

s

October 22, 2008 by e_calamarhttp://www.flickr.com/photos/e_calamar/2964991182/

http://.../pub/mirrors/apache/lucene/mahout/0.1/

Thank you to all thosemaking this possible.

Page 42: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

[email protected]

[email protected]

● We need You:

● Enthusiasm.● Mathematical knowledge.● Proficiency in Hadoop.● Interest in understanding data.

July 9, 2006 by trackrecordhttp://www.flickr.com/photos/trackrecord/185514449

Page 43: Apache Mahout - Last.fmstatic.last.fm/.../isabel_drost-introducing_apache_mahout.pdf · Agenda Motivation. Machine learning? Introducing Mahout. How can you help?

Some advertising

Berlin - June* at 5p.m.

newthinking store Berlin

Tucholskystr. 48

Hadoop** User/Developer Meeting Germany

* Exact date is set by speaker – that is you!

** Lucene, Tika, Solr, UIMA, Mahout, katta, ... people welcome.