tools andtechnologies for large scale data mining
DESCRIPTION
Tools andTechnologies for Large Scale DataMiningTRANSCRIPT
![Page 1: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/1.jpg)
Tools andTechnologies for Large Scale DataMining
Jaganadh GProject Lead NLP R&D
365Media Pvt. [email protected]
DRDO Sponsored National Level Seminaron
Challenging Issues on Data Mining Semantic Web,Sri Krishna College of Engineering and Technology,
Coimbatore
27th Jan 2012
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 2: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/2.jpg)
About me !!
Software Engineer Specializing in Text Analytics Research &Development
When free, teaches Python, Speaks about FOSS and blogs athttp://jaganadhg.in
Working as Project Lead (NLP) 365Media Pvt. Ltd.Coimbatore
I am a computational linguist / Linguist and Indologist, Bookreviewer
Maters Degree Holder in Sanskrit from University of Kerala
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 3: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/3.jpg)
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 4: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/4.jpg)
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 5: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/5.jpg)
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 6: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/6.jpg)
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 7: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/7.jpg)
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 8: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/8.jpg)
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 9: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/9.jpg)
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 10: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/10.jpg)
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 11: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/11.jpg)
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 12: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/12.jpg)
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 13: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/13.jpg)
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 14: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/14.jpg)
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 15: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/15.jpg)
Tool for building Machine Learning powerd product/service
Apache Mahout
Apache Mahout is a scalable machine learning library that supportslarge data sets. Apache Mahout’s goal is to build scalable machinelearning libraries.
Commercially friendly licence
Well documented
Healthy community
Targeted to developers
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 16: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/16.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 17: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/17.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 18: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/18.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 19: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/19.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 20: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/20.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 21: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/21.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 22: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/22.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 23: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/23.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 24: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/24.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 25: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/25.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 26: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/26.jpg)
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 27: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/27.jpg)
Demo
Building recommendations engines with Mahout
Document Classification with Mahout
Some Python stuff on Machine Learning
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 28: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/28.jpg)
Reference
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 29: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/29.jpg)
Reference
Mahout in Action - Book by Sean Owen and Robin Anil,published by Manning Publications.
Taming Text - By Grant Ingersoll and Tom Morton, publishedby Manning Publications.
Introducing Apache Mahout - Grant Ingersoll - Intro toApache Mahout focused on clustering, classification andcollaborative filtering.https://www.ibm.com/developerworks/java/library/j-mahout/index.html
Programming Collective Intelligence: Building Smart Web 2.0Applicationshttp://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 30: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/30.jpg)
Useful Resources
Apache Mahout Site http://mahout.apache.org/
Apache Mahout Mailing List [email protected]
The code which I used for Mahout demo is available athttp://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
Twenty News Group data sethttp://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 31: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/31.jpg)
Questions ??
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 32: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/32.jpg)
Acknowledgments
Thanks to :
Manning Publications for Review Copy of the book ”Mahoutin Action”
Apache Mahout mailing list members
Ted Dunning and Robin Anil for suggestions
Sreejith S and Biju B for Java help
@chelakkandupoda for review and criticism
Mukundhanchari R&D Director 365Media Pvt. Ltd. forsupport and encouragement
Jaganadh G Tools andTechnologies for Large Scale Data Mining
![Page 33: Tools andTechnologies for Large Scale Data Mining](https://reader038.vdocuments.us/reader038/viewer/2022103114/555a6654d8b42ae7218b4afd/html5/thumbnails/33.jpg)
Finally
Jaganadh G Tools andTechnologies for Large Scale Data Mining