Tools andTechnologies for Large Scale DataMining
Jaganadh GProject Lead NLP R&D
365Media Pvt. [email protected]
DRDO Sponsored National Level Seminaron
Challenging Issues on Data Mining Semantic Web,Sri Krishna College of Engineering and Technology,
Coimbatore
27th Jan 2012
Jaganadh G Tools andTechnologies for Large Scale Data Mining
About me !!
Software Engineer Specializing in Text Analytics Research &Development
When free, teaches Python, Speaks about FOSS and blogs athttp://jaganadhg.in
Working as Project Lead (NLP) 365Media Pvt. Ltd.Coimbatore
I am a computational linguist / Linguist and Indologist, Bookreviewer
Maters Degree Holder in Sanskrit from University of Kerala
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning
Machine Learning
Machine learning is a subfield of artificial intelligence (AI)concerned with algorithms that allow computers to learn.
This talk is not aimed to give introduction about MachineLearning
Dont expect some mathy equations here
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Machine Learning and Our Life
Do you think that Machine Learning has any impact in our life??
Yes
In our day to day life we may use many Machine Learningpowered tools
E-mail spam filtering , product recommendations etc ..
Fraud detection
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Examples
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Tool for building Machine Learning powerd product/service
Apache Mahout
Apache Mahout is a scalable machine learning library that supportslarge data sets. Apache Mahout’s goal is to build scalable machinelearning libraries.
Commercially friendly licence
Well documented
Healthy community
Targeted to developers
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Algorithms in Apache Mahout
Collaborative Filtering
User and Item based recommenders
K-Means, Fuzzy K-Means clustering
Mean Shift clustering
Dirichlet process clustering
Latent Dirichlet Allocation
Singular value decomposition
Parallel Frequent Pattern mining
Complementary Naive Bayes classifier
Random forest decision tree based classifier
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Demo
Building recommendations engines with Mahout
Document Classification with Mahout
Some Python stuff on Machine Learning
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Reference
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Reference
Mahout in Action - Book by Sean Owen and Robin Anil,published by Manning Publications.
Taming Text - By Grant Ingersoll and Tom Morton, publishedby Manning Publications.
Introducing Apache Mahout - Grant Ingersoll - Intro toApache Mahout focused on clustering, classification andcollaborative filtering.https://www.ibm.com/developerworks/java/library/j-mahout/index.html
Programming Collective Intelligence: Building Smart Web 2.0Applicationshttp://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Useful Resources
Apache Mahout Site http://mahout.apache.org/
Apache Mahout Mailing List [email protected]
The code which I used for Mahout demo is available athttp://bitbucket.org/jaganadhg/blog/src/tip/bck9/java/
Twenty News Group data sethttp://people.csail.mit.edu/jrennie/20Newsgroups/20news-bydate.tar.gz
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Questions ??
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Acknowledgments
Thanks to :
Manning Publications for Review Copy of the book ”Mahoutin Action”
Apache Mahout mailing list members
Ted Dunning and Robin Anil for suggestions
Sreejith S and Biju B for Java help
@chelakkandupoda for review and criticism
Mukundhanchari R&D Director 365Media Pvt. Ltd. forsupport and encouragement
Jaganadh G Tools andTechnologies for Large Scale Data Mining
Finally
Jaganadh G Tools andTechnologies for Large Scale Data Mining