machine learning software + intro weka · 2006-02-14 · weka: clusterers weka contains...
TRANSCRIPT
Machine Learning Software +Intro WEKA
Oliver Brdiczka
Equipe PRIMAINRIA Rhône-Alpes
Outline
� Machine Learning Software� MATLAB
� Orange
� Torch3
� R language
� WEKA
� YALE
� Short introduction to WEKA
Machine Learning Software
� MATLAB toolboxes:
� Many toolboxes for different machine learning areas
� E.g: SPIDER, PRTools (pattern recognition), BNT (bayesian networks) …
� Need a license of MATLAB! (or use scilab)
Machine Learning Software
� Orange (University of Ljubljana)
� Focus on data mining + visualization
� C++ components + Python scripting
� GUI
� Linux, MS Windows, Macintosh
� GNU General Public license
Machine Learning Software
� TORCH3 (IDIAP)� (Statistical) machine learning library� C++ library� Linux, MS Windows� BSD license
� R language� Language/environment for statistical computingand graphics (free impl. of S language)
� C++� Linux, MS Windows, Macintosh� GNU General Public license
Machine Learning Software
� WEKA (University Waikato,New Zealand)
� Machine learning/data mining software
� Java-based
� GUI
� GNU General Public license
Machine Learning Software
� YALE (University of Dortmund)
� Environment for machine learning experiments (Experiment editor)
� Java-based
� GUI
� Integration of WEKA learners
� GNU General Public license
Machine Learning Software
� Others…
� Libraries for specific learning algorithms:� HMM: ghmm (C++), jahmm (Java)
� Graphical Bayesian Models: gmtk (C++)
� …
Short introduction to WEKA
WEKA: main features
� Comprehensive set of data pre-processing tools, learning algorithms and evaluation methods
� Graphical user interfaces (incl. data visualization)
� Environment for comparing learning algorithms
WEKA: data format ARFF
@relation weather
@attribute outlook {sunny, overcast, rainy}@attribute temperature numeric@attribute humidity numeric@attribute windy {TRUE, FALSE}@attribute play {yes, no}
@datasunny,85,85,FALSE,nosunny,80,90,TRUE,noovercast,83,86,FALSE,yesrainy,70,96,FALSE,yesrainy,68,80,FALSE,yes
WEKA: data import
� Data can be imported from a file in various formats: ARFF, CSV, C4.5, binary
� Data can also be read from a URL or from an SQL database (using JDBC)
� Pre-processing tools in WEKA are called “filters”
WEKA: filters
� WEKA contains filters for:
� Discretization, normalization, resampling, attribute selection, transforming and combining attributes, …
WEKA: classifiers
� Classifiers in WEKA are models for predicting nominal or numeric quantities
� Implemented learning schemes include:� Decision trees and lists, instance-based classifiers, support vector machines, multi-layer perceptrons, logistic regression, Bayes’ nets, …
� “Meta”-classifiers include:� Bagging, boosting, stacking, error-correcting output codes, locally weighted learning, …
WEKA: clusterers
� WEKA contains “clusterers” for finding groups of similar instances in a dataset
� Implemented schemes are:� k-Means, EM, Cobweb, FarthestFirst …
� Clusters can be visualized and compared to “true” clusters (if given)
� Evaluation based on loglikelihood if clustering scheme produces a probability distribution
WEKA: API documentation
� javadoc
WEKA: User Interfaces
� Simple Command Line Interface
� Explorer
� Filters, classifiers, clusterers, visualization
� Experimenter
� Comparing different learning algorithms
� Knowledge Flow
� Graphical programming tool
Conclusion
� Important tool for machine learningproblems
� Used by many research groups
� Many extensions are available for WEKA:
� Spectral clustering, time series mining, gridcomputing, document classification and clustering, vector quantization, rulediscovery, parallel processing …
References (web)
� MATAB toolboxes:
� SPIDER: http://www.kyb.tuebingen.mpg.de/bs/people/spider/main.html
� PRTools:
http://www.prtools.org/
� BNT:
http://bnt.sourceforge.net/
References (web)
� Orange
http://www.ailab.si/orange
� TORCH3
http://www.torch.ch/
� R language
http://www.r-project.org/
References (web)
� WEKA
http://www.cs.waikato.ac.nz/ml/weka/
� YALE
http://www-ai.cs.uni-dortmund.de/SOFTWARE/YALE/index.html
� Other:� Jahmm:
http://www.run.montefiore.ulg.ac.be/~francois/software/jahmm/
� Ghmm: http://www.ghmm.org/
� Gmtk: http://ssli.ee.washington.edu/~bilmes/gmtk/