things to know
- you need data labeled with the correct answers to “train” these algorithms before they work
things to know
- you need data labeled with the correct answers to “train” these algorithms before they work
- feature = dimension = column = attribute of the data
things to know
- you need data labeled with the correct answers to “train” these algorithms before they work
- feature = dimension = column = attribute of the data
- class = category = label = Harry Potter house
BIG CAVEATOften times choosing/creating good features or gathering more data will help more than changing algorithms...
training data
import numpy as np X = np.array([[1, 0.1], [3, 0.2], [5, 0.1]…]) y = np.array([1, 2, 1])
training data
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis model = LinearDiscriminantAnalysis() model.fit(X, y)
logistic regression“divide it with a logistic function”
from sklearn.linear_model import LogisticRegression model = LogisticRegression() model.fit(X,y) predicted = model.predict(z)
SVMs (support vector machines)“*advanced* draw a line through it”
- better definition of “terrible”- lines can turn into non-linear shapes if you transform your data
SVMs (support vector machines)“*advanced* draw a line through it”
figure credit: scikit-learn documentation
SVMs (support vector machines)“*advanced* draw a line through it”
from sklearn.svm import SVC model = SVC(kernel='poly', degree=2) model.fit(X,y) predicted = model.predict(z)
SVMs (support vector machines)“*advanced* draw a line through it”
from sklearn.svm import SVC model = SVC(kernel='rbf') model.fit(X,y) predicted = model.predict(z)
KNN (k-nearest neighbors)“what do similar cases look like?”
figure credit: scikit-learn documentation
KNN (k-nearest neighbors)“what do similar cases look like?”
from sklearn.neighbors import NearestNeighbors model = NearestNeighbors(n_neighbors=5) model.fit(X,y) predicted = model.predict(z)
decision tree learnersmake a flow chart of it
from sklearn.tree import DecisionTreeClassifier model = DecisionTreeClassifier() model.fit(X,y) predicted = model.predict(z)
baggingsplit training set, train one model each, models “vote”
new data point
not spam
spam
not spam
baggingsplit training set, train one model each, models “vote”
new data point
not spam
spam
not spam
not spam
Final Answer:
other spins on this
Random Forest - like bagging, but at each split randomly constrain features to choose from
other spins on this
Random Forest - like bagging, but at each split randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees
other spins on this
Random Forest - like bagging, but at each split randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.
other spins on this
Random Forest - like bagging, but at each split randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.
Boosting- train models in order, make the later ones focus on the points the earlier ones missed
other spins on this
Random Forest - like bagging, but at each split randomly constrain features to choose from
Extra Trees - for each split, make it randomly, non- optimally. Compensate by training a ton of trees
Voting - combine a bunch of different models of your design, have them “vote” on the correct answer.
Boosting- train models in order, make the later ones focus on the points the earlier ones missed
from sklearn.ensemble import BaggingClassifierRandomForestClassifier ExtraTreesClassifier VotingClassifier AdaBoostClassifier GradientBoostingClassifier
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of nearby points)
no
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of nearby points)
no
Naïve Bayes yes yes no
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times that feature is used)
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times that feature is used)
Ensemble models yes kinda (% of models that agree)
yes, depending on component parts
Nonlinear decision boundary
provide probability estimates
tell how important a feature is to the model
Logistic Regression no yes yes, if you scale
SVMs yes, with kernel no no
KNN yes kinda (percent of nearby points)
no
Naïve Bayes yes yes no
Decision Tree yes no yes (number of times that feature is used)
Ensemble models yes kinda (% of models that agree)
yes, depending on component parts
Boosted models yes kinda (% of models that agree)
yes, depending on component parts
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
KNN yes yes
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
KNN yes yes
Naïve Bayes yes yes
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
Ensemble models kinda, by adding new models to the ensemble
yes
can be updated with new training data
easy to parallelize?
Logistic Regression kinda kinda
SVMs kinda, depending on kernel yes for some kernels, no for others
KNN yes yes
Naïve Bayes yes yes
Decision Tree no no (but it’s very fast)
Ensemble models kinda, by adding new models to the ensemble
yes
Boosted models kinda, by adding new models to the ensemble
no
Other quirksSVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)
Other quirksSVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution can deal with missing data
Other quirksSVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution can deal with missing data
Decision Tree can provide literal flow charts very sensitive to outliers
Other quirksSVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution can deal with missing data
Decision Tree can provide literal flow charts very sensitive to outliers
Ensemble models less prone to overfitting than their component parts
Other quirksSVMs have to pick a kernel
KNN you need to define what “similarity” is in a good way. fast to train, slow to classify (compared to other methods)
Naïve Bayes have to choose the distribution can deal with missing data
Decision Tree can provide literal flow charts very sensitive to outliers
Ensemble models less prone to overfitting than their component parts
Boosted models many parameters to tweak more prone to overfit than normal ensembles most popular Kaggle winners use these