machine learning in practice lecture 9

Machine Learning in PracticeLecture 9

Carolyn Penstein RoséLanguage Technologies Institute/

Human-Computer Interaction Institute

Plan for the Day Announcements

Questions?Assignment 4Quiz

Today’s Data Set: Speaker Identification Weka helpful hints

Visualizing Errors for Regression ProblemsAlternative forms of cross-validationCreating Train/Test Pairs

Intro to Evaluation

Speaker Identification

Today’s Data Set – Speaker Identification

Preprocessing Speech

Record speech to WAV files.Extract a variety of acoustic and prosodic features.

Predictions: which algorithm will perform better?

What previous data set does this remind you of?

J48 .53 KappaSMO .37 KappaNaïve Bayes .16 Kappa

Notice Ranges and Contingencies

Most Predictive Feature

Least Predictive Feature

What would 1R do?

What would 1R do?

.16 Kappa

Weka Helpful Hints

Evaluating Numeric Prediction: CPU data

Visualizing Classifier Errors for Numeric Prediction

Creating Train/Test Pairs

First click here


If you pick unsupervised,you’ll get non-stratifiedfolds, otherwise you’llget stratified folds.

Stratified versus Non-Stratified Weka’s standard cross-validation is

stratifiedData is randomized before dividing it into foldsPreserves distribution of class values across

foldsReduces variance in performance

Unstratified cross-validation means there is no randomizationOrder is preservedAdvantage for matching predictions with

instances in Weka

Stratified versus Non-Stratified Leave-one-out cross validation

Train on all but one instance Iterate over all instances

Extreme version of unstratified cross-validation If test set only has one instance, the distribution of

class values cannot be preservedMaximizes amount of data used for training on

each fold

Stratified versus Non-Stratified Leave-one-subpopulation out

If you have several data points from the same subpopulation

Speech data from the same speakerMay have data from same subpopulation in

train and testover-estimates overlap between train and test

When is this not a problem?You can manually make sure that won’t

happenYou have to do that by hand


If you pick unsupervised,you’ll get non-stratifiedfolds, otherwise you’llget stratified folds.


Now click here


You’re going torun this filter 20times altogether.twice for every fold.


True for Train, falsefor Test


If you’re doingStratified, make sureyou have to classattribute selectedhere.


1. Click Apply


2. Save the file


3. Undo before youcreate the next file

Doing Manual Train/Test* First load thetraining data on thePreprocess tab

Doing Manual Train/Test

* Now select SuppliedTest Set as the Test Option


Then Click Set

Doing Manual Train/Test* Next Load the Testset


* Then you’re allset, so click on Start

Evaluation Methodology

Intro to Chapter 5 Many techniques illustrated in Chapter 5

(ROC curves, recall-precision curves) don’t show up in applied papersThey are useful for showing trade-offs between

properties of different algorithmsYou see them in theoretical machine learning

papers

Intro to Chapter 5 Still important to understand what they

represent The thinking behind the techniques will show

up in your papersYou need to know what your numbers do and

don’t demonstrateThey give you a unified framework for thinking

about machine learning techniquesThere is no cookie cutter for a good evaluation

Confidence Intervals Mainly important if there is some question about

whether your data set is big enough You average your performance over 10 folds, but

how certain can you be that the number you got is correct?

We saw before that performance varies from fold to fold

0 10 20 30 40( )

Confidence Intervals We know that the distribution of categories found

in the training set and in the testing set affects the performance

Performance on two different sets will not be the same

Confidence intervals allow us to say that the probability of the real performance value being within a certain range from the observed value is 90%

0 10 20 30 40( )

Confidence Intervals Confidence limits come from the normal

distribution Computed in terms of number of standard

deviations from the mean If the data is normally distributed, there is a

15% chance of the real value being more than 1 standard deviation above the mean

What is a significance test? How likely is it that the difference you see

occurred by chance? How could the difference occur by

chance?

0 10 20 30 40( ( ) )

If the mean of one distribution is within theconfidence interval of another, the difference you observe could be by chance.

If you want p<.05, you need the 90% confidence intervals. Find the correspondingZ scores from a standard normal distribution table.

Computing Confidence Intervals 90% confidence interval corresponds to

z=1.655% chance that a data point will occur to the

right of the rightmost edge of the interval f = percentage of successes N = number of trials p = (f + z2/2N +or- z(squrt(f/N – f2/N + z2/4N2)))/(1

+ z2/N) f=75%, N=1000, c=90% -> [0.727,0.773]

Significance Tests If you want to know whether the difference

in performance between Approach A and Approach B is significantGet performance numbers for A and B on each

fold of a 10-fold cross validationYou can use the Experimenter or you can do

the computation in Excel or Minitab If you use exactly the same “folds” across

approaches you can use a paired t-test rather than an unpaired t-test

Significance TestsDon’t forget that you can get a significant result

by chance! The Experimenter corrects for multiple comparisons

Significance tests are less important if you have a large amount of data and the difference in performance between approaches is large

Using the Experimenter* First click New

Using the Experimenter

Make sureSimple is selected


Select .csvas the outputfile formatand click on Browse

Enter file name

Click on Add New


Load data set


10 repetitions isbetter than 1, but1 is faster.


Click on Add Newto add algorithms


Click Choose toselect algorithm


You should addNaïve Bayes, SMO, and J48


Then click onthe Run tab


Click on Start


When it’s done,Click on Analyze


Click File to loadthe results file yousaved

Do Analysis

* Explicitly selectdefault settings here

* Then select Kappa Here

* Then select PerformTest

Do Analysis* Base case is what you are comparing with

CSV Output

Analyze with Minitab

More Complex Statistical Analyses

I put a Minitab manual in the Readings folder on Blackboard.

Take Home Message We focused on practical, methodological

aspects of the topic of Evaluation We talked about the concept of a

confidence interval and significance tests We learned how to create Train/Test pairs

for manual cross-validation, which is useful for preparing for an error analysis

We also learned how to use the Experimenter to do experiments and run significance tests

machine learning in practice lecture 9

Documents

traintest pairsnow

testsetdoing manual

herecreating traintest

handcreating traintest

quiztodays data set

previous data set

data points

subpopulationspeech