gaps in our knowledge? a central problem in data modelling and how to get round it rob harrison...

GAPS IN OUR KNOWLEDGE?

a central problem in data modelling and how to get round

itRob Harrison

Agenda sampling

density & distribution representation

accuracy vs generality regularization

trading off accuracy and generality cross validation

what happens when we can’t see

The Data Modelling Problem

y= f(x) z=y+e multivariate & non-linear

measurement errors {xi, zi} i = 1:N zi = f(xi)+ei infer behaviour everywhere from a

few examples little or no prior information on f(x)

A Simple Example

one cycle of a sine wave “well-spaced” samples enough data (N = 6) noise-free

0 0.2 0.4 0.6 0.8 1-0.5

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1-0.5

“observational” data

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9-0.4

0 0.2 0.4 0.6 0.8 1-0.5

Sparsely Sampled

two well-spaced samples

0 0.2 0.4 0.6 0.8 1-0.5

0 0.2 0.4 0.6 0.8 1-0.2

0 0.2 0.4 0.6 0.8 1-0.4

Densely Sampled

200 well-spaced samples

0 0.2 0.4 0.6 0.8 1-0.5

Large N

200 poorly-spaced samples

0 0.2 0.4 0.6 0.8 1-0.5

What’s So Hard?

the gaps get more (well-spaced) data

lack of prior knowledge can’t see

dimension!!

Two Dimensions

same sampling density (N = 62) well-spaced?

Dimensionality

lose ability to see the shape of f(x) try it in 13-D

number of samples exponential in d if N OK in 1-D, Nd needed in d-D

how do we know if “well-spaced”? how can we sample where the action is? observational vs experimental data!

Generic Structure

use e.g. a power series Stone-Weierstrass other bases e.g. Fourier series …

y = a5x5 + a4x4 + a3x3 + …+ a1x + a0

= 5 & six samples – no error!

0 0.2 0.4 0.6 0.8 1-1

PROBLEM SOLVED!

Generic Structure

y = a5x5 + a4x4 + a3x3 + …+ a1x + a0

= 5 & six samples – no error! > 5 still no error but …

0 0.2 0.4 0.6 0.8 1-1

poor inter-sample behaviour

how can we know without looking?

Generic Structure

y = a5x5 + a4x4 + a3x3 + …+ a1x + a0

= 5 & six samples – no error! > 5 still no error but … measurement error

0 0.2 0.4 0.6 0.8 1-1

Generic Structure

y = a5x5 + a4x4 + a3x3 + …+ a1x + a0

= 5 & six samples – no error! > 5 still no error but … measurement error model is as complex as data

Curse Of Dimension we can still use the idea but … in 2-D we get 21 terms

direct and cross products in d-D we get (d+)!/d!! e.g. transform a 16x16 bitmap by =3

polynomial and get ~ 3 million terms sample size / distribution practical for “small” problems

Other Basis Functions

Gaussian radial basis functions additional design choices

how many?, where?, how wide? adaptive sigmoidal basis functions

how many?

0 0.2 0.4 0.6 0.8 1-0.5

0 0.2 0.4 0.6 0.8 1-1

zero error?

Overfitting (sample data)

rough components

0 0.2 0.4 0.6 0.8 1-0.5

0 0.2 0.4 0.6 0.8 1-400

Underfitting (sample data)

over-smooth components

0 0.2 0.4 0.6 0.8 1-0.5

0 0.2 0.4 0.6 0.8 1-1

v. small error

“just right” components

Goldilocks

Restricting “Flexibility” can we use data to tell the

estimator how to behave? regularization/penalization

penalize roughness e.g. SSE + Q

use potentially complex structure data constrains where it can Q constrains elsewhere

0 0.2 0.4 0.6 0.8 1-2

four narrow grbfs

RMSE = 0.80

0 0.2 0.4 0.6 0.8 1-2

four narrow grbfs + penalty

RMSE = 0.24

How Well Are We Doing?

so far we know answer for d > 2 we are “blind” what’s happening between samples? test on new samples in between

VALIDATION compute a performance index

GENERALIZATION

0 0.2 0.4 0.6 0.8 1-0.5

Hold-out Method

keep back P% for testing

wasteful

sample dependent

Training RMSE 0.23Testing RMSE 0.38

Cross Validation leave-one-out CV

train on all but one test that one repeat N times compute performance

m-fold CV divide sample into m non-overlapping sets proceed as above

all data used for training and testing more work but realistic performance estimates

used to choose “hyper-parameters” e.g. , number, width …

Conclusion gaps in our data cause most of our

problems noise

more data only part of the answer distribution / parsimony

restricting flexibility helps if no evidence to contrary

cross validation is a window into d-D estimates inter-sample behaviour

gaps in our knowledge? a central problem in data modelling and how to get round it rob harrison...

Documents

2013 lsa d ivisional a ffairs p romotions o verview kathe...

young carers and family group conferencing: learning about...

2007 fall nutrition progress: rob harrison hood river...

languages for software-defined networks nate foster, michael...

rob fatal as rob fatal

pse 450 teamwork and leadership in an environmental science...

advanced/ intermediate rob merritt - jazz aspen snowmass...

summary of results from the regional forest nutrition...

enabling innovation inside the network joint with nate...

harrison/purchase* 3q 2012 single family housing sales...

an agreement between the harrison central...

young carers and family group conferencing– a short term...

aerobatic performer rob harrison – the tumbling … ·...

march 21 - 22, exponor - qsp summit€¦ · march 21 - 22,...

compost amendment to control runoff from turf rob harrison...

harrison tower rehabilitation project 1621 harrison...

george harrison dark horse/best of george harrison

a network programming language -...

happy street / harrison home inspections ltd. / kevin...

esrm 410 forest soils & site productivity instructors daniel...