optimization in r: algorithms, sequencing, and automatic differentiation

Optimization in R: algorithms, sequencing, and automatic differentiation

James ThorsonAug. 26, 2011

ThemesBasic:

Algorithms

Settings

Starting location

Intermediate

Sequenced optimization

Phasing

Parameterization

Standard errors

Advanced

Derivatives

Outline1. One-dimensional

2. Two-dimensional

3. Using derivatives

ONE-DIMENSIONAL

Basic: Algorithm

• Characterists– Very fast

– Somewhat unstable

• Process– Starts with 2 points

– Moves in direction of higher point

– Then goes between two highest points

optimize(fn =, interval =, ...)

Basic: Algorithm

Intermediate: Sequenced Sequencing:

1. Using a stable but slow method

2. Then using a fast method for fine-tuning

One-dimensional sequencing

3. Grid-search

4. Then use optimize()

Intermediate: Sequenced

Basic: AlgorithmsOther one-dimensional functions

• uniroot – Finds where f( ) = 0∙• polyroot – Finds all solutions to f( ) = 0∙

TWO-DIMENSIONAL

Basic: Settings

• trace = 1– Means different things for different optimization

routines

– In general, gives output during optimization

– Useful for diagnostics

optimx(par = , fn = , lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Basic: Settings

• follow.on = TRUE– Starts subsequent methods at last stopping point

• method = c(“nlminb”,”L-BFGS-U”)– Lists the set and order of methods to use

calcMin() in “PBSmodelling” package

Basic: SettingsContraints

• Unbounded

• Bounded– I recommend using bounds

– Box-constraints are common

• Non-box constraints– Usually implemented in the objective function

Basic: AlgorithmsDifferences among algorithms:

• Speed vs. accuracy

• Unbounded vs. bounded

• Can use derivatives

Basic: AlgorithmsNelder-Mead (a.k.a. “Simplex”)

• Characteristics– Bounded (nlminb)

– Unbounded (optimx)

– Cannot use derivatives

– Slow and but good at following valleys

– Easily stuck at local minima

Basic: AlgorithmsNelder-Mead (a.k.a. “Simplex”)

• Process– Uses a polygon with n+1 vertices

– Take worst point and rotate across center

– If worse: shrink

– If better: Accept and expand along axis

Basic: Algorithms

-1 0 1 2 3

-1 0 1 2 3-1

Basic: AlgorithmsRosenbrock “Banana” Function

Basic: AlgorithmsQuasi-Newton (“BFGS”)

• Characteristics– Bounded (optim, method=“BFGS”)

– Unbounded (optim, method=“L-BFGS-U”)

– Can use derivatives

– Fast and less accurate

Basic: AlgorithmsQuasi-Newton (“BFGS”)

• Process– Approximates gradient and Hessian

– Uses Newton’s method to update location

– Uses various other methods to update gradient and Hessian

Basic: Algorithms

Basic: AlgorithmsQuasi-Newton (“ucminf”)

• Different variation on quasi-Newton

Basic: Algorithms

Basic: AlgorithmsConjugate gradient

• Characteristics:– Bounded (optim)

– Very fast for near-quadratic problems

– Low memory

– Highly unstable generally

– I don’t recommend it for general usage

Basic: AlgorithmsConjugate gradient

• Process– Numerical calculation of derivatives

– Subsequent derivatives are “conjugate” (i.e. form an optimal linear basis for a quadratic problem)

Basic: Algorithms

Basic: AlgorithmsMany others!

As one example….

Spectral project gradient

• Characterististics– ???

• Process– ???

Basic: Algorithms

Basic: AlgorithmsAccuracy trials

Npar bobyqa

newuoa

Rvmmin

nlminb

Rcgmin

ucminf L-BFGS-B

nlm spg Nelder-Mead

BFGS CG

1 50 0 0 1 0 1 0 1 1 1 0 1 1

2 50 0 0 0 1 1 0 1 1 0 0 0 1

3 50 0 0 0 1 1 0 1 1 0 0 0 1

4 2 0 0 0 1 1 1 1 0 0 1 0 0

5 3 0 NA 1 1 0 NA 1 NA 1 NA NA NA

6 50 0 0 1 0 1 0 1 1 1 0 1 1

7 50 0 0 1 0 1 0 1 1 1 0 1 1

8 50 0 0 0 1 1 1 1 1 1 0 1 1

9 303 0 0 1 1 1 0 1 1 1 0 1 1

10 5 0 NA 1 1 1 NA 1 NA 1 NA NA NA

Basic: starting locationIt’s important to provide a good starting

location!– Some methods (like nlminb) find the nearest local

minimum

– Speeds convergence

Intermediate: ParameterizationSuggestions:

1. All parameters on a similar scale– Derivatives are approximately equal

– One method: use exp() and plogit() for inputs

2. Minimize covariance

3. Minimize changes in scale or covariance

Intermediate: PhasingPhasing

1. Estimate some parameters (with others fixed) in a first phase

2. Estimate more parameters in each phase

3. Eventually estimate all parameters

4. Multi-species models

• Estimate with linkages in later phases

5. Statistical catch-at-age

• Estimate scale early

Intermediate: Standard errorsMaximum likelihood allows asymptotic

estimates of standard errors

1. Calculate Hessian matrix at maximum likelihood estimate– Second derivatives of Log-Likelihood function

2. Invert the Hessian

3. Diagonal entries are variances

4. Square root is standard error

Intermediate: Standard errorsCalculation of Hessian depends on parameter

transformations

• When using exp() or logit() transformations, use the delta-method to transform back to normal space

Intermediate: Standard errors

Intermediate: Standard errorsGill and King (2004) “What to do when your

Hessian is not invertible”

gchol() – Generalized Cholesky (“kinship”)

ginv() – Moore-Penrose Inverse (“MASS”)

Intermediate: Standard errors[

Switch over to R-screen to show mle() and solve(hess())

Advanced: Differentiation

Gradient:

• Quasi-newton

• Conjugate gradient

Hessian:

• Quasi-newton

optimx(par = , fn = , gr=, hess=, lower = , upper = , control=list(trace=1, follow.on=TRUE) , method = c(“nlminb”,”L-BFGS-U”))

Advanced: DifferentiationAutomatic differentiation

• AD Model Builder

• “radx” package (still in development)

Semi-Automatic differentiation

• “Rsympy” package

Symbolic differentiation

• “deriv”

None of these handle loops or “sum/prod”

so they’re not really helpful for statistics yet

Advanced: DifferentiationMixture distribution model (~ 15 params)

• 10 seconds in R

• 2 seconds in ADMB

Multispecies catchability model (~ 150 params)

• 4 hours in R (using trapezoid method)

• 5 minutes in ADMB (using MCMC)

Surplus production meta-analysis (~ 750 coefs)

• 7 days in R (using trapezoid method)

• 2 hours in ADMB (using trapezoid method)

optimization in r: algorithms, sequencing, and automatic differentiation

Documents

ecological modelling for next generation sequencing datafrom...

nawwaf kharma. programming as problem solving with applied...

genome sequencing algorithms - basavaraj...

protein sequencing algorithms

www.bioalgorithms.infoan introduction to bioinformatics...

algorithms for genotype and haplotype inference from low-...

channel differentiation & image differentiation

tackling car sequencing problems using a generic genetic ......

approximation algorithms for sequencing problems ·...

next generation sequencing (ngs) - utdata compression...

10 dna sequencing - algorithms in bioinformatics make...

algorithms for next generation sequencing data analysis ·...

scalable algorithms for next-generation sequencing data...

12.1 algorithms for next generation sequencing...

single-cell rna sequencing of human embryonic stem cell...

sequencing a genome. approximate molecular dynamics: new...

algorithms for sequencing multileaf...

pattern classification with genetic algorithms...

algorithms and score functions used in peaks de novo ·...

spatiotemporal single-cell rna sequencing of developing...