ch2 pdf slides - amazon s3 · r functions > df$a df$b

32
R Functions Why should you write a function?

Upload: others

Post on 16-Aug-2020

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Why should you write a function?

Page 2: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$b, na.rm = TRUE))

> df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))

> df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

What does this code do?

Page 3: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$b, na.rm = TRUE))

> df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))

> df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

What does this code do?

Page 4: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

What does this code do?> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$b, na.rm = TRUE))

> df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))

> df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

Duplication hides the intent

Page 5: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$b, na.rm = TRUE))

> df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))

> df$d <- (df$d - min(df$d, na.rm = TRUE)) / (max(df$d, na.rm = TRUE) - min(df$d, na.rm = TRUE))

What does this code do?

Did you spot the mistake?

Page 6: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

How was this code wri!en?> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

Do one column

Page 7: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

How was this code wri!en?

Copy-and-paste

Page 8: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE))

How was this code wri!en?

Edit for next column

Page 9: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

> df$a <- (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

> df$b <- (df$b - min(df$b, na.rm = TRUE)) / (max(df$b, na.rm = TRUE) - min(df$b, na.rm = TRUE))

> df$c <- (df$c - min(df$c, na.rm = TRUE)) / (max(df$c, na.rm = TRUE) - min(df$c, na.rm = TRUE))

How was this code wri!en?

Repeat

Page 10: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

Course Title

When should you write a function?

If you have copied-and-pasted twice,

it’s time to write a function

Page 11: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Writing a function makes the intent clearer> df$a <- rescale01(df$a)

> df$b <- rescale01(df$b)

> df$c <- rescale01(df$c)

> df$d <- rescale01(df$d)

● Reduces mistakes from copy and pasting

● Makes updating code easier

Page 12: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Functional programming further reduces duplication

> library(purrr) > df[] <- map(df, rescale01)

Chapter 3: Functional Programming

Page 13: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Let’s practice!

Page 14: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

How should you write a function?

Page 15: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Start with a simple problem> df <- data.frame( a = rnorm(10), b = rnorm(10), c = rnorm(10), d = rnorm(10) )

# Rescale the a column in df to a 0-1 range

Page 16: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Start with a simple problem> df <- data.frame( a = 1:11, b = rnorm(11), c = rnorm(11), d = rnorm(11) )

# Rescale the a column in df to a 0-1 range

# Output should be: [1] 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Page 17: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Get a working snippet of code> (df$a - min(df$a, na.rm = TRUE)) / (max(df$a, na.rm = TRUE) - min(df$a, na.rm = TRUE))

Page 18: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Rewrite to use temporary variables

> ( x - min( x , na.rm = TRUE)) / (max( x , na.rm = TRUE) - min( x , na.rm = TRUE))

Page 19: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Rewrite to use temporary variables> x <- df$a

> ( x - min( x , na.rm = TRUE)) / (max( x , na.rm = TRUE) - min( x , na.rm = TRUE))

Page 20: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Rewrite for clarity> x <- df$a

> rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1])

Page 21: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Finally, turn it into a function> x <- df$a

> rescale01 <- function(x){ rng <- range(x, na.rm = TRUE) (x - rng[1]) / (rng[2] - rng[1]) }

> rescale01(x)

Page 22: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

Course Title

How should you write a function?● Start with a simple problem

● Get a working snippet of code

● Rewrite to use temporary variables

● Rewrite for clarity

● Finally, turn into a function

Page 23: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Let’s practice!

Page 24: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

How can you write a good function?

Page 25: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

What makes a good function?● Correct

● Understandable

● Functions are for humans and computers

● Correct + Understandable = Obviously correct

Page 26: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

What does this code do?> baz <- foo(bar, qux)

> df2 <- arrange(df, qux)

Who knows?

Good names make code understandable with minimal context

Page 27: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Naming principles

# Good > col_mins() > row_maxes()

# Bad > newData <- c(old.data, todays_log)

● Pick a consistent style for long names

# Bad > T <- FALSE > c <- 10 > mean <- function(x) sum(x)

● Do not override existing variables or functions

✴ Same whether objects, functions, or arguments

Page 28: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Function names

# Good > impute_missing() # Bad > imputed()

● Should generally be verbs

# Good > collapse_years() # Bad > f() > my_awesome_function()

● Should be descriptive

Page 29: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Argument names● Should generally be nouns

● Use the very common short names when appropriate:

● x, y, z: vectors

● df: a data frame

● i, j: numeric indices (typically rows and columns)

● n: length, or number of rows

● p: number of columns

Page 30: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

t.test(x, y = NULL, alternative = c("two.sided", "less", "greater"), mu = 0, paired = FALSE, var.equal = FALSE, conf.level = 0.95, ...)

Argument order

● Data arguments come first

● Detail arguments should have sensible defaults

mean(x, trim = 0, na.rm = FALSE, ...)

Detail Arguments: supply arguments that control the details of the computation

Data arguments: supply data to compute on

Page 31: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

What makes a good function?● Use good names for functions and arguments

● Use an intuitive argument order and reasonable default values

● Make it clear what the function returns

● Use good style inside the body of the function

Page 32: ch2 pdf slides - Amazon S3 · R Functions > df$a  df$b

R Functions

Let’s practice!