welcome back!

Welcome back!

What will you be doing on May 10th, 2014?

• http://www.youtube.com/watch?v=Jey_fUDh3vI

Official IB Schedule

• IA SUBMISSION: MARCH 28, 2014• IB BIOLOGY FINAL EXAM: MAY 12 (MONDAY)

33 weeks from beginning of term

TOPICS FOR FALL TERM

STANDARD AND HIGHER LEVEL

• Statistics (2h)• Genetics (15h)• Respiration (2h)

• Photosynthesis (3h)• Further Ecology (6h)

• (SL only – Topic A: Human Nutrition and Health) (2h)

HIGHER LEVEL ONLY• Further Genetics ( 6h)• Further Respiration (7h)

• Further Photosynthesis (5h) • Further Ecology (5h)

TOPICS FOR SPRING TERM

STANDARD AND HIGHER LEVEL

• (SL only – Topic A: Human Nutrition and Health) (2h)• Human physiology (9 h)

HIGHER LEVEL ONLY• Plant science ( 11 h)

• HL Human physiology ( 17 h)

• Topic H: further human physiology (15h)

Remaining IA assignments

• Genetics IA( DCP and CE)• Photosynthesis IA (Design)• Plant Science IA (Design,DCP and CE)• Human Health and Nutrition (Design, DCP and

IA)IA SUBMISSION: MARCH 28, 2014

Statistics

• How can we know that scientific information is reliable and valid?

• Why does Biology need statistical methods?

Big questions in Science…

What do I need to know about statistics to succeed in IB

Biology?

Statisticians…

‘..people who like figures, but don’t have the personality skills to become accountants…’

• do uncertainty, randomness and chance have a place in science?

• How should we react to them?...

What do we need to know about statistics?

• ‘Average’: mean, median, mode• ‘Error bars’: Variance, standard deviation,

standard error of the mean, (interquartile range)• Significance and probability• T-tests ( 1- and 2- tailed, paired and

independent)• Chi-Squared test (genetics IA)• The relationship of causation and correlation• Classic graphs

Can statistics help us?

• Chocolate gives you spots

• Late nights sap young people’s brain power

• Coffee can make you see dead people

• Mobile phones cause cancer!

How do we make sense of data?

Look for patterns and outliers in different groups

Descriptive statistics

Graphs, tables, means and varianceYou can’t use the results to generalise about the population beyond the

Apply tests to see if the differences we see are of predictive value (reliable): Inferential statistics

T-testsChi-squared tests

ANOVA Regression analysis

allow us to make inferences (generalisations) about the population beyond our data

(based on probability)

What do we do with Biological data?

• Measure ‘central value’: mean, median, mode• Measure ‘spread’ (variance): range, standard

deviation, interquartile range• Compare data sets• Look for relationships (often called

correlations) between data sets

Inferential statistics use probability (p) values

• The p value tells us the likelihood that the difference we observed is real and repeatable

• Specifically, the p value is the probability that the difference observed was produced by random data (chance)

• If p = 0.10, there is a 10% chance

• If p = 0.05, there is a 5% chance

• If p = 0.01. there is a 1% chance

Scientists accept p < 0.05 as ‘significantly

different’

Sample size matters

• Bigger samples make it easier to detect differences

• A good guideline is to aim for 20 – 30 data points in each test group

Looking at data

Biological data are often normally distributed

• Height• Blood pressure• Heart rate• Marks on an exam• Errors in machine-made products

If NOT normally distibuted, data can be skewed (or just jumbled!)

An example• Researchers have

developed a new drug (tetesterol) to lower serum cholesterol levels

• They treat 2 groups for a month with either tetesterol or placebo

• After that month, the researchers measure cholesterol in both groups

Cholesterol concentration after 1 month…

(i.e., does the drug really make a difference?)

First, ‘eyeball’ the data: ‘Descriptive statistics’

Measure the central tendency (mean, median, mode)

Is this difference reliable?

(i.e., does the drug really make a difference?)

Cholesterol concentration after 1 month

Why not just look at the means?The means may show you a difference, but we can’t be sure that it’s a reliable differenceWhich of these data sets shows the greatest variation?

In order to compare test samples, we also need to look at the spread of results

Measurement of ‘spread’ (variance):

• Range• Variance• Standard

deviation• (standard error)• (interquartile

range)

Range – and its limitations

Standard deviation σ• A measure of spread• It is, simply, the square root of the variance• It gives us an idea of the spread of most of the

data and is much more reliable than range (less affected by anomalous data)

• You just need to press a button• You don’t need to know the formula• (There are links on the Blog if you WANT to know

the formula…)

Variance

Officially:• Variance: the average of the squared differences from

the mean in a sample• You calculate it using a calculator or EXCEL

Standard deviation

• Only applicable to normal distributions

• 68% of values are within 1 standard deviation of the mean

• 95% of values are within 2 SD’s of the mean

Error bars on graphs

They are graphical representations of the spread of the dataMay represent:• Range• Standard deviation• Standard error• Confidence intervals • Interquartile range

There are various types of error bar

Question check:

• Which data set has the highest mean?

• Which data set has the highest variance?

• What do the error bars represent?

Question check:

Comparing data

Drug trial data

Large overlap: lots of shared data…Results are not likely to be significantly different (more likely due to chance)

Small or no overlap: very little shared data…Results are likely to be significantly different (‘real’)

Question check:

Inferential StatisticsComparing two data sets: The T-test…

• Used to compare two normally distributed data sets (ideally with similar variances)

• A t-test is a statistic that checks if the means of 2 groups are reliably different

• Just looking at the means may show you that they are different, but doesn’t show if the difference is reliable

• We always test the NULL Hypothesis (H0)• T-test…the movie…

Two main types of T-test

Independent (unpaired) samples (most common)

E.g. testing the quality of two types of fruit smoothie…

Dependent (paired) samples

• One group measured at 2 different times

• E.g. heart rate before and after exercise

So what is the T-value?

It’s just a number!

Reading, writing and understanding T-tests

• (99) = degrees of freedom

• How many samples were there in this case?

• p = probability of results happening by chance

• Are these results significant?

• M = mean values

So what are degrees of freedom?

Degrees of freedom represent sample size.For only one group, df = n-1, where n = number of samplesUsually we are looking at 2 groups, so df = (n1 + n2) -2

Question check:

Let’s try some…examples from the worksheet

6. In a t-test comparing Group A and Group B, the P value was calculated as 0.004. What does this P value tell us about these two sets of data?

Explain your answer.

8. (b.) A student measures 15 snail shells on the north side of an island and 16 on the south.

Confidence = DF = Critical value = t is calculated as 2.02. So we reject/accept Ho. Conclusion:

Correlations and coincidences

Can statistics help us?

• Chocolate gives you spots

• Late nights sap young people’s brain power

• Coffee can make you see dead people

• Mobile phones cause cancer!

Correlation doesn’t mean causation

• Biologists frequently look for correlations (associations) between two variables (e.g. body weight and sugar consumption; drug consumption and death; hours of sleep and exam performance)

• Data are typically plotted as a scatter plot• Mathematically derived correlations do NOT provide

evidence of a cause; rather, we must develop experiments to identify the mechanism which is the cause of the observed correlation.

• Observations lacking a controlled experiment can only suggest a correlation

Calculation of correlation…

• Correlation is defined by r (can range from -1 completely negative correlation to + 1 positive correlation_

• Having identified correlation, the cause must be determined

• ‘Correlation’ and r values simply give us clues where to look

Positive correlation

• The two variables measured change in the same direction

• E.g. as temperature increases, the number of ice creams sold in Sara-Li’s increases

Lines of best fit

• Aims to go through the middle of all of the points on a scatter plot; the better the fit, the stronger the correlation

• Typically use programming tools (EXCEL and Logger Pro) to draw lines and calculate correlation

Negative correlation

As the number of weeks in the charts

increases, the number of records

sold falls

No correlation

Question check:

welcome back!

databiological data

sense of data

data points

chanceif p

random data chance

human nutrition

h topics

hhl human physiology

Documents

welcome back

welcome back!

welcome back welcome back class of 2015!!

welcome back!!!!

welcome back !

welcome back!!!

welcome back!

welcome & welcome back!

welcome back!!