introduction to research dr tim cooper [email protected] sr2-353

57
Introduction to Research Dr Tim Cooper [email protected] SR2-353 http:// xkcd.com/242/

Upload: philip-malone

Post on 24-Dec-2015

222 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Introduction to Research

Dr Tim [email protected]

http://xkcd.com/242/

Page 2: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Syllabus• This course aims to give you an overview of what scientific research

is and how to do it – at its heart, this involves learning how to think critically and scientifically

• Context, while many professors profess their goal to be to ‘teach critical thinking skills’:

“…more than 2,300 undergraduates at twenty-four institutions, 45 percent of these students demonstrate no significant improvement in a range of skills—including critical thinking, complex reasoning, and writing—during their first two years of college…”

Academically Adrift (Arum and Roksa, 2011)

After graduating from college “…[students] … with high C.L.A. [critical thinking test] scores were significantly less likely to be unemployed … Low-C.L.A. graduates were twice as likely … to lose their jobs between 2010 and 2011…. Low-C.L.A. graduates were also 50 percent more likely to end up in an unskilled occupation, and were less likely to be satisfied with their jobs.

Remarkably, the students had almost no awareness of this dynamic. When asked during their senior year in 2009, three-quarters reported gaining high levels of critical thinking skills in college, despite strong C.L.A. evidence to the contrary. When asked again two years later, nearly half reported even higher levels of learning in college. This was true across the spectrum of students, including those who had struggled to find and keep good jobs.”

The Economic Price of College, NYT (http://www.nytimes.com/2014/09/03/upshot/the-economic-price-of-colleges-failures.html?_r=0&abt=0002&abg=0)

Page 3: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Syllabus• This course course has 5 parts

1. Lectures (scientific method, responsible conduct of research, scientific writing)

2. Reading and analysis of primary literature (4 papers)3. Assignment 1: Short writing exercise (modeled on a UH student

scholarship) – 10%4. Assignment 2: Short experiment and write up using an established

computational platform (AVIDA) – 20%

5. Assignment 3: Open research project – 40%

• Critical/scientific thinking probably can’t effectively be taught on its own – instead, I’ll present a basic framework (the ‘scientific method’) and then we’ll use that framework in the context of specific subject matter to evaluate claims about the natural world. We’ll employ this process again when you do some experiments.

• Office hours: [email protected] by appointment (SR2-353)

• No exams — grade based on assignments AND participation (30%)

• Lectures and materials can be found at cooperlab.squarespace.com/biol4397/

• I hope the course will be fun, but it won’t be easy and you will have to work hard: it’s a course about doing something, not remembering something. If you do, you can be confident of getting a good grade. Do consider whether it is right for you.

Page 4: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

FYI (and there are many more research programs, in addition to the one offered at UH (SURF- http://www.uh.edu/honors/undergraduate-research/uh-research/surf/)):

Long (but not complete) list of research opportunities at medical and academic institutions:

http://www.uh.edu/honors/undergraduate-research/scholarships/

Page 5: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

The purpose of this test is:

1.For me to understand your strengths and weaknesses in basic scientific understanding – and focus course content appropriately

2.To allow a comparison of understanding at the beginning and end of the course – enabling a test of its value

Short answer pre-test (not graded):

Page 6: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

http://www.sciencecouncil.org/definition

• “The first principle [of doing science] is that you must not fool yourself—and you are the easiest person to fool.”

Richard Feynman, Caltech commencement address (1974)

• We’re interested in science as an active and dynamic activity – how to ‘pursue and apply knowledge and understanding of the natural world’. We should learn do to do this ourselves and be able to evaluate claims by others. To do this, a key component is application of a ‘systematic methodology’ to help you avoid fooling yourself into believing something that is not well-supported

United Kingdom

What is science?

Page 7: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

What is Science?

•We want to do science to better understand the world – and perhaps predict how things will change (by themselves or if we intervene) or apply our understanding of the world to develop new technology (genetic engineering, computers…)

•An important reason for wanting to define science is to help recognize science – the stuff that can be useful – from pseudoscience, which is often presented as being scientific, but is not

•E.g., one of each pair is scientific, the other is not (which is which and why?):

◦ Astrology◦ Astronomy

◦ Physics◦ Psychics

“Science is the pursuit and application of knowledge and understanding of the natural

and social world following a systematic methodology based on evidence”

…if this is a good definition of science, what aspect(s) do the two pseudosciences at left

fail at?

Page 8: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Skill

Sco

re (

%)

mean = 69%range over skills = 50-92%

Results of pre-test

Page 9: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

= 2

= 8

= 0

= 14

Next number in sequence:2, 3, 3, 5, 10, 13, 39, 43, 172, 177, ?

Rose and Petals game

(Allegedly, it took Bill Gates more than two hours to figure it out http://www.borrett.id.au/computing/petals-bg.htm)

=

=

Page 10: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

What is the “Scientific Method”?

Scientific Method (sometimes called the Hypothetico-Deductive method):

1 & 2. Induction: integrate observation/theory to come up with a research question that aims to explain how some aspect of the world probably works — ideally something that: (i) isn’t already understood and (ii) is worth the time it will take to understand.

3. Deduction: translate your question into a testable (i.e., falsifiable) hypothesis. “If …[I do experiment X], then …[I will see result Y]”. The hypothesis should be falsifiable and specific (e.g., if I change X by 5%, then Y will change by 10%).

4. Test the hypothesis through experiment. It is not trivial to devise a good test of a hypothesis—i.e., one that can clearly separate predicted and alternative outcomes, and that avoids complicating factors.

5. Determine the fit of your experiment’s outcome to the predictions made by your hypothesis. This will almost certainly involve some kind of statistical comparison (e.g., between treatment and control groups).

6. If the outcome of your experiment does not match your prediction, then: (i) your hypothesis is falsified and needs to be revised or (ii) your experiment was unsuitable and needs to be revised.

o Bonus point if you can determine what’s wrong with this application of the scientific method: https://www.youtube.com/watch?v=k2MhMsLn9B0

Page 11: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

7 minute exercise

•How can you test if an astrologer actually does have some (not necessarily perfect) ability to predict the future?

Page 12: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• Coming up with a good research question is the first, and a very important step in applying the scientific method

• A good question integrates existing observations and theory to generate an overarching explanation

• Often this process proceeds via induction — defined as: (i) proceeding from the specific to the general or (ii) a claim that a conclusion probably follows from a premise(s) – a statement from which a conclusion is drawn – if the premise is true

• For example:

Your house window is broken All animals have DNA

Your TV is missing All animals have genetic material

You’ve probably been burglarized DNA is probably the genetic material

inductive conclusion

Page 13: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• You’ll often see an inductive argument presented as if the conclusion is true. Beware! An inductive conclusion is only the first step in the scientific method. By definition inductive conclusions are, at best, only probably true.

• Some problems of induction (and deduction) that you need to keep in mind (more at: http://cooperlab.org/biol4397 - An Illustrated Book of Bad Arguments) – avoiding these kind of problems helps avoid wasting time testing bad conclusions …

• Inductive leap: A conclusion that initially seems to be certain may not be supported as additional data is collected

Every swan I’ve ever seen is white

All swans are white

https://www.rightpet.com/livestock-poultrydetail/black-swanhttp://xkcd.com/

605/

Page 14: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• Affirming the consequent: Be aware that there will usually (always?) be more than one explanation for a particular conclusion/observation. Just because ‘A’ *always* leads to ‘B’, it doesn’t follow that observing ‘B’ means that ‘A’ was the cause (perhaps ‘C’ also leads to ‘B’).

If I have the flu, I will have a sore throat

I have a sore throat

I have the flu

No! Having the flu is not the only way to have a

sore throat

might

Page 15: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• Correlation is not causation: Just because two variables are associated with one another, it doesn’t mean that the two variables directly interact with each other (so that changing one would necessarily change the other)

http://xkcd.com/552/

Cri

me

rate

Ice cream consumption

Bars in a city

Churc

hes

in a

ci

ty

Page 16: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• Correlation is not causation: Just because two variables are associated with one another, it doesn’t mean that changing one variable (the independent) will cause a change in the other (the dependent)

New England J Med (2012)

Eric Cornell, who won the Nobel Prize in Physics in 2001, told Reuters: "I attribute essentially all my success to the very large amount of chocolate that I consume. Personally I feel that milk chocolate makes you stupid… dark chocolate is the way to go. It's one thing if you want a medicine or chemistry Nobel Prize but if you want a physics Nobel Prize it pretty much has got to be dark chocolate."But when … contacted to elaborate on this comment, he changed his tune."I deeply regret the rash remarks I made to the media. We scientists should strive to maintain objective neutrality and refrain from declaring our affiliation either with milk chocolate or with dark chocolate," he said.

"Now I ask that the media kindly respect my family's privacy in this difficult time."

http://www.bbc.com/news/magazine-20356613

Page 17: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

1. Developing a research question — Induction

• Ascertainment bias (sampling bias): Observations we choose to consider in developing a conclusion may be biased in a way that supports our favorite conclusion (observations that can’t be explained by the conclusion may be ignored or the sampling may be carried out in a way that they are not even collected)

http://philosophy.hku.hk/think/stat/samples.php

nytimes.com

Page 18: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

•Do national stereotypes (an inductive conclusion derived from many specific observations) have a ‘kernel of

truth’?

•Rate an average American for (1-10 scale; world average = 5):

NeuroticismExtroversion

AgreeablenessOpenness

Conscientiousness

Induction – stereotypes

Page 19: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Terracciano et al. Science 2005

neuro

ticism

extra

versi

on

opennes

sag

reea

blenes

s

consci

entiousn

ess

Standard personality inventory

Character survey – judge an average person of your nationality/culture (very significant similarity in this measure among individuals of a culture)

Testing an inductive conclusion…

• Stereotypes – asking people to evaluate traits of people they interact with – bore no relation to results of a standard personality inventory test (NEO-PI-R)

Page 20: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

3. Deduction — developing a hypothesis

• No matter how good a scientist is, his/her initial inductive conclusion cannot be accepted as true—even if the conclusion is properly constructed (i.e., it doesn’t obviously have any of the problems we’ve discussed), a good scientist will recognize that it is just their best guess of several plausible possibilities

• This is why it is necessary to devise a hypothesis that is based on our inductive conclusion and test a prediction that necessarily derives from this hypothesis — this is called deduction. If the prediction fails, the hypothesis is invalid. If the hypothesis is invalid, the inductive conclusion was wrong. (With caveats based on the quality of the test.)

• Feynman’s quote (“The first principle [of doing science] is that you must not fool yourself—and you are the easiest person to fool.”) speaks to this point—after you’ve come up with a great inductive conclusion, it can be tempting to think you’ve finished. In fact, you’re just getting started, now you need to convince yourself that you haven’t been fooled!

Page 21: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

3. Deduction — developing a hypothesis

• We can translate an inductive conclusion into a hypothesis — this is an ‘if, then’ statement about how you propose that the world works

• A hypothesis should make a prediction that can be falsified — i.e., shown to be incorrect by an experimental test. The more specific the hypothesis, the more likely it can be falsified (and the more impressive it is when it isn’t)

Observation

The green pigment is involved in photosynthesis

Inductive conclusion(probably true)

Green plants can photosynthesize (convert light to chemical energy)

Then…preventing production of the green pigment will prevent photosynthesis

Deductive conclusion (necessarily true or your hypothesis is wrong) –

aka, a hypothesis

If…the green pigment is required for photosynthesis

A mutation that prevents production of green pigment will prevent photosynthesis

A falsifiable prediction

Page 22: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

3. Deduction — developing a hypothesis

•Most scientists apply the principle of falsification to test their hypotheses – an idea put forward by Sir Karl Popper in 1934

•Popper was concerned that some of the popular scientific theories of the early 20th century were pseudoscience, not real science. In particular, he was worried that people who believed these theories found supporting evidence everywhere they looked – almost any observation could be interpreted in a way that confirmed the theory, no observation would prove it wrong.

•Popper describes meeting a famous psychologist, Alfred Adler (who had developed a theory of ‘individual psychology’):

…I reported to [Adler] a case which to me did not seem particularly Adlerian, but which he found no difficulty in analyzing in terms of his theory of inferiority feelings.… Slightly shocked, I asked him how he could be so sure. "Because of my thousand-fold experience," he replied; whereupon I could not help saying: "And with this new case, I suppose, your experience has become thousand-and-one-fold.”

What I had in mind was that his previous observations may not have been much sounder than this new one; that each in its turn had been interpreted in the light of "previous experience," and at the same time counted as additional confirmation.

I could not think of any human behavior which could not be interpreted in terms of [Adler’s] theory. It was precisely this fact—that they [observations] always fitted, that they were always confirmed—which in the eyes of their admirers constituted the strongest argument in favor of these theories. It began to dawn on me that this apparent strength was in fact their weakness.

Karl Popper, Conjectures and refutations (1963)

why is this a weakness?

Page 23: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

3. Deduction — developing a hypothesis

• Popper’s idea was that for a hypothesis to be useful, it has to make a prediction that can be shown to be wrong — that is, falsified (there needs to be a consequence to being wrong, you can’t weasel out of it by saying: “Well, actually, my hypothesis can explain that result, even though I wasn’t expecting it. Yippee. What a great and flexible hypothesis.”)

• The prediction may be difficult to test — perhaps not being possible at the current time. Nevertheless, that a hypothesis makes a prediction that can, at least in principle, be tested and falsified is generally considered to be essential for it to be considered scientific

Phys. Lett. B 716: 1-29 (2012) trbimg.com/img-52b49c18/turbine/la-sci-sn-top-science-stories-of-2013-

gallery--004/968

48 yrs~$13.25

bnPrediction: existence of an as yet unobserved sub-atomic particle with a mass of ~125 GeV (note that the prediction is specific, a good thing)

Page 24: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

= 2

= 8

= 0

= 14

Rose and Petals game

=

=

Lecture 3

Page 25: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

3. Deduction — developing a hypothesis

• Popper’s idea was that for a hypothesis to be useful, it has to make a prediction that can be tested and shown to be wrong — that is, falsified (there needs to be a consequence to being wrong, you can’t weasel out of it by saying: “Well, actually, my hypothesis can explain that result, even though I wasn’t expecting it. Yippee. What a great and flexible hypothesis.”)

• The prediction may be difficult to test — perhaps not being possible at the current time. Nevertheless, that a hypothesis makes a prediction that can, at least in principle, be tested and falsified is generally considered to be essential for it to be considered scientific

Phys. Lett. B 716: 1-29 (2012) trbimg.com/img-52b49c18/turbine/la-sci-sn-top-science-stories-of-2013-

gallery--004/968

48 yrs~$13.25

bnPrediction: existence of an as yet unobserved sub-atomic particle with a mass of ~125 GeV (note that the prediction is specific, a good thing)

Page 26: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• A lack of information could hold you back– educate yourself as to what is happening around you. You may find yourself feeling blocked and unable to express yourself, particularly in a group situation. Getting your point across to others and understanding what others REALLY mean when they communicate with you–may be a weak link just now. Take a step back and view yourself as you would a movie–then you will know the steps you need to take to clear away communication problems.

o To see how it’s done: www.wikihow.com/Write-a-Horoscope

3. Deduction — astrology: unfalsifiable hypotheses (when well written)

• Astrology: can these predictions be falsified (even in principle)? No — then they are not science!

• Note that these predictions aren’t unfalsifiable just because they are probabilistic (it would be fine to predict a particular event occurs 70% of the

time) – the problem is that it’s not clear: (i) what probability is predicted (20/50/90% of the time a lack of information will hold you back) (ii) what are the predicted events? How will you recognize when you have/have not been ‘held back’? How will you know if it was possible for you to have educated

yourself in a way that would have prevented being held back?

Page 27: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• Economic hypotheses – quantitative predictions are generally poor• Wrong predictions can generally be explained away• This morning a new durable goods report was released – orders were down by

3.4% against a consensus forecast of a slight gain. On open the stock market was down by 300 points, with the bad durable goods data seen as a primary driver of the decrease. At least one economist doesn’t see any need to revise the models used to generate the forecast:

“Jim O'Sullivan, chief US economist at High Frequency Economics, noted that stripping away the highly volatile aircraft and defense components, durable goods orders edged higher in December.”economictimes.indiatimes.com/news/international/business/us-durable-goods-orders-plunge-in-december/articleshow/46032549.cms

3. Deduction — economics

http://fivethirtyeight.blogs.nytimes.com/2012/08/18/aug-17-does-a-bullish-stock-market-predict-a-faster-recovery/?_r=0

Prediction (IMF) and reality of Greece austerity

http://krugman.blogs.nytimes.com/2015/01/25/the-greek-stand-by-arrangement/?module=BlogPost-Title&version=Blog%20Main&contentCollection=Opinion&action=Click&pgtype=Blogs&region=Body

Page 28: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Experimental design

Page 29: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

4. Testing a hypothesis — developing a prediction

• It’s obvious that a good hypothesis should make specific testable predictions, but why is it that we try to falsify a deductive (i.e., necessary) conclusion (i.e., prediction) of our hypothesis? Why not try to find evidence verifying a prediction?

• Remember the problem of induction (aka the Black Swan argument): whereas hundreds of observations of white swans (in England, all swans are white) might have seemed to verify the hypothesis that all swans are white, a single observation of a black swan (in Australia) is sufficient to falsify it. No matter how many observations support your hypothesis, it still may not be true; one contrary observation is sufficient to show it is false. (Captured by the phrase: Absence of evidence is not evidence of absence.)

• Trying to falsify a hypothesis forces you to formulate a hypothesis that can be wrong and be explicit about how it could be wrong (if you can’t think of a circumstance your hypothesis can’t explain, it’s not a useful hypothesis). It asks you to put yourself in the shoes of someone who wants to see your pet hypothesis falsified. That’s a good way to avoid fooling yourself.

Page 30: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

4. Testing a hypothesis — developing a prediction

• We don’t test hypotheses directly — rather, we test a prediction of our hypothesis. Our prediction should be formulated as a deductive conclusion—if our hypothesis is true than our prediction must be true. If our prediction is falsified, then either:

1. Our hypothesis is false (we need a new one…)

2. We made some mistake in the experimental test of our prediction (e.g., if we predict that the aerodynamic/engine combination of a car will give a top speed of 200 mph and we find that it goes faster than that according to our trusty vintage stopwatch, it could be that the hypothesis is wrong and that car can go faster than 200 mph, but it could be that our stopwatch is running slow)

• One of the hardest aspects of science is to design experiments in a way that isolates what we want to test from all the other factors that are involved in the test (how fast the car is going, and not whether our stopwatch keeps good time; how heavy something is, not whether our balance is accurate; is DNA the genetic material or do we have some contamination with protein…)

Page 31: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

4. Testing a hypothesis — developing a prediction

Prediction (speed of light): 299,792,458 ms-1

Observation (neutrino speed): 299,798,454 ms-1

= 002% faster than predicted

16,000 events observed; <3.4 in one million probability that results can be explained by chance

“This outcome is totally unexpected," stresses Antonio Ereditato spokesperson for the experiment. "Months of research and verifications have

not been sufficient to identify an instrumental effect that could explain the result of our measurements."

http://www.nature.com/news/timing-glitches-dog-neutrino-claim-1.10123

• Turns out the ‘stopwatch’ was

running slow – so the neutrinos weren’t

travelling as fast as originally thought.

When the clock was fixed they calculated

neutrinos to be travelling at a speed not different than the

speed of light

Nature (2011)

Page 32: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Exercise

• It’s more fuel efficient to drive with the AC on and the windows up than with the AC off and the windows down

• Can you reword into an ‘if …, then …’ hypothesis?

• Can you generate a specific testable and falsifiable prediction?

• Imagine a particular result, how would you interpret it?

www.youtube.com

Page 33: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

4. Testing a hypothesis — developing a prediction

• In practice, most experiments test for an effect of a variable of interest by comparing an observation to a null hypothesis

• A NULL hypothesis (H0): a hypothesis that predicts what will happen if the variable we are interested in does not have any influence on the experiment — if this variable is important, the null hypothesis will be falsified

• For example: if we think that antibiotic X does inhibit cell growth, we might generate the null hypothesis:

H0: Antibiotic X does not effect bacterial growth

• If we falsify this hypothesis we can provisionally conclude that bacteria do grow more slowly in the presence of antibiotic X

Note the slight of hand: by falsifying the null hypothesis we end up (provisionally)

verifying that the antibiotic slows bacterial growth

controlno antibiotic

testwith antibiotic

No significant difference – the null hypothesis is not falsified and we

conclude that the antibiotic did not

affect growth controlno antibiotic

testwith antibiotic

Significant difference – the null hypothesis is falsified and we conclude that the antibiotic might

affect growth

gro

wth

rate

gro

wth

rate

Page 34: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

4. Testing a hypothesis — developing a prediction• But still, why use a null hypothesis – isn’t it simpler to test directly if the antibiotic

slows growth?

• …think about the results you’ll get. Imagine if 10 of 13 trials gives you a slower growth rate among bacteria exposed to the antibiotic vs. those in the control group – does this falsify your prediction of the antibiotic slowing growth? How can you tell?

• In practice, to address this question you would perform a statistical test to determine: what is the probability of finding the distribution 10/13 by chance alone? That is, the statistical test you use to evaluate your experiment is implicitly based on the null hypothesis that the effect of the antibiotic is due only to chance.

• The probability of observing 10 trials with faster growth in the absence of

antibiotic and 3 trials with faster growth in the presence of antibiotic is p = 0.04 (sign test). This is below the usual cutoff of 0.05 at which we reject the null hypothesis – we conclude that

the result cannot be explained by chance alone and is consistent with the antibiotic acting to slow growth. We don’t conclude that the results prove the antibiotic slowed growth

con

trol –

test

gro

wth

rate

0

10 trials had faster growth

without antibiotic

3 trials had faster growth with antibiotic

Page 35: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Lecture 4 – experimental design interpretation and analysis

Page 36: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Things to consider in designing an experiment

• The goal is to be as confident as possible that you are testing what you think you are testing. If your prediction is not supported you want it to be because your hypothesis was false (not because a piece of test equipment was faulty or because your control-test comparison was unsuitable); if your prediction is supported you want it to be because your hypothesis was true (in this particular case)

• To consider:• Replication/statistics of comparing control and experimental

groups• What is the best control treatment? You want to try to isolate only

the factor that you are interested in. E.g., often we dissolve antibiotics in ethanol, so our ‘antibiotic’ treatment was really antibiotic + ethanol. If we really want to isolate the effect of the antibiotic, we should have added ethanol to the control.

• How are subjects selected and treated (randomly or in a way that creates bias)?

• When were experiments performed – were control and test groups measured on the same day?

• Many others…

Page 37: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Cited 6393 times (as of Jan 2015)! That’s a

lot

Page 38: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

A statistical comparison of two groups — t-test

• Our null hypothesis will often be that test and control groups have the same mean value (i.e., that the tested variable does not change the value we’re observing)

• Once we do the experiment, we need to compare the two groups – a common test is the t-test, but many others…

• A t-test is based on the t-statistic:

t = (mean(control group) – mean(test group)) / combined standard error

• We convert the t-statistic to a P value and confidence intervals using a statistical table (or a computer program). The P value gives us the probability of observing a t-statistic as large as we observed by chance alone – given the uncertainty in estimates of each group’s mean, what is the chance of observing a difference as big as the one we see by chance alone?

• The larger the t-statistic, the less likely that the two groups have the same mean. What makes it larger: (i) a bigger difference between the control and test groups or (ii) lower combined standard error

Page 39: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Error bars (confidence intervals) and sample sizeA larger t-statistic increases the chance

of finding a difference between our control and test groups – something we

often want to do. How do we get a larger t-statistic?

higher n (samples) lowers SE (standard error)

SE (standard error) = standard deviation / √number of observations (n)

lower SE increases tt-statistic = (mean(control) –

mean(test)) / combined SE

larger t decreases the P value making it more likely that the null hypothesis is

rejected so that you conclude the difference between control and test

groups is greater than can be explained by chance alone – this is usually the

interesting result!

low replication high replication

note: even with little change in estimated means of each group the smaller error bars makes it

easier to detect a difference between the groups

no significant difference

between groups

significant difference

between groups

So… how big a sample/many replicates

should you use?

Page 40: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

phen

otyp

e of

inte

rest

from BMC Neuroscience 11:5

Experimental samples: mean and 90% CI

Pseudoreplication gives an illusion of precision and results in misleading comparisons

Second lowest scored question from the pre-test

19. Two studies estimate the mean caffeine content of an energy drink. Each

study uses the same test on a random sample of the energy drink. Study 1

uses 25 bottles, and study 2 uses 100 bottles. Which statement is true?

a. The estimate of the actual mean caffeine content from each study will be

equally uncertain.

b. The uncertainty in the estimate of the actual mean caffeine content will be

smaller in study 1 than in study 2.

c. The uncertainty in the estimate of the actual mean caffeine content will be

larger in study 1 than in study 2.

d. None of the above

Page 41: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• A fundamental component of manipulative (in contrast to descriptive) science is the need to compare the outcome of a specific experimental manipulation to a null expectation (what we

expect in the absence of some treatment effect). E.g.,: o Does deletion of a gene affect bacterial growth rate?

o Does exposure to an environmental toxin affect plant photosynthesis?

den

sity

Actual distribution of

response for each treatment

(what you would see if you did each experiment many many

times)

CONTROL

TREATMENTz

z

phenotypic response

Measurements come from distributions – you won’t get the same measurement twice

Page 42: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

densi

ty

phenotypic response

Sample distribution – this is what you

observe in experiments

CONTROL

TREATMENT

• Replication is essential for a meaningful comparison between treatment and control groups and reduces the chance of wrongly inferring that a treatment has an effect –

why?

To estimate the actual distribution it is essential that you make replicate measurements

Actual distribution of

response for each treatment

(what you would see if you did each experiment many many

times)

Page 43: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• Much worse than low replication (which is accounted for in statistical tests – remember the large confidence intervals when we had low replication) is having

‘false’ replication – pseudoreplication (non-independent samples), which distorts the relationship of the sample distribution to the actual distribution

densi

ty

phenotypic response

CONTROL

TREATMENT

Sample distribution – this is what you

observe in experiments

Actual distribution of

response for each treatment

(what you would see if you did each experiment many many

times)

To estimate the actual distribution it is essential that you make replicate measurements

Page 44: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Two independent samples from

identical actual distributions

(mean = 0, SD = 1)

Each sample assayed in triplicate – low

measurement error (mean = sample, SD =

0.1)

(0.75) (0.91) (-0.57) (-1.68)

-2 0 2

0.79 0.80 0.76

0.86 0.92 0.97

-0.44 -0.59 - 0.72

-1.55 -1.68 -1.69

Triplicate assays averaged to

estimate original sample

0.78 0.92

-0.58 -1.64

P < 0.001

P = 0.17

Treatment Control

Pseudoreplication gives you false confidence in your measurements

• You’re in a rush to perform a difficult and expensive experiment that aims to measure the effect of a drug on a

mouse liver enzyme. You can only afford enough of the drug to include two treatment mice, which you pair with two control mice (no drug). You take three samples from each mouse’s liver, and measure enzyme activity in each

sample. • How should you treat the six measurements you’ll

obtain from each pair of mice?

Biological replicates

Technical replicatesTechnical replicates

Page 45: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• When you replicate an experiment you need to think hard about the source of variation within an experimental group and be sure that your replication allows you to estimate it – the goal is to get an accurate estimate of your variable of interest (e.g., its mean) not just a precise estimate (replicate estimates are similar to one

another)

• Pseudoreplication often leads to high precision at the cost of accuracy – there is usually less variation between technical replicate measurements of the same

sample than between independent samples

precise accurate precise & accurate

Pseudoreplication is bad for statistics, and is just plain misleading

http://www.mybigfatbrightonweekend.co.uk/stag-activities/archery-target-shooting-and-quad-bike-trek

Page 46: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Pseudoreplication can be hard to identify

• It’s obvious that you are pseudoreplicating when you take a single sample, measure it in triplicate, and act as if you have three independent samples for the purposes of statistical comparison

• Sometimes it is less obvious …

• E.g., (after Hurlbert (1984)):

We wish to determine how quickly maple leaves decompose at different depths in a lake. We make 8 small bags of nylon netting, fill each with maple leaves, and place 4 in a group on the 1-m isobath and 4 in a group at the 10-m isobath. After 1 month we retrieve the bags and measure the amount of organic matter that has decomposed. We use a t-test to compare the amount of decomposition at the two depths.

Page 47: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Bags at 1 m depth

Bags at 10 m depth

• Is there anything wrong with this experiment? What are you really measuring? How would you change the experiment?

Is degradation different at 1 m and 10 m depths?

Page 48: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Bags at 1 m depth

Bags at 10 m depth

• Is there anything wrong with this experiment? What are you really measuring? How would you change the experiment?

Better?

Page 49: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

You can carry out an experiment in many ways that can be deceptively different

Hurlbert (1984) on class web site

*Two treatments (e.g., control and test) indicated by different colored squares. Different designs indicate different possible arrangements of measurements of 4 samples of each treatment (e.g., differences in time or space along each schema)

• What are the pros and cons of each design? Which allow you to meaningfully compare the two groups to test for a difference (i.e., which designs minimize the chance of

finding a difference between treatments that is NOT caused by the test factor (black squares) that we care about)?

Page 50: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

• Do you agree?

“There is now a better way. Petabytes [i.e., ‘big data’] allow us to say: "Correlation is enough." We can stop looking for models. We can analyze the data without

hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find

patterns where science cannot.”

http://www.wired.com/science/discoveries/magazine/16-07/pb_theory

If one comparison is good, isn’t more better?

Page 51: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

The probability of falsely rejecting your null hypothesis increases with each comparison

• To account for this we:

o Need to account for multiple comparisons (there are ways to do this statistically)

o Should determine comparisons we want to test before we do an experiment rather than post hoc (post hoc comparisons will probably be biased to those that look interesting – in effect you are doing multiple comparisons by eye,

but only a few formally)

• Using a cutoff of p<0.05 to infer a significant correlation between two factors means that, by chance, we’ll wrongly infer a significant correlation

about 1 in 20 times

Page 52: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

The probability of falsely rejecting your null hypothesis increases with each comparison

• To account for this we:

o Need to account for multiple comparisons (there are ways to do this statistically)

o Should determine comparisons we want to test before we do an experiment rather than post hoc (post hoc comparisons will probably be biased to those that look interesting – in effect you are doing multiple comparisons by eye,

but only a few formally)

• Using a cutoff of p<0.05 to infer a significant correlation between two factors means that, by chance, we’ll wrongly infer a significant correlation

about 1 in 20 times

How does this help explain the observation that when an AFC (NFC) team wins the superbowl

the stock market will go up (down) that year? This prediction

has been correct 31/40 years (sign test, p < 0.0007).

Page 53: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Treat control and test groups in a way that doesn’t introduce bias

Social Psychology, Catherine A. Sanderson

• Hypothesis: Rats with experience running through a maze will perform better in that maze than naïve rats

• Psychology students were given rats to run in a maze – they were told if they had a ‘maze smart’ (had experienced the maze previously) or ‘maze dumb’

rat• Maze smart rats consistently performed better than maze dumb rats• Just one problem…all the rats were the same

• Turns out that students who were told they had smart rats treated them better than did students who were told they had dumb rats – better, more gentle, handling caused less stress to the rats and better outcomes in the

maze

Study: Rosenthal and Fode (1963) Behavioral Science

Page 54: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

http://chronicle.com/article/Document-Sheds-Light-on/123988/

• Bias: Researchers will often have some a priori notion of what they expect (or hope to see) from some experimental treatment

• It is very important to design experiments, and record and analyze data in a way that does not (inadvertently or deliberately) reinforce our expectations (remember the example of national stereotypes)

http://chronicle.com/article/Document-Sheds-Light-on/123988/

“The experiment tested the ability of rhesus monkeys to recognize sound patterns. Researchers played a series of three

tones (in a pattern like A-B-A) over a sound system. After establishing the pattern, they would vary it (for instance, A-B-B) and see whether the monkeys were aware of the change. If

a monkey looked at the speaker, this was taken as an indication that a difference was noticed.

… Researchers watched videotapes of the experiments and "coded" the results, meaning that they wrote down how the monkeys reacted. As was common practice, two researchers independently coded the results so that their findings could

later be compared to eliminate errors or bias.

…When [a] second research assistant analyzed the first research assistant's codes, he found that the monkeys didn't seem to notice the change in pattern. In fact, they looked at the speaker more often when the pattern was the same. In

other words, the experiment was a bust.

But Mr. Hauser's [the lab head] coding showed something else entirely: He found that the monkeys did notice the change in

pattern—and, according to his numbers, the results were statistically significant. If his coding was right, the experiment

was a big success.”

Design experiments to minimize the influence of bias

Page 55: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

It’s not easy to be random (to sample without bias)

• You are interested in the frequency of a genetic marker in a population of bacteria – how can you estimate this?

1. Sample well isolated colonies

(the genetic marker shouldn’t effect where a colony grows, right?)?

2. Overlay a grid and pick the top left

(or some other random location)

colony of a randomly chosen

square?

3. Number colonies and choose random

numbers to sample?

Page 56: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Assignment 1: UH SURF Application

Partial summary of application guidelines:

•500 words only!•Be written in first person•[Very briefly] Discuss the relative importance of the proposed research within its discipline•State the specific tasks to be accomplished during the program •Define the scope and goals of the proposed research•Be checked for any spelling or syntax errors before being submitted•Competitive applications generally include:o Research proposals that are clear, thorough, within the 500-word

limit, free of errors in spelling, grammar, and syntax, and in the student's own words

o Research proposals with clear, realistic goals that help the student focus and indicate that the project will lead to a substantive research experience for the student

Page 57: Introduction to Research Dr Tim Cooper tfcooper@uh.edu SR2-353

Assignment 1: UH SURF Application

• Following your own research interests you should devise a project suitable for a SURF application, and write it is a proposal

• You won’t perform the proposed experiment, but what you propose should, in principle, be doable in a 10 week research period

• Any experimental details that you are not sure about, you should do your best to get right using online/primary etc. sources

• Each proposal will be read and commented on by myself and by two other class members (so each class member will have to read two proposals)

• Proposals are due to me (electronic copy in an editable format – e.g., Word) by February 10. I will distribute each proposal to three class members who will have until February 19 to send me their peer-review that contains their comments/corrections/suggestions (we’ll talk more about this before then). I’ll distribute these corrected copies to the original author who will have until February 26 to submit a final copy to me. **due by midnight on due dates.