inf 397c introduction to research in library and information science spring, 2005 day 2

51
R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 1 i INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

Upload: mabyn

Post on 12-Jan-2016

16 views

Category:

Documents


0 download

DESCRIPTION

INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2. Standard Deviation. σ = SQRT( Σ (X - µ) 2 /N) (Does that give you a headache?). USA Today has come out with a new survey - apparently, three out of every four people make up 75% of the population. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 1

i

INF 397CIntroduction to Research in Library and

Information Science

Spring, 2005

Day 2

Page 2: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 2

iStandard Deviation

σ = SQRT(Σ(X - µ)2/N)

(Does that give you a headache?)

Page 3: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 3

i• USA Today has come out with a new

survey - apparently, three out of every four people make up 75% of the population. – David Letterman

Page 4: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 4

i• Statistics: The only science that enables

different experts using the same figures to draw different conclusions. – Evan Esar (1899 - 1995), US humorist

Page 5: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 5

iHow to talk about a set of #s?

Name M/F B'day Fing.

lgth MLB

gms Q

Alex J. M 9-Nov 5 2 4

Ben B. M 19-Dec 7 0 3

Brazos P. M 5-Sep 8 6 4

Derek N. M 5-Aug 8 12 4

Hans H. M 24-Jan 7.4 0 4

Jay Y. M 2-Jul 7.5 3 4

Mike Z. M 10-Feb 7.3 0 5

Randolph B. M 16-Jan 7.1 43 5

Terry V. M 10-Oct 7 4 5

Will M. M 31-Oct 7.7 50 4

Page 6: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 6

iName M/F

B'day

Fing lgth.

MLB gms.

Q

Hans H. M 24-Jan 7.4 0 4

Mike Z. M 10-Feb 7.3 0 5

Ben B. M 19-Dec 7 0 3

Alex J. M 9-Nov 5 2 4

Jay Y. M 2-Jul 7.5 3 4

Terry V. M 10-Oct 7 4 5

Brazos P. M 5-Sep 8 6 4

Derek N. M 5-Aug 8 12 4

Randolph B. M 16-Jan 7.1 43 5

Will M. M 31-Oct 7.7 50 4

Page 7: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 7

iHistograms

Page 8: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 8

iPercentiles/Deciles

• The cumulative percentage for any given score is the “percentile” for that score.

• The decile is one-tenth of the percentile (usually rounded to the nearest whole number).

• So, in our finger example, 7.7 cm was the 80th percentile, or the 8th decile.

Page 9: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 9

iScales

• The data we collect can be represented on one of FOUR types of scales:– Nominal – Ordinal– Interval– Ratio

• “Scale” in the sense that an individual score is placed at some point along a continuum.

Page 10: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 10

iNominal Scale

• Describe something by giving it a name. (Name – Nominal. Get it?)

• Mutually exclusive categories.• For example:

– Gender: 1 = Female, 2 = Male

– Marital status: 1 = single, 2 = married, 3 = divorced, 4 = widowed

– Make of car: 1 = Ford, 2 = Chevy . . .

• The numbers are just names.

Page 11: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 11

iOrdinal Scale

• An ordered set of objects. • But no implication about the relative

SIZE of the steps.• Example:

– The 50 states in order of population: • 1 = California• 2 = Texas• 3 = New York • . . . 50 = Wyoming

Page 12: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 12

iInterval Scale

• Ordered, like an ordinal scale.• Plus there are equal intervals between each

pair of scores.• With Interval data, we can calculate means

(averages).• However, the zero point is arbitrary.• Examples:

– Temperature in Fahrenheit or Centigrade.– IQ scores

Page 13: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 13

iRatio Scale

• Interval scale, plus an absolute zero.

• Sample:– Distance, weight, height, time (but not years

– e.g., the year 2002 isn’t “twice” 1001).

Page 14: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 14

iScales (cont’d.)

It’s possible to measure the same attribute on different scales. Say, for instance, your midterm test. I could:

• Give you a “1” if you don’t finish, and a “2” if you finish.

• “1” for highest grade in class, “2” for second highest grade, . . . .

• “1” for first quarter of the class, “2” for second quarter of the class,” . . .

• Raw test score (100, 99, . . . .).– (NOTE: A score of 100 doesn’t mean the person

“knows” twice as much as a person who scores 50, he/she just gets twice the score.)

Page 15: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 15

iScales (cont’d.)Nominal Ordinal Interval Ratio

Name = = =

Mutually-exclusive

= = =

Ordered = =

Equal interval

=

+ abs. 0Gender, Yes/No

Class rank, ratings

Days of wk., temp.

Inches, dollars

Page 16: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 16

iCritical Skepticism

• Remember the Rabbit Pie example from last week?

• The “critical consumer” of statistics asked “what do you mean by ’50/50’”?

• Let’s look at some other situations and claims.

Page 17: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 17

iCompany is hurting.

• We’d like to ask you to take a 50% cut in pay.

• But if you do, we’ll give you a 60% raise next month. OK?

• Problem: Base rate.

Page 18: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 18

iSale!

• “Save 100%”

• I doubt it.

Page 19: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 19

iProbabilities

• “It’s safer to drive in the fog than in the sunshine.” (Kinda like “Most accidents occur within 25 miles of home.” Doesn’t mean it gets safer once you get to San Marcos.)

• Navy literature around WWI:– “The death rate in the Navy during the Spanish-

American war was 9/1000. For civilians in NYC during the same period it was 16/1000. So . . . Join the Navy. It’s safer.”

Page 20: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 20

iAre all results reported?

• “In an independent study [ooh, magic words], people who used Doakes toothpaste had 23% fewer cavities.”

• How many studies showed MORE cavities for Doakes users?

Page 21: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 21

iSampling problems

• “Average salary of 1999 UT grads – “$41,000.”

• How did they find this? I’ll bet it was average salary of THOSE WHO RESPONDED to a survey.

• Who’s inclined to respond?

Page 22: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 22

iCorrelation ≠ Causation

• Around the turn of the century, there were relatively MANY deaths of tuberculosis in Arizona.

• What’s up with that?

Page 23: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 23

iRemember . . .

• I do NOT want you to become cynical.

• Not all “media bias” is intentional.

• Just be sensible, critical, skeptical.

• As you “consume” statistics, ask some questions . . .

Page 24: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 24

iAsk yourself. . .

• Who says so? (A Zest commercial is unlikely to tell you that Irish Spring is best.)

• How does he/she know? (That Zest is “the best soap for you.”)

• What’s missing? (One year, 33% of female grad students at Johns Hopkins married faculty.)

• Did somebody change the subject? (“Camrys are bigger than Accords.” “Accords are bigger than Camrys.”)

• Does it make sense? (“Study in NYC: Working woman with family needed $40.13/week for adequate support.”)

Page 25: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 25

iQuote on front of Huff book:

• “It ain’t so much the things we don’t know that get us in trouble. It’s the things we know that ain’t so.” Artemus Ward, US author

• Being a critical consumer of statistics will keep you from knowing things that ain’t so.

Page 26: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 26

iClaims

• “Better chance of being struck by lightening than being bitten by a shark.”

• Tom Brokaw – Tranquilizers.

• What are some claims you all heard/read?

Page 27: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 27

iBreak

Page 28: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 28

iBefore the break . . .

• We learned about frequency distributions.

• I asserted that a frequency distribution, and/or a histogram (a graphical representation of a frequency distribution), was a good way to summarize a collection of data.

• There’s another, even shorter-hand way.

Page 29: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 29

iMeasures of Central Tendency

• Mode– Most frequent score (or scores – a

distribution can have multiple modes)

• Median– “Middle score”– 50th percentile

• Mean - µ (“mu”)– “Arithmetic average”– ΣX/N

Page 30: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 30

iLet’s calculate some “averages”

• From old data.

Page 31: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 31

iA quiz about averages1 – If one score in a distribution changes, will the mode change?__Yes __No __Maybe

2 – How about the median?__Yes __No __Maybe

3 – How about the mean?__Yes __No __Maybe

4 – True or false: In a normal distribution (bell curve), the mode, median, and mean are all the same? __True __False

Page 32: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 32

iMore quiz5 – (This one is tricky.) If the mode=mean=median, then the distribution is

necessarily a bell curve?__True __False

6 – I have a distribution of 10 scores. There was an error, and really the highest score is 5 points HIGHER than previously thought.a) What does this do to the mode?

__ Increases it __Decreases it __Nothing __Can’t tellb) What does this do to the median?

__ Increases it __Decreases it __Nothing __Can’t tellc) What does this do to the mean?

__ Increases it __Decreases it __Nothing __Can’t tell

7 – Which of the following must be an actual score from the distribution?a) Meanb) Medianc) Moded) None of the above

Page 33: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 33

iOK, so which do we use?

• Means allow further arithmetic/statistical manipulation. But . . .• It depends on:

– The type of scale of your data• Can’t use means with nominal or ordinal scale data• With nominal data, must use mode

– The distribution of your data• Tend to use medians with distributions bounded at one

end but not the other (e.g., salary). (Look at our “Number of MLB games” distribution.)

– The question you want to answer• “Most popular score” vs. “middle score” vs. “middle of the

see-saw”• “Statistics can tell us which measures are technically

correct. It cannot tell us which are ‘meaningful’” (Tal, 2001, p. 52).

Page 34: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 34

iHave sidled up to SHAPES of distributions

• Symmetrical

• Skewed – positive and negative

• Flat

Page 35: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 35

iWhy . . .

• . . . isn’t a “measure of central tendency” all we need to characterize a distribution of scores/numbers/data/stuff?

• “The price for using measures of central tendency is loss of information” (Tal, 2001, p. 49).

Page 36: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 36

iNote . . .

• We started with a bunch of specific scores.• We put them in order.• We drew their distribution.• Now we can report their central tendency.• So, we’ve moved AWAY from specifics, to a

summary. But with Central Tendency, alone, we’ve ignored the specifics altogether.– Note MANY distributions could have a particular

central tendency!• If we went back to ALL the specifics, we’d be

back at square one.

Page 37: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 37

iMeasures of Dispersion

• Range

• Semi-interquartile range

• Standard deviation– σ (sigma)

Page 38: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 38

iRange

• Like the mode . . .– Easy to calculate– Potentially misleading– Doesn’t take EVERY score into account.

• What we need to do is calculate one number that will capture HOW spread out our numbers are from that Central Tendency.– “Standard Deviation”

Page 39: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 39

iBack to our data – MLB games

• Let’s take just the men in this class, since N = 10, and it’ll be easy to do the math..

• xls spreadsheet. • Measures of central tendency.• Go with mean.• So, how much do the actual scores

deviate from the mean?

Page 40: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 40

iSo . . .

• Add up all the deviations and we should have a feel for how disperse, how spread, how deviant, our distribution is.

• Let’s calculate the Standard Deviation.

• σ = SQRT(Σ(X - µ)2/N)

• Σ(X - µ)

Page 41: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 41

iDamn!

• OK, so mathematicians at this point do one of two things.

• Take the absolute value or square ‘em.

• We square ‘em. Σ(X - µ)2

• Then take the average of the squared deviations. Σ(X - µ)2/N

• But this number is so BIG!

Page 42: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 42

iOK . . .

• . . . take the square root (to make up for squaring the deviations earlier).

• σ = SQRT(Σ(X - µ)2/N)

• Now this doesn’t give you a headache, right?

• I said “right”?

Page 43: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 43

iHmmm . . .

Mode Range

Median ?????

Mean Standard Deviation

Page 44: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 44

iWe need . . .

• A measure of spread that is NOT sensitive to every little score, just as median is not.

• SIQR: Semi-interquartile range.

• (Q3 – Q1)/2

Page 45: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 45

iTo summarize

Mode Range -Easy to calculate.-Maybe be misleading.

Median SIQR -Capture the center.-Not influenced by extreme scores.

Mean

(µ)

SD

(σ)

-Take every score into account. -Allow later manipulations.

Page 46: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 46

iGraphs

• Graphs/tables/charts do a good job (done well) of depicting all the data.

• But they cannot be manipulated mathematically.

• Plus it can be ROUGH when you have LOTS of data.

• Let’s look at your examples of claims.

Page 47: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 47

iSome rules . . .

• . . . For building graphs/tables/charts:– Label axes.– Divide up the axes evenly.– Indicate when there’s a break in the rhythm!– Keep the “aspect ratio” reasonable.– Histogram, bar chart, line graph, pie chart,

stacked bar chart, which when?– Keep the user in mind.

Page 48: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 48

iWho wants to guess . . .

• . . . What I think is the most important sentence in S, Z, & Z (2003), Chapter 2?

Page 49: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 49

ip. 19

• Penultimate paragraph, first sentence:

• “If differences in the dependent variable are to be interpreted unambiguously as a result of the different independent variable conditions, proper control techniques must be used.”

Page 50: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 50

i• http://highered.mcgraw-hill.com/sites/007

2494468/student_view0/statistics_primer.html

• Click on Statistics Primer.

Page 51: INF 397C Introduction to Research in Library and Information Science Spring, 2005 Day 2

R. G. Bias | School of Information | SZB 562BB | Phone: 512 471 7046 | [email protected] 51

iHomework

• LOTS of reading. See syllabus.

• Send a table/graph/chart that you’ve read this past week. Send email by noon, Friday, 2/4/2005.

See you next week.