sociology 601: midterm review, october 15, 2009
DESCRIPTION
Sociology 601: Midterm review, October 15, 2009. Basic information for the midterm Date: Tuesday October 20, 2009 Start time: 2 pm. Place: usual classroom, Art/Sociology 3221 Bring a sheet of notes, a calculator, two pens or pencils Notify me if you anticipate any timing problems - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/1.jpg)
Sociology 601: Midterm review, October 15, 2009
• Basic information for the midterm– Date: Tuesday October 20, 2009– Start time: 2 pm.– Place: usual classroom, Art/Sociology 3221– Bring a sheet of notes, a calculator, two pens or pencils– Notify me if you anticipate any timing problems
• Review for midterm– terms– symbols– steps in a significance test– testing differences in groups– contingency tables and measures of association– equations 1
![Page 2: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/2.jpg)
Important terms from chapter 1
Terms for statistical inference:• population• sample• parameter• statistic
Key idea: You use a sample to make inferences about a population
2
![Page 3: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/3.jpg)
Important terms from chapter 22.1) Measurement:• variable• interval scale• ordinal scale• nominal scale• discrete variable• continuous variable
2.2-2.4) Sampling:• simple random sample• probability sampling• stratified sampling• cluster sampling• multistage sampling• sampling errorKey idea: Statistical inferences depend on measurement and sampling.3
![Page 4: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/4.jpg)
Important terms from chapter 33.1) Tabular and graphic description• frequency distribution• relative frequency distribution• histogram• bar graph
3.2-3.4) Measures of central tendency and variation• mean• median• mode• proportion• standard deviation• variance• interquartile range• quartile, quintile, percentile
4
![Page 5: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/5.jpg)
Important terms from chapter 3
Key ideas:
1.) Statistical inferences are often made about a measure of central tendency.
2.) Measures of variation help us estimate certainty about an inference.
5
![Page 6: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/6.jpg)
Important terms from Chapter 4
• probability distribution• sampling distribution • sample distribution• normal distribution• standard error• central limit theorem• z-score
Key ideas:1.) If we know what the population is like, we can predict what a sample
might be like.2.) A sample statistic gives us a best guess of the population parameter.2.) If we work carefully, a sample can tell us how confident to be about our
sample statistic.6
![Page 7: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/7.jpg)
Important terms from chapter 5• point estimator• estimate• unbiased• efficient• confidence interval
Key ideas: 1.) We have a standard set of equations we use to make estimates.2.) These equations are used because they have specific desirable
properties.3.) A confidence interval provides your best guess of a parameter.4.) A confidence interval provides your best guess of how close your
best guess (in part 3.)) will typically be to the parameter. 7
![Page 8: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/8.jpg)
Important terms from chapter 66.1 – 6.3) Statistical inference: Significance tests
• assumptions• hypothesis• test statistic• p-value• conclusion• null hypothesis• one-sided test• two-sided test• z-statistic
8
![Page 9: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/9.jpg)
Key Idea from chapter 6
A significance test is a ritualized way to ask about a population parameter.
1.) Clearly state assumptions
2.) Hypothesize a value for a population parameter
3.) Calculate a sample statistic.
4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic.
5.) Decide whether the hypothesis can be thrown out.
9
![Page 10: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/10.jpg)
More important terms from chapter 66.4, 6.7) Decisions and types of errors in hypothesis tests• type I error• type II error• power6.5-6.6) Small sample tests• t-statistic• binomial distribution• binomial testKey ideas: 1.) Modeling decisions and population characteristics can affect the
probability of a mistaken inference.2.) Small sample tests have the same principles as large sample
tests, but require different assumptions and techniques. 10
![Page 11: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/11.jpg)
symbols
a
YY
i
HH
dfnzt
ss
PYY
000
ˆ
22 ˆˆ
ˆˆ
11
![Page 12: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/12.jpg)
Significance tests, Step 1: assumptions
• An assumption that the sample was drawn at random.– this is pretty much a universal assumption for all significance
tests.• An assumption whether the variable has two outcome
categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal
sampling distribution. This is assumption varies from test to test. – Some tests assume a normal population distribution.– Other tests assume different minimum sample sizes.– Some tests do not make this assumption.
• Declare α level at the start, if you use one. 12
![Page 13: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/13.jpg)
Significance Tests, Step 2: Hypothesis
• State the hypothesis as a null hypothesis.– Remember that the null hypothesis is about the
population from which you draw your sample.
• Write the equation for the null hypothesis.
• The null hypothesis can imply a one- or two-sided test.– Be sure the statement and equation are consistent.
13
![Page 14: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/14.jpg)
Significance Tests, Step 3: Test statistic
For the test statistic, write:• the equation, • your work, and • the answer.
– Full disclosure maximizes partial credit.
– I recommend four significant digits at each computational step, but present three as the answer.
14
![Page 15: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/15.jpg)
Significance tests, Step 4: p-value
Calculate an appropriate p-value for the test-statistic.
– Use the correct table for the type of test;
– Use the correct degrees of freedom if applicable;
– Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.
15
![Page 16: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/16.jpg)
Significance Tests, Step 5: Conclusion
Write a conclusion
– write the p-value, your decision to reject H0 or not;
– a statement of what your decision means;
– discuss the substantive importance of your sample statistic.
16
![Page 17: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/17.jpg)
Useful STATA outputs• immediate test for sample mean using TTESTI:. * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500. ttesti 100 508 100 500, level(95)
One-sample t test
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 100 508 10 100 488.1578 527.8422
------------------------------------------------------------------------------
Degrees of freedom: 99
Ho: mean(x) = 500
Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.212820
![Page 18: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/18.jpg)
Useful STATA outputs• immediate test for sample proportion using PRTESTI:
• . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5• . prtesti 832 .53 .50, level(95)
• One-sample test of proportion x: Number of obs = 832
• ------------------------------------------------------------------------------• Variable | Mean Std. Err. [95% Conf. Interval]• -------------+----------------------------------------------------------------• x | .53 .0173032 .4960864 .5639136• ------------------------------------------------------------------------------
• Ho: proportion(x) = .5
• Ha: x < .5 Ha: x != .5 Ha: x > .5• z = 1.731 z = 1.731 z = 1.731• P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418
21
![Page 19: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/19.jpg)
Useful STATA outputs• Comparison of two means using ttesti•
• ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal
• Two-sample t test with unequal variances
• ------------------------------------------------------------------------------• | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]• ---------+--------------------------------------------------------------------• x | 4252 18.1 .1978304 12.9 17.71215 18.48785• y | 6764 32.6 .221294 18.2 32.16619 33.03381• ---------+--------------------------------------------------------------------• combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597• ---------+--------------------------------------------------------------------• diff | -14.5 .2968297 -15.08184 -13.91816• ------------------------------------------------------------------------------• Satterthwaite's degrees of freedom: 10858.6
• Ho: mean(x) - mean(y) = diff = 0
• Ha: diff < 0 Ha: diff != 0 Ha: diff > 0• t = -48.8496 t = -48.8496 t = -48.8496• P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000
24
![Page 20: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/20.jpg)
Chapter 6: Significance Tests for Single Sample
or sample size best testmean large z-test for Ybar - 0
proportion large z-test for hat - 1
mean small t-test for Ybar - 0
proportion small Fisher’s exact test
32
![Page 21: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/21.jpg)
Equations for tests of statistical significance
€
z = Y − μ0
ˆ σ Y
33
€
z =ˆ π − π 0
σ ˆ π
€
t = Y − μ0
ˆ σ Y
![Page 22: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/22.jpg)
Chapter 7: Comparing scores for two groups
or sample size sample scheme best testmean large independent z-test for 2 - 1
proportion large independent z-test for 2 - 1
mean small independent t-test for 2 - 1
proportion small independent Fisher’s exact test
mean large dependent z-test for Dproportion large dependent McNemar testmean small dependent t-test for Dproportion small dependent Binomial test
34
![Page 23: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/23.jpg)
Two Independent Groups: Large Samples, Means
€
7.1. difference of two large sample means : z =Y 2 −Y 1( ) − 0
s12
n1
+ s22
n2
• It is important to be able to recognize the parts of the equation, what they mean, and why they are used.
• Equal variance assumption? NO
35
![Page 24: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/24.jpg)
Two Independent Groups: Large Samples, Proportions
€
7.2 difference of 2 large sample proportions : z =ˆ π 2 − ˆ π 1( ) − 0
ˆ π (1− ˆ π )n1
+ˆ π (1− ˆ π )
n2
• Equal variance assumption? YES (if proportions are equal then so are variances).
• df = N1 + N2 - 2
36
![Page 25: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/25.jpg)
Two Independent Groups: Small Samples, Means
€
t(or z) = (Y 2 −Y 1) − 0ˆ σ Y 2 −Y 1
= (Y 2 −Y 1)(n1 −1)s1
2 + (n2 −1)s22
n1 + n2 − 2* 1
n1
+ 1n2
7.3 Difference of two small sample means:
Equal variance assumption: SOMETIMES (for ease)
NO (in computer programs)37
![Page 26: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/26.jpg)
Two Independent Groups: Small Samples, Proportions
Fisher’s exact test • via stata, SAS, or SPSS• calculates exact probability of all possible
occurences
38
![Page 27: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/27.jpg)
Dependent Samples:
• Means:
• Proportions:
€
t(or z) = D ˆ σ D
= D sD
n
39
€
z = n12 − n21
n12 + n21
![Page 28: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/28.jpg)
Chapter 8: Analyzing associations
• Contingency tables and their terminologies:– marginal distributions and joint distributions– conditional distribution of R, given a value of E.
(as counts or percentages in A & F)– marginal, joint, and conditional probabilities.
(as proportions in A & F)
• “Are two variables statistically independent?”
40
![Page 29: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/29.jpg)
Descriptive statistics you need to know
• How to draw and interpret contingency tables (crosstabs)
• Frequency and probability/ percentage terms– marginal – conditional– joint
• Measures of relationships: – odds, odds ratios– gamma and tau-b
41
![Page 30: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/30.jpg)
Observed and expected cell counts
• fo, the observed cell count, is the number of cases in a given cell.
• fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other.
• fe = row total * column total / N– the equation for fe is a correction for rows or columns
with small totals.
42
![Page 31: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/31.jpg)
Chi-squared test of independence• Assumptions: 2 categorical variables, random sampling, fe
>= 5
• Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.)
• Test statistic: 2 = ((fo-fe)2/fe)
• p-value from 2 table, df = (r-1)(c-1)
• Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.
43
![Page 32: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/32.jpg)
Probabilities, odds, and odds ratios.
• Given a probability, you can calculate an odds and a log odds.– odds = p / (1-p)
• 50/50 = 1.0• 0 ∞
– log odds = log (p / (1-p) ) = log (p) – log(1-p)• 50/50 = 0.0• -∞ +∞
– odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ]
• Given an odds, you can calculate a probability.p = odds / ( 1 + odds)
44
![Page 33: Sociology 601: Midterm review, October 15, 2009](https://reader030.vdocuments.us/reader030/viewer/2022020111/56816807550346895ddd8b04/html5/thumbnails/33.jpg)
Measures of association with ordinal data• concordant observations C:
– in a pair, one is higher on both x and y• discordant observations D:
– in a pair, one is higher on x and lower on y• ties
– in a pair, same on x or same on y
• gamma (ignores ties)
• tau-b is a gamma that adjusts for “ties”– gamma often increases with more collapsed tables b and both have standard errors in computer output b can be interpreted as a correlation coefficient
€
=C − DC + D
45