the new sat facts november, 2006 wayne camara & amy schmidt

41
The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Upload: barnaby-howard

Post on 02-Jan-2016

222 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

The New SAT Facts

November, 2006Wayne Camara &

Amy Schmidt

Page 2: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Executive Summary

• Purpose of Briefing: To provide an overview of recent research conducted on the new SAT.

• Research and Analysis designed to meet three demands:

1. Provide baseline data concerning score use and other characteristics of the new SAT compared to old SAT

2. Respond to questions from stakeholders concerning the new SAT

3. Develop a base of new knowledge on the Writing test and impact of changes to the SAT

Page 3: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Research to Date on the New SAT• Score Change for PSAT/NMSQT Test Takers

• Construct Comparability and Continuity in the SAT

• Essay Reliability

• Effect of New SAT Length on Performance (Fatigue)

• Consequences of adding writing to K-12 instruction

• Impact of Taking Advanced Math Courses on New SAT Math Items

• Standardized Differences on Ethnic Subgroups Relationship Between Essay Features and Essay Scores

• Effects of Short-Term Coaching on Writing Test performance

• Discrepant Scores between CR and Writing

Page 4: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Score Change for PSAT/NMSQT (P/N) Test Takers (Oh, Wright, & Zanna, 2005)• Analyzed score changes and repeater patterns for P/N; results used to develop table of expected SAT score ranges for P/N Score Report Plus.

• Based on test-takers who took both 2003 and 2004 P/N, and those who took both 2004 P/N and spring 2005 (March, May or June) SAT

Page 5: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

P/N Score Change Study Highlights• On average, 2003 sophomores repeating the P/N as

juniors improved their reading score by 3.3 points, their math score by about 4.4 points, and their writing score by about 4.1 points.

• On average, 2004 juniors taking the P/N received junior-year SAT scores that were 2.5 points higher in reading, 1.9 points higher in math, and 1.4 and 1.3 points higher in writing (MC and composite, respectively).

• The correlations between the old (2003) and new (2004) P/N scores ranged from .82 to .86 for the three subtests.

• The correlations between the new P/N and the new SAT ranged between .81 and .87.

Page 6: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

P/N Score Change

3.3

4.4 4.1

2.5

1.9

1.3

0

0.5

1

1.5

2

2.5

3

3.5

4

4.5

PN Soph-J r PN - SAT

CRMathWriting

Page 7: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Construct Comparability and Continuity in the SAT (Oh & Sathy, 2006)• Study assessed whether the changes to the SAT had an impact on the constructs measured by the test.

• Results are based on factor analysis of data from a sample of students taking both the previous version and new version of the SAT during the 2003 field trial.

Page 8: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Highlights of Results from Construct Comparability StudyCritical Reading

• Exploratory Factor Analysis revealed at least 2 distinct factors, one comprising items related to sentence completion and analogy items and one comprising critical reading and passage-based reading items.

• This finding suggests that the construct continuity for the sentence completion item type and passage-based reading/critical reading item types are maintained in the new SAT.

• Results from a 2-factor model without analogy items provided best fit to the data.

Page 9: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Highlights of Results from Construct Comparability Study, ContinuedMath

• Results suggested that the new math test is essentially unidimensional, as was the previous version.

• Tests of dimensionality revealed a small yet statistically reliable secondary factor related to geometry items in both the old and new SAT.

Page 10: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Score Equity Assessment of Transition from SAT I to the new SAT (Dorans, Cahn, Jiang, & Liu, 2006)• Study assessed whether the changes in the College-Bound Senior 2006 means were due to population shifts or to changes to the SAT

• Using operational data from the first year of new SAT administration, Score Equity Assessment was used to estimate what the subgroup means would have been had the SAT not changed.

Page 11: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Highlights of Results from Score Equity Assessment Study, Continued•Linkages between the new SAT and the old were examined for population invariance across gender groups.

• Results suggested that the equating functions were invariant across gender groups, providing support for the comparability of scores from the old SAT to the new SAT.

Page 12: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Essay Reliability within the SAT Reasoning Test (Allspach & Walker, 2005)• Study designed to estimate various forms

of reliability associated with the SAT essay.

• 3,776 juniors from 35 high schools participated in the study.

• Four different essay prompts used. Students wrote on two different essay prompts at two different times, about 2 weeks apart.

• Essays were read by raters trained similarly as in operational SAT essay readings.

Page 13: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Essay Reliability Study

Type of reliability estimates:• Single-rater (inter-rater) reliability – Correlation

between observed scores from 2 raters scoring the same essay. Represents consistency of any given rater in scoring an essay.

• Double-rater reliability – Correlation between total essay scores from two pairs of raters scoring the same essay. Represents consistency in scoring method itself when 2 raters are used.

• Observed essay reliability – Correlation between examinees’ total scores on 2 different essays. Represents proportion of true (writing ability) in essay score.

Page 14: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Highlights of Results from Essay Reliability Study• The average single-rater

reliability coefficients across the 4 prompts was approximately .79.

• The average double-rater reliability was about .88.

• The average observed essay reliability was about .67

• 70% of scores between 6-8; 80% of scores between 6-9

• Reader agreement:

• 56% exact

• 96.5% +/- 1 pt

• 3.5% > +/- 2 pts (go to third reader)

Page 15: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Investigating the Effect of New SAT Test Length on the Performance of Regular SAT Examinees (Wang, 2006)• Using the data from the March 2005 SAT

administration, a recent study examined test-taker performance on eight SAT sections which were presented to examinees in different orders and in different positions.

• The study looked at the average percent of items answered correctly and the average number of items omitted for different sections of the test.

Page 16: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

If the increased length of the SAT caused test-taker fatigue, we would expect:

• The percent of items answered correctly to decrease for the later sections of the test, when the students would be feeling fatigue.  

• The students’ omit rates to increase for later sections of the test.

Page 17: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

The average percent of items correct was consistent throughout the entire test:Section Order 1 2 3 4 5 6 7 8

Spiral Frequency M2 W1 R2 M1 R1 M3 R3 W2

Main 146,409 0.56 0.60 0.60 0.64 0.57 0.59 0.58 0.70

R1 M2 W1 R2 M1 R3 M3 W2

Scrambled 142,496 0.57 0.56 0.60 0.61 0.64 0.59 0.59 0.69

The results were similar for gender, racial/ethnic, and language groups, and for different levels of ability as measured by total SAT score.

Page 18: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

The average omit rate was NOT higher at the end of the test:

Section Order 1 2 3 4 5 6 7 8

Spiral Frequency M2 W1 R2 M1 R1 M3 R3 W2

Main 146,409 0.16 0.05 0.09 0.12 0.08 0.14 0.09 0.03

R1 M2 W1 R2 M1 R3 M3 W2

Scrambled 142,496 0.08 0.16 0.05 0.09 0.11 0.09 0.14 0.04

The average omit rate for the last 6 items was also NOT higher at the end of the test:

Section Order 1 2 3 4 5 6 7 8

Spiral Frequency M2 W1 R2 M1 R1 M3 R3 W2

Main 146,409 0.10 0.02 0.05 0.08 0.03 0.10 0.05 0.03

R1 M2 W1 R2 M1 R3 M3 W2

Scrambled 142,496 0.03 0.10 0.02 0.04 0.08 0.04 0.10 0.03

Page 19: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Summary of Fatigue Study Findings:• Study conducted on March 05 SAT and

replicated on Oct 06 administration. Results also compared to SAT I and no changes were detected.

• On average, students got the same percent of items correct on later sections of the test as on earlier sections.

• On average, students did not omit a larger number of items on later sections of the test.

• These findings provide evidence that any fatigue that students may have felt did not impair their performance in any way.

Page 20: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

The Impact of Taking Advanced Math Courses on Performance on the New SAT Math Items (Deng & Kobrin, 2006)

• Evaluated whether taking more advanced math courses in high school gives students an advantage on the new SAT items testing Algebra II content.

• Study analyzed new SAT field trial data.

• Standardized mean differences on average item performance for the old and new content across groups of students with various course-taking patterns.

• DIF analyses to explore whether items functioned similarly for students of equal ability with different course taking patterns.

Page 21: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Math Course-taking Study: Summary of Results• Students who took one or more advanced courses

scored higher than those who did not take any advanced course or just planned to do so.

• Students who planned to take one or more advanced courses scored higher than those who did not plan to take any advanced course.

• Items measuring the new content were more sensitive to the effects of taking advanced math courses than items that measure the old content.

• Several sub-content areas within Algebra II and Geometry had large percentage of items showing DIF.

Page 22: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

The Relationship Between Essay Features and Essay Scores (Kobrin, Deng, & Shaw)

This study investigated:

• the relationship between several features of SAT essay responses and essay scores.

• whether essay scores are predictable from features of the prompt.

• subgroup differences (racial/ethnic, gender, and language) in the frequency of essay response features and their correlation with essay scores.

Page 23: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Essay Research Study—Phase I

• Phase I focused on essay length and scores

• 2,820 essays were sampled from 6 different SAT forms (both east & west coast prompts) that were administered in March, May, & June of ’05.

• Examined the relationship between essay score and: • number of words

• number of paragraphs

• whether students reached the 2nd page

• whether students wrote in first-person (used the pronoun “I”).

Page 24: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Phase I Results: Correlation of Length with Essay and SAT-W Scores (Kobrin, Deng & Shaw, Under Review)

Variable Mean

(Min, Max)

SD Correlation with Essay Score

Correlation with SAT-W Score

N Words 290 (1, 619)

81.5 .62 .42

N Paragraphs 3.7 (1, 9)

1.2 .34 .23

Essay Score 7.1 1.7 --- .73

SAT-W Composite Score

487.5

111.7

.73

---

The range of correlations with essay scores across the six prompts was .57 to .68 for number of words and .27 to .38 for number of paragraphs.

Page 25: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

More Phase I Results

Reaching the Second Page• Students who reached the second page scored

about 2 pts higher than those who did not. After controlling for # of words, this was reduced to less than one pt (.7).

Using First-Person

• About 50% of students used first-person. The mean score for students using first-person was 6.9 compared to 7.3 for students not using first-person.

• There was substantial variation across prompts in the use of first-person responses. Some prompts appeared more conducive to a first-person response than others, but the voice used appeared to have very little impact on essay score.

Page 26: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Effects of Short-Term Coaching on Standardized Writing Tests (Hardison & Sackett, 2006)

• Can coaching increase scores on the SAT essay?

• Does that coaching increase scores only on the specific essay, or does it also increase the test-taker’s actual writing ability that the test is intended to measure?

Page 27: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Methods for Short-Term Coaching Study• Six Ph.D. students were hired to develop

coaching strategies for a training program, similar to those offered by test-prep companies.

• 50 first-year college students participated in 9 hour training program (training group); 49 students did not receive training (control group).

• Both groups completed pretest and posttest essays from CLEP.

• Participants also completed two additional essays developed to mimic writing tasks that a student might encounter in a college setting.

Page 28: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Results of Short-Term Coaching Study• After controlling for ability (using ACT scores),

students receiving training did indeed score significantly higher on essay.

• Coaching was particularly effective for those with lower writing performance, but actually led to a decrease in scores for high-performers.

• Coaching also produced significant improvement in performance on the generalizability tests when compared to the control group.

• Results suggest that SAT essays may be susceptible to coaching, but score inflation may reflect at least some improvement in overall writing ability.

Page 29: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Students with discrepant CR and W scores

• Correlation between SAT CR and W about .84

• 100,000 students had a significant discrepancy between scores.

• Of these 50% had a CR score that was 1 SD > than W (63% male); 50% had a W score 1 SD > CR (63% female).

• No significant difference among students in HSGPA.

• Results by ethnicity and best language not significant:• Whites > CR; Asians > Writing

• English Speakers > CR; ELL > Writing

Page 30: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Update on New SAT Scores: College Bound Seniors 2006

Page 31: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

What comes next? Research planned or in progress…

• Impact of SES on SAT & College Success

• Validity Study of the SAT Reasoning Test

• Consequential Validity of SAT Writing

• New SAT/ACT Concordance

• Evaluating Formula Scoring vs Right Scoring

• Placement validity of Math and Writing tests

Page 32: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Comparison of 2006 College-Bound Seniors with Previous Cohorts

Sub-group 2004 2005 2006

Gender: Male 47 47 46

Female 53 53 54

Race/Ethnicity: No Response 19 10 9

American Indian or Alaskan Native 1 1 1

Asian or Pacific Islander 8 9 9

Black 10 10 10

Mexican or Mexican American 4 5 4

Puerto Rican 1 1 1

Other Hispanic or Latino 3 4 5

White 51 56 56

Other 3 4 4

Best Language: No Response 16 7 6

English 74 82 83

English and Another 7 8 8

Another Language 2 3 3

Page 33: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

2004, 2005, & 2006 College-Bound Seniors

2004 2005 2006‘04 to ’06Changes

‘05 to ’06Changes

Highest Verbal 515.2 516.2 511.8 -3.4 -4.4

Highest Math 525.7 527.2 525.8 0.1 -1.4

Highest Composite

1040.9 1043.4 1037.6 -3.3 -5.8

Latest Verbal 507.9 508.4 503.5 -4.4 -4.9

Latest Math 518.0 519.8 517.8 -0.2 -2.0

Latest Composite

1025.9 1028.2 1021.3 -4.6 -6.9

Highest Composite (single admin)

1035.4 1037.8 1031.9 -3.5 -5.9

Page 34: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Major Changes in College-Bound Seniors Cohort Scores• 1973 -10 pts (7V, 3 M)

• 1975 -16 pts (9V, 7M)

• 1985 +8 pts (5V, 3M)

• 1990 -5 pts (4V, 1M)

• 1995 +7 pts (5V, 2M)

• 2003 +6 pts (3V, 3M)

• Math has not dropped 2 pts in 1 year since 1978

• The last time Verbal dropped more than 1 pt was:

• 2002 -2 pts

• 1990 -4pts

Page 35: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

2004 2005 2006 04-06 Diff 05-06 Diff

1-time test takers

N 636,655 645,629 682,005 45,350 36,376

% 44.9 43.8 46.5 1.6 2.7

Mean Verbal 485.3 485.4 480.3 -5.0 -5.1

Mean Math 490.3 490.8 487.6 -2.7 -3.2

2-time test-takers

N 542,589 563,028 545,173 2,584 -17,855

% 38.2 38.2 37.2 -1.0 -1.0

Mean Verbal 526.4 527.1 521.0 -5.4 -6.1

Mean Math 535.8 537.9 535.6 -0.2 -2.3

3-time test-takers

N 195,215 216,883 187,194 -8,021 -29,689

% 13.8 14.7 12.8 -1.0 -1.9

Mean Verbal 528.0 527.4 531.1 3.1 3.7

Mean Math 551.4 551.9 561.5 10.1 9.6

Retesting Patterns and Scores

Page 36: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Overall Retesting Changes on SAT 2004-06

04 05 06 05-04 06-05 06-04

Total Students

1,429,007

1,475,623

1,464,744

46,616 (3.3%)

-10,879 (-.7%)

35,737 (2.5%)

Total Tests*

2,492,683

2,630,388

2,547,367

+137,705 (5.5%)

-83,021 (-3.2%)

+54,684 (2.2%)

Page 37: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

CB Srs Score Changes and Subgroups• CR -5 (males -8, females -3) – largest drop since 1994

• Math -2 (males -2, females -2)

• Underrepresented minorities show overall gains:

• Income < $20k CR +2, M +1

• Non English Speaking CR +5, M +2

• Private school CR-11, M-4

• Non-Response rate (no change in %) but large decrease in scores

• Score gaps decrease

• In CR among all ethnic minorities except Other Hispanic (no change)

• In Math for Asian, Black, and Puerto Rican subgroups

• Score decline evidence in first SAT taken

• First SAT in 05 (CR 498, M 507.9); in 06 (CR 494.8, M 508.5) 06-05 (CR-3.2, M+.6)

• No difference in age testing between first time test takers (mean 17.2 yrs)

• HSGPA increases in 06

• .03 increase in GPA from 05 to 06; mean is 3.33 (with 43% of students having a HSGPA >A-)

Page 38: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Ethnic Differences Slightly Reduced in CR (Effect sizes)Group 2004 2005 2006 2006

Critical Reading Writing

Asian -.01 .03 .06 .14

Black -.7 -.66 -.61 -.63

Mexican Am.

-.51 -.49 -.43 -.41

Puerto Rican

-.46 -.42 -.39 -.45

Latin Am.

-.42 -.4 -.4 -.43

White .18 .21 .21 .2

Females -.07 -.07 -.03 .1

Page 39: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Ethnic Differences Slightly Reduced in Math (Effect Sizes)Group 2004 2005 2006

Asian .52 .52 .52

Black -.8 -.77 -.77

Mexican Am.

-.53 -.5 -.46

Puerto Rican

-.58 -.55 -.54

Latin Am. -.46 -.44 -.48

White .11 .14 .16

Females -.32 -.3 -.3

Page 40: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Students who take a Core Curriculum or More Significantly outperform those taking less than a Core Curriculum

909,049

267,970

903,452

270,728

200,000

300,000

400,000

500,000

600,000

700,000

800,000

900,000

1,000,000

2005 2006

Core Curriculum (or more) Less than Core

522

476

519

470

530

488

531

483

430

440

450

460

470

480

490

500

510

520

530

540

SAT CR05

SAT CR06

SAT M05

SAT M06

Core Curriculum (or more) Less than Core

Number of StudentsSAT Scores

Page 41: The New SAT Facts November, 2006 Wayne Camara & Amy Schmidt

Core vs Non-Core

• Core = 4 yrs of English, 3 yrs of Math (with Algebra), 3 yrs of Science, 3 yrs of Social Studies.

2005

N (%)

CR M 2006

N (%)

CR M 06-05 CR

(06-05)

M

(06-05)

Core +

909,049 (77.3)

522 530 903,452 (77.0)

519 531 -5,597 (-0.3)

-3 +1

Core -

267,278 (22.7)

476 488 270,728 (23.0)

470 483 +3,450

(+0.3)

-6 -5