1 instructional tools in educational measurement and statistics (items) for school personnel:...

46
1 Instructional Tools in Educational Measurement and Statistics (ITEMS) for School Personnel: Rebecca Zwick U.C. Santa Barbara Measured Progress August, 2007 Development and Evaluation of Three Web-Based Training Modules

Upload: alberta-osborne

Post on 26-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

1

Instructional Tools in Educational Measurement and Statistics (ITEMS) for School Personnel:

Rebecca ZwickU.C. Santa Barbara

Measured ProgressAugust, 2007

Development and Evaluation of Three Web-Based Training Modules

2

Overview of Presentation 1. What was the impetus for the project?

2. How is the project structured?

3. What’s in the modules, and how are statistical concepts presented?

4. How effective are the modules?

5. What have been the challenges and successes?

6. Clip from Module 3: “What’s the Difference?”

3

1. What was the impetus for the project?

4

In today’s NCLB era…

Teachers and administrators are expected to use test results to make decisions about instruction and resource allocation and to explain results to students, parents, the school board, and the press.

Many educators have not received the measurement and statistics training needed to use test scores productively.

5

Stiggins, Education Week, 2002:

“only a few states explicitly require competence in assessment as a condition for being licensed to teach. No licensing examination now in place … verifies competence in assessment …

almost no states require competence in assessment for licensure as a principal or school administrator at any level.”

6

Evidence from Preliminary Assessment Literacy Survey (Brown & Daw, 2004)

Of 24 UCSB M.Ed./credential students, only: 10 could choose correct definition of Z-score 10 could choose definition of measurement error

Of 10 experienced teachers/ administrators, only: 5 could choose the correct combined average when

told “20 students averaged 90 on an exam and 30 students averaged 40.”

1 could choose definition of measurement error

7

Goal of ITEMS

Create 3 25-minute Web-based modules to increase the “assessment literacy” of K-12 educators by teaching basic concepts in educational measurement and statistics, as applied to test score interpretation.

Assess effectiveness of modules

Funded by National Science Foundation 2004-2008

8

2. How is the project structured?

9

Who works on the project?

Staff: Rebecca Zwick, Project Director Jeff Sklar (Statistics Dept., Cal Poly,San Luis Obispo),

Senior Researcher Alex Norman (Media Arts & Technology, UCSB),

Technical Specialist Cris Hamilton, Independent animator/ designer Pamela Yeagley (Education, UCSB), Project

Evaluator Liz Alix (Education, UCSB), Project Administrator

10

Advisory Committee Kevin Almeroth, Computer Science UCSB Beth Chance, Statistics Department, Cal Poly Willis Copeland, Education, UCSB Raya Feldman, Statistics, UCSB Mary Hegarty, Psychology, UCSB Richard Mayer, Psychology UCSB Tine Sloan, Acting Director, Teacher Ed, UCSB 4 administrators & 2 teachers (local districts)

11

Work cycle:Develop and evaluate 1 module per year:

Fall: Develop module

Winter/spring - Collect data on module effectiveness

Summer - Analyze data; post module on our Website with supplementary materials; distribute CDs/DVDs.

Modules 1 & 2 are posted; Module 3 will be posted soon.

12

Module Administration and Evaluation

On Website, participants view module & take an assessment literacy quiz tailored to its content.

Participants are randomly assigned to take quiz either before or after viewing module.

Hypothesis: mean score for Module-first (treatment) group will be higher than mean for Quiz-first (control) group.

Participants get $15 Borders (electronic) gift “card” and can print out a personalized completion certificate.

13

Ms. Tran read in the paper that at Clearview School, the number of proficient students increased by only one student since last year, while at Hollyhock School, the number increased by 83. Yet, according to the article, the percentage of proficient students increased more at Clearview than at Hollyhock. What is the most likely explanation for this situation? A. Hollyhock is a larger school than Clearview. B. Clearview is a larger school than Hollyhock. C. Clearview is using a stricter definition of proficiency than Hollyhock. D. Hollyhock is using a stricter definition of proficiency than Clearview.

SAMPLE QUIZ ITEM

14

Later phases of data collection:

One-month follow-up: Participants take quiz again to check retention (another Borders card)

Participants respond to Web-based project evaluation survey asking their opinions on the module (no gift card!)

15

3. What’s in the modules?

How are statistical concepts presented?

16

Module Content Module 1 (2005): “What’s the Score?”

-Test score distributions and their properties, types of test scores, score interpretations

Module 2 (2006): “What Test Scores Do and Don’t Tell Us” -Measurement error and sampling error; imprecision in individual and average test scores

Module 3 (2007): “What’s the Difference?” -Interpretation of test score trends and group differences; data aggregation issues

17

Modules use cognitive psychology principles to enhance learning

Multimedia: Present concepts using both words and pictures (see Mayer, Multimedia learning, 2001)

Prior knowledge: Use words and pictures that invoke participants’ prior knowledge (Narayanan & Hegarty, 2002); use analogies, metaphors (English, 1997)

Use conversational (informal) style

18

“Embedded questions” (Modules 2 and 3)

Each module segment includes a question designed to allow participants to check their understanding of the material.

If their answer is incorrect, they’re encouraged to go back and view the segment again.

Found helpful by nearly all participants (Year 3)

Example is in upcoming clip.

19

Goals for Presentation of Technical Concepts

Clear and accurate, but without formulas or jargon

Based on realistic examples; no abstractions.

Engaging; not just “talking heads”

Decision: Use animated characters

20

EXAMPLES

21

Module 1: How to explain “distribution” of test scores?

Show test papers being tossed into bins, gradually forming a distribution.

Then discuss mean, median, SD, skewness of distribution.

22

Module 1: Test Score Distribution

23

Module 1: Test Score Distribution

24

Module 2: How to convey the idea of measurement error?

“Multiple Edgars:” A child takes a test repeatedly . His brain is magically purged of his

memory of the test in between administrations.

For various reasons, he gets different scores each time.

25

Module 2: Measurement Error

26

Module 2: Measurement Error

27

Module 3: How to explain data aggregation complexities and paradoxes?

No abstractions! Use realistic and specific examples:

Performance for all student groups could increase, but overall school performance decreases (Simpson’s paradox/ amalgamation paradox) …

28

Simpson’s Paradox Example

29

Module 3: How to explain sampling error (of a change in test score averages)?

Especially complex in the case of NCLB-type testing.

Models based on random sampling are not only hard to explain, but don’t apply!

Solution: Show that the change in test score averages is more “sensitive” to extreme values when N is small.

30

Later..

A clip from Module 3

Module 3 includes upgrades-professional animator, actors, sound studio.

31

4. How effective are the modules?

Quiz ResultsProgram Evaluation ResultsInformal Emails

32

Quiz Results for Module 1 Evaluation (N=113):

Average Number of Correct Responses (Out of 20 items)

Module-1st Group

Quiz-1st Group

CompareMeans

Sub-Group

Mean(sd)

Sample Size

Mean(sd)

Sample Size

p-value (1-sided)

Teacher Ed

13.1(4.0)

33 11.7(3.5)

35 .059

School District

13.4(3.2)

19 12.5(3.2)

26 .198

33

Quiz Results for Module 2 Evaluation (N= 104): Average Number of Correct Responses (Out of 16 items)

Module-1st Group

Quiz-1st Group

CompareMeans

Sub-Group

Mean(sd)

Sample Size

Mean(sd)

Sample Size

t-test p-value (1-sided)

Teacher Ed

12.6(3.2)

40 9.5(3.7)

41 .000

School District

12.7(1.9)

11 12.5(1.4)

12 .375

34

Module 3 quiz results

Major recruitment problems, N= 23 Module-first and quiz-first groups both scored

an average of 10.4 on a 14-item quiz. Possible reason: Only 4 of 23 were teacher

ed students.

Supplementary data analysis may occur - CSU Fresno teacher ed students

35

One-month follow-up

Quiz results tended to be the same or better at one-month follow-up

However, follow-up samples are small (N= 11, 38, and 10 for the three years) and are not a random subgroup of initial participants

36

Conclusion on quiz outcomes:

Modules are probably most effective for those who are new to the classroom.

We hope to encourage their use in teacher education programs and in in-service training programs for new teachers.

37

Formal “independent” program evaluation

Year 1: phone interviews and paper surveys on presentation, content, impact

Years 2 and 3: Web-based surveys

Responses to above were positive, but participation rates were only 10-12%.

38

Formal program evaluation (continued)

Comments entered in boxes during participation were mixed: Some negative comments on navigational

features (later improved) and on animation Comments on content and utility were

favorable

39

Sample of Email Comments Received

“Very helpful and right to the point. If I were a building principal or a department chair today all of the staff would go through this until everyone really understood it.”

“I am inclined to recommend [this] as required viewing for all new hires in our K-12 district, and it certainly will be recommended … for inclusion in professional development on assessment literacy.”

“I will be sharing [this] with my Assistant Superintendent with the hope of promoting it as a part of our new teacher induction process.”

40

5. Project Challenges and Successes

41

The big challenge: publicity and recruitment

Despite Ads in two educational magazines Personal contacts with school districts District participation on advisory committee Contacts with professional organizations Contacts with California State Dept. of

Education and other state organizations Dean’s letter to 100+ superintendents Website and blog postings

42

Successes

Automated system has facilitated administration and evaluation of module; module quality has improved.

Quiz results show Modules 1 and 2 were effective, mainly for teacher education students.

Participant comments indicated that modules were found useful by many.

43

The future… “Repackaging project?”

Redo modules with superior production values, as in Module 3: professional animation, professional actors, sound studio

Unify “look and feel” across the modules

Work on mechanisms for disseminating as a package

44

MORE INFORMATION?? See http://items.education.ucsb.edu See Zwick, Sklar, Wakefield, & Folsom,

Educational Measurement: Issues and Practice, in press.

Email us at: [email protected] OR [email protected]

45

Disclaimer

Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation

46

Clip from Module 3: “What’s the Difference?”

Topic: How the number of students affects the interpretation of score trends

Context: Press conference 2 reporters ask questions about a recent test

score release. Superintendent Florence and 2 teachers–Stan,

and Norma–respond.