how to develop performance assessments · revision of the tasks and the scoring rubrics based on...

How to Develop Performance Assessments

by Anthony Petrosky and Vivian Mihalakis

The benefits of using formative performance assessments in the classroom instead of multiple choice, fill in the blank, or short answer questions have to do with their abilities to capture authentic samples of students' work that make thinking and reasoning visible. Multiple choice and short answer questions just cannot give us clear windows into students' thinking and reasoning no matter how their results are reported.

Our discussion focuses on the use of performance assessments for formative, diagnostic purposes—not for summative high stakes decision making. Performance assessments can certainly be used for high stakes decision making, but the procedures for developing such summative assessments is much more standardized than the development and use of them for formative purposes.

Our discussion focuses on two types of performance assessments. One is the use of portfolios to capture students' work. The other is the use of what I'll refer to as “drop-in” assessments. Drop-in assessments mirror the conceptual learning and the tasks that students have been engaged with in a curriculum, but they are administered with materials that students haven't yet used, read, or seen. Portfolios fall generally into two types. There are curated collections of work samples collected over time that invite students to put their best works forward, and there are all-inclusive collections, also collected over time, that invite students to include such things as drafts, revisions, false starts, notes, and finished work samples.

The Two Types of Portfolios

Curated portfolios allow students to assess their own work, so that within predetermined criteria, they can build a case for what they know and can do through work samples that they think best represent their work. Curated portfolios can be built individually by students working alone against a set of criteria, or they can be built collaboratively with two or three students assessing each other's work and making collaborative decisions about what best represents each person's work over time.

All-inclusive portfolios allow teachers to see students' work samples with greater depth and range. The window is bigger and deeper because it includes views into

work from various sorts of samples that include finished pieces, drafts, notes, sketches, annotations, revisions, and so on. Like curated portfolios, all-

inclusive collections are assessed against a set of criteria; and like curated portfolios, students can collaborate with others to self-assess their collections.

© University of Pittsburgh

Students can present their self-assessments orally as podcasts or digitally record discussions, or they can compose written self-reports. Whatever form they take, self-assessments need to reference the work in the portfolio and make evidence based cases for the attributes of the work. Self-assessments are valuable components of both curated and all-inclusive portfolios. They provide students

an opportunity to be involved in their own formative assessment, to reflect on their growth, and to assess what they know and can do.

Drop-in Assessments

Drop-in performance assessments are generally used to assess individual students' work with the concepts and the types of tasks that they've been engaged with in a curriculum. If students, for instance, have been studying ratio and proportion word problems over a six week period and writing out their reasoning in explanations of their work, then a drop-in performance assessment would give them a similar task on the same concepts and ask for similar written explanations of their reasoning with new but familiar types of word problems.

Drop-in performance assessments can also be collected into portfolios, but they usually are not. Unlike work collected over time in portfolios, drop-in assessments are usually done individually. However, it is certainly possible to invite students to collaborate on reviews of their drop-in assessments and to suggest comments that lead to revisions, but generally teachers develop and use drop-in assessments to gauge students' independent work on concepts and skills in the taught curriculum.

Performance Assessments Require Good Tasks

The key to good performance assessments is in the tasks that are presented to students. If the tasks ask for low-level identification of correct answers, for example, or regurgitation of memorized or received responses, then we might as well stick to multiple choice or short answer tests. Performance assessments that present students with cognitively challenging tasks focused on important concepts almost always involve the students in evidence based explanations of their thinking and reasoning.

Process and Tools to Develop Good Tasks

Whether using portfolios or drop in assessments, teachers benefit from a clearly articulated process for creating cognitive challenging tasks. This process needs to be written and public so that criteria for creating cognitively challenging tasks can be met and accounted for by all involved. Teachers also benefit from using a small set of tools to assess the cognitive challenge of tasks that prompt students' work for portfolios, drop-in assessments, and instruction.


Teachers have to grow their own expertise for developing cognitive challenging performance tasks for their taught curriculum. When they do this, they also are developing their expertise in creating tasks for instruction. Teachers often benefit from working with knowledgeable consultants who understand both the curriculum content and the development of formative assessment tasks. Having

a good set of tasks and scoring models that others have developed in the same discipline helps too. And it's important that teachers actually complete and score the performance tasks that they create. There is no better test of them before field-testing them with students. And of course, field-testing them helps teachers to see how students understand the tasks, to see if you get what you expect to get, even if it is with just a small group of students. This type of continuous, engaged development relies on the teachers' regular study of student work samples produced for the tasks. Taking a continuous improvement stance towards this development fuels cycles of revision of the tasks and the scoring rubrics based on the work samples. As the work samples change, the tasks and rubrics will need to change, and those changes will lead to deeper thinking on the part of the teachers. They will grow smarter about the assessments and their students' work. Continuous improvement benefits everyone who these assessments touch, and the work of continuous improvement is good professional development that also educates teachers in the creation of cognitively demanding instructional tasks. Good formative assessment tasks are also good instructional tasks.

Developing Acceptable Tasks

But not all formative assessments are created equal, so teachers benefit from a simple set of procedures that allow them to judge tasks as acceptable. The two most important dimensions for making these judgments of acceptability are content and depth, when depth is taken to mean the cognitive challenge of the tasks. If a district, for example, works from a common set of standards such as the CCSS, then teachers developing formative assessment tasks can judge the content of the tasks against both the standards and the materials used for instruction. Do they hit the same standards? Do they do so with similar or common materials? If students have been studying ratio and proportion with word problems, for example, are all the tasks versions of ratio and proportion word problems?

The judging of tasks as acceptable for their cognitive demand is tricky because many teachers are unfamiliar with such higher-level tasks and haven't themselves been exposed to them in their own schooling. We have generations of students and teachers who have been weaned on a steady diet of No Child Left Behind (NCLB) low-level tasks. They have been tested with equally low-level NCLB multiple choice and fill-in-the blank short response items.


To help support teachers in their understandings of low and high cognitive demand tasks, we developed simple task analysis guides in mathematics and English/Literacy. We've included versions of those below, along with charts with specific examples for each of low and high demand tasks. We've also included a condensed version of a criteria chart developed by our colleagues at SCALE at

Stanford University to guide the development of performance assessment tasks. It makes use of nine criteria. The complete guide is available free in creative commons from SCALE.

Our abridged version of the SCALE guide comes first. After that, we present the Mathematical Task Analysis Guide with examples of low- and high-level tasks. Finally, we present the IFL English/Literacy Text-Based Task Analysis Guide along with examples of low- and high-level questions for the "Gettysburg Address."

Here's the abridged SCALE performance assessment development guide in a list form.

1. What are the performance outcomes being assessed? (What should students know, understand, or demonstrate that you want to measure.)

2. What standards are aligned with these performance outcomes? 3. How will you set the context for the task and engage students in authentic and

relevant ways? (What's the real-world or disciplinary context, audience and purpose? How will you consider students' lived experience, interests, and/or prior knowledge?)

4. What materials/resources will students encounter and use in this performance assessment? (What are the texts, media, data, sources of information?)

5. What specific question(s) and directions will be in your prompt? (What will your prompt say?)

6. What will students produce that will give you evidence of their performance? (What specific sources of evidence—student products—will you use to evaluate student performances? A clear product should be indicated.)

7. What is your scoring system? (What is the criteria for quality—e.g., checklist, rubric—used to capture student achievement of the performance outcomes?)

8. What scaffolding strategies or mini-tasks will help students access and complete the performance assessment? (What mini-assignments will you use to help students do the thinking work and production that leads up to completing the task and helps them acquire key skills—e.g., graphic organizers, modeling, free-writes, annotated bibliographies, drafts, self- or peer-edits?)

9. How will you meet the needs of your diverse students? (What are the accommodations, language supports, reading supports?)

Here's the Mathematical Task Analysis Guide followed by a chart with examples of low and high cognitive demand tasks.


Here is a chart with examples of low and high cognitive demand mathematics tasks with examples of the responses expected from students in italics to illustrate the reasoning.

English Language Arts Task Analysis Guide and an Example

In English language arts performance assessment development, we use two tools—a chart to assess the complexity of the texts being used with the tasks and a chart to assess the cognitive challenge of the tasks. In ELA, both tools need to be used because complexity is determined both by the cognitive demand of the text and the task. It's possible, for instance, to have a text that isn't very demanding with a task that is, or it's possible to have a demanding text with a task that isn't. There are many examples of rubrics for assessing text complexity available on the internet (see for example http://www.nciea.org/publications/Updated%20toolkit-text%20complexity_KH12.pdf and http://www.reading.org/Libraries/Books/bk478-samplechapter.pdf, or http://www.corestandards.org/assets/Appendix_A.pdf), so we're only including our English Language Arts Text-Based Task Analysis Guide here with an example of its use for questions written for the "Gettysburg Address."


Here is a chart with examples of low and high cognitive demand questions in for the "Gettysburg Address." Notice how the demands differ for the different types of questions—comprehension, interpretation, and analysis.

The Take Away Big Ideas

Formative assessments live inside curriculum. They are integral to the teachers' and students' learning. Unlike benchmark assessments that we've grown accustomed to through NCLB, they are not developed outside of the taught curriculum to gauge students' probably performance on summative assessments. The teachers who will use formative assessments should be involved in their development. They can certainly contract for support in this development, and they can benefit tremendously from doing the work in professional learning communities (PLCs), but if they are to learn how to make formative assessment tasks integral to their instruction, they need to do the work themselves. It's not easy. It takes a lot of time and trial and error. It entails long hours studying student work samples and multiple revisions of the task and scoring guides. And it involves either face-to-face or online scoring by communities of like-minded teachers in the same schools and districts.

But all of this work is educative, and you can't buy educative from vendors hawking test banks. You can create your own formative assessment test banks for your


curriculum, but these things stagnate quickly, so they need to be part-and-parcel of a continuous improvement process that begins with the assumption that as students change, as teachers change, and as the curriculum changes, so too will the assessments have to change. The educative benefits of creating performance tasks reach deep into instruction and bear additional fruit as

teachers rethink and revise their curriculum and their instruction to support students to succeed on the performance assessments.

We've given you a lot to think about in considering the use of two types of portfolio based formative performance assessments and in the use of drop-in assessments. The key to capturing the complexity of students' thinking and reasoning, as we've been arguing, is in the nature of the tasks that we ask them to take on. Tasks have to be judged acceptable on two dimensions—their content and their cognitive challenge. To help you think about the differences in cognitively challenging tasks in mathematics and English/literacy, we included a couple of task analysis guides, but we should be clear that it takes more than just reading these to be able to use them. At IFL, we conduct 2 – 3 days of professional development around each of these tools, as does SCALE for its tools to guide the development of performance assessments. The professional development necessarily includes multiple examples of tasks and deep discussions among participants that are promoted both by their actually doing the tasks themselves and by their use of the guides. Participants also study student work samples and rate them against different types of rubrics. Just as focused writing and talk are critical components of students' learning, so too are they central to adults' learning, especially when the materials and concepts that we ask them to work with are relatively unfamiliar to them.

how to develop performance assessments · revision of the tasks and the scoring rubrics based on...

Documents