tm07

Benchmark

Sample

CommitteesWhat are they?

Why do we need them?

How do they work?

Created by Mark Feder

Benchmark Sample Committees

But because the premise of our instructional program is that we do not (can not) teach students language but can only facilitate their language acquisition, the problem of assessment becomes more complex and messy. We cannot simply test what we have taught but need to attempt to assess how students have progressed in their journey toward language proficiency. While this is undeniably a far more difficult undertaking, it is infinitely more valuable than measuring how well students can prove their knowledge of something that may not even be important to their language acquisition.

If INTERLINK were a content-based program, we would have the relatively simple task of testing how well our students learn the content we teach them. In other words, we would teach xyz and then test students on xyz. Assessment would be a neat, easy, quantifiable and comfortably objective process.

The goal of this presentation is to explain the purpose and operation of Benchmark Sample Committees (BSCs). But to understand the need for BSCs, we need to step back a bit and look at the bigger picture of assessment.

#2


Of course, we want to know if students are learning as a result of their instruction and whether they are qualified to take on the challenges that the instruction is meant to prepare them for, so assessment is of great consequence to all concerned. So naturally, the great emphasis in assessment is on accuracy. We want our assessment to be accurate and exact – and objective, that is to say, the results should be the same regardless of who administers or scores the assessment tool.

OK, that needs some explanation. What assessment is really about is accountability - checking on whether the students and the teacher did their jobs well or poorly. (The ambiguity of whether it’s the student or teacher that is at fault when there are shortcomings, is an interesting problem in the topic of assessment.)

This presentation is not the place to discuss the rationale and principles of the INTERLINK program, but we must all be clear about our starting point with respect to the issue of assessment. The only way we can make assessment neat and clean is to also make it meaningless.

#3


So what is wrong with that? Well, nothing in itself. The problem is that such assessment instruments almost necessarily have to focus on what is taught rather than on what is learned. It is not too hard to test whether a student can form the present perfect of a particular verb, but it is not quite so easy to determine whether a student knows how to form it for all verbs or whether s/he uses that form in actual speech or distinguishes the difference between that form and the simple present and simple past in reading and listening. Nor can testing determine how significant the student’s present perfect capabilities are to his or her overall communicative ability.

The entire history of testing procedures, from the development of multiple choice questions to the emergence of standardized tests prepared not by the teacher but by some outside authority who determines what students should be learning, shows a preoccupation with making assessment more mechanical, rigid and absolute.

In other words, what we strive for in assessment tends to be something mechanical, something that can be applied without bias, something that yields clear, preferably numeric results that can be conveniently graphed and charted.

#4


When I stated earlier that “the only way we can make assessment neat and clean is by making it meaningless” I meant simply that the mechanical assessment we long for only works when we use it to compare what has been learned with what has been taught. However, we should be much less concerned with what we teach than with what students are learning. We can have neat mechanical assessment only at the cost of discarding what we believe is most important for their development of linguistic proficiency.

And that focus on what we teach students as opposed to what they are learning is the crux of why certain assessment practices can be detrimental rather than beneficial. The fact is that assessment has implications for how we teach, and if our modes of teaching and assessing are in conflict, the results will not be good.

In other words, testing is good at determining whether students can demonstrate specific knowledge and skills but not at determining how important those skills are in the student’s overall development. Since, as teachers, we want our students to succeed in their tests, we take pains to teach them what they will be tested on, regardless of its relevance to their overall linguistic development.

#5


Don’t imagine that we’re off on a political tangent. What we are talking about here is whether language acquisition and perhaps education in general is about what is taught or what is learned. Is the focus on the content and material being delivered or is it on the learner and the various obstacles that impede learning? Much of what we have grown accustomed to in the realm of assessment disregards the learner and the process of learning and focuses squarely on what information is dispensed by the teacher.

If we needed corroboration that the mania for neat, “objective” assessment has dire consequences for the learning process, we need simply look at the shambles being made of education by the current craze for standardized testing. There is little doubt that focus on testing distorts and corrupts the learning process by subordinating learning to what is being taught.

So our real concern regarding common assessment concepts and practices is the impact they have on how and what we teach our students. Mechanical assessment has implications and consequences for the very fabric of the education we provide.

#6


The issue then is not whether or not to assess but how to assess. The common, mechanical assessment procedures are not suitable to our program or philosophy of language acquisition. The alternative to counter-productive mechanical assessment that we have chosen is holistic assessment. Holistic assessment attempts to look at overall communicative competence and proficiency rather than discrete elements of knowledge. Holistic assessment helps us focus on what has been learned and what proficiency has been developed, as opposed to what taught elements have been retained.

Absolutely not. We need to assess in order to gauge the effectiveness of our activities and to help us improve our teaching. We also need it to make sure that our students are grouped in a way that makes it easier for them to work together and for us work with them. Assessment is vital.

What does all this mean? Is assessment bad? Should it just be discarded entirely so we can focus on our students?

#7


Benchmarks by themselves may seem a bit vague, too open to different interpretations by different people. What is needed is something to make them less abstract and more concrete, something to pin them down a bit more. What we have selected as the best way of concretizing the Benchmarks is to provide representative samples to show what the Benchmark means and to distinguish one from another.

The mechanism we have created to implement holistic assessment is the Benchmark. Consult the Curriculum Guide for an in depth discussion of Benchmarks, what they and how they work. For now, let us be content to think of Benchmarks as yardsticks against which we attempt to measure student proficiency. Or as color samples against which to match the hues of linguistic ability exhibited by students.

The problem with holistic assessment is that it’s not the neat clean kind of assessment we’re used to and long for. As suggested earlier, it’s messy and complex. It’s more like eating an orange than a candy bar. It’s kind of inconvenient in some ways, but ultimately, it’s the far better choice.

#8


ETS and other testing moguls are quite proud of how they have used rubrics to attain machine-like accuracy in the scoring of writing samples which they implemented to be part of the progressive trend toward holistic assessment. In other words, recognizing the limitation of conventional multiple-choice tests, test-makers have decided to analyze the actual productions of students and apply the same rigorous standards to assessing them that characterizes the standardized tests. The perfect solution!

The mechanism I am alluding to involves specification of all the discrete elements that go into distinguishing one Benchmark or level from another, a mechanism that often goes by the name rubric. Rubrics have been proposed as a way fleshing out what a Benchmark really means. It is suggested as a means of restoring to the complex task of holistic assessment the same neat, objective character that the testing procedures discussed earlier were intended to provide.

But before we talk more about samples, we should discuss another mechanism for concretizing the Benchmarks that has been discussed, considered and ultimately rejected, although it should be noted, there are many advocates of this approach.

#9


To be clear, the problem is not finding or developing a good rubric but with the very concept of rubrics. Rubrics prevent the scorer from reacting spontaneously to a piece of writing by insisting that the writing can only be judged by pre-established criteria. The scorer’s critical faculties are shut down and replaced by imposed standards of acceptability.

Consistency is seen as proof of the accuracy of the system, and rubrics are seen as key mechanisms for evaluating the intrinsic quality of writing. But all the consistency shows is that human beings can be trained to function like machines. It is unlikely that Faulkner or Hemingway would fare well or be considered to have acceptable writing skills in a rubric-based assessment. It is equally implausible that a rubric-perfect piece of writing would ever be considered for a Nobel Prize. Rubrics reward bland, formulaic homogeneous writing – and that is not usually good writing.

The only problem with rubrics is that they function in much the same way as standardized tests in that they establish predefined criteria to assess the quality of a piece of writing. Scorers are trained to use these rubrics to produce assessments that are very consistent.

#10


By contrast, we attempt to help students improve their writing by starting out with the student’s own vision or idea and helping them discover ways to communicate that vision or idea in the most effective and compelling way. Our challenge is to get students invested in and excited about their writing so that they will work at it on their own to produce something unique. That motivation is not easily combined with following rules for writing a 5-paragraph essay. A factory approach to punching out a class of competent writers does not work because competency is attained through investment in the process and not by following a fixed system, technique or recipe.

That attitude, however, misses the whole point of what writing is and the nature of helping a student become a good writer. The rubric is a teacher-centered mechanism that assumes good or acceptable writing is the result of following certain imposed rules or formulas. An essay should consist of a certain number of paragraphs each consisting of a topic sentence followed by a certain number of supporting sentences etc. Where does the student learn how to produce such writing? From the teacher, of course.

One may object that a teacher, especially and ESL teacher, should be content to produce to competent writer and not expect to produce Nobel Prize winners or creative writers, and therefore rubrics are just the thing for us.

#11


That is a weakness in our system. The current samples were established by one person, and even though that person is impeccably qualified for the task (I am, of course, biased here, being that person), a better system would allow teachers themselves to determine, based on ongoing experience, what samples are most appropriate. So finally, we have come to the point of addressing the questions of what BSCs are and why we need them.

Of course, samples have their own problems. Right from the start, the question is: where do the samples come from? Who picks them and decides what should stand as a sample?

So once again, the problem with rubrics is the same as that of mechanical assessment discussed earlier – it has implications for our teaching that go against the very grain of our program. Alfie Kohn’s essay The Trouble with Rubrics and Maja Wilson’s delightful little book Rethinking Rubrics for Writing Assessment make the case against rubrics much more compellingly and eloquently than I can here. The point is, samples were quite intentionally chosen instead of rubrics to help define Benchmarks.

#12


The committee will absolutely establish the samples. The committee decision will not be vetoed or overruled by Center Directors, the Curriculum Director or the President of INTERLINK. But all teachers will have to abide by the samples until another review is done.

This is how it works: Each term, one specific skill area (reading, writing, speaking, listening) will be reviewed. There will be one committee for each level and it will be comprised of one teacher from center, preferably one currently teaching that level. This term, writing will be reviewed, so there will be committees for writing 1, 2, 3, 4 and 5. Next term, a different skill area will be addressed and in the course of the year, all the Benchmark samples will be reviewed. What will happen next year. Well, perhaps a repeat of the entire process. That depends on how satisfied everyone is with the samples.

Teachers have long voiced the idea that they should have more input and control of aspects of the curriculum. Benchmark Sample Committees are a way of providing such control.

#13


Another possibility is that there will be disagreement among committee members. In that case, they have to hash it out until agreement is reached. Such disagreement, incidentally, is a good thing, because it presents opportunities for discussing different viewpoints and expressing different perspectives, all of which strengthens the process and the program.

However, there may agreement that the current sample doesn’t work, in which case members would express their views on what they would like to see replace it. One or more members may present alternative sample to be reviewed. Committee members continue to talk until a decision is reached about an alternative sample. Once the decision is made, the work of the committee is finished.

So, what will the committee members do? Essentially communicate with each other. They will begin by carefully examining the existing sample and express their views about their adequacy and appropriateness. If the four committee members agree that the existing sample is satisfactory, the work is done. That’s it.

#14


How will committee members communicate with one another and how often? That is up to the members themselves and what it takes to complete the task. Members can talk by phone or via email or Skype or messaging or all of the above. Whatever works and whatever is easiest. To facilitate communication, a special forum for committees has also been created. To access the forum, open the Curriculum Guide and Click Forums.

#15


When you enter the forum, register and copy down your username and password someplace safe in case you forget it later on.

#16


Once you log in, enter the Faculty Forum and then check into whichever committee discussion you are interested in.

#17

Please note that the Faculty Forum is always available for INTERLINK teachers to communicate with one another and that new topics can be started any time by anyone.


On a personal level, the project offers opportunity for professional development and, through acquaintance with teachers at other centers, provides the prospect of greater cooperation and interaction.

There are other benefits as well. The cross-center contact will not only help build greater consistency into what is done throughout INTERLINK but will also open the door for sharing of ideas between the centers.

The work of the committees will improve the curriculum and its and its applicability to the real situations faced by teachers in the classroom. It will also allow a degree of teacher control of the curriculum previously absent.

#18


At the end of the process, the committee will submit a short report detailing the decisions made. In the event that a new sample is proposed, the sample will, of course, also be submitted.

Most centers have more than five faculty members which means that some teachers would be “unemployed” in the committee work. Since it is good for everyone to participate in the process, the “unemployed” teachers would serve as adjuncts to the actual committee members. To avoid any undue of one particular center in the committee work, there will be only one official member per level per center.

A few logistical notes . . .

#19


Readings:

It is crucial that INTERLINK teachers understand the pedagogical assumptions and implications of using rubrics for assessment. The following link contains some materials that should be read by all faculty.

Readings on rubrics

#20

http://eslus.com/ilc/BSCs.htm


THE END

There are some things that count that can't be counted. And some things that can be counted that don't count.

Before beginning your committee work, please thoroughly review the materials referenced on the previous slide. If you have questions or want to discuss something related to the Benchmarks or samples, feel free to email me at [email protected] or Skype me (marfed961). I wish you all good luck with the project and hope that is useful, enjoyable and productive.

#21

mailto:[email protected]