assessing students' opportunity to learn the intended curriculum using an online teacher log:...

27
This article was downloaded by: [University of Winnipeg] On: 11 September 2014, At: 11:30 Publisher: Routledge Informa Ltd Registered in England and Wales Registered Number: 1072954 Registered office: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK Educational Assessment Publication details, including instructions for authors and subscription information: http://www.tandfonline.com/loi/heda20 Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence Alexander Kurz a , Stephen N. Elliott a , Ryan J. Kettler b & Nedim Yel a a Arizona State University b Rutgers University Published online: 14 Aug 2014. To cite this article: Alexander Kurz, Stephen N. Elliott, Ryan J. Kettler & Nedim Yel (2014) Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence, Educational Assessment, 19:3, 159-184, DOI: 10.1080/10627197.2014.934606 To link to this article: http://dx.doi.org/10.1080/10627197.2014.934606 PLEASE SCROLL DOWN FOR ARTICLE Taylor & Francis makes every effort to ensure the accuracy of all the information (the “Content”) contained in the publications on our platform. However, Taylor & Francis, our agents, and our licensors make no representations or warranties whatsoever as to the accuracy, completeness, or suitability for any purpose of the Content. Any opinions and views expressed in this publication are the opinions and views of the authors, and are not the views of or endorsed by Taylor & Francis. The accuracy of the Content should not be relied upon and should be independently verified with primary sources of information. Taylor and Francis shall not be liable for any losses, actions, claims, proceedings, demands, costs, expenses, damages, and other liabilities whatsoever or howsoever caused arising directly or indirectly in connection with, in relation to or arising out of the use of the Content. This article may be used for research, teaching, and private study purposes. Any substantial or systematic reproduction, redistribution, reselling, loan, sub-licensing, systematic supply, or distribution in any form to anyone is expressly forbidden. Terms & Conditions of access and use can be found at http://www.tandfonline.com/page/terms- and-conditions

Upload: nedim

Post on 17-Feb-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

This article was downloaded by: [University of Winnipeg]On: 11 September 2014, At: 11:30Publisher: RoutledgeInforma Ltd Registered in England and Wales Registered Number: 1072954 Registeredoffice: Mortimer House, 37-41 Mortimer Street, London W1T 3JH, UK

Educational AssessmentPublication details, including instructions for authors andsubscription information:http://www.tandfonline.com/loi/heda20

Assessing Students' Opportunity to Learnthe Intended Curriculum Using an OnlineTeacher Log: Initial Validity EvidenceAlexander Kurza, Stephen N. Elliotta, Ryan J. Kettlerb & Nedim Yelaa Arizona State Universityb Rutgers UniversityPublished online: 14 Aug 2014.

To cite this article: Alexander Kurz, Stephen N. Elliott, Ryan J. Kettler & Nedim Yel (2014) AssessingStudents' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial ValidityEvidence, Educational Assessment, 19:3, 159-184, DOI: 10.1080/10627197.2014.934606

To link to this article: http://dx.doi.org/10.1080/10627197.2014.934606

PLEASE SCROLL DOWN FOR ARTICLE

Taylor & Francis makes every effort to ensure the accuracy of all the information (the“Content”) contained in the publications on our platform. However, Taylor & Francis,our agents, and our licensors make no representations or warranties whatsoever as tothe accuracy, completeness, or suitability for any purpose of the Content. Any opinionsand views expressed in this publication are the opinions and views of the authors,and are not the views of or endorsed by Taylor & Francis. The accuracy of the Contentshould not be relied upon and should be independently verified with primary sourcesof information. Taylor and Francis shall not be liable for any losses, actions, claims,proceedings, demands, costs, expenses, damages, and other liabilities whatsoever orhowsoever caused arising directly or indirectly in connection with, in relation to or arisingout of the use of the Content.

This article may be used for research, teaching, and private study purposes. Anysubstantial or systematic reproduction, redistribution, reselling, loan, sub-licensing,systematic supply, or distribution in any form to anyone is expressly forbidden. Terms &Conditions of access and use can be found at http://www.tandfonline.com/page/terms-and-conditions

Page 2: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

Educational Assessment, 19:159–184, 2014

Copyright © Taylor & Francis Group, LLC

ISSN: 1062-7197 print/1532-6977 online

DOI: 10.1080/10627197.2014.934606

Assessing Students’ Opportunity to Learn theIntended Curriculum Using an Online Teacher

Log: Initial Validity Evidence

Alexander Kurz and Stephen N. ElliottArizona State University

Ryan J. KettlerRutgers University

Nedim YelArizona State University

This study provides initial evidence supporting intended score interpretations for the purpose of

assessing opportunity to learn (OTL) via an online teacher log. MyiLOGS yields 5 scores related

to instructional time, content, and quality. Based on data from 46 middle school classes, the

evidence indicated that (a) MyiLOGS has high usability, (b) its quarterly summary scores are

relatively consistent over time, (c) summary scores based on 20 randomly sampled log days provide

reliable estimates of teachers’ respective yearly summary scores, and (d) most teachers report

positive consequences from using the instrument. Agreements between log data from teachers

and independent observers were comparable to agreements reported in similar studies. Moreover,

several OTL scores exhibited moderate correlations with achievement and virtually nonexistent

correlations with a curricular alignment index. Limitations and directions for future research to

strengthen and extend this initial evidence are discussed.

Current test-based accountability contingencies targeted at schools are intended to compelteachers and administrators to improve relevant instructional inputs and processes in ways that

can lead to student achievement of intended outcomes. Although annual state assessments are

designed to yield test scores that permit valid interpretations of what students know and are

able to do, the evidence is rarely sufficient to make valid test score inferences about teachers’

instructional provisions (Polikoff, 2010). However, if the psychometric property of instructional

Correspondence should be sent to Alexander Kurz, T. Denny Sanford School of Social and Family Dynamics,

Arizona State University, 951 S Cady Mall, Tempe, AZ 85287. E-mail: [email protected]

Color versions of one or more of the figures in the article can be found online at www.tandfonline.com/heda.

159

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 3: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

160 KURZ ET AL.

sensitivity remains unknown, then test users cannot be confident that these assessments register

differences in instruction (D’Agostino, Welsh, & Corson, 2007). Teachers’ efforts to providestudents with the opportunity to learn the intended curriculum thus are likely to remain

unmeasured and unaccounted for in most test-based accountability systems.

Researchers interested in the concept of opportunity to learn (OTL) have established a range

of OTL indices that can lead to more direct, and potentially more valid, score interpretations

about time, content, and quality differences in teachers’ enacted curricula than inferences basedon test scores from summative, large-scale assessments alone (Kurz, 2011). Following a process

of validation outlined in the Standards for Educational and Psychological Testing (American

Educational Research Association, American Psychological Association, & National Council

on Measurement in Education, 1999), we present a summary of initial evidence supporting

intended score interpretations for the purpose of assessing OTL via an online teacher log

called the Instructional Learning Opportunities Guidance System (MyiLOGS; Kurz, Elliott, &Shrago, 2009). The summary includes multiple sources of evidence—usability, reliability, as

well as validity evidence based on content, responses processes, internal structure, relations

to other variables, and consequences of using the measure—and a critical appraisal of this

evidence in light of the proposed score interpretations and uses. To establish context for the

study and evidence based on content, we begin by providing a brief summary of researchrelated to OTL and the extent to which the instrument’s OTL indices represent the content

domains identified in the literature.

DEFINING OTL

For decades, researchers have examined instructional indicators of the enacted curriculum under

the larger concept of OTL (Rowan & Correnti, 2009). Kurz (2011) reviewed the respective

research literature and identified major lines of OTL research related to the time, content,

and quality of classroom instruction. His conceptual synthesis of OTL acknowledged the co-occurrence of all three enacted curriculum dimensions during instruction (see Figure 1). That

is, teachers allocate instructional time and content coverage to the standards that define the

intended curriculum using a variety of pedagogical approaches. The conceptual model depicts

OTL as a matter of degree along three orthogonal axes with distinct zero points.

OTL Indices

Carroll (1963) provided one of the first operational definitions of OTL according to the time

allocated to instruction in a school’s schedule (i.e., allocated time). Subsequently, researchers

developed more instructionally sensitive indices for descriptive purposes and to examine their

contributions to student achievement (see Borg, 1980). Such indices were based on the pro-portion of allocated time dedicated to instruction (i.e., instructional time), the proportion of

instructional time during which students were engaged (i.e., engaged time), or the proportion

of engaged time during which students experienced a high success rate (i.e., academic learning

time).

Researchers also defined OTL in relation to the content covered during instruction. The

main focus was the extent to which the content of instruction overlapped with the content of

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 4: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 161

FIGURE 1 Conceptual model of opportunity to learn. Source: Kurz, 2011. Copyright © Springer ScienceC

Business Media, LLC 2011. Reprinted with kind permission from Springer Science and Business Media.

assessments (i.e., content overlap). The work of Husén (1967) for the International Association

of the Evaluation of Educational Achievement exemplifies this line of research, which typically

requires teachers to rate their coverage of the constructs assessed by test items. Followingstandards-based reform, policymakers shifted the normatively desirable target of instruction

from tested content to the broader intended curriculum, whose content is merely sampled

by large-scale achievement tests (Rowan, Camburn, & Correnti, 2004). Under the No Child

Left Behind Act (2001), states have been required to define their subject- and grade-specific

intended curricula through a set of rigorous academic standards. Subsequently, stakeholders

became more interested in taxonomies that allowed experts to judge the alignment betweenthe content of various curricula such as a teacher’s enacted curriculum and a state’s intended

curriculum. Porter (2002), for example, developed the Surveys of Enacted Curriculum (SEC),

which have been used to quantify alignment between standards and assessments (as well as

other curricula) via structured ratings along a comprehensive list of content topics and cognitive

demands (see Roach, Niebling, & Kurz, 2008).Researchers further considered aspects of instructional quality to operationalize OTL. Teach-

ers’ uses of empirically supported instructional practices and instructional resources, for exam-

ple, have become common considerations, especially following the findings from the process-

product literature (see Brophy & Good, 1986). More recently, meta-analytic findings have been

used by researchers and practitioners to identify specific instructional practices that contributeto student achievement (Slavin, 2002) including the achievement of specific subgroups such

as students with disabilities (e.g., Gersten et al., 2009). Examples include explicit instruction

(i.e., modeling and engaging students in a step-by-step approach to solving a problem), visual

representations, and guided feedback. Instructional grouping formats other than whole class

also have received support in meta-analytic reviews (see Elbaum, Vaughn, Hughes, Moody, &

Schumm, 2000).

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 5: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

162 KURZ ET AL.

TABLE 1

Enacted Curriculum Dimensions, Relevant OTL Indices, and Proposed Definitions

Enacted

Curriculum Dimension OTL Index Index Definition

Time Instructional time Instructional time dedicated to teaching the general curriculum

standards and, if applicable, any custom objectives.

Content Content coverage Content coverage of the general curriculum standards and, if

applicable, any custom objectives.

Quality Cognitive processes Emphasis of cognitive process expectations along a range from

lower order to higher order thinking skills.

Instructional practices Emphasis of instructional practices along a range from generic

to empirically supported practices.

Grouping formats Emphasis of grouping formats along a range from individual to

whole class instruction.

Note. OTL D opportunity to learn.

OTL Frameworks

Stevens (1993) provided the first conceptual framework of OTL, bringing together four ele-

ments: content coverage, content exposure (i.e., time on task), content emphasis (i.e., emphasis

of cognitive processes), and quality of instructional delivery (i.e., emphasis of instructional prac-

tices). Despite its lack of operationalization, her framework has guided numerous researchersinterested in OTL (e.g., Abedi, Courtney, Leon, Kao, & Azzam, 2006; Herman & Abedi, 2004;

Wang, 1998). Most important, Stevens clarified OTL as a teacher effect related to the allocation

of adequate instructional time covering a core curriculum via different cognitive demands and

instructional practices that can produce student achievement.

According to the model by Kurz (2011), which was based on the aforementioned literature,OTL is a matter of degree related to the temporal, curricular, and qualitative aspects of a

teacher’s instruction. To provide OTL, a teacher must dedicate instructional time to covering

the content prescribed by the intended curriculum using pedagogical approaches that address a

range of cognitive processes, instructional practices, and grouping formats. Table 1 provides a

breakdown of the three enacted curriculum dimensions as well as the selection of OTL indicesand respective definitions that informed the scores of MyiLOGS. The indices related to the

quality of instruction require further clarification. We understand that the mere implementation

of certain cognitive processes, instructional practices, and grouping formats is neither an

exhaustive description of instructional quality nor a straightforward guarantee that a specific

demand, practice, or format itself was implemented with high quality. Instead we adopt a more

general stance used in prior OTL research (e.g., Brophy & Good, 1986; Carroll, 1989; Stevens,1993), which posits that certain instructional variables can impact the quality of instruction as

evidenced by their relation to student achievement.

OTL Measures

Many measures of classroom instruction assess aspects of a teacher’s enacted curriculum (e.g.,

Connor et al., 2009; Pianta & Hamre, 2009). Similarly, most OTL measures are designed to

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 6: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 163

assess instructional inputs and processes that contribute to student achievement of intended

outcomes (Herman, Klein, & Abedi, 2000; McDonnell, 1995). Given current accountabilitysystems, these intended outcomes are clearly specified via a state’s academic standards-hence

Kurz’s particular definition of OTL. Very few measures address this content focus in ways that

provide actual time estimates (e.g., minutes) and information on instructional practices (see

Kurz, 2011). In addition, some measures provide scores for the entire class (e.g., Porter, 2002),

individual students (e.g., Rowan, Camburn, & Correnti, 2004), or both (e.g., Kurz, Talapatra,& Roach, 2012). Last, measurement methods can rely primarily on direct observation (e.g.,

Pianta & Hamre, 2009), teacher self-report (e.g., Porter, 2002), or a combination of both (e.g.,

Kurz et al., 2014).

Given the cost of frequent observations and related challenges of generalizing from a limited

sample of teaching observations to a universe of teaching events across the school year, Rowan

and colleagues have argued for teacher logs (e.g., Rowan et al., 2004; Rowan & Correnti,2009). Historically, the complexity and variability of classroom instruction have led most

OTL researchers to adopt a teacher logging approach (Burstein & Winters, 1994). Teacher

logs, however, can be administered as end-of-year surveys, which require teachers to provide a

summative classwide account of their instruction across the entire school year, or intermittently

based on a set of discreet days for individual students. Empirical evidence does not supportthe use of summative teacher logs for high- and low-frequency teaching behaviors and more

complex accounts of instruction (see Rowan et al., 2004). In addition, some evidence suggests

that classwide OTL indices differ from student-specific indices for students with disabilities

(Kurz et al., 2014). These findings call into question the extent to which classwide OTL indices

can be generalized to individual students nested within the same class.

OTL Studies

Researchers and policymakers have used OTL studies in a number of contexts to (a) describe

the instructional opportunities offered to different groups of students, (b) monitor the effects of

school reform efforts, and (c) understand and improve students’ academic achievement (Hermanet al., 2000; Porter, 1991; Roach et al., 2009). The development and initial use of MyiLOGS

occurred in the context of special education to assess OTL for students with disabilities nested

in either general or special education classes. The importance of OTL studies for students with

disabilities is grounded in a policy rationale, which requires compliance with federal legislation

such as the Individuals with Disabilities Education Act (1997) mandating students’ access tothe general education curriculum including its academic standards (Karger, 2005). In addition,

the participation of students with disabilities in tests that assess grade-level standards further

necessitates their exposure to the content of these standards to ensure the validity of certain

test score inferences (Wang, 1998). Finally, recent findings have raised concerns about OTL for

students with disabilities: limited use of allocated time for instruction (Vannest & Hagan-Burke,2010), low exposure to standards-aligned content (Kurz, Elliott, Wehby, & Smithson, 2010), and

inconsistent use of evidenced-based practices (Burns & Ysseldyke, 2009), as well as other issues

related to instructional quality (Vaughn, Levy, Coleman, & Bos, 2002). Operationalizing and

measuring OTL thus can quantify students’ access to the general education curriculum, provide

evidence concerning valid test score inferences, and identify areas of classroom instruction in

need of intervention.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 7: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

164 KURZ ET AL.

ASSESSING OTL WITH MyiLOGS

The MyiLOGS OTL measure was designed to assess students’ opportunity to learn the intended

curriculum based on five OTL indices along three enacted curriculum dimensions: time, content,

and quality. This online teacher log is completed concurrently with a teacher’s instructional

planning and implementation efforts. For every school day (so-called calendar days), teachers

are asked to report on their instructional time dedicated to the state-specific academic standardsand any custom objectives such as other valued academic skills not included in the standards or

Individualized Education Program objectives for students with disabilities. Based on a random

sample of 2 weekdays (so-called detail days), teachers are further asked to report on additional

details related to instructional quality for their overall class and individual students. Specifically,

teachers report on the cognitive processes expected for each enacted content standard and their

various instructional practices implemented in a particular grouping format.

Scores

Based on five OTL indices, five major OTL scores are calculated (see Table 2). One score isrelated to time (i.e., Time on Standards), one to content (i.e., Content Coverage), and three to

quality (i.e., Cognitive Processes, Instructional Practices, Grouping Formats). The score for time

is a percentage based on a teacher’s allocated class time. The score for content is a percentage

TABLE 2

OTL Scores, Definitions, and Quarter Calculations

OTL Score Score Definition Quarter Calculation

Time on standardsa Percentage of allocated class time used for instruction

on the state-specific academic standards.

Based on 40 logged

calendar days

Content coveragea Percentage of state-specific academic standards

addressed for one minute or more.

Based on 40 logged

calendar days

Cognitive processesb Sum of differentially weighted percentages of

instructional time dedicated to each cognitive

process expectation (Attend and Remember x1;

Understand/Apply, Analyze/Evaluate, and Create

x2).

Based on 8 logged

detail days

Instructional practicesb Sum of differentially weighted percentages of

instructional time dedicated to each instructional

practice (Used Independent Practice and Other

Instructional Practices x1; Provided Direct

Instruction, Provided Visual Representation, Asked

Question, Elicited Think Aloud, Provided Guided

Feedback, and Assessed Student Knowledge x2).

Based on 8 logged

detail days

Grouping formatsb Sum of differentially weighted percentages of

instructional time dedicated to each grouping format

(Whole Class x1; Individual andSmall Group x2).

Based on 8 logged

detail days

Note. OTL D opportunity to learn.aScore can be calculated based on 1 or more calendar days. A typical week features 5 calendar days.bScore can be calculated based on 1 or more detail days. A typical week features 2 random detail days.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 8: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 165

based on the total number of standards for a particular subject and grade. The three scores for

quality are percentages based on the time dedicated to one of two categories. The weighting foreach category is either 1.00 or 2.00. Scores of 1.00 indicate an exclusive focus on lower order

thinking skills (i.e., attend, remember), or generic instructional practices (i.e., independent

practice, other instructional practices), or whole-class instruction. Scores of 2.00 indicate

an exclusive focus on higher order thinking skills (i.e., understand/apply, analyze/evaluate,

create), or evidence-based instructional practices (i.e., direct instruction, visual representations,questions, think aloud, guided feedback, reinforcement, assessment), or individual/small group

instruction. Given an allocated class time of 60 min, for example, a teacher who spends 15 min

asking students to recall definitions of triangles .0:25 � 1 D 0:25/ and 45 min having students

create examples of these triangle types .0:75 � 2 D 1:50/ would receive a Cognitive Process

Score of 1.75. As such, the score is a linear transformation of the percentage of time spent in

any of the high-order categories.The weighting for the two categories is partly intended to prevent potentially negative

user associations with a score of 0. More important, the weighting and use of two categories

for all three quality-related scores is grounded in two operating assumptions: (a) teachers

address a range of cognitive processes, instructional practices, and grouping formats during

the course of their instruction; and (b) teachers who emphasize higher order thinking skills,evidence-based instructional practices, and alternative grouping formats can improve the quality

of students’ opportunity to learn valued knowledge and skills. Although the empirical basis

for these assumptions is insufficient to single out specific processes, practices, or formats,

we decided on a dichotomous grouping for two reasons. First, teachers must move expected

cognitive processes beyond recall to promote a transfer of knowledge (Anderson et al., 2001;Mayer, 2008). As such, teachers emphasizing high-order thinking skills should receive scores

closer to 2.00. Second, given empirical support for evidence-based instructional practices and

grouping formats other than whole class, teachers emphasizing the latter should also receive

scores closer to 2.00.

Although it is possible to calculate all scores based on a single day of logging, all scores

are intended to be used and interpreted as quarterly and yearly summary scores. Two monthsof school should yield 40 or more calendar days and 8 or more detail days. The quarter score

calculations are thus based on sets of 40 consecutively logged calendar days for the Time on

Standards and Content Coverage scores and on sets of 8 consecutively logged detail days for the

Cognitive Processes, Instructional Practices, and Grouping Formats scores. With the exception

of the Content Coverage score, quarterly summary scores represent the average percentageacross sets of consecutively logged days. The yearly summary score represents the average

percentage across all available log days. Given that the Content Coverage score is calculated

cumulatively, its four quarterly summary scores thus are based on Day 40, 80, 120, and 160,

respectively. For example, a Content Coverage score of 0.10 calculated on Day 40 represents

the first quarter summary score and indicates a teacher covered 10% of the academic standards(for at least a minute or more) during the first 40 days of logging.

Score Interpretations

For the purpose of assessing OTL—the extent to which a teacher dedicates instructional time to

cover the content prescribed by intended standards using a range of cognitive processes, instruc-

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 9: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

166 KURZ ET AL.

tional practices, and grouping formats—MyiLOGS scores are designed to allow interpretations

about a teacher’s (a) time spent on academic standards; (b) content coverage of academicstandards; (c) emphases along a range of cognitive processes, especially lower order versus

higher order thinking skills; (d) emphases along a range of instructional practices, especially

generic versus evidence-based instructional practices; and (e) emphases along a range of

instructional grouping formats, especially whole-class versus more differentiated instructional

groupings based on small groups or individual students.

EVIDENCE BASED ON CONTENT

The five main OTL scores calculated via MyiLOGS were developed on the basis of theory and

the empirical OTL literature and can be used to address all three enacted curriculum dimensions

relevant to the concept of OTL. The initial measure was pilot tested for several months with

a small group of general and special education teachers across three states. Subsequently,we used feedback from teachers and a panel of experts to refine the selection of cognitive

processes and instructional practices. The panel included instructional leaders from three state

departments of education, as well as a team of university researchers and consultants with

expertise in curriculum, measurement, and special education. Given the limitations of end-of-

year summative surveys, the finalized measure allowed teachers to gather OTL data on a dailybasis to maximize generalizability to a universe of teaching events across the school year. In

addition, teachers were able to provide OTL data for their overall class and individual students.

As such, the measure can be used to collect a large number of data points for both the overall

class and nested target student—thereby addressing issues of generalizability and instructional

differentiation.

Although the available evidence based on content suggests that MyiLOGS addresses allpreviously outlined content domains of OTL via the five OTL scores, the selection of scores

related to instructional quality underrepresents instructional resources (Herman et al., 2000), a

consideration in some OTL research. The MyiLOGS teacher profile gathers key information on

instructional resources (e.g., teacher preparation, teaching experience, participation in relevant

in-service education), however, these data do not influence score calculations for instructionalquality. Moreover, no information is gathered on material resources (e.g., availability of instruc-

tional materials). Next, we describe the methods used to collect and analyze the data under the

remaining evidence categories.

METHOD

Participants

The teacher participant sample featured 38 general and special education teachers from seven

middle schools in Arizona (n D 15 teachers), five middle schools in Pennsylvania (n D 12

teachers), and five middle schools in South Carolina (n D 11 teachers). To be included in the

study, each general and special education teacher had to provide mathematics (MA) and/or

reading (RE) instruction to two eighth-grade students with disabilities. The subject-specific

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 10: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 167

samples across states were comprised as follows: (a) 19 teachers provided OTL data on 20

MA classes featuring 39 target students, and (b) 23 teachers provided OTL data on 26 REclasses featuring 50 target students. Several teachers logged multiple classrooms (e.g., two

different MA classes), and some of the same target students were logged by multiple teachers

(e.g., a MA teacher and a RE teacher). Out of the 46 classrooms logged by teachers, 29

were (full-inclusion) general education classes and 17 were (self-contained) special education

classes.The target student sample .N D 56/ largely comprised boys and students with learning

disabilities. The Arizona subsample was predominately Hispanic, and the subsamples in Penn-

sylvania and South Carolina were predominately Caucasian and African American. The Arizona

subsample further featured a very large proportion of students on free/reduced lunch. To further

describe the target sample, teachers were asked to rate students’ performance levels in the areas

of MA, RE, motivation, and prosocial behavior via the Performance Screening Guide (Elliott& Gresham, 2008) and students’ academic skills and enablers via the Academic Competence

Evaluation Scales (DiPerna & Elliott, 2000).

The mean level ratings via the Performance Screening Guide across all three states indicated

that the target student sample generally performed at Level 2 (in need of intervention) in both

academic areas and at Level 3 (at risk for problems) in the Motivation to Learn and ProsocialBehavior areas. The mean total scores via the Academic Competence Evaluation Scales further

placed students’ academic skills across all three states in the Developing range (first decile

nationally) and students’ academic enabling behaviors in the Competent range (fourth decile

nationally). The teachers’ low academic ratings of the target student sample were consistent

with students’ below proficient performance on previous year’s state test. About 91% of allparticipating students performed below proficiency in MA and RE.

Measures and Procedures

MyiLOGS

This online teacher log (www.myilogs.com) features the state-specific academic standards

for various subjects (including Common Core State Standards) and additional customizable

skills that allow teachers to add student-specific objectives (e.g., Individualized Education

Program objectives). The measure therefore allows teachers to document the extent to which

their classroom instruction covers individualized intended curricula. To this end, MyiLOGSprovides teachers with a monthly instructional calendar that includes an expandable sidebar,

which lists all intended objectives for a class. Teachers drag and drop planned standards

that are to be the focus of the lesson onto the respective calendar days and indicate the

approximate number of minutes dedicated to each standard. After the lesson, teachers are

required to confirm enacted standards, instructional time dedicated to each standard, and anytime not available for instruction (due to transitions, class announcements, etc.) at the class

level. In addition, two randomly selected days per week require further documentation. On

these detail days, teachers report on additional time emphases related to the standards listed on

the calendar including cognitive process expectations, instructional practices, grouping formats,

and time not available for instruction. This detailed reporting occurs for the overall class and

individual students along two 2-dimensional matrices. For the first matrix (see Figure 2),

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 11: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

168 KURZ ET AL.

FIGURE 2 Screenshot of the Objective � Cognitive Process matrix.

teachers report on the instructional minutes allocated per standard along five cognitive process

expectations for student learning adapted from the revised version of Bloom’s taxonomy (seeAnderson et al., 2001): Attend, Remember, Understand/Apply, Analyze/Evaluate, and Create.

MyiLOGS also includes an Attend category, which is not part of the revised Bloom’s taxonomy.

The cognitive expectation of Attend allows teachers to differentiate between the expectation

of students (passively) listening to instructional tasks and related instructions and (actively)

recalling information such as a fact, definition, term, or simple procedure. Similar categorieshave been used in the context of special education, especially for students with significant

cognitive disabilities (e.g., Karvonen, Wakeman, Flower, & Browder, 2007).

For the second matrix (see Figure 3), teachers report on the instructional minutes allocated

per instructional practice along three grouping formats. In Table 3, seven instructional practices

are marked by a table note to indicate empirical support on the basis of research syntheses andmeta-analyses (e.g., Gersten et al., 2009; Marzano, 2000; Vaughn, Gersten, & Chard, 2000).

In addition, grouping formats other than whole class also have received empirical support

for improving learning outcomes (see Elbaum et al., 2000). “Other instructional practices”

FIGURE 3 Screenshot of the instructional Practice � Grouping Format matrix.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 12: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 169

TABLE 3

Instructional Practices and Definitions

Instructional Practice Definition

Provided direct instructiona Teacher presents issue, discusses or models a solution approach, and engages

students with approach in similar context.

Provided visual representationsa Teacher uses visual representations to organize information, communicate

attributes, and explain relationships.

Asked questionsa Teacher asks questions to engage students and focus attention on important

information.

Elicited think alouda Teacher prompts students to think aloud about their approach to solving a

problem.

Used independent practice Teacher allows students to work independently to develop and refine

knowledge and skills.

Provided guided feedbacka Teacher provides feedback to students on work quality, missing elements,

and observed strengths.

Provided reinforcementa Teacher provides reinforcement contingent on previously established

expectations for effort and/or work performance.

Assessed student knowledgea Teacher uses quizzes, tests, student products, or other forms of assessment to

determine student knowledge.

Other instructional practices Any other instructional practices not captured by the aforementioned key

instructional practices.

aThis instructional practice has received empirical support across multiple studies.

represents a generic category to allow teachers to report on their entire allocated class timeusing the available selection of instructional practices and/or “time not available for instruction.”

Teachers use the latter category to indicate any noninstructional minutes (e.g., transitions,

announcements, fire drills), which together with instructional minutes must add up to the total

allocated class time. The grouping formats were defined as follows: (a) Individual: Instructional

action is focused on individuals working on different tasks; (b) Small Group: Instructional actionis focused on a small groups working on the different tasks; (c) Whole Class: Instructional

action is focused on the whole class working on the same task. If students are working on the

same 20 math problems on their own, then the grouping format remains Whole Class (i.e., no

task differentiation). If one group is working on defining isosceles triangles and another one

on defining equilateral triangles, then the grouping format is Small Group (i.e., the task wasdifferent by group).

Training Surveys

At the conclusion of the 3-hr MyiLOGS training session, all participants completed a 9-item survey to provide information on their satisfaction with the training and the software. In

addition, eight months after completion of the data collection phase, all participants were asked

to complete a follow-up survey regarding the utility of MyiLOGS and the associated MyiLOGS

report that summarized their instructional provisions during the previous year. Participants also

were asked to complete a final instructional scenario comparable to the scenarios completed

during the performance assessment to determine maintenance of skills to use MyiLOGS.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 13: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

170 KURZ ET AL.

SEC

This online survey (www.seconline.org), typically conducted at the end of the year, provides

information on the alignment between intended, enacted, and assessed curricula. The SEC

alignment method hereby relies on content translations by teachers (for purposes of the enacted

curriculum) and curriculum experts (for purposes of the intended and assessed curriculum) who

code a particular curriculum into a content framework that features a comprehensive K–12 list

of subject-specific topics. The SEC content frameworks in MA and RE include 183 and 163topics, respectively. All content translations occur along a 2-dimensional matrix of topics (e.g.,

multiply fractions) and cognitive demands (e.g., memorize). Teachers report on their enacted

curriculum at the end of the school year by describing different instructional emphases for each

topic and any applicable cognitive expectations using a 4-point scale. As such, instructional

time is not directly assessed via the SEC. To calculate alignment between two content matrices,the data in each matrix are reduced to cell-by-cell proportions with their sum across all rows

and columns equaling 1.00. Porter’s (2002) alignment index (AI) takes both dimensions (i.e.,

topics and cognitive demands) into consideration when calculating the content overlap between

two matrices according to this formula: AI D 1 � Œ.†jxi � yi j/=2�, where xi indicates the cell

proportion in cell i for matrix x and yi indicates the cell proportion in cell i for matrix y. Theindex thus ranges from 0 to 1, the latter indicating perfect alignment.

Assuming accurate recall, the AI can provide information about the extent to which a

teacher’s enacted curriculum matches the content topics and cognitive expectations expressed

in the academic content standards of the general curriculum. However, the SEC employs several

levels of inference to determine this index. Unlike MyiLOGS, which allows teachers to directly

report on instructional time and content coverage allocated to state-specific standards, the SECrelies on (a) expert judgment to translate the state-specific standards into a content matrix and

(b) teacher judgment to translate their enacted curricula into a second set of content matrices.

Only the subsequent comparison of both matrices ultimately determines the AI. An additional

limitation of the AI as an OTL proxy stems from the overlap calculation at the intersection

of topic and cognitive demand, which does not offer separate scores for content coverage andcognitive process expectations. A teacher who emphasized the same topics indicated by the

standards thus can receive an AI of 0, if the topics were emphasized at a different category of

cognitive of demand from what the standards prescribed. Given that MyiLOGS scores provide

separate information on key OTL indices that are otherwise combined in the AI, we expected

small correlations .r < :3/ among scores. We note that the SEC can yield data that allow for

alignment calculations at the marginal for topic or cognitive demand only (Polikoff, Porter, &Smithson, 2011). For purposes of this study, we relied on the commonly used AI, which was

calculated based on the aforementioned formula using enacted curriculum matrices (established

by participants) and the respective intended curriculum matrices for their state, subject, and

grade. The latter matrices were provided directly through the Measures of the Enacted Curricu-

lum Project at the Wisconsin Center for Education Research. As such, they were establishedusing multiple raters and standard SEC methods (Porter, Polikoff, Zeidner, & Smithson, 2008).

State Tests

In three states, paper-and-pencil assessments designed to measure student achievement of

state standards were used to provide summative data on the extent to which students have

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 14: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 171

achieved the academic standards of the general curriculum for 8th-grade MA and RE: the

Arizona Instrument to Measure Standards, the Pennsylvania System of School Assessment,and South Carolina’s Palmetto Assessment of State Standards. Given previous associations

between the five OTL indices and student achievement, we expected medium correlations

.r > :3/ between classwide OTL scores and class achievement.

Training Procedures

Each teacher received the standard professional development on the use of MyiLOGS,which focused sequentially on four elements: worked example (15 min), guided practice (1

hr), performance assessment (45 min to 1 hr), and independent practice (1 hr). For purposes

of the performance assessment, teachers had to pass a sequence of tests. These tests featured

written instructional scenarios that summarized typical lessons. Teachers had to correctly log the

instructional scenario via MyiLOGS. Teachers had to pass two scenarios with 100% accuracy

to be able to continue in the study.To ensure accurate use of the SEC, the lead author worked with the director of the Measures

of the Enacted Curriculum Project at the Wisconsin Center for Education Research to develop

a training video that reviewed the online completion procedures and logging conventions of the

SEC. The 30-min video also reviewed the similarities and differences of the cognitive process

expectations between the SEC and MyiLOGS. Prior to using the SEC, all participants had toreview the training video.

Study Procedures

Personnel in each state began the recruitment process at the beginning of the 2010–2011

school year. The trainings were implemented in Arizona during the months of September and

October followed by Pennsylvania and South Carolina. Of 41 recruited teachers, 38 could be

trained to criterion during the allotted training time. All participants were compensated for theirtime spent on study-related tasks. Each teacher received a $150 honorarium for participation in

the MyiLOGS training and $100 per month for using MyiLOGS to report on daily classroom

instruction. The monthly compensation was contingent on timely completion of MyiLOGS,

which was monitored through biweekly procedural fidelity checks. The required logging period

for all participants was 4 full months after the teacher training with the option to continuethrough the month of April 2011. At the end of school year, participants further completed the

SEC.

Observation Procedures

Each teacher participant was observed at least once during his or her logging period. An

additional 20% of teachers were randomly selected to receive a total of three observationsresulting in 51 observations across all 38 participants. Trained observers used an observation

form that mirrored the two 2-dimensional matrices used in the MyiLOGS software to code the

dominant cognitive process per standard and the dominant instructional practice per grouping

format observed during a 1-min interval. For training purposes, observers had to obtain an

overall agreement percentage of 80% or higher on two consecutive 30-min sessions. A vibrating

timer on a fixed interval was used to indicate the 1-min recording mark. Interobserver agreement

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 15: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

172 KURZ ET AL.

was collected on about 30% of all observation sessions across states. All observations sessions

lasted for the entire class period.For agreement purposes, cell-by-cell agreement was calculated for each matrix based on cell

estimates within a 3-min range or less. For each matrix, interrater agreement was calculated

as the total number of agreements divided by the sum of agreements and disagreements. In

addition, an overall interrater agreement percentage was calculated as the total number of

agreements across both matrices divided by the sum of agreements and disagreements acrossboth matrices. That latter index was used in establishing the training criterion (at or above 80%)

and retraining criterion (below 80%) for observers. Agreement percentages between observers

as well as teachers and observers are reported in the Results section.

Design and Data Analysis

Data for the usability and consequences related to MyiLOGS were collected using a teacher

survey and reported using descriptive statistics. The reliability of MyiLOGS scores was es-

timated via Pearson correlations between summary scores based on randomly selected setsof days and yearly summary scores as indicators of precision (i.e., how precisely can yearly

summary scores be estimated using smaller sets of randomly sampled days) and via Pearson

correlations between quarterly summary scores as indicators of stability (i.e., how stable are

quarterly summary scores from one set of consecutively logged days to the other). The validity

of inferences drawn from MyiLOGS scores was characterized using multiple forms of evidence,as suggested by the Standards for Educational and Psychological Testing (AERA, APA, &

NCME, 1999). Evidence based on response processes was indicated using descriptive statistics

about the degree to which teachers appropriately logged information in the measure. Evidence

based on internal structure was indicated by a matrix of Pearson correlations among all five

major OTL scores. Evidence based on relations to other variables was indicated by correlations

between MyiLOGS indices and data from direct observations, the SEC, and class achievementtest scores.

RESULTS

Usability and Evidence Based on Response Processes

Nielsen (1994) defined usability along five quality components that were applied to the My-

iLOGS software: (a) learnability (i.e., ease of logging), (b) efficiency (i.e., logging time once

trained), (c) memorability (i.e., ease of reestablishing proficiency after a period of nonuse),

(d) errors (i.e., frequency and severity of errors), and (e) satisfaction. Evidence that supports

learnability and low error rates is based on the fact that 93% of users could be trained to anerror-free criterion in about 3 hr of training. The software’s learnability is further supported

by post-training and follow-up survey results (see Table 4) related to understanding the system

and being able to use it reliably (Posttraining Questions 3, 4, 7, and 9).

Evidence related to efficiency and response processes was based on website user statistics,

which indicated that participants completed their logs concurrently with their instructional

efforts as intended two to three times per week with a relatively small time investment.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 16: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 173

TABLE 4

Posttraining and Follow-up Survey Results

Posttraininga

Follow-Up

(n D 26)

Question

No. Question Stem M SD M SD

1 Professional development related to the content standards

is important for promoting effective instruction.

5.8 0.4 5.6 0.6

2 Comprehensive, high-quality coverage of the content

standards is an important part of effective instruction.

5.8 0.4 5.6 0.6

3 The MyiLOGS training was helpful for understanding how

to use the system.

5.9 0.3 5.4 0.7

4 Based on the MyiLOGS training, I was prepared to use the

system reliably.

5.5 0.5 5.3 0.8

5 An online version of this training (e.g., webinar) could

have been equally effective.

3.2 1.5 3.9 1.4

6 I think MyiLOGS can support my comprehensive,

high-quality coverage of the content standards.

5.6 0.6 5.2 0.7

7c The MyiLOGS training scenarios were helpful for

understanding how to use the system.

5.9 0.4

8c Overall, I think the trainers were well prepared. 5.9 0.4

9c Overall, I think the training time was sufficient for

understanding how to use the system.

5.7 0.5

10d The charts and tables of the MyiLOGS Report provided

meaningful information about my instruction.

5.3 0.7

11d I would use the MyiLOGS Report feedback during the

school year to improve my instruction.

5.2 0.8

12d I think MyiLOGS Instructional Growth Plan could be

helpful as a professional development tool.

5.2 0.8

13d Using MyiLOGS substantially increase my self-reflection

and awareness of how and what I was teaching.

5.3 0.8

Note. 1 (strongly disagree), 2 (disagree), 3 (somewhat disagree), 4 (somewhat agree), 5 (agree), 6 (strongly agree).an D 41. bn D 26. cPosttraining-only question. dFollow-up-only question.

Specifically, the website tracked teachers’ average number of log-ins per week (excluding

holidays and other school breaks) as well as their active logging time per week. On average,participants logged into MyiLOGS 2.4 times per week .SD D 0:6/ and clocked about 5.9 min

per week .SD D 1:4/ of active logging time. In addition, their log completion was monitored

on a biweekly basis. Completion checks were based on completed calendar days as well as

detail days for both the overall class and target students. A total of 15 checks were completed

during 30 weeks of instructional logging. On average, 92% of classrooms per check werelogged without any missing calendar or detail day information. Following e-mail prompts, all

teachers completed their missing data prior to the next check. The final instructional data set

was 100% complete.

Posttraining and follow-up survey results further supported user satisfaction (see Table 4

Questions 6, 10, 11, and 13) with the majority of participants being in agreement that the

software supported their instruction. Evidence for memorability was established via a retest

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 17: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

174 KURZ ET AL.

scenario completed by 26 participants 8 months after completion of the study (more than 14

months after the initial training). Across states, 100% of respondents completed the calendarlevel correctly (i.e., entries for the classwide time and content scores), 92% complete the class

details correctly, and 91% and 82% completed Target Student 1 and 2 correctly. Based on the

initial performance assessment standard, 10 out of 26 respondents (38%) maintained criterion

level performance of 100% accuracy across all categories (i.e., entries for the classwide and

student-specific time, content, and quality scores).

MyiLOGS Scores and Estimates of Their Reliability

The mean scores and standard deviations for the quarterly and yearly summary scores aredocumented in Table 5. Quarterly summary scores only were calculated if all log days per

quarter were completed (see Table 2). Yearly summary scores were calculated based on all

completed log days. The means for the three quality scores were very consistent across quarters.

Time on Standards decreased across quarters especially from the second to the third quarters.

Content Coverage increased most notably during the first two quarters with only minor increases

in subsequent quarters.The precision and representativeness with which summary scores based on randomly sam-

pled sets of log days can estimate teachers’ yearly summary scores generally increased with

larger sets of randomly sampled days (see Table 6). All five summary scores based on 10

randomly sampled days correlated with their respective yearly summary scores above 0.80.

For Time on Standards and Content Coverage, summary scores based on 30 randomly sampledlog days yielded fairly precise estimates of teachers’ yearly summary scores with correlations

close to or above 0.90 and diminishing returns for larger samples of log days thereafter. For the

three quality scores, summary scores based on 10 randomly sampled days appeared to yield

reasonably precise estimates of teachers’ yearly summary scores and diminishing returns for

larger samples of log days thereafter.The correlations among quarterly summary scores provide information about the stability

of these scores. As documented in Tables 7 and 8, the correlations among the same MyiLOGS

TABLE 5

Means and Standard Deviations for Quarterly and Yearly Summary Scores

Quarter 1 Quarter 2 Quarter 3 Quarter 4

Yearly

Summary

Score M SD M SD M SD M SD M SD

Time on standards 0.75 0.23 0.74 0.19 0.60a 0.21a 0.60b 0.19b 0.68 0.18

Content coverage 0.33 0.22 0.50 0.22 0.62a 0.21a 0.64b 0.14b 0.68 0.22

Cognitive processes 1.74 0.16 1.73 0.16 1.73a 0.17a 1.75c 0.19c 1.74 0.14

Instructional processes 1.66 0.18 1.60 0.21 1.61a 0.21a 1.63c 0.18c 1.62 0.19

Grouping formats 1.27 0.22 1.24 0.24 1.23a 0.21a 1.19c 0.22c 1.25 0.22

Note. N D 46 unless otherwise noted.aN D 44. bN D 12. cN D 38.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 18: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 175

TABLE 6

Correlations Between Summary Scores Based on Sets of Randomly Sampled Days and

Yearly Summary Score

No. of Randomly Sampled Days

Score Ten Twenty Thirty Forty Fifty

Time on standards .83 .89 .92 .93 .94

Content coverage .83 .85 .87 .86 .87

Five Ten Fifteen Twentya Twenty-Fivea

Cognitive processes .73 .81 .85 .87 .84

Instructional practices .82 .88 .89 .81 .79

Grouping formats .89 .92 .93 .91 .89

Note. All correlations represent mean correlations based on 10 repeated random samples. All

yearly summary scores were calculated without including the respective sets of randomly selected

days. N D 46 unless otherwise noted.aN D 42. All correlations p < .05.

quarterly score over time were generally moderate to high for contiguous quarters and low

to moderate for noncontiguous quarters. Overall, these correlations decreased in magnitude

from the first to fourth quarter, suggesting a change in these instructional indicators as theschool year progresses. Specifically, the stability of quarterly summary scores for Time on

Standards and Content Coverage decreased from one quarter to the other. Correlations related

to the fourth quarter must be interpreted with caution, because only 12 teachers featured data

sets with 160 log days. For the three quality scores, the patterns between Quarter 1 and 2, 2

and 3, and 3 and 4 are fairly consistent. The first and second quarters are moderately stable,whereas the correlations between the remaining quarters show greater stability with correlations

above 0.50.

TABLE 7

Correlations Among Quarterly Summary Scores for Time and Content

Quarter 1

(Day 1–40)

Quarter 2

(Day 41–80)

Quarter 3a

(Day 81–120)

Quarter 4b

(Day 121–160)

TS CC TS CC TS CC TS CC

Quarter 1 — —

Quarter 2 .88 .94 — —

Quarter 3a .43 0.76 .66 .83 — —

Quarter 4b .41c .36 .69 .35 .89 .88 — —

Note. Quarters 1, 2, 3, and 4 summary scores are based on the first, second, third, and fourth set of 40 consecutively

logged calendar days, respectively. N D 46 unless otherwise noted. TS D Time on Standards; CC D Content Coverage.aN D 44. bN D 12. All correlations except c are significant at p < :05.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 19: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

176 KURZ ET AL.

TABLE 8

Correlations Among Quarterly Summary Scores for Quality

Quarter 1

(Day 1–8)

Quarter 2

(Day 9–16)

Quarter 3a

(Day 17–24)

Quarter 4b

(Day 25–32)

CP IP GF CP IP GF CP IP GF CP IP GF

Quarter 1 — — —

Quarter 2 .56 .65 .70 — — —

Quarter 3a .48 .71 .40 .73 .78 .78 — — —

Quarter 4b .45 .64 .25c .69 .51 .56 .83 .57 .68 — — —

Note. Quarters 1, 2, 3, and 4 summary scores are based on the first, second, third, and fourth set of 8 consecutively

logged detail days, respectively. N D 46 unless otherwise noted. CP D Cognitive Processes; IP D Instructional

Practices; GF D Grouping Formats.aN D 44. bN D 38. cAll correlations except this one are significant at p < :05.

Internal Structure

Initial evidence for the internal structure of MyiLOGS was provided by the inter-correlations

among the five OTL scores. As indicated in Table 9, the correlations between 4 of 10 score

pairs were low, falling at or below 0.30. None of the correlations exceeded 0.43. Thus in allcases, the shared variance between any pair of scores was less than 18% suggesting that each

of the five scores provides relatively unique information regarding instruction.

Relations to Other Variables

To describe relations to other variables, we examined the extent to which the OTL scores were

related to the SEC AI. Given that the AI is based on a teacher’s report for their overall class, we

used the calendar-based MyiLOGS OTL scores for Time on Standards and Content Coverage,

which also refer to the overall class. The three quality scores were based on classwide detail

days. Second, we examined the relations between the scores and average class achievement

on the Arizona state test for the 15 participating teachers—the only state that provided class-specific achievement data for all students in participating classrooms. Last, we calculated the

TABLE 9

Correlations Among Yearly Summary Scores

Time on

Standards

Content

Coverage

Cognitive

Processes

Instructional

Practices

Grouping

Formats

Time on standards —

Content coverage .36 —

Cognitive processes �.16a .14a —

Instructional practices .41 .32 �.36 —

Grouping formats �.33 �.30 �.15a �.43 —

Note. N D 46. aAll correlations except this one are significant at p < :05.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 20: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 177

extent to which teacher log data were in agreement with the log data of independent observers

recording the same lesson.The correlational data did not support meaningful relations between the SEC AI and any

of the five OTL scores (see Table 10). Controlling for state (i.e., dummy codes for AZ, PA,

and SC) and subject (i.e., dummy codes for MA and RE), a regression model that included

all yearly OTL scores to predict AI resulted in partial correlations that failed to account for

more than 1% of shared variance with p exceeding .05 in all cases. The predictive analysesindicated that one time-based and two quality-based MyiLOGS scores were related to average

class achievement. Specifically, the correlation between the yearly summary score for Time

on Standards and class achievement was r D :56, p < :05, accounting for about 31% of the

variance in average class achievement. The correlation between the yearly summary score for

Cognitive Processes and class achievement was r D :64, p < :05, accounting for about 41%

of the variance in average class achievement. Last, the correlation between the yearly summaryscore for Grouping Formats and class achievement was r D �:71, p < :05, accounting for

about 50% of the variance in average class achievement.

To estimate the extent to which teachers’ log data represented a valid account of their

classroom instruction, we calculated agreement percentages between teachers and independent

observers on the basis of detail days at the class level related to five cognitive processexpectations per standard and nine instructional practices per grouping format. Across sessions,

agreement between teachers and observers for cognitive processes per standard ranged between

27% and 100% with an average of 63%. Across sessions, agreement for instructional practices

per grouping format ranged between 64% and 100% with an average of 82%. Overall agreement

between teachers and observers across sessions ranged between 55% and 100% with an averageof 77%. In the context of prior validity research using teacher logs, Camburn and Barnes (2004)

reported agreement percentage between teachers and observers that ranged between 37% and

75% with an average agreement of 52%. The current findings are consistent with prior research,

which also featured only one subject-specific observation per teacher.

Interobserver agreement was collected on more than 30% of all observation sessions between

two trained observers. Across sessions, agreement between two independent observers forcognitive processes per standard ranged between 67% and 100% with an average of 93%.

Across sessions, agreement for instructional practices per grouping format ranged between

TABLE 10

Partial Correlations Between Opportunity to Learn Scores and

SEC Alignment Index

Score

SEC Alignment

Index

Time on standards .06

Content coverage �.07

Cognitive processes �.04

Instructional practices .05

Grouping formats �.11

Note. N D 46. All correlations p > :05. SEC D Surveys of Enacted

Curriculum.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 21: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

178 KURZ ET AL.

89% and 100% with an average of 98%. Overall agreement between two observers across

sessions ranged between 85% and 100% with an average of 97%.

Consequences of Using the Measure

Results from the posttraining (Question 6) and follow-up survey (Questions 10, 11, and 13)

indicated that the use of MyiLOGS was associated with several intended consequences (see

Table 4). On average, teachers agreed that MyiLOGS was useful for supporting their com-

prehensive, high-quality coverage of the content standards and increasing their self-reflection.Their agreement for Question 6 decreased only slightly upon completion of the study and a

substantial period of nonuse. At the end of the study, teachers were also allowed to review

graphical representations of their instructional data via the MyiLOGS Report, which features

more than a dozen charts and tables. The responses provided support that teachers found these

graphical reports meaningful for improving their instruction.

DISCUSSION

Empirical information about students’ opportunity to learn the intended curriculum is critical

to instructional equity, access to the general curriculum, testing fairness, and the validity of testscore inferences about teacher instruction. Despite federal directives that mandate instruction

based on the content that students are expected to know, few, if any, OTL measurement

options have been available that can be deployed at scale daily or weekly. In addition, the

potential for programmatic research leading to interventions that target malleable factors of

instruction rests upon sound conceptualization, operationalization, and measurement of OTL.

This study summarized initial evidence supporting intended score interpretations for the purposeof assessing OTL via MyiLOGS, an online teacher log. As discussed next, this initial evidence

is limited but promising for the assessment of OTL at scale.

Major Findings

Educational technologies for use by teachers must be able to demonstrate usability in authentic

educational settings (i.e., classrooms) with actual users (i.e., teachers) before evidence of relia-bility and validity become relevant. That is, teacher self-report measures that yield reliable OTL

scores and permit valid inference about teacher instruction are of little practical value if teachers

are not able to meaningfully and efficiently integrate them into their daily instructional practices.

To this end, we established usability evidence with teachers via performance assessments, user

surveys, and an 8-month posttest. These data indicated that MyiLOGS users can be trained tocriterion within a relatively short time and that their proficiency can be maintained across a

substantial period of nonuse. Users who responded to our follow-up survey further agreed that

MyiLOGS provided valuable personalized feedback that could be used to improve instruction.

Available evidence in support of usability, however, remains limited to survey and user integrity

data based on a volunteer sample compensated for participation. To strengthen evidence for

usability, we recommend additional survey questions that address the issue of usability more

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 22: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 179

directly (e.g., ease of use, burden of daily use, importance of collecting OTL data) using paid

as well as unpaid users.The quarterly summary score means remained very consistent for the Cognitive Processes,

Instructional Practices, and Grouping Formats scores. The quarterly summary score means for

Time on Standards means were similar for the first and second quarters as well as for the

third and fourth quarters. Given that the Content Coverage score is cumulative, the increases

in means during the first three quarters were not surprising.Initial evidence for the reliability of MyiLOGS scores indicated that summary scores based

on randomly sampled sets of 20 log days can function as reliable estimates of teacher’s

respective yearly summary scores. This finding thus suggests that, depending on measurement

purpose (i.e., descriptive, formative), logging across the majority of the school year may not

be necessary. Estimates of score stability using quarterly summary scores for each MyiLOGS

scale ranged from high to moderate depending on the time span covered, with contiguous scorequarters being more stable than noncontiguous score quarters.

Several sources of evidence were collected and used to test the validity of intended score

interpretations. First, we found that the five OTL scores along the three enacted curriculum

dimensions of time, content, and quality provided relatively independent information, account-

ing for little variance among the respective scores. This finding is consistent with the proposedtheoretical model of OTL. Second, we found that the OTL scores were not related to the SEC’s

index of curricular alignment. Given that MyiLOGS scores provide separate information on key

OTL indices that are otherwise combined in the AI, the virtually nonexistent correlations were

expected. As predicted, the AI establishes content overlap between the intended curriculum

(i.e., standards) and the enacted curriculum (i.e., instruction) on the basis of emphasizingthe same topics using the same categories of cognitive demand. In other words, the SEC

considers emphases of content coverage and cognitive processes conjunctively for purposes

of calculating alignment between what is taught and what is expected. MyiLOGS, on the

other hand, provides separate scores for Time on Standards, Content Coverage, and Cognitive

Processes. To establish (convergent) evidence related to other variables in future studies, we

suggest the use of measures that separately assess the constructs in question: instructionaltime, content coverage, and instructional quality. Vannest and Parker (2010), for example, have

collected data on teachers’ instructional time via the Teacher Time Use observation instrument

that should yield convergent evidence supporting interpretations based on the Time on Standards

score. In addition, SEC data based on the Survey of Instructional Practices and the Survey of

Instructional Content could be combined to calculate indices that match the five OTL scoresof MyiLOGS, thus offering the possibility for collecting evidence of convergent validity.

From a predictive validity perspective, the moderate to large correlations that the yearly

summary scores for Time on Standards and Cognitive Processes shared with class achievement

were expected; given that students are more likely to perform standards that they are taught,

and that higher order cognitive processes can be expected more readily of students who arehigher achieving. The very large negative correlation between Grouping Formats and class

achievement, however, was unexpected. One hypothesis is that small groups and individualized

instruction are often used in response to students struggling to learn academic content—

especially in the context of special education, which was a focus in this sample. Given the

small sample size and no control for students’ prior achievement, these initial correlations must

be considered preliminary and interpreted with caution.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 23: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

180 KURZ ET AL.

Evidence based on classroom observations by independent observers is critical when ex-

amining the validity of score interpretations based on self-report. We found that teacher logdata and observer log data provided similar accounts for the same lesson. The area of greatest

disagreement between teachers and observers involved the enacted cognitive processes per

standard. It should be noted that the observation protocol required observers to allocate instruc-

tional and noninstructional minutes for the entire allocated class time by recording cognitive

processes per standard and instructional practices per grouping format. As such, teachersand observers were able to disagree on the basis of instructional time, enacted standards,

cognitive processes, instructional practices, and grouping formats. In addition, agreements and

disagreements were not categorical but rather based on time with a 3-min confidence band.

Compared to the agreement percentages reported by Camburn and Barnes (2004), the overall

agreement percentages reported in this study were higher. To strengthen evidence related to

log data from external observers, we recommend multiple observations per teacher, ideally inthe context of a generalizability study that can account for multiple sources of variance in

observational scores (Hill, Charalambous, & Kraft, 2012).

Consequential validity evidence was based on survey data, which supported the notion that

the self-recording and self-monitoring required to use MyiLOGS, especially in conjunction with

the MyiLOGS Report, had some formative instructional benefits. That is, most teachers reportedthat using MyiLOGS increased self-reflection and awareness regarding their own instruction

and that the MyiLOGS Report provided meaningful instructional information. However, the

findings are limited by the fact that 32% of teachers did not respond to the follow-up survey.

It is unclear if the nonresponding teachers’ responses would have been less or more supportive

of intended benefits than those of the responding teachers. To strengthen evidence based onconsequences, we recommend survey questions that directly assess the intended consequence

of MyiLOGS’s formative instructional benefits such as the extent to which teachers actually

changed their instruction following a review of their MyiLOGS data. Strong evidence would

constitute efficacy data based on an experimental design with teachers randomly assigned to a

treatment group using MyiLOGS and a well-documented control group.

Evidence for Intended Score Interpretations

Confirmation bias is a particular concern for developers of measurement tools (e.g., Haertel,

1999; Ryan, 2002). In the present study, we tried to address this concern by making our outcome

expectations clear prior to data collection, including teachers’ input and feedback, and using anadditional measure (i.e., the SEC) that was expected to function differently from MyiLOGS.

That being said, the existing evidence is preliminary and limited in support of the intended

score interpretations related to a teacher’s time use, content coverage, and emphases along

a range of cognitive processes, instructional practices, and grouping formats; thus restricting

current uses of MyiLOGS to low-stakes purposes for teachers and researchers.First, available evidence supports the contention that MyiLOGS scores provide separate

information on time, content, and aspects of instructional quality related to cognitive processes,

instructional practices, and grouping formats. This finding is consistent with the underlying

theoretical model of OTL, which MyiLOGS is designed to address. However, it should be

noted that the Content Coverage score is based on an adjustable time threshold. For this

study, the time threshold was set to 1 min. Increasing the threshold to 60 min is likely to

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 24: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 181

increase the relation between time and content. Second, the scores related to instructional

quality are intended to yield interpretations about the extent to which teachers emphasized arange of cognitive processes, instructional practices, and grouping formats. However, current

scoring conventions make dichotomous distinctions (e.g., lower order vs. high-order thinking

skills), which do not permit score interpretations about range. In fact, a Cognitive Processes

score of 2.00 can be the result of a teacher exclusively emphasizing Understand/Apply or a

teacher emphasizing all three higher order cognitive processes. This limitation, however, couldbe resolved by considering different methods for calculating the respective scores. Third, the

most important avenues for strengthening evidence to support intended score interpretations are

(a) providing strong relations with separate criterion measures of instructional time, content

coverage, and instructional quality, and (b) supplying improved interobserver data both in

terms of breadth (i.e., more observations per teacher), depth (i.e., generalizability study), and

precision (i.e., discreet agreement decisions for aspects of time, content, quality).

Limitations

The initial evidence for the use of MyiLOGS and the validity of its score inferences was based

on a relatively small sample of general and special education middle school teachers. Theseteachers were volunteers from three states where different content standards and summative

statewide achievement tests were in use. No attempt was made to secure a representative sample

of teachers to participate in the study. This sample issue is a clear limitation to generalization

and needs to be addressed in future studies.

Another limitation stems from two methodological challenges related to the observationsystem. Given the possibility that a teacher can address all cognitive processes and instructional

practices in one lesson, the observation protocol allowed any categories that were neither

reported by the teacher nor observed by the observer to be counted as an agreement. This

convention may have contributed to inflated agreement percentages in certain cases. A second

methodological challenge of the observation system was the varying cell sizes by which

agreement percentages were calculated. Depending on the number of standards/objectivesper lesson, the possible number of agreements/disagreements varied from teacher to teacher.

This prevented the application of alternative agreement statistics such a Kappa, which could

have accounted for chance agreement. Last, the majority of teachers was observed only

once and coupled with possible reactivity to being observed, the session may not have been

representative of a teacher’s typical logging practices. All observations were used to establishagreement percentages for the class only. The extent to which similar agreement percentages

can be established for student-level information remains unknown but needs to be investigated,

particularly if one’s purpose is to understanding access to the general curriculum for all students.

Implications for Future Research and Practice

A continued program of research on the uses and psychometric properties of MyiLOGS in a

wide range of grades and types of classroom would enhance the current findings. Particular areas

for future research on technical characteristics include interrater reliability, validity evidence

based on content, and validity evidence based on relations to other variables, in particular

alignment indices and interim measures of student achievement. Using samples of classrooms

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 25: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

182 KURZ ET AL.

that are co-taught, the interrater reliability of the measure could be calculated based on Pearson

correlations between scores generated by the two teachers. To evaluate content validity, agroup of experts on OTL (including teachers, assessment leaders, and researchers, among

others) could independently evaluate MyiLOGS to determine which of its indices are critical

for measuring its constituent constructs. Also, direct observation is often considered the “gold

standard” against which all other measures of teaching are judged. Although the current study

includes relations between MyiLOGS and direct observation, future research could includemore observations or video coding to increase the reliability and validity of inferences drawn

from the criterion measure. Observations may also be used in the context of a multitrait–

multimethod study, which allows for the examination of convergent and discriminant validity

coefficients. Last, another study of relations to other variables could examine whether students

who have experienced low OTL (across various indices and content areas) respond differently

to increased opportunities, through intervention or instructional programming changes, thando similarly performing students who have experienced high OTL. Evidence that supports

the claim that increases in OTL lead to improved performances among those who previously

experienced low OTL would strongly support MyiLOGS as a measure of the constructs.

Conclusion

The importance of OTL has been apparent to stakeholders in the policy and research realm

for decades (e.g., Anderson, 1986; McDonnell, 1995; O’Day, 2004) and led to the inclusion of

voluntary OTL standards in the Goals 2000: Educate America Act (PL 103-227) and subsequentfederal policies such as the access to the general curriculum mandates under the Individuals with

Disabilities Education Act (1997). Difficulties defining the concept of OTL and operationalizing

its indicators, however, have contributed, at least in part, to the failure of OTL gaining a

foothold in our current test-based accountability system. This study provided initial usability,

reliability and validity evidence for MyiLOGS—an online teacher log measure that holds

potential for large-scale assessment of OTL and formative feedback for targeted instructionalchanges. The opportunity to advance a number of research agendas on effective teaching,

instructional equity, and teacher professional development, as well as helping teachers learn

more about their ongoing instructional provisions lies ahead with effective measures of OTL.

FUNDING

This research was supported by an Enhanced Assessment Grant from the U.S. Department ofEducation (S368A090006). The opinions expressed are those of the authors and do not represent

the views of the U.S. Department of Education or endorsement by the federal government.

REFERENCES

Abedi, J., Courtney, M., Leon, S., Kao, J., & Azzam, T. (2006). English language learners and math achievement:

A study of opportunity to learn and language accommodation (Tech. Rep. No. 702). Los Angeles: University of

California, National Center for Research on Evaluation, Standards, and Student Testing.

American Educational Research Association, American Psychological Association, & National Council on Measure-

ment in Education. (1999). Standards for educational and psychological testing. Washington, DC: Author.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 26: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

OPPORTUNITY TO LEARN 183

Anderson, L. W. (1986). Opportunity to learn. In T. Husén & T. Postlethwaite (Eds.), International encyclopedia of

education: Research and studies (pp. 3682–3686). Oxford, UK: Pergamon.

Anderson, L. W., Krathwohl, D. R., Airasian, P. W., Cruikshank, K. A., Mayer, R. E., Pintrich, P. R., : : : Wittrock,

M. C. (2001). A taxonomy for learning, teaching, and assessing: A revision of Bloom’s taxonomy of educational

objectives. New York, NY: Longman.

Borg, W. R. (1980). Time and school learning. In C. Denham & A. Lieberman (Eds.), Time to learn (pp. 33–72).

Washington, DC: National Institute of Education.

Brophy, J., & Good, T. L. (1986). Teacher behavior and student achievement. In M. C. Wittrock (Ed.), Handbook of

Research on Teaching (3rd ed., pp. 328–375). New York, NY: Macmillian.

Burns, M. K., & Ysseldyke, J. E. (2009). Reported prevalence of evidence-based instructional practices in special

education. Journal of Special Education, 43, 3–11.

Burstein, L., & Winters, L. (1994, June). Models for collecting and using data on opportunity to lean at the state

level: OTL options for the CCSSO SCASS science assessment. Presented at the CCSSO National Conference on

Large-scale Assessment, Albuquerque, NM.

Camburn, E., & Barnes, C. A. (2004). Assessing the validity of a language arts instruction log through triangulation.

Elementary School Journal, 105, 49–73.

Carroll, J. B. (1963). A model of school learning. Teachers College Record, 64, 723–733.

Carroll, J. B. (1989). The carroll model: A 25-year retrospective and prospective view. Educational Researcher, 18, 2–31.

Connor, C. M., Morrison, F. J., Fishman, B. J., Ponitz, C. C., Glasney, S., Underwood, P. S., : : : Schatschneider, C.

(2009). The ISI classroom observation system: Examining the literacy instruction provided to individual students.

Educational Researcher, 38, 85–99.

D’Agostino, J. V., Welsh, M. E., & Corson, N. M. (2007). Instructional sensitivity of a state’s standards-based

assessment. Educational Assessment, 12, 1–22.

DiPerna, J. C., & Elliott, S. N. (2000). Academic competence evaluation scales: Manual K–12. San Antonio, TX:

Psychological Corporation.

Elbaum, B., Vaughn, S., Hughes, M. T., Moody, S. W., & Schumm, J. S. (2000). How reading outcomes for students

with learning disabilities are related to instructional grouping formats: A meta-analytic review. In R. Gersten, E. P.

Schiller, & S. Vaughn (Eds.), Contemporary special education research: Syntheses of the knowledge base on critical

instructional issues (pp. 105–135). Mahwah, NJ: Erlbaum.

Elliott, S. N., & Gresham, F. M. (2008). Social skills improvement system. San Antonio, TX: Pearson.

Gersten, R., Chard, D. J., Jayanthi, M., Baker, S. K., Morphy, P., & Flojo, J. (2009). Mathematics instruction for

students with learning disabilities: A meta-analysis of instructional components. Review of Educational Research,

79, 1202–1242.

Haertel, E. H. (1999). Validity arguments for high-stakes testing: In search of evidence. Educational Measurement:

Issues and Practices, 18(4), 5–9.

Herman, J. L., & Abedi, J. (2004). Issues in assessing English language learners’ opportunity to learn mathematics

(CSE Report No. 633). Los Angeles, CA: University of California, Center for the Study of Evaluation.

Herman, J. L., Klein, D. C., & Abedi, J. (2000). Assessing students’ opportunity to learn: Teacher and student

perspectives. Educational Measurement: Issues and Practice, 19, 16–24.

Hill, H. C., Charalambous, C. Y., & Kraft, M. A. (2012). When rater reliability is not enough teacher observation

systems and a case for the generalizability study. Educational Researcher, 41, 56–64.

Husén, T. (1967). International study of achievement in mathematics: A comparison of twelve countries. New York,

NY: Wiley & Sons.

Individuals with Disabilities Education Act Amendments of 1997, 20 U.S.C. §§ 1400 et seq.

Karger, J. (2005). Access to the general education curriculum for students with disabilities: A discussion of the

interrelationship between IDEA and NCLB. Wakefield, MA: National Center on Accessing the General Curriculum.

Karvonen, M., Wakeman, S. Y., Flowers, C., & Browder, D. M. (2007). Measuring the enacted curriculum for students

with significant cognitive disabilities: A preliminary investigation. Assessment for Effective Intervention, 33, 29–38.

Kurz, A. (2011). Access to what should be taught and will be tested: Students’ opportunity to learn the intended

curriculum. In S. N. Elliott, R. J. Kettler, P. A. Beddow, & A. Kurz (Eds.), Handbook of accessible achievement

tests for all students: Bridging the gaps between research, practice, and policy (pp. 99–129). New York, NY: Springer.

Kurz, A., Elliott, S. N., Lemons, C. J., Zigmond, N., Kloo, A., & Kettler, R. J. (2014). Assessing opportunity-to-

learn for students with disabilities in general and special education classes. Assessment for Effective Intervention,

1534508414522685. Advance online publication. doi:10.1177/1534508414522685

Kurz, A., Elliott, S. N., & Shrago, J. S. (2009). MyiLOGS: My instructional learning opportunities guidance system.

Nashville, TN: Vanderbilt University.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14

Page 27: Assessing Students' Opportunity to Learn the Intended Curriculum Using an Online Teacher Log: Initial Validity Evidence

184 KURZ ET AL.

Kurz, A., Elliott, S. N., Wehby, J. H., & Smithson, J. L. (2010). Alignment of the intended, planned, and enacted

curriculum in general and special education and its relation to student achievement. Journal of Special Education,

44, 131–145.

Kurz, A., Talapatra, D., & Roach, A. T. (2012). Meeting the curricular challenges of inclusive assessment: The role

of alignment, opportunity to learn, and student engagement. International Journal of Disability, Development and

Education, 59, 37–52.

Marzano, R. J. (2000). A new era of school reform: Going where the research takes us (REL No. #RJ96006101).

Aurora, CO: Mid-continent Research for Education and Learning.

Mayer, R. E. (2008). Learning and instruction (2nd ed.). Upper Saddle River, NJ: Pearson.

McDonnell, L. M. (1995). Opportunity to learn as a research concept and a policy instrument. Educational Evaluation

and Policy Analysis, 17, 305–322.

Nielsen, J. (1994). Heuristic evaluation. In J. Nielsen & R. L. Mack (Eds.), Usability inspection methods (pp. 25–60).

New York, NY: Wiley & Sons.

No Child Left Behind Act of 2001, 20 U.S.C. §§ 6301 et seq.

O’Day, J. A. (2004). Complexity, accountability, and school improvement. In S. H. Fuhrman & R. F. Elmore (Eds.),

Redesigning accountability systems for education (pp. 15–43). New York, NY: Teachers College Press.

Pianta, R. C., & Hamre, B. K. (2009). Conceptualization, measurement, and improvement of classroom processes:

Standardized observation can leverage capacity. Educational Researcher, 38, 109–119.

Polikoff, M. S. (2010). Instructional sensitivity as a psychometric property of assessments. Educational Measurement:

Issues and Practice, 29, 3–14.

Polikoff, M. S., Porter, A. C., & Smithson, J. (2011). How well aligned are state assessments of student achievement

with state content standards? American Educational Research Journal, 48, 965–995.

Porter, A. C. (1991). Creating a system of school process indicators. Educational Evaluation and Policy Analysis, 13,

13–29.

Porter, A. C. (2002). Measuring the content of instruction: Uses in research and practice. Educational Researcher, 31,

3–14.

Porter, A. C., Polikoff, M. S., Zeidner, T., & Smithson, J. (2008). The quality of content analyses of state student

achievement tests and content standards. Educational Measurement: Issues and Practice, 27, 2–14.

Roach, A. T., Chilungu, E. N., LaSalle, T. P., Talapatra, D., Vignieri, M. J., & Kurz, A. (2009). Opportunities and

options for facilitating and evaluating access to the general curriculum for students with disabilities. Peabody Journal

of Education, 84, 511–528.

Roach, A. T., Niebling, B. C., & Kurz, A. (2008). Evaluating the alignment among curriculum, instruction, and

assessments: Implications and applications for research and practice. Psychology in the Schools, 45, 158–176.

Rowan, B., Camburn, E., & Correnti, R. (2004). Using teacher logs to measure the enacted curriculum: A study of

literacy teaching in third-grade classrooms. Elementary School Journal, 105, 75–101.

Rowan, B., & Correnti, R. (2009). Studying reading instruction with teacher logs: Lessons from the Study of

Instructional Improvement. Educational Researcher, 38, 120–131.

Ryan, K. (2002). Assessment validation in the context of high-stakes assessment. Educational Measurement: Issues

and Practice, 21, 7–15.

Slavin, R. E. (2002). Evidence-based education policies: Transforming educational practice and research. Educational

Researcher, 31, 15–21.

Stevens, F. I. (1993). Applying an opportunity-to-learn conceptual framework to the investigation of the effects of

teaching practices via secondary analyses of multiple-case-study summary data. Journal of Negro Education, 62,

232–248.

Vannest, K. J., & Hagan-Burke, S. (2010). Teacher time use in special education. Remedial and Special Education,

31, 126–142.

Vannest, K. J., & Parker, R. I. (2010). Measuring time: The stability of special education teacher time use. Journal of

Special Education, 44, 94–106.

Vaughn, S., Gersten, R., & Chard, D. J. (2000). The underlying message in LD intervention research: Findings from

research syntheses. Exceptional Children, 67, 99–114.

Vaughn, S., Levy, S., Coleman, M., & Bos, C. S. (2002). Reading instruction for students with LD and EBD: A

synthesis of observation studies. Journal of Special Education, 36, 2–13.

Wang, J. (1998). Opportunity to learn: The impacts and policy implications. Educational Evaluation and Policy

Analysis, 20, 137–156.

Dow

nloa

ded

by [

Uni

vers

ity o

f W

inni

peg]

at 1

1:31

11

Sept

embe

r 20

14