student learning outcomes assessment … learning outcomes assessment at nyit’s global sites ......

STUDENT LEARNING OUTCOMES ASSESSMENT AT NYIT’S GLOBAL SITES

As anticipated, the activities of committees on global campuses focused on local priorities.

However, it has to be noted that, due to the phase-out of two campuses and in some cases,

political unrests, no programmatic assessment activities took place at Amman (Jordan) as well as

at Manama (Bahrain).

The list below gives highlights for each other campus.

Abu Dhabi, UAE

Due to significant personnel changes for this campus, the Assessment Committee was

simplified to the campus dean, the assistant deans, and the Director for Institutional

Research and Assessment.

Moreover, for the entire fall semester, there was no local leadership for the School of

Management, which represents 86% of student enrollment at this site. As a result, there

was no discussion of assessment at the beginning of the year.

The Assessment Committee held its opening meeting in November 2011, but in fall and

start of spring, the work focused more on harmonizing degree maps and other high

priority tasks (schedules and advising tasks).

By the end of March, faculty in the School of Management requested the Assessment

Committee to postpone Assessment Day till fall 2012. An October date is now

envisioned.

The Arts & Sciences faculty participated actively in the fall 2011 collection and analysis

of data for the institutional assessment of the core Foundation of Scientific Process

course.

Vancouver, Canada

The Vancouver campus MBA programmatic assessment process was initiated in 2008.

The current round of learning assessment outcomes includes a new level of assessment

activity. Localized empowerment in Vancouver, as well as all other campus locations, is

by way of the oversight of the local Associate Dean, who works collaboratively with the

DAD (Department Assessment Director) to affect change at the Vancouver campus

location. Specifically, The Vancouver Associate Dean and DAD convene local faculty

together regularly to discuss specific program outcomes (both direct and indirect) at the

Vancouver campus location, and recommend changes to the curriculum through the

contextualized element in the Master Syllabi. That is, Vancouver faculty and personnel

have autonomy in adopting and revising contextualized elements that not only reflect

stakeholder interests, but also help to strengthen program outcomes in Vancouver in ways

that are specific to the limitations identified in the program outcomes at that location.

All faculty members, as a requirement of their individual educational responsibilities,

maintain course portfolios that include samples of student learning outcomes. In this way

the faculty evidence appropriateness of scoring and validating the implementation of the

assessment process through coursework administered. The collection of scores in the

Goal Validation Scores system for the academic year 2011/12 has not been completed

yet.

As indirect measures, an employer survey and an alumni survey was implemented in

spring and summer 2012 in conjunction with course learning outcome assessments.

In addition, with the introduction of the revised MBA in fall 2011, administration of the

ETS Major Field Test in business at the master’s level was made mandatory for all

entering and graduating students. This provides a systematic method of comparing

students’ subject-matter competencies at the beginning and end of their educational

experience. Results were analyzed, and recommendations made to the School of

Management leadership.

Nanjing, the People’s Republic of China

The newly established Assessment Committee in Nanjing held regular meetings during

2011-2012, and launched a programmatic assessment exercise during fall 2011.

The Committee agreed that the specific outcome to be assessed in this study be chosen

from the BS in Computer Science goals (“Upon graduation students are expected to

demonstrate” … “An ability to use a variety of computer programming languages and be

competent at least in one high level language.).

The specific methodology for the Nanjing project integrated the assessment of the

program outcome into the usual course examinations for the entire sequence of three

courses. The Committee designed an assessment instrument, using SQL techniques

(Standard Query Language). From the student perspective, the only immediate visible

change was that a new “examination” was given in each course.

Preliminary analysis was done, reported and shared with all faculty during an Assessment

Day offered on April 27, 2012. The complete analysis, including all the June 2012 results

is underway and will be available over the summer.

In addition, during the Assessment Day, faculty also shared their experiences with

"cheating" and discussed cheating and plagiarism policies.

Finally, the Arts & Sciences faculty participated actively in the fall 2011 collection and

analysis of data for the institutional assessment of the core Foundation of Scientific

Process course.

Report on assessment in Vancouver AY 2011-12

MBA program

Based on input from our stakeholders (stakeholders’ conferences as well as business advisory

board member meetings), we added contextualized learning goals for all our non waivable and

elective courses( 13 of the non-waivable courses as well as for some elective courses). Those

contextualized learning goals and their learning validations and scores are unique for our

location.

Example:

Added contextualized learning goal in FINC 620:

(B) Contextualized (Globalized) Learning Goal(s): Upon the successful completion of this course, the student will be able to:

1. Summarize current events regarding ethical financial decision making and reporting in Canada including the differences of perspectives between non-Canadian and Canadian companies.

Added assurance of learning validation for additional learning goal (quiz) with relevant scores:

Assurance of Learning Validation (In support of the Contextualized (Globalized)

Learning Goal(s)):

B1. Graded Quiz: An additional quiz will be assigned on a Canadian current event.

For the purpose of assurance of learning, the Graded Quiz will be assigned 2

scores based on the:

a. Score 1: Demonstrated ability to put a global perspective on class materials (MBA-INTERNATIONAL); and

b. Score 2: Incorporation of current ethical consideration in matters of financial choices and financial reporting (MBA-ETHICS).

Summary of Assessment Activities at the Nanjing Campus

(Relevant to the Preparation of the President's Report)

1. Informal Introduction

2. The Longitudinal Study at the Nanjing Campus

3. The Targeted Program Outcome

4. Unnatural Language Acquisition

5. Design of the Assessment Instrument

6. Natural Language Considerations

7. Assessment Day Discussion

Informal Introduction

Generally, assessment is concerned with student progress. That's a deliberate

oversimplification, but experience with introducing these kinds of studies to faculty,

administration, and outside funding resources suggests that much simplification is necessary.

Much of the material is deeply abstract and, at times, open to misinterpretation. We want to

avoid that misinterpretation. Even the more accepted tenets require a gentle introduction and

frequent restatement.

Another necessary, introductory component is putting the material into a proper context.

Ultimately, any university is simply a complex entity. Smaller things within larger things.

Different perspectives from different participants. To make this all of this material digestible

for a general audience, we quickly acknowledge a few communication difficulties. Discussions

of expectations with alumni, faculty or students can be rather far reaching. Faculty and

students are extremely sensitive to any details that may affect them directly. It takes time to

interpret the pragmatic day-to-day classroom experience in terms of any expressed goals or

ideals, and even more time to properly connect to a shared, reasonable framework to improve

the teaching/learning proposition.

Briefly, there are a lot of concurrent goals and objectives.

Here, the larger direction is meeting an “assurance of excellence” criteria, and we're specifically

concerned with NYIT's 2030 vision.1 Quoting from that material:

The vision is that, by 2030, NYIT will be:

Known for its career-oriented undergraduate and unique and distinctive

graduate and professional programs;

Known for its thriving graduate centers featuring interdisciplinary research,

degree programs, and "best-in-class" work in a small number of highly targeted

niches;

Known as a global and partially virtual university with NYIT in New York as its

quality hub;

Known as a model student-centered university;

Known as a leader in teaching quality improvement; and

Known as a well-funded institution, with dependable revenue from a variety of

sources.

The approach taken to realize these kinds of ideals is pragmatic. “NYIT has systematic,

coordinated processes overseen by a steering committee and vice president to set institutional

targets, monitor results, and use those results to inform decision-making and resource

allocation.”2 There are also three additional views on assessment that concern us here:

1. NYIT's Institutional Assessment Plan3 requires, “School and program level outcomes

1 NYIT 2030: Setting Directions, Meeting Challenges, New York Institute of Technology Strategic Plan May

2006

assessment (annual) designed by deans and program faculty; results and an

improvement action plan developed; plans and results reviewed by the Assessment

Committee of the Academic Senate.”

2. The Assessment Committee of NYIT's Academic Senate Defines "Student Learning

Outcome Assessment" as, "The intentional identification, collection, analysis, and use of

data to improve teaching and learning."

3. The relevant standard set by NYIT's regional accrediting agency, the Middle States

Commission on Higher Education states, "Assessment of student learning demonstrates

that, at graduation, or other appropriate points, the institution's students have

knowledge, skills, and competencies consistent with institutional and appropriate higher

education goals."

The “assurance of excellence” criteria is forward looking. Assessment activities reference many

shared ideals, as articulated through various NYIT publications, including the the three course

catalogs4 and numerous departmental directives outlining the expectations for each course and

program. Yet, in simple terms, the institution is committed to a self-examination process which

continually seeks methods to improve our offerings, standings, etc. The style of the approach is

significant. This expression was part of the presentation for faculty during the assessment day

meeting at Nanjing campus:

We need to emphasize the relative value of systemic, data-driven approaches to

institutional self-assessment as opposed more immediate anecdotal approaches to

institutional self-assessment. In simple terms, “one of my students did this” should have

a less impact on resource planning than “30% of our students did this.”

Anecdotal material isn't irrelevant, it simply should have far less weight on long term

planning. Of course, there are other dimensions of concern. Sufficiently egregious

material of one type or the other may require a more immediate response.

The university has adopted a data-driven approach to planning based upon continuing research

into the current strengths and weaknesses within the overall curriculum. Ultimately, all

universities have limited resources; effective planning requires priorities; and priorities, at the

scale of complexity involved, are best derived from properly-qualified, quantifiable studies of

what is needed.

While the target of such research addresses the gestalt of teaching and learning activities at

NYIT, best management practices for this effort require a balanced, comprehensive treatment

without unjustified weight on particular areas. To that end, there is a shared framework for

NYIT's self examination. Crystallizing vague ideals into specific tangible expectations allows

systemic improvement. This is critically important to managing quality. The explicitly stated

learning outcomes for course and degrees provided through the faculty senate5 offer the

formal model of expectations at NYIT.

Before we move to a finer-grained view of assessment activity at the Nanjing campus, another

objective is worth mentioning, “Do this as well as possible. Create an example for others to

follow.”6 The interpretation of this particular goal for this report includes a mandate that the

goals, procedures, and reasoning are all intelligible to an audience of NYIT faculty and

administration.

The last goal, making the material accessible, implies something more than merely not

requiring members of the audience to switch disciplines. The methodology has to be open and

inclusive. Discussions need to be able to proceed without shared expertise in a particular

discipline. We also need to reach out to those to are new to the usual tools of assessment as

well as those who are new to the policies and procedures of NYIT. Ultimately, we're reaching

for the common ground.

To that end, we must acknowledge certain important perspectives and distinctions. A research

professor in chemistry has a very different reading list than a research professor in French

literature of the seventeenth century. There isn't much overlap. Briefly, the distinction extends

to teaching and learning. The way one measures the acquisition of scientific concepts related

to chemistry may be very different than how one goes about grading a student's critical

appreciation of aesthetics. From the point of the view of a proponent of any one discipline, the

details of their particular field of study may come first in importance. Yet, as a self-evident

claim, scheduling for courses and examinations falls into a category of common concerns for all

faculty. So does assessment.

As a concession to the obvious, the last goal could have been interpreted with an entirely

different emphasis. Education, especially higher education, is an entire discipline. Assessment

itself is the subject of numerous peer-reviewed articles.7 The technical and statistical terms are

sometimes overloaded and many are highly specific to distinct disciplines and/or models of

learning. Not every assessment activity warrants a book chapter or journal article; however,

there are many standards in involved the design process as well as many potential critics for

any study. Further, pedagogical studies have become almost a separate sub-discipline within

many fields. For example, there are separate learning models for STEM (Science, Technology,

Engineering, Mathematics) programs, and much detailed material concerning specific courses

and programs.

This kind of experiment, a longitudinal study, can be repeated at any university. How we

proceed is general. What we're measuring is more specific to NYIT. The assessment directives

are derived from NYIT's framework of learning outcomes.

The Longitudinal Study at the Nanjing Campus

The method chosen for the current project at Nanjing is a longitudinal study. Simply, students

are assessed at different points in the learning process. Early in the design process, discussion

began as a simple pre and post analysis where students are assessed both before and after a

particular learning activity. Either type of assessment activity offers a baseline. We might vary

the classroom environment, try the same assessment and see how making particular changes

affects subject material acquisition. Multiple probes, as opposed to a simple before and after

portrait, offer insight into the rate of acquisition.

The specific outcome targeted by this study is, “Upon graduation students are expected to

demonstrate” … “An ability to use a variety of computer programming languages and be

competent at least in one high level language.”8 While not currently available to the public on

the primary website for the department, there are thirteen specific learning outcomes involved

in the B.S. In Computer Science offered by for the School of Engineering and Computing

Sciences. These may be found at in the reports at 9. That same source offers course goals and

a matrix of relationships.

The particular learning outcome selected here is program driven. Together with the twelve

other specific learning outcomes, it begins to describe the institution's expectations for its

graduating seniors. Carefully, these are not the only public expressions of the goals and ideals.

“The primary educational objective of the computer science program at NYIT is to produce

well-rounded graduates that have a wide range of skills, aptitudes, and interests, and who are

prepared for successful careers in industry and government and/or graduate studies. The

coursework in computer science--comprehensive, rigorous, flexible and prepares students to

solve real-world problems as well as to do fundamental research [sic] – includes:

cover the software and hardware aspects of computer science and foster creativity in

problem-solving skills

liberal arts courses [sic]

courses which provide a concentration of computer classes in an area of specialization

a minor concentration which provides a complement to the option area [sic]”10

Mildly, and without comment, there is a much larger picture as well. Computer Science

curricula has been the subject of a joint ACM-IEEE study.11 At NYIT, connections between the

targeted learning outcome and specific course learning outcomes are available at 12.

Connections to core competencies are available at 13. The overall implied structure is broad,

reasonably mature and well-reasoned. The program outcomes are broadly stated not only to

allow ease of navigation through the framework for a nonspecialist, but also to allow the

deeper expansion and interpretation by any concerned member of the discipline.

Ultimately, the stated learning outcomes at NYIT have the virtues of being short, simple, and

well-organized. Working within larger ideals is understood.

The framework keeps us on track.

To begin to make sense of the relationships for this study, we need to emphasize a necessary

distinction between learning outcomes stated for a particular course and the learning outcomes

stated for an entire degree program. In this case, the assessment study has a single targeted

program outcome. This was selected from the thirteen distinct program outcomes. In the

Nanjing study, there are a sequence of three specific courses connected to this outcome where

the details are sufficiently stable to allow repeated evaluation with similar instruments.14 The

sequence of courses is CSCI 300 Database Management , CSCI 401 Database Interfaces and

Programming, and CSCI 405 Distributed Database Systems. As may be expected, there are

deep connections between the learning outcomes for each course and those for the overall

program. In this particular instance, each course supplies a portion of the practice necessary to

reach competency/mastery. The critical detail that makes this study possible is that all three

courses rely upon SQL, the Standard Query Language.

As a significant side issue for Computer Scientists, SQL is not a Turing-complete

programming language in the usual sense. It lacks a looping construct. It does,

however, reflect a depth of atomic, pedagogical concerns that arise with more

traditional languages. It appears within a programming context, and it could be

argued that SQL currently has a more durable and sustained basis than another often-

recommended language, C++.15

As a useful feature, SQL does lend itself to concise, measurably correct-or-incorrect

responses to problem/task questions offered in a natural language. For example,

given an appropriate shared database design, we can ask a question like, “Find the

ratio of male sophomores to female sophomores among those taking business

courses.” The expected answer can be given in a single SELECT statement. More

complex and deeper probes, all with a brief, precise, and unambiguously correct or

incorrect solution, are possible.

Directly, the first course at NYIT requires that students use the SQL language to design

databases. The second course requires them to build user interfaces. In order to

accomplish any task related to building an interface, students must use SQL

programmatically, not only embedding SQL statements within a language such as C,

C++, Java, or php, but also using the outer language to construct these statements. (In

the catalog description, C++ is stated explicitly.) The third course requires them to

understand and implement techniques appropriate to distributed databases.

Particular implementations, such as MySQL, do allow synchronization support within

the design; however, for a robust understanding of Brewer's Theorem, the students

should have further practice building and comparing synchronization algorithms in a

more general programming context.

The specific methodology for the Nanjing project integrates the assessment of the program

outcome into the usual course examinations for the entire sequence of three courses. From

the student perspective, the only immediate visible change is that a new “examination” is given

on the first day in each course. We expect a full statistical portrait of student acquisition

through the sequence of twelve probes. When producing the final analysis, we might rely upon

the vast amount of collected literature for many things --- including myriad details specific to

this discipline. Auxiliary information, perhaps determined by other course activities, can also be

integrated into the results to explain differing rates of acquisition.

For members of the audience concerned with developing new assessment efforts in other

disciplines, it does seem reasonable to expect similar relationships between course and

program goals. Commenting on the vast amount of peer-reviewed papers on subject-matter

acquisition in higher education (or making sweeping generalizations about all the details of

learning outcomes for other courses and degrees) is far beyond the scope or intent here.

Briefly, there is nothing about the general methodology which is specific to a particular

program or course. That is, the overview for a longitudinal assessment activity really doesn't

change in application to a particular area. Specific features do --- without being dismissive of

those aspects as mere details.

Yet, loosely, it can be expected that many of the individual course events simply build toward

the targeted program outcome.

A longitudinal study simply monitors the rate of acquisition. There is much in this kind of

study that is experimental. For example, in advance of the results, we cannot say whether or

not students acquire proficiency suddenly or gradually. We might expect that hands-on

practice practice facilitates faster learning, but that kind of claim would be best supported as

a result of the study. It should not be assumed as an axiom beforehand.

For the first course, the planned sequence includes a pre-evaluation, a post-evaluation and two

interim probes (for a total of four assessments). The first is given under examination

conditions; however, the students are not required to study and are requested NOT to study.16

The last three assessment exercises are presented as examinations, with grades at risk. The

atomic questions are similar. By design, each probe revisits the same set of issues with similar

questions, close in complexity, form, etc., but altered sufficiently to avoid direct memorization.

For the purpose of generating meaningful gain statistics, there are direct parallels, a bijection

between the set of questions given during one probe and those on the next.

During the assessment day activities at Nanjing, two significant, specific questions were raised

at this point in the presentation. First, what mechanisms were in place to prevent “teaching to

the test” or some other inappropriate focus on the program goals?

The “teaching to the test” concern is rather general and has a valid basis. There are four

assessments per course, and three courses. The students will have seen eleven previous

versions by the time they take final assessment.

If the tests were identical, we would expect perfect results from almost every student ---

as a result of memorization rather than understanding. Here, more-or-less the same

methodology used to prevent cheating/copying during an exam is employed, and it

should be noted that cultural conditions in Nanjing require heroic efforts on the part of

the instructor to prevent inappropriate communication during examinations. Significant

monitoring is required. The students are placed in arranged seating, widely separated,

and all mechanisms for communication --- particularly electronic devices --- are

forbidden. All exam papers are produced using software, and each paper is unique. The

name of each student is preprinted, the questions are scrambled, and many small

details, not affecting the depth of thought required, are changed from student to

student.

The correspondence between instruments offered on the same day for different

students is necessarily precise. The dominating issue there is fairness. The

correspondence between instruments offered on different days for the same students is

less precise. In both cases, similarity between questions is required to make the results

comparable. Ultimately, each probe is not the same.

In fact, some questions evolve in form or difficulty.

It is unreasonable to expect a student on the first day of the first course to explain

connections between elements there have not yet been presented. We expect very

little during the first probe. However, it is reasonable to ask for a writing sample --- to

determine something about how each student might organize their thoughts for

presentation. The ability to express themselves on the first day can be compared to

their ability to express an answer on the last day.

Students will, of course, note the variations and prepare themselves according to what

they understand. There's nothing unfortunate or undesirable about this. We expect

them to learn how to handle general questions.

Second, what motivation existed for the students to take the initial --- pre-instruction ---

assessment seriously?

The answer to the second question is nothing. No motivation is really appropriate. As a

side note, students were excited with being given the first probe. Some of this may be

individual or cultural; however, the explanation given to the instructor was that they

were pleased to be given an early insight into what was ultimately expected.

For the very first assessment, not much is really expected. Student focus on writing

samples and study the format of questions. Curiosity, if nothing else, is invoked.

Outside of these very simple things, receiving a mostly blank examination paper for each

student is not unexpected. In subsequent courses, we expect that examination on the

first day will affirm that students start about where they left off in the previous course.

As always, students tend to surprise us occasionally. There are many precedents for

students studying over the holidays. There may be surprises the other way --- as the

practice of “cramming” for an exam often yields little long term gain.

The observed results, for the very first assessment, mirrored exactly what was

anticipated --- although a few students tried interesting ways of “faking” answers to

questions beyond their abilities.

The Targeted Program outcome

Again, the specific outcome targeted by this study is, “Upon graduation students are expected

to demonstrate” … “An ability to use a variety of computer programming languages and be

competent at least in one high level language.”17 This outcome is attached to many courses.

This makes it appropriate for an institutional study. The perspective here may be roughly

stated as follows:

The learning objectives for individual courses are important. Yet we typically assume

these to be well-served by individual courses. A more appropriate target for

institutional concerns are the goals and objectives which span multiple courses. In the

broader scope, we're merely acknowledging the existence of skills which are acquired

through a long series of courses (or through a entire degree program).

In the formal model offered by NYIT, these are clearly distinct. Without detail, there are

dependencies. Prerequisite course requirements have a significant purpose.

Within the Computer Science program, there are many specific courses connected to specific

programming languages.

Again, the sequence of courses selected for the study is CSCI 300 Database Management ,

CSCI 401 Database Interfaces and Programming, and CSCI 405 Distributed Database Systems.

The selection criteria for choosing this particular sequence of courses simply reflects the

longest course sequence in which the same formal language is used repeatedly and without

controversy. The language under consideration is SQL (Standard Query Language). Students

are expected to learn related theoretical material. They're also expected to use the language

pragmatically.

As a necessary aside for those unfamiliar with programming, we offer a brief overview of the

targeted skill.

Unnatural Language Acquisition

Ultimately, linguistic analogies between acquiring a programming language and acquiring a

natural language break down rather quickly.

Typically, programming languages have an extremely small number of well-defined

keywords. Within SQL, the first four are SELECT, INSERT, UPDATE, and DELETE. CREATE and

ALTER statements follow quickly. The first aspects of grammar are extremely well-defined,

with detailed atomic interpretations. There are WHERE clauses, conjunctions, etc. which

allow the recursive construction of larger statements.

Student typically view the machine interpretation and related constraints as “unforgiving.” In

contrast to natural languages, a misplaced semi-colon renders a long passage meaningless or,

at times, dangerous.

The range of correct responses to any question in this area may be interesting. In brief,

computer programming languages have an extraordinary linguistic flexibility as well as a

capacity for abstraction that defies simple expression. We can be a bit more technical for

that part of the audience in disciplines with overlapping skills. Up to a level of identifiers,

many programming languages are covered by the usual Chomsky hierarchy. In fact, many are

context-free. This makes them appear simple. In fact, the apparent simplicity is convenient

and forms the basis for lexicographic parsing. The human reality is something else. As we

begin to include identifiers, every programming language with significant power takes on a

generative aspect which dominates practical use. In brief, the language has the expressive

power to define, effectively, new nouns. (This is an analogy, but it works well enough.)

Reading becomes an exercise in decoding the references, impossible to do quickly unless one

can quickly absorb and discard entire vocabularies. With some programming languages, the

generative capacity is effectively unlimited. Not only do we have the equivalent of new

nouns, but also verbs, clauses, and, to a very large extent, entirely new grammars.

Effectively, in a single step, we move far beyond Chomsky and functional grammars. As an

example, template programming arose out of an unsuspected capacity of generic

programming to express an algorithmic language within itself, causing the compiler to

execute arbitrary tasks. Handling and understanding the expressive power requires a range

skills. We can even redefine the usual infix notation of mathematics, approaching abstract

graduate-level algebra from another direction.

Certain principles and best practices do force a kind of convergence to particular expressions.

As a result, some issues of originality aren't easily evaluated. Most reasonable responses by

professionals to the task, “Design a brief database for handling a music collection.” would tend

to be very similar in appearance. There are only so many meaningful synonyms for artist. It is

generally true that variable names are not visible to the end user, and that one could substitute

the nonsense words for well-selected variable names and a working answer would continue

to work. However, it is a fallacy that issues in communication can be ignored entirely --- even

if the only objective is getting an application to function as specified.

Larger issues at play. One of the most difficult requirements for beginning

programmers to master is the absolute, critical necessity of explaining what they've

created.

Almost all programming languages offer a facility for commenting. That is, arbitrary

material expressed in a natural language can be placed within the code to explain

what it is and what it does. Stylistics intrinsic to the programming language also begin

to appear early on with formatting and name selection. Much of this is simply not

optional. Literate programming, as championed by Donald Knuth, discourages calling

variables, functions, and subroutines A, B, C, etc. Meaningful names are required.18

Not following these kinds of guidelines, students produce naturally obfuscated

material.

Even with an early emphasis on teaching commenting, novice programmers routinely

face a critical stage where they have difficulty reading and understanding their own

programs.

There is a glass ceiling in place. They cannot improve their own results beyond a

certain level until they can read their own results.

The necessity of interaction with natural language is immediately apparent on a another

level. As instructors, we have to specify tasks. That is, we begin an assignment with “Write a

program which . . .”19 There is some typical stylization and a certain rigor20 as to how this is

done; however, the assignments are ultimately given in a natural language.

For novices, there just is no other place to start.

The machine may be unforgiving, but the human audience is much harder to reach. As

beginning programmers reach toward handling challenges which are not designed to teach,

this particular issue is replete with many difficulties. Communicating algorithms, negotiating

specifications, etc. are overarching skills which connect many courses. It fair to say that the

ability to read and write computer programs has a very strong relationship with the ability to

read and write in some natural language. Arguably, the ability to read and write

mathematical proofs may be a more precise analogical description of the overall skill. In fact,

the there numerous connections with problem solving.

Neither is entirely accurate.

In order to begin programming, the students do have to be able to read a task specification,

often much shorter than a paragraph. This may be deliberately vague. For example, “Design

a database to store the contents of your music collection.” They have to read and analyze

that request with appropriate tools of language in order to be effective. (An example of noun

decomposition will follow in another section.) Task specifications often appear to be natural

language. They aren't --- not in the usual sense. Normal vocabulary is used differently and

understood differently.

For example, the concept of age is interesting, and is often used as an example in

database design and normalization.

Storing the age of an individual is bad practice. In brief, the number changes with

time. The data becomes obsolete and questionable --- even over a short period of

time. Storing the birth date is more appropriate. With that information, age can

calculated. Updating and refreshing the information is not needed. The data remains

durable.

A human being has exactly one birth date; however, a requirement for this

information in the wrong place may introduce insertion anomalies. That is, we may

know much about an artist, but not the birth date. A requirement in the wrong place

would prevent the user from storing what data is/was available.

What all of the above boils down to is that, when a question requires that age be

stored in a database, there are professional expectations concerning exactly how this

is done. The concept of age, the word age, should trigger those associations.

Restating, the mental processes in programming often connects certain principles to concepts

acquired in childhood. The students need to use natural language at an extremely high level of

conceptual understanding in order to acquire the skill because we, as instructors, have to rely

upon the students having the basic concepts, the vocabulary to discuss those concepts at a

high level of abstraction, and the general flexibility to expand on those concepts.

The relative abilities of students as far as absorbing new material is perhaps less important

than their speed in applying that material. Many students learn best from hands-on practice.

Trial and error. If a student doesn't write a syntactically correct command, the compiler or

interpretor will generate an error. In brief, conclusive feedback can be incredibly swift in a

hands-on environment.

Another part of the desired literacy skill is connecting to programming examples. Given

example material, students can often “see” what a program or statement does in terms of

testing it out on their home computers. Understanding the internals sufficiently well in order

to create variations takes time and effort. Much of the assimilation and accommodation

process involved can be elided --- if and only if they can simply read and understand the

comments surrounding the example. In brief, we expect students with strong natural language

skills to show a faster rate of acquisition.

However, there's another significant disconnection from natural language acquisition. The

meta-cognitive skills in learning any subject involve the specialized vocabulary necessary to

describe the process of thinking and learning. For general subjects, we describe certain tasks

as involving assimilation or accommodation. These don't arise in the first year of

undergraduate courses. At least not too often. In contrast, for computer science, as students

proceed through through even the simplest of material, they find confusing and contradictory

labels related to almost every skill they're acquiring. Some of this can occur because the

writers may be striving for a certain kind of accuracy at the expense of clarity. Yet, much of it

occurs because the author or authors are simply foreshadowing the much deeper meta-

cognitive material to come in subsequent courses. In particular, Software Engineering is

deeply concerned with making both the individual programmer and teams of programmers

more productive. There's a certain connection between thinking and expression which can

make the construction of new material go much faster.21 As a result, phrases such as object-

oriented programming, design patterns, and the Turing computational model may appear on

the first page of any given introductory text.

Some of the terminology which appears very early merely offer an extremely detailed

labeling, often distracting to the simple task of getting started. For example, I find limited

use on the first day for the phrase lexical scope resolution operator.22 Within the first

chapter, words such as class, pattern, and object may be redefined and used very differently

than they are in the standard language. Theoretical Computer Science has a equally

important impact on common framework of abstractions. Theoretical and conceptual

abstractions, such as asymptotic analysis also tend appear very early.

There is ongoing debate within the field as what should appear first. The complexity of initial

acquisition is also compounded by the need to use different verbs and labels to describe an

inhuman audience and avoid a misleading anthropomorphization. (Programs are not merely

produced, they're consumed. In human terms, they're read. In machine terms, they're

parsed.)

Design of the Assessment Instrument

The exact instrument given to the students is roughly ten pages to eleven pages in length.

The targeted program outcome, competency in at least higher level programming language, is

subdivided into component pieces, testing smaller elements, some repeatedly to insure that

little guesswork is involved in reaching an answer. The series of questions within each

instrument is relatively comprehensive with respect to SQL. More than one question probes

the ability to form precise SELECT statements. Other questions address the formation of

DELETE, INSERT, and UPDATE statements. In the first three instruments, these commands are

heavily emphasized. Later probes in the sequence emphasize other elements, for example,

CREATE and ALTER statements. By the third course, there will be questions addressing the

formation of SQL queries within one or more higher level languages.23

Most questions were designed to directly test the ability to express commands (using SQL) to

accomplish specific tasks related to storage, retrieval and adaptation of a database.

Discipline-specific arguments are necessary to support which items are included and which are

omitted. Largely, what separate a database design from a simple spreadsheet is the use of

more than one table. Mastering queries where the required responses involve more than one

table is fundamental for competency in SQL. This is a sample question to test for that skill:

Suppose you have the following data schema:

Product(maker, model, type)

PC(model, cores, speed, ram, hd, rd, price)

Laptop(model, cores, speed, ram, hd, rd, screen, price)

Printer(model, color, type, price)

Write an SQL query which retrieves all the makers of printers with prices strictly less

than 1000.

Certain technical and pedagogical criticisms are possible. With respect to the stylized format

(the data schema approach)24, primary keys are not marked. Just as we might usually assume

a certain vocabulary, we can also usually expect that students can infer what the primary

keys are. The expected solution process requires this information --- along with other

necessary details --- for example, that the model codes for printers are distinct from model

codes for laptops. The natural language aspects can be discussed at length.

For now, at least one variation of this particular question is planned to appear on every

instrument in the sequence. The exact wording may change within even a single offering.

Minor variations, designed to discourage cheating and memorization, alter the word printers

to laptops or ask for color printers, etc. We do expect that all of the students will learn other

things and there are other course issues at stake. Later versions --- especially those in

subsequent courses --- might have a different set-up, perhaps employing a different kind of

database specification. In the second course, students may be asked to write functions in

another language to produce the needed response. By end of the final course, the question

may be offered electronically within a controlled environment --- also testing for speed of

response. We expect a high standard at the end. Over time, we expect to see the

percentage of correct answers rise. We also expect that individual students will get faster. A

student who is competent in SQL should achieve mastery of this particular item along the

way.

No matter what vehicle is used, the answer to absolutely every variation of this particular

question requires the formation of a particular, syntactically correct, functional SQL query –- a

well-formed SELECT statement joining two tables.25

In a similar way, every question in the sequence of instruments have, by design, direct, strong

relationships with questions to appear later. Not all of these relationships are as precise as

what is planned for the sample question; however, they all support analytic comparison of

atomic results. These links between the questions offer us much more than an insight into

what the students are learning, the common threads allow us to monitor the progress of

students over time. We want to know how fast they're learning, and how much they're

learning. This is the point of any longitudinal study.

As may be expected, the format of each question is intended to help separate and isolate

each precise skill we want to test.

At times, the ideal of isolating skills is better accomplished with coordinated question

sequences within an instrument. In brief, a well-designed instrument for this type of study

anticipates the need to explain student failure with less ambiguity than a normal examination.

Briefly, a student who can answer the questions has mastered the related skill. What we would

like to claim, with as much authority or evidential weight as possible, is that any student who

can not answer the questions correctly has not mastered the skill. Concretely, we can usually

expect that a good computer science student would know what the terms laptop, PC, or

printer mean in this context. However, an otherwise outstanding student might occasionally

stumble over vocabulary --- particularly if he or she was learning English as a second

language. This is simplest way to explain why each instrument also includes one to two

addition questions to lightly assess natural language skills.

These specific question threads serve multiple purposes, including the need to measure a

different aspect of competency with programming languages. Ultimately, the planned

progression of writing sample questions are tailored to target two specific meta-cognitive

skills involved in competency:

The first is the ability of individual students to interpret programming requirements;

the second is the ability of individuals to document their results, participating

appropriately in the kind of group work where one might never even meet one's

collaborators.

Ultimately, the assessment requires some validation in the sense we must attempt to assert

that most students can understand the questions.

In the first very first instrument, there were two questions merely asking for writing samples.

The first asked for a response in English. The second asked for a response in Chinese. (Later

instruments drop the request for a writing sample in Chinese, and direct the English writing

sample away from elementary issues and toward specific topics of concern in databases.)

The results of that sequence of questions may enhance any attempt at explanation for a given

student who has a significant delay or complete failure in achieving mastery of any other skill.

In practice, there are numerous interconnected skills involved in even the simplest task. Within

just the one sample question given above, there are significant asides and peripheral

considerations. As a claim, the natural language concerns are the most important. Nearly

every method for probing competency in computer programming is affected by natural

language skills.

Additional material is given below is given below to justify the writing sample questions. In

simple terms, studies at the Nanjing campus are affected by English and Chinese language

considerations in many ways.

Well-considered rubrics for scoring the questions are also important to the design of testing

instruments.

Some additional insight into SQL, databases, and general programming is necessary before

we proceed. Outside of a teaching/learning context, it isn't quite natural to have the pertinent

details predigested in any form, let alone the specific, brief form given above. Directly, the data

schema given above abbreviates what would otherwise be an extremely lengthy specification.

Much is stated. Much is implied. A few relationships might need to be inferred. Here, as

with standard story problems for engineering or mathematics, a stylized format offers the

necessary simplicity to put the question within reach of beginning students. Some of the

simplicity does come at cost in precision. Thus, we specifically need to communicate a few

details about test scoring in this context.

The students should, at times, write down the inferences they make about the

relationships as part of the answer --- in much the same manner as a student might

“show work” in answering a mathematics problem. The additional material is not the

answer, but it supports the answer and makes the reasoning transparent.

Some of this expectation reflects an instructor/grader prerogative: Partial credit for a

wrong answer is possible if and only if the nature of the mistake is both obvious and

not integral to the direct skill being tested.26 Other policies may be important. An

explanation serves as insurance. Full credit for an unexplained answer, even it is the

only correct answer anticipated by the instructor, is not always assured.

Generally, for any subject, the process of designing good test questions is a bit of an art. There

many metrics of quality. Clarity is one. Accuracy in reflecting real-life concerns is another.

There are others. Part of the process in designing questions for these efforts anticipates

criticism from general sources. Best practices make this a long-term, public study --- at least

within the NYIT community. The entire sequence, including all students responses, will be

archived, and placed into the shared pool of material at NYIT upon completion. (Interested

faculty may request source material for all delivered instruments during the study.) Versions

without redacted identifying student information should not be released outside the

university.

Natural Language Considerations

Ultimately, the targeted program learning outcome simply involves a very high degree of

natural language competency. With any programming problem or exercise, the task

specification usually omits much more detail than the given sample question. A larger task

would be deliberately vague and success would require practiced problem interpretation.

We can offer very specific supporting material here.

A key example is the technique of noun decomposition as described in 27. This simple

approach (or a variation) is usually introduced in first course where object-oriented

design appears. (At NYIT, this is the very first course in the Computer Science

sequence.)

As a pointed example of application, we may require that students design a database

to store information about their music collections. This technique would direct the

students to focus on the whatever noun phrases appear in a lengthy natural language

description of the task.

The likely outcome would include the identification of things such as title, artist, year,

etc. All of these would likely appear as fields in any good student solution.

Many of the most useful tools for turning a vague problem statement into an appropriate,

effective design are directly connected to natural language tools. Deeper analytic techniques

appear in Software Engineering courses. The very naive noun decomposition technique is a

starting place, and the actual skill is closer to an art. Common knowledge is assumed. So is

common sense. A concept such as artist can further decomposed into name, age, etc. We

don't expect a student to continue the analytic process indefinitely; however, the depth of

analysis can vary --- and there are meta-cognitive skills at play in communicating the limits we

expect.

To connect this with learning outcomes, a student unfamiliar with the concept of a noun

would be at a disadvantage in understanding the noun decomposition technique. Yet we

don't consider English language skills as prerequisites. Like mathematics, Computer Science

offers a certain independence from specific natural languages. Programmers routinely share

source code without sharing a natural language.

As noted earlier, there are additional questions to probe natural language skills. We make no

assumptions about the results beforehand, and comprehensive assessment is not possible.

One intent is simply to correlate relative progress with component skills derived from one

perspective and an embedded, overarching skill derived from another. A second intent

would to expose elements that could better explain encountered failure or delays in

acquiring competency with SQL. We make no assumptions or predictions about the results

beyond the acknowledgement that language issues can impact the progress of even the best

of students. The impact can be either negative or positive.

As a practical matter, the students at the Nanjing campus are learning English as second

(sometimes third) language and proceeding without a true immersion environment. With

instruction delivered in English, lectures in a non-English subject serve to support practice in

two separate skills simultaneously. The connections are clear, nearly self-evident, and not

always discouraging. Students learning a second language may be more familiar with parts of

speech, grammar issues, and meta-cognitive skills than monolingual students who've been

unchallenged in these areas. There is no evidence to support whether our students are

ultimately better or worse than those proceeding from a different background or those who

emphasize different study methods.28 We have problem students. We also have students

with flawless, enviable language skills. We can be illustrative of many suspected issues,

although most are limited in scope. A few of our students do lack vocabulary, avoid listening

and reading practice, and rely upon electronic dictionaries and the whispers of friends during

lecture. There is a reasonable concern whether or not these particular students have the

essential language skills necessary to succeed in coursework --- or the capacity to develop

those skills concurrently while attending courses where English is not taught directly. Much

of this is simply beyond the scope of the current assessment study. The challenges of

teaching in a global program are often unique and ephemeral, and the requirements for clear

instructor-student communication are fairly significant in any course.

The lingering concern that has to be addressed with respect to an assessment study is

whether or not the local language issues may skew the results. Further, we need to at least

consider different ways to design instruments to minimize language concerns.

Some very precise, unambiguous material could be delivered through other mechanisms.

With respect to the sample question given earlier, the set-up could have been expanded to

include the four specific CREATE statements needed to build an example of the implied

database. This is something we expect the students to produce and use; however, providing

just the four CREATE statements as part of the problem statement would increase the length

of the question to more than two pages, with much of the presented detail both irrelevant

and distracting. Doing this --- even just this much --- also introduces a much deeper reading

skill. More to the point, the skill we're driving toward is the linkage between natural

language and precise, mathematically unambiguous material. If we could assume student

mastery of this at the beginning, assessment would be irrelevant. Ultimately, removing the

concerns of stylization from this subject may be an inappropriate goal. In the simple world of

learning exercises and test questions, pool tables are frictionless, players in a market economy

behave rationally, and the simple relationships are, well, simple. Some stylization of test

questions is unavoidable in almost any subject. Here, it may be a positive aspect. While we

don't want to encourage distracting test skills or “plug-n-chug” responses, an anomaly of this

subject is that problem stylization is eerily reflective of the real-life difficulties in working

with databases. That is, some of the student burden conveyed in the sample problem offers

precisely the right level of challenge for practical work.

Real databases don't look the toy models we offer for exercises and test questions. Yet, real

databases are always toy models of something tangible in the outside world. There are

abstractions of abstractions at play here. Consider the sample question again. The necessary

set-up for the problem requires a shared perspective on a small portion of an entire database

--- a very difficult thing to convey quickly and unambiguously. Someone proficient in SQL

might pause before answering this kind of question because that person would be used to

obtaining needed details by working with the database directly. The table and field names in

an existing database --- metadata --- is typically the only reliable resource available reflecting

the designer's intentions. (Documentation material, when it does exists, is often poor.)

Meta-data only offers short labels. The names selected by the designer might imply many

unstated, but critical relationships. Not all names are well-chosen. Poorly chosen names

encourage discrepancies, abuses, etc. An experienced professional would expect to spend

significant time studying a real database, probing and testing to verify the actual

relationships, before working with it. Thus, there is a need to understand how other people

might interpret the same terms. In most cases, professionals would consider a strong

vocabulary, contextual interpretation of term, and inference of intended relationships be

inseparable aspects of the greater skill of working with databases.

The same argument holds when we consider that the database specification could have been

given with lengthy natural language explanations of all the terms and relationships as used in

the problem. Beyond the argument given above, there is a specific local concern. Lengthy

problems in English just don't work in this environment .29

As an alternative, dictionaries could be provided. That is, we disregard the problems and

offer standardized help. Providing complete dictionaries for every student simply goes

beyond our resources. In fact, the instructor burden for providing partial dictionaries is

unreasonably high. Yet, the most appropriate response here is that this proposal would alter

the testing proposition. An instructor-provided, partial dictionary becomes an expected

crutch, and, when omitted, becomes an excuse for not acquiring course skills. When

routinely included, it forms an excuse for not acquiring vocabulary --- in any language.

The particular issue at the Nanjing campus is acquiring a second language concurrently. The

English language vocabulary for our students is not as strong as it would be among native

speakers of English and not as strong as their Chinese language vocabulary. Thus, a further

variation on this approach consider translating dictionaries, offering definitions of English

terms in Chinese. Note that there is a major distinction between providing a dictionary and

allowing the students to provide their own. Here, for this study, we do allow students to

bring in non-electronic translating dictionaries --- carefully checked for crib sheets and other

inappropriate material. As a teaching opportunity, the time required to look up terms is

more than significant in a test-taking environment, and the students themselves have begun

to discover that this isn't the most practical approach.

Simple pragmatics mandate that we ask brief questions where brief answers are satisfactory.

We simply need the students to understand the questions --- quickly and in isolation from

most other skills. Returning to the concerns of assessment, we could consider asking

assessment questions bilingually or entirely in Chinese --- or whatever language the students at

hand feel most proficient in.

It's easy to make the case that this may not be appropriate for instruction. Ultimately, there

are circumstances where this approach may serve needs of assessment much better. Bilingual

test preparation increases the instructor burden and defeats the one of the major purposes

of instruction in English. In this learning environment, this could much worse than simply

confusing. The mere existence of well-design test instruments is important with respect to

English language acquisition. There are cultural issues with respect to how the students here

approach learning English --- anecdotally, they spend enormous effort preparing for

standardized tests, and less effort in conversational or writing practice. In brief, testing in

English is a major motivation.

For this study, the intent of designing the major assessments instrument in Chinese (or

bilingually) would be to isolate their acquisition of SQL from their ongoing acquisition of English.

That is a valuable goal, and this requires serious consideration. The immediate problem (and

the depth of that problem) might be surprising. Students at the Nanjing campus encounter

far greater difficulties in using Chinese in connection to computer applications than they do

with English. In fact, the central focus of any database course in Chinese would necessarily

very different. In brief, by using Chinese as the sole language of instruction for acquisition of

SQL, we'd be adding numerous linguistic requirements which would be difficult to support

robustly.

While Oracle and MySQL implementations of SQL do offer support for Chinese (and, more

directly, Unicode), there are still repeated difficulties with the variety of encodings, fonts, and

expressions. Both natural Chinese character sets, traditional and simplified, are much, much

larger than usual Latin character set used for English (and European languages). In brief,

there are more than 256 distinct characters. Working with the written Chinese language

directly using the 256 character possibilities for byte storage is possible. However, storage of

one character per byte is not possible. At a fine-grained level of analysis, UTF8 is a precise,

unambiguous, industry-standard solution for handling Chinese strings where a quick bijection

into byte storage is needed. Yet, retrieval and display in a general context has been

incompletely standardized. UTF8 storage solutions are straightforward; however, general

tools expose numerical codes rather Chinese characters. These numerical interpretations lack

human-readability. Thus, maintenance of databases without using specialized tools is

problematic. The better solutions are both proprietary and expensive.

One can translate Chinese written with either the traditional or simplified character set into

human-readable material using the Latin character set as a basis. There is a standard for this.

However, it isn't the sort of system that one would expect native speakers to be completely

comfortable using. Directly, Chinese language can be rendered in Hànyǔ Pīnyīn, and version of

Pīnyīn without diacritical marks we may refer to here as Pinyin.30 Hànyǔ Pīnyīn is phonetic ---

in fact, the word Pīnyīn has a denotation consistent with the English word phonetic --- and the

consturction of is relatively recent, although the typographic decisions about where to place

the diacritical mark has the kind of arbitrary complexity one expects out of any language

artifact with a significant history. The general process, called Romanization, is useful applying

older, legacy material (based on the old ASCII standard31) for general string handling to

languages with non-Latin character sets. It's also helpful in the language acquisition process

for non-Chinese speakers.

For native speakers of Chinese, the issue is more complex. Pinyin, as opposed to Hànyǔ

Pīnyīn, is not phonetic. Pinyin uses the Latin character set, but an individual word rendered in

Pinyin could be the result of several different words in Chinese. The four significant diacritical

marks in Hànyǔ Pīnyīn indicate tone. Mandarin has four, sometimes five, tonal forms, and the

tone is necessary to complete the meaning. Chinese is typed using Pinyin, adding to some

confusion over what a native speaker may and may not know. The initial of characters struck

typically yields several options for the precise Chinese character intended --- either simplified or

traditional --- and the user must select from the choices available using a mouse and/or arrow

keys. Like the automatic spelling correction offered for English SMS typing on intelligent

phones, the software is forgiving of mistakes and offers choices which are not restricted to

what was requested. Specifically, the software would typically include those options which are

more likely to be what was intended as opposed to what was requested, with algorithms for

the process using automated inferences based upon preceding characters. That is, one could

mistype the Pinyin for a particular character and still get the correct character as option. All

of this should help to explain the following repeatedly-observed phenomenon:

Some of the freshmen at the Nanjing campus of NYIT, functional typists on US

keyboards and intelligent students, have occasionally demonstrated difficulties with

spelling their own names in Pinyin.

The issue is minor, but underscores a general claim that Pinyin is not a natural way of

typesetting Chinese. Written Chinese material, using either simplified or traditional characters,

can be transliterated automatically into Pinyin. The reverse is more difficult. For non-Chinese

speakers, a good analogy for the human effort might be rendering English phonetically using

the Greek alphabet and then removing the vowels. Even this might not capture the right level

of difficulty. The diacritical marks are significant. Like unpointed Hebrew (points indicate

vowels), sufficiently long passages in Pinyin (without diacritical marks) can be understood by a

native speaker. At least eventually. Unlike unpointed Hebrew, this isn't at all natural.

Ultimately, short passages in Pinyin can be decrypted by an intelligent native speaker with only

modest effort, but regular reading or production would be exhausting without practice. With

automated translation, the reverse holds. Short passages would likely be ambiguous; long

passages would be simpler.

Not knowing where this report might be disseminated, the following conclusion/assertion

may be badly misinterpreted if taken out of context: English is the simpler choice.

Ultimately, technical courses in SQL taught by native speakers for native speakers are very

different than the NYIT sequence. Those courses need to address a wide range of topics

directly connected to the encoding and display problems at the outset. Some instructors use

localized solutions, and many students are handcuffed to specific technologies during

instruction. In other words, they aren't learning SQL in a general context. Instead, they're

learning Oracle, Microsoft Access, etc. specifically and directly. Many of the issues with

respect to SQL are similar, but competency in SQL is less a concern than general functionality

with the tools.

The end result is that using the Chinese language for assessing SQL acquisition is not

appropriate. We cannot present problems for assessment which require skills far beyond

what we're teaching.

Assessment Day at the Nanjing Campus

The agenda for the Assessment day at the Nanjing campus include an informal introduction

similar to the above, along with a handout provided by New York.32 All available faculty were

present.

An early part of the presentation described what is NOT covered by the notion of assessment.

Generally, the point was offered repeatedly that assessment activities at NYIT aren't

threatening to a particular professor or student. NYIT, like many other institutions, makes a

careful effort to safeguard its integrity in recognized ways. For example, the individual

responses to course evaluations, sensitive material, are carefully separated from individual

responses to examinations, graded material. More precisely, students are aware that

safeguards exist to ensure that their comments can be made freely and that their grades are

not at risk.

For an audience of faculty, grades aren't the issue.

As may be expected, some concerns were raised, and it should be noted that there were many

sensitivities in this area. Specific sensitivities centered around a general concern that

assessment studies might be used as yet another mechanism to evaluate instructor

effectiveness. There is some validity to this concern. Assessment activities span multiple

courses. They measure the student acquisition of material. They're often designed to provide

comparable data. So, the potential exists. Yet, one may or should expect that the human

resource procedures for faculty, specifically those involving retention, raises, advancement,

tenure, and other sensitive aspects of employment are simply independent mechanisms. At

NYIT, these are not connected to assessment studies.

It was generally accepted useful assessment studies can be conducted in a manner in which

instructors are not at risk. There was no other general resolution to these concerns other than

an acknowledgement that assessment is a maturing process emerging in higher education.

Part of the thesis presented was that both initiation of and participation in assessment activities

at NYIT lie well within the capacity (and capabilities) of any individual instructor at any time,

and, ultimately, assessment at NYIT is not a mysterious activity. It should be noted that many

faculty were less familiar with the program learning outcomes and that effort should be

expended in disseminating this material routinely through other departmental and campus

initiatives. The faculty generally acknowledged that student assessment within a single course

is already the responsibility of the instructor. Typically, grading has a legacy of meaning and

certain unique features. It may be considered almost a separate process. One distinction as far

as assessment of the institution is that all currently administered examinations for grading

serve only course-related goals. Thus, most assessment initiatives should avoid redundancy

and focus on the issues which affect multiple courses. The distinction between course and

program learning outcomes was discussed at length, and details of the NYIT framework were

presented.

To some extent, competency, even expertise, in the design of assessment instruments

appropriate to each course is expected from the assigned instructor. There were few questions

in this area.

After a break, audience participation devices were provided and a demonstration of rapid

assessment was provided. The demonstration addressed a significant local issue with students

being unwilling to raise their hands during lecture. The demonstration also provided a basis for

discussing how quickly and seamlessly assessment activities could integrated in a course.

Following an instructor question, the limitations of what's possible were discussed.

Sometimes not everything “fits” within a narrow framework of measurable outcomes, and good

instructors may motivate the subject in diverse ways. As a result, students do expect that

presented course material reflects an understanding of how the course objectives may connect

to a larger picture. Yet, what's not expected, outside of assessment efforts, is that each

individual instructor repeatedly attempts to measure all the program outcomes directly, putting

grades at risk33 for broad overarching skills, perhaps incompletely mastered and tangential to

the specific subject at hand. Many examples are possible. While mastery of certain technical

jargon is expected in almost every course, simple logistics prevent testing all language skills

robustly, repeatedly, and comprehensively in every course. Even if it were possible, the

proposition goes far beyond the implicit institution-instructor-student contract. Paraphrasing a

discussion point, many instructors would be uneasy with the practice of putting grades directly

at risk34 for English comprehension skills in a Chemistry35 course; yet, they would be relatively

comfortable with putting those same grades indirectly at risk. That is, a certain fluency is

expected in order for students to follow exam instructions and demonstrate what they know.

The day concluded with a long discussion of local issues.

Significant topics included upcoming examinations and concerns with how to handle incidents

of cheating and plagiarism. A general policy for handling cheating was discussed, leading to a

decision to collect material on specific incidents so that repeat offenders could be tracked

through th dean's office. A policy for plagiarism was also discussed. No significant resolution

was made.

After the meeting, materials relevant to that discussion were found on the New York website

and disseminated to faculty.

Preliminary Report on Assessment Activity Results

Briefly, the number of students in the sample (44) is too small to make conclusive statements.

The material here is suggestive and individual strong performances tend to dominate the

results. Further, it is too early in the sequence of assessment instruments to provide detailed

statistics such as regression tests.

Three out of the four instruments for the first course have been administered.

Some early analyzed results are available.

1. Students in the first stages of the first course in databases are acquiring significant

practice. Two students are already close to to having the entire range of skills involved

in the targeted program outcome.

(That is, at the time of the third assessment in the very first course, these two students

are already nearly where we would expect them at the end of the third course.)

2. With a small number of exceptions, all students have demonstrated significant gains

between the initial assessment and the second assessment. Less significant gains were

made between second and third assessments.

As an issue, many students are converging on the needed atomic skills, but have

problems with small details. Small variants in how partial credit is applied do not

change the overall picture greatly, but affect the statistical metrics. Roughly 83% of the

students are making progress beyond the initial prediction. A small number of students

are far behind. Individual attention has been directed to these students.36

Close analysis of individual test results from the third administration reveals that many

students attempted to memorize responses to earlier versions of the same instrument --

- rather than respond to the questions at hand. This has been discussed with the

students directly, and the fourth instrument in the sequence is being adjusted

accordingly to make this less likely.

3. The ability of the students to express themselves with the English language is modestly

correlated with their ability to express themselves in Chinese. Neither is significantly

correlated with their ability to acquire a programming language --- at least in the initial

phases.

Difficulties in properly assessing Chinese writing samples, the inconclusive nature of

those questions, and the limited space and time for questions of this nature during

examination had lead to a decision to drop this series of questions.

The English writing sample question is being modified as planned, and now targets

course material rather than general topics.

4. The students are acquiring significant English language practice concurrently with the

course.

A peripherally administered instrument, a standard word analogy in English, with

questions similar in form to those offered by the GRE --- albeit simpler, was

administered through a audience response system.

Over a period of nine weeks, the body of students went from an audience average of

24% correctness to 56% correctness.

5. Student feedback through direct discussion has been extraordinarily positive.

Note that all courses present the course learning outcomes on the syllabus. Here, the

first day of class required an explanation of the assessment schedule, the targeted

program outcome, and a description of how this material was to be embedded into

regular examinations. Much of what was discussed covered exactly what did and did

not affect their grades. Student feedback through Likert probes (administered through

the audience response system), that same day was mildly favorable; however, individual

students came forward after the first instrument to express gratitude over being shown

“exactly what we need to learn.”

Informally, they indicated that neither the discussion nor participating in the actual

assessment would have been sufficient. Both made the way forward through studies

clear. This may help explain the significant gains after the first assessment --- although a

conclusion is not yet possible.

In brief, presenting them with the the framework of what they're expected to do, the

connections between all learning outcomes in their raw form, is useful in more than one

way.

6. The final instrument for this course is scheduled for early June.

Analysis and conclusion should follow by July.

student learning outcomes assessment … learning outcomes assessment at nyit’s global sites ......

Documents