student learning outcomes assessment … learning outcomes assessment at nyit’s global sites ......
TRANSCRIPT
STUDENT LEARNING OUTCOMES ASSESSMENT AT NYIT’S GLOBAL SITES
As anticipated, the activities of committees on global campuses focused on local priorities.
However, it has to be noted that, due to the phase-out of two campuses and in some cases,
political unrests, no programmatic assessment activities took place at Amman (Jordan) as well as
at Manama (Bahrain).
The list below gives highlights for each other campus.
Abu Dhabi, UAE
Due to significant personnel changes for this campus, the Assessment Committee was
simplified to the campus dean, the assistant deans, and the Director for Institutional
Research and Assessment.
Moreover, for the entire fall semester, there was no local leadership for the School of
Management, which represents 86% of student enrollment at this site. As a result, there
was no discussion of assessment at the beginning of the year.
The Assessment Committee held its opening meeting in November 2011, but in fall and
start of spring, the work focused more on harmonizing degree maps and other high
priority tasks (schedules and advising tasks).
By the end of March, faculty in the School of Management requested the Assessment
Committee to postpone Assessment Day till fall 2012. An October date is now
envisioned.
The Arts & Sciences faculty participated actively in the fall 2011 collection and analysis
of data for the institutional assessment of the core Foundation of Scientific Process
course.
Vancouver, Canada
The Vancouver campus MBA programmatic assessment process was initiated in 2008.
The current round of learning assessment outcomes includes a new level of assessment
activity. Localized empowerment in Vancouver, as well as all other campus locations, is
by way of the oversight of the local Associate Dean, who works collaboratively with the
DAD (Department Assessment Director) to affect change at the Vancouver campus
location. Specifically, The Vancouver Associate Dean and DAD convene local faculty
together regularly to discuss specific program outcomes (both direct and indirect) at the
Vancouver campus location, and recommend changes to the curriculum through the
contextualized element in the Master Syllabi. That is, Vancouver faculty and personnel
have autonomy in adopting and revising contextualized elements that not only reflect
stakeholder interests, but also help to strengthen program outcomes in Vancouver in ways
that are specific to the limitations identified in the program outcomes at that location.
All faculty members, as a requirement of their individual educational responsibilities,
maintain course portfolios that include samples of student learning outcomes. In this way
the faculty evidence appropriateness of scoring and validating the implementation of the
assessment process through coursework administered. The collection of scores in the
Goal Validation Scores system for the academic year 2011/12 has not been completed
yet.
As indirect measures, an employer survey and an alumni survey was implemented in
spring and summer 2012 in conjunction with course learning outcome assessments.
In addition, with the introduction of the revised MBA in fall 2011, administration of the
ETS Major Field Test in business at the master’s level was made mandatory for all
entering and graduating students. This provides a systematic method of comparing
students’ subject-matter competencies at the beginning and end of their educational
experience. Results were analyzed, and recommendations made to the School of
Management leadership.
Nanjing, the People’s Republic of China
The newly established Assessment Committee in Nanjing held regular meetings during
2011-2012, and launched a programmatic assessment exercise during fall 2011.
The Committee agreed that the specific outcome to be assessed in this study be chosen
from the BS in Computer Science goals (“Upon graduation students are expected to
demonstrate” … “An ability to use a variety of computer programming languages and be
competent at least in one high level language.).
The specific methodology for the Nanjing project integrated the assessment of the
program outcome into the usual course examinations for the entire sequence of three
courses. The Committee designed an assessment instrument, using SQL techniques
(Standard Query Language). From the student perspective, the only immediate visible
change was that a new “examination” was given in each course.
Preliminary analysis was done, reported and shared with all faculty during an Assessment
Day offered on April 27, 2012. The complete analysis, including all the June 2012 results
is underway and will be available over the summer.
In addition, during the Assessment Day, faculty also shared their experiences with
"cheating" and discussed cheating and plagiarism policies.
Finally, the Arts & Sciences faculty participated actively in the fall 2011 collection and
analysis of data for the institutional assessment of the core Foundation of Scientific
Process course.
Report on assessment in Vancouver AY 2011-12
MBA program
Based on input from our stakeholders (stakeholders’ conferences as well as business advisory
board member meetings), we added contextualized learning goals for all our non waivable and
elective courses( 13 of the non-waivable courses as well as for some elective courses). Those
contextualized learning goals and their learning validations and scores are unique for our
location.
Example:
Added contextualized learning goal in FINC 620:
(B) Contextualized (Globalized) Learning Goal(s): Upon the successful completion of this course, the student will be able to:
1. Summarize current events regarding ethical financial decision making and reporting in Canada including the differences of perspectives between non-Canadian and Canadian companies.
Added assurance of learning validation for additional learning goal (quiz) with relevant scores:
Assurance of Learning Validation (In support of the Contextualized (Globalized)
Learning Goal(s)):
B1. Graded Quiz: An additional quiz will be assigned on a Canadian current event.
For the purpose of assurance of learning, the Graded Quiz will be assigned 2
scores based on the:
a. Score 1: Demonstrated ability to put a global perspective on class materials (MBA-INTERNATIONAL); and
b. Score 2: Incorporation of current ethical consideration in matters of financial choices and financial reporting (MBA-ETHICS).
Summary of Assessment Activities at the Nanjing Campus
(Relevant to the Preparation of the President's Report)
1. Informal Introduction
2. The Longitudinal Study at the Nanjing Campus
3. The Targeted Program Outcome
4. Unnatural Language Acquisition
5. Design of the Assessment Instrument
6. Natural Language Considerations
7. Assessment Day Discussion
Informal Introduction
Generally, assessment is concerned with student progress. That's a deliberate
oversimplification, but experience with introducing these kinds of studies to faculty,
administration, and outside funding resources suggests that much simplification is necessary.
Much of the material is deeply abstract and, at times, open to misinterpretation. We want to
avoid that misinterpretation. Even the more accepted tenets require a gentle introduction and
frequent restatement.
Another necessary, introductory component is putting the material into a proper context.
Ultimately, any university is simply a complex entity. Smaller things within larger things.
Different perspectives from different participants. To make this all of this material digestible
for a general audience, we quickly acknowledge a few communication difficulties. Discussions
of expectations with alumni, faculty or students can be rather far reaching. Faculty and
students are extremely sensitive to any details that may affect them directly. It takes time to
interpret the pragmatic day-to-day classroom experience in terms of any expressed goals or
ideals, and even more time to properly connect to a shared, reasonable framework to improve
the teaching/learning proposition.
Briefly, there are a lot of concurrent goals and objectives.
Here, the larger direction is meeting an “assurance of excellence” criteria, and we're specifically
concerned with NYIT's 2030 vision.1 Quoting from that material:
The vision is that, by 2030, NYIT will be:
Known for its career-oriented undergraduate and unique and distinctive
graduate and professional programs;
Known for its thriving graduate centers featuring interdisciplinary research,
degree programs, and "best-in-class" work in a small number of highly targeted
niches;
Known as a global and partially virtual university with NYIT in New York as its
quality hub;
Known as a model student-centered university;
Known as a leader in teaching quality improvement; and
Known as a well-funded institution, with dependable revenue from a variety of
sources.
The approach taken to realize these kinds of ideals is pragmatic. “NYIT has systematic,
coordinated processes overseen by a steering committee and vice president to set institutional
targets, monitor results, and use those results to inform decision-making and resource
allocation.”2 There are also three additional views on assessment that concern us here:
1. NYIT's Institutional Assessment Plan3 requires, “School and program level outcomes
1 NYIT 2030: Setting Directions, Meeting Challenges, New York Institute of Technology Strategic Plan May
2006
assessment (annual) designed by deans and program faculty; results and an
improvement action plan developed; plans and results reviewed by the Assessment
Committee of the Academic Senate.”
2. The Assessment Committee of NYIT's Academic Senate Defines "Student Learning
Outcome Assessment" as, "The intentional identification, collection, analysis, and use of
data to improve teaching and learning."
3. The relevant standard set by NYIT's regional accrediting agency, the Middle States
Commission on Higher Education states, "Assessment of student learning demonstrates
that, at graduation, or other appropriate points, the institution's students have
knowledge, skills, and competencies consistent with institutional and appropriate higher
education goals."
The “assurance of excellence” criteria is forward looking. Assessment activities reference many
shared ideals, as articulated through various NYIT publications, including the the three course
catalogs4 and numerous departmental directives outlining the expectations for each course and
program. Yet, in simple terms, the institution is committed to a self-examination process which
continually seeks methods to improve our offerings, standings, etc. The style of the approach is
significant. This expression was part of the presentation for faculty during the assessment day
meeting at Nanjing campus:
We need to emphasize the relative value of systemic, data-driven approaches to
institutional self-assessment as opposed more immediate anecdotal approaches to
institutional self-assessment. In simple terms, “one of my students did this” should have
a less impact on resource planning than “30% of our students did this.”
Anecdotal material isn't irrelevant, it simply should have far less weight on long term
planning. Of course, there are other dimensions of concern. Sufficiently egregious
material of one type or the other may require a more immediate response.
The university has adopted a data-driven approach to planning based upon continuing research
into the current strengths and weaknesses within the overall curriculum. Ultimately, all
universities have limited resources; effective planning requires priorities; and priorities, at the
scale of complexity involved, are best derived from properly-qualified, quantifiable studies of
what is needed.
While the target of such research addresses the gestalt of teaching and learning activities at
NYIT, best management practices for this effort require a balanced, comprehensive treatment
without unjustified weight on particular areas. To that end, there is a shared framework for
NYIT's self examination. Crystallizing vague ideals into specific tangible expectations allows
systemic improvement. This is critically important to managing quality. The explicitly stated
learning outcomes for course and degrees provided through the faculty senate5 offer the
formal model of expectations at NYIT.
Before we move to a finer-grained view of assessment activity at the Nanjing campus, another
objective is worth mentioning, “Do this as well as possible. Create an example for others to
follow.”6 The interpretation of this particular goal for this report includes a mandate that the
goals, procedures, and reasoning are all intelligible to an audience of NYIT faculty and
administration.
The last goal, making the material accessible, implies something more than merely not
requiring members of the audience to switch disciplines. The methodology has to be open and
inclusive. Discussions need to be able to proceed without shared expertise in a particular
discipline. We also need to reach out to those to are new to the usual tools of assessment as
well as those who are new to the policies and procedures of NYIT. Ultimately, we're reaching
for the common ground.
To that end, we must acknowledge certain important perspectives and distinctions. A research
professor in chemistry has a very different reading list than a research professor in French
literature of the seventeenth century. There isn't much overlap. Briefly, the distinction extends
to teaching and learning. The way one measures the acquisition of scientific concepts related
to chemistry may be very different than how one goes about grading a student's critical
appreciation of aesthetics. From the point of the view of a proponent of any one discipline, the
details of their particular field of study may come first in importance. Yet, as a self-evident
claim, scheduling for courses and examinations falls into a category of common concerns for all
faculty. So does assessment.
As a concession to the obvious, the last goal could have been interpreted with an entirely
different emphasis. Education, especially higher education, is an entire discipline. Assessment
itself is the subject of numerous peer-reviewed articles.7 The technical and statistical terms are
sometimes overloaded and many are highly specific to distinct disciplines and/or models of
learning. Not every assessment activity warrants a book chapter or journal article; however,
there are many standards in involved the design process as well as many potential critics for
any study. Further, pedagogical studies have become almost a separate sub-discipline within
many fields. For example, there are separate learning models for STEM (Science, Technology,
Engineering, Mathematics) programs, and much detailed material concerning specific courses
and programs.
This kind of experiment, a longitudinal study, can be repeated at any university. How we
proceed is general. What we're measuring is more specific to NYIT. The assessment directives
are derived from NYIT's framework of learning outcomes.
The Longitudinal Study at the Nanjing Campus
The method chosen for the current project at Nanjing is a longitudinal study. Simply, students
are assessed at different points in the learning process. Early in the design process, discussion
began as a simple pre and post analysis where students are assessed both before and after a
particular learning activity. Either type of assessment activity offers a baseline. We might vary
the classroom environment, try the same assessment and see how making particular changes
affects subject material acquisition. Multiple probes, as opposed to a simple before and after
portrait, offer insight into the rate of acquisition.
The specific outcome targeted by this study is, “Upon graduation students are expected to
demonstrate” … “An ability to use a variety of computer programming languages and be
competent at least in one high level language.”8 While not currently available to the public on
the primary website for the department, there are thirteen specific learning outcomes involved
in the B.S. In Computer Science offered by for the School of Engineering and Computing
Sciences. These may be found at in the reports at 9. That same source offers course goals and
a matrix of relationships.
The particular learning outcome selected here is program driven. Together with the twelve
other specific learning outcomes, it begins to describe the institution's expectations for its
graduating seniors. Carefully, these are not the only public expressions of the goals and ideals.
“The primary educational objective of the computer science program at NYIT is to produce
well-rounded graduates that have a wide range of skills, aptitudes, and interests, and who are
prepared for successful careers in industry and government and/or graduate studies. The
coursework in computer science--comprehensive, rigorous, flexible and prepares students to
solve real-world problems as well as to do fundamental research [sic] – includes:
cover the software and hardware aspects of computer science and foster creativity in
problem-solving skills
liberal arts courses [sic]
courses which provide a concentration of computer classes in an area of specialization
a minor concentration which provides a complement to the option area [sic]”10
Mildly, and without comment, there is a much larger picture as well. Computer Science
curricula has been the subject of a joint ACM-IEEE study.11 At NYIT, connections between the
targeted learning outcome and specific course learning outcomes are available at 12.
Connections to core competencies are available at 13. The overall implied structure is broad,
reasonably mature and well-reasoned. The program outcomes are broadly stated not only to
allow ease of navigation through the framework for a nonspecialist, but also to allow the
deeper expansion and interpretation by any concerned member of the discipline.
Ultimately, the stated learning outcomes at NYIT have the virtues of being short, simple, and
well-organized. Working within larger ideals is understood.
The framework keeps us on track.
To begin to make sense of the relationships for this study, we need to emphasize a necessary
distinction between learning outcomes stated for a particular course and the learning outcomes
stated for an entire degree program. In this case, the assessment study has a single targeted
program outcome. This was selected from the thirteen distinct program outcomes. In the
Nanjing study, there are a sequence of three specific courses connected to this outcome where
the details are sufficiently stable to allow repeated evaluation with similar instruments.14 The
sequence of courses is CSCI 300 Database Management , CSCI 401 Database Interfaces and
Programming, and CSCI 405 Distributed Database Systems. As may be expected, there are
deep connections between the learning outcomes for each course and those for the overall
program. In this particular instance, each course supplies a portion of the practice necessary to
reach competency/mastery. The critical detail that makes this study possible is that all three
courses rely upon SQL, the Standard Query Language.
As a significant side issue for Computer Scientists, SQL is not a Turing-complete
programming language in the usual sense. It lacks a looping construct. It does,
however, reflect a depth of atomic, pedagogical concerns that arise with more
traditional languages. It appears within a programming context, and it could be
argued that SQL currently has a more durable and sustained basis than another often-
recommended language, C++.15
As a useful feature, SQL does lend itself to concise, measurably correct-or-incorrect
responses to problem/task questions offered in a natural language. For example,
given an appropriate shared database design, we can ask a question like, “Find the
ratio of male sophomores to female sophomores among those taking business
courses.” The expected answer can be given in a single SELECT statement. More
complex and deeper probes, all with a brief, precise, and unambiguously correct or
incorrect solution, are possible.
Directly, the first course at NYIT requires that students use the SQL language to design
databases. The second course requires them to build user interfaces. In order to
accomplish any task related to building an interface, students must use SQL
programmatically, not only embedding SQL statements within a language such as C,
C++, Java, or php, but also using the outer language to construct these statements. (In
the catalog description, C++ is stated explicitly.) The third course requires them to
understand and implement techniques appropriate to distributed databases.
Particular implementations, such as MySQL, do allow synchronization support within
the design; however, for a robust understanding of Brewer's Theorem, the students
should have further practice building and comparing synchronization algorithms in a
more general programming context.
The specific methodology for the Nanjing project integrates the assessment of the program
outcome into the usual course examinations for the entire sequence of three courses. From
the student perspective, the only immediate visible change is that a new “examination” is given
on the first day in each course. We expect a full statistical portrait of student acquisition
through the sequence of twelve probes. When producing the final analysis, we might rely upon
the vast amount of collected literature for many things --- including myriad details specific to
this discipline. Auxiliary information, perhaps determined by other course activities, can also be
integrated into the results to explain differing rates of acquisition.
For members of the audience concerned with developing new assessment efforts in other
disciplines, it does seem reasonable to expect similar relationships between course and
program goals. Commenting on the vast amount of peer-reviewed papers on subject-matter
acquisition in higher education (or making sweeping generalizations about all the details of
learning outcomes for other courses and degrees) is far beyond the scope or intent here.
Briefly, there is nothing about the general methodology which is specific to a particular
program or course. That is, the overview for a longitudinal assessment activity really doesn't
change in application to a particular area. Specific features do --- without being dismissive of
those aspects as mere details.
Yet, loosely, it can be expected that many of the individual course events simply build toward
the targeted program outcome.
A longitudinal study simply monitors the rate of acquisition. There is much in this kind of
study that is experimental. For example, in advance of the results, we cannot say whether or
not students acquire proficiency suddenly or gradually. We might expect that hands-on
practice practice facilitates faster learning, but that kind of claim would be best supported as
a result of the study. It should not be assumed as an axiom beforehand.
For the first course, the planned sequence includes a pre-evaluation, a post-evaluation and two
interim probes (for a total of four assessments). The first is given under examination
conditions; however, the students are not required to study and are requested NOT to study.16
The last three assessment exercises are presented as examinations, with grades at risk. The
atomic questions are similar. By design, each probe revisits the same set of issues with similar
questions, close in complexity, form, etc., but altered sufficiently to avoid direct memorization.
For the purpose of generating meaningful gain statistics, there are direct parallels, a bijection
between the set of questions given during one probe and those on the next.
During the assessment day activities at Nanjing, two significant, specific questions were raised
at this point in the presentation. First, what mechanisms were in place to prevent “teaching to
the test” or some other inappropriate focus on the program goals?
The “teaching to the test” concern is rather general and has a valid basis. There are four
assessments per course, and three courses. The students will have seen eleven previous
versions by the time they take final assessment.
If the tests were identical, we would expect perfect results from almost every student ---
as a result of memorization rather than understanding. Here, more-or-less the same
methodology used to prevent cheating/copying during an exam is employed, and it
should be noted that cultural conditions in Nanjing require heroic efforts on the part of
the instructor to prevent inappropriate communication during examinations. Significant
monitoring is required. The students are placed in arranged seating, widely separated,
and all mechanisms for communication --- particularly electronic devices --- are
forbidden. All exam papers are produced using software, and each paper is unique. The
name of each student is preprinted, the questions are scrambled, and many small
details, not affecting the depth of thought required, are changed from student to
student.
The correspondence between instruments offered on the same day for different
students is necessarily precise. The dominating issue there is fairness. The
correspondence between instruments offered on different days for the same students is
less precise. In both cases, similarity between questions is required to make the results
comparable. Ultimately, each probe is not the same.
In fact, some questions evolve in form or difficulty.
It is unreasonable to expect a student on the first day of the first course to explain
connections between elements there have not yet been presented. We expect very
little during the first probe. However, it is reasonable to ask for a writing sample --- to
determine something about how each student might organize their thoughts for
presentation. The ability to express themselves on the first day can be compared to
their ability to express an answer on the last day.
Students will, of course, note the variations and prepare themselves according to what
they understand. There's nothing unfortunate or undesirable about this. We expect
them to learn how to handle general questions.
Second, what motivation existed for the students to take the initial --- pre-instruction ---
assessment seriously?
The answer to the second question is nothing. No motivation is really appropriate. As a
side note, students were excited with being given the first probe. Some of this may be
individual or cultural; however, the explanation given to the instructor was that they
were pleased to be given an early insight into what was ultimately expected.
For the very first assessment, not much is really expected. Student focus on writing
samples and study the format of questions. Curiosity, if nothing else, is invoked.
Outside of these very simple things, receiving a mostly blank examination paper for each
student is not unexpected. In subsequent courses, we expect that examination on the
first day will affirm that students start about where they left off in the previous course.
As always, students tend to surprise us occasionally. There are many precedents for
students studying over the holidays. There may be surprises the other way --- as the
practice of “cramming” for an exam often yields little long term gain.
The observed results, for the very first assessment, mirrored exactly what was
anticipated --- although a few students tried interesting ways of “faking” answers to
questions beyond their abilities.
The Targeted Program outcome
Again, the specific outcome targeted by this study is, “Upon graduation students are expected
to demonstrate” … “An ability to use a variety of computer programming languages and be
competent at least in one high level language.”17 This outcome is attached to many courses.
This makes it appropriate for an institutional study. The perspective here may be roughly
stated as follows:
The learning objectives for individual courses are important. Yet we typically assume
these to be well-served by individual courses. A more appropriate target for
institutional concerns are the goals and objectives which span multiple courses. In the
broader scope, we're merely acknowledging the existence of skills which are acquired
through a long series of courses (or through a entire degree program).
In the formal model offered by NYIT, these are clearly distinct. Without detail, there are
dependencies. Prerequisite course requirements have a significant purpose.
Within the Computer Science program, there are many specific courses connected to specific
programming languages.
Again, the sequence of courses selected for the study is CSCI 300 Database Management ,
CSCI 401 Database Interfaces and Programming, and CSCI 405 Distributed Database Systems.
The selection criteria for choosing this particular sequence of courses simply reflects the
longest course sequence in which the same formal language is used repeatedly and without
controversy. The language under consideration is SQL (Standard Query Language). Students
are expected to learn related theoretical material. They're also expected to use the language
pragmatically.
As a necessary aside for those unfamiliar with programming, we offer a brief overview of the
targeted skill.
Unnatural Language Acquisition
Ultimately, linguistic analogies between acquiring a programming language and acquiring a
natural language break down rather quickly.
Typically, programming languages have an extremely small number of well-defined
keywords. Within SQL, the first four are SELECT, INSERT, UPDATE, and DELETE. CREATE and
ALTER statements follow quickly. The first aspects of grammar are extremely well-defined,
with detailed atomic interpretations. There are WHERE clauses, conjunctions, etc. which
allow the recursive construction of larger statements.
Student typically view the machine interpretation and related constraints as “unforgiving.” In
contrast to natural languages, a misplaced semi-colon renders a long passage meaningless or,
at times, dangerous.
The range of correct responses to any question in this area may be interesting. In brief,
computer programming languages have an extraordinary linguistic flexibility as well as a
capacity for abstraction that defies simple expression. We can be a bit more technical for
that part of the audience in disciplines with overlapping skills. Up to a level of identifiers,
many programming languages are covered by the usual Chomsky hierarchy. In fact, many are
context-free. This makes them appear simple. In fact, the apparent simplicity is convenient
and forms the basis for lexicographic parsing. The human reality is something else. As we
begin to include identifiers, every programming language with significant power takes on a
generative aspect which dominates practical use. In brief, the language has the expressive
power to define, effectively, new nouns. (This is an analogy, but it works well enough.)
Reading becomes an exercise in decoding the references, impossible to do quickly unless one
can quickly absorb and discard entire vocabularies. With some programming languages, the
generative capacity is effectively unlimited. Not only do we have the equivalent of new
nouns, but also verbs, clauses, and, to a very large extent, entirely new grammars.
Effectively, in a single step, we move far beyond Chomsky and functional grammars. As an
example, template programming arose out of an unsuspected capacity of generic
programming to express an algorithmic language within itself, causing the compiler to
execute arbitrary tasks. Handling and understanding the expressive power requires a range
skills. We can even redefine the usual infix notation of mathematics, approaching abstract
graduate-level algebra from another direction.
Certain principles and best practices do force a kind of convergence to particular expressions.
As a result, some issues of originality aren't easily evaluated. Most reasonable responses by
professionals to the task, “Design a brief database for handling a music collection.” would tend
to be very similar in appearance. There are only so many meaningful synonyms for artist. It is
generally true that variable names are not visible to the end user, and that one could substitute
the nonsense words for well-selected variable names and a working answer would continue
to work. However, it is a fallacy that issues in communication can be ignored entirely --- even
if the only objective is getting an application to function as specified.
Larger issues at play. One of the most difficult requirements for beginning
programmers to master is the absolute, critical necessity of explaining what they've
created.
Almost all programming languages offer a facility for commenting. That is, arbitrary
material expressed in a natural language can be placed within the code to explain
what it is and what it does. Stylistics intrinsic to the programming language also begin
to appear early on with formatting and name selection. Much of this is simply not
optional. Literate programming, as championed by Donald Knuth, discourages calling
variables, functions, and subroutines A, B, C, etc. Meaningful names are required.18
Not following these kinds of guidelines, students produce naturally obfuscated
material.
Even with an early emphasis on teaching commenting, novice programmers routinely
face a critical stage where they have difficulty reading and understanding their own
programs.
There is a glass ceiling in place. They cannot improve their own results beyond a
certain level until they can read their own results.
The necessity of interaction with natural language is immediately apparent on a another
level. As instructors, we have to specify tasks. That is, we begin an assignment with “Write a
program which . . .”19 There is some typical stylization and a certain rigor20 as to how this is
done; however, the assignments are ultimately given in a natural language.
For novices, there just is no other place to start.
The machine may be unforgiving, but the human audience is much harder to reach. As
beginning programmers reach toward handling challenges which are not designed to teach,
this particular issue is replete with many difficulties. Communicating algorithms, negotiating
specifications, etc. are overarching skills which connect many courses. It fair to say that the
ability to read and write computer programs has a very strong relationship with the ability to
read and write in some natural language. Arguably, the ability to read and write
mathematical proofs may be a more precise analogical description of the overall skill. In fact,
the there numerous connections with problem solving.
Neither is entirely accurate.
In order to begin programming, the students do have to be able to read a task specification,
often much shorter than a paragraph. This may be deliberately vague. For example, “Design
a database to store the contents of your music collection.” They have to read and analyze
that request with appropriate tools of language in order to be effective. (An example of noun
decomposition will follow in another section.) Task specifications often appear to be natural
language. They aren't --- not in the usual sense. Normal vocabulary is used differently and
understood differently.
For example, the concept of age is interesting, and is often used as an example in
database design and normalization.
Storing the age of an individual is bad practice. In brief, the number changes with
time. The data becomes obsolete and questionable --- even over a short period of
time. Storing the birth date is more appropriate. With that information, age can
calculated. Updating and refreshing the information is not needed. The data remains
durable.
A human being has exactly one birth date; however, a requirement for this
information in the wrong place may introduce insertion anomalies. That is, we may
know much about an artist, but not the birth date. A requirement in the wrong place
would prevent the user from storing what data is/was available.
What all of the above boils down to is that, when a question requires that age be
stored in a database, there are professional expectations concerning exactly how this
is done. The concept of age, the word age, should trigger those associations.
Restating, the mental processes in programming often connects certain principles to concepts
acquired in childhood. The students need to use natural language at an extremely high level of
conceptual understanding in order to acquire the skill because we, as instructors, have to rely
upon the students having the basic concepts, the vocabulary to discuss those concepts at a
high level of abstraction, and the general flexibility to expand on those concepts.
The relative abilities of students as far as absorbing new material is perhaps less important
than their speed in applying that material. Many students learn best from hands-on practice.
Trial and error. If a student doesn't write a syntactically correct command, the compiler or
interpretor will generate an error. In brief, conclusive feedback can be incredibly swift in a
hands-on environment.
Another part of the desired literacy skill is connecting to programming examples. Given
example material, students can often “see” what a program or statement does in terms of
testing it out on their home computers. Understanding the internals sufficiently well in order
to create variations takes time and effort. Much of the assimilation and accommodation
process involved can be elided --- if and only if they can simply read and understand the
comments surrounding the example. In brief, we expect students with strong natural language
skills to show a faster rate of acquisition.
However, there's another significant disconnection from natural language acquisition. The
meta-cognitive skills in learning any subject involve the specialized vocabulary necessary to
describe the process of thinking and learning. For general subjects, we describe certain tasks
as involving assimilation or accommodation. These don't arise in the first year of
undergraduate courses. At least not too often. In contrast, for computer science, as students
proceed through through even the simplest of material, they find confusing and contradictory
labels related to almost every skill they're acquiring. Some of this can occur because the
writers may be striving for a certain kind of accuracy at the expense of clarity. Yet, much of it
occurs because the author or authors are simply foreshadowing the much deeper meta-
cognitive material to come in subsequent courses. In particular, Software Engineering is
deeply concerned with making both the individual programmer and teams of programmers
more productive. There's a certain connection between thinking and expression which can
make the construction of new material go much faster.21 As a result, phrases such as object-
oriented programming, design patterns, and the Turing computational model may appear on
the first page of any given introductory text.
Some of the terminology which appears very early merely offer an extremely detailed
labeling, often distracting to the simple task of getting started. For example, I find limited
use on the first day for the phrase lexical scope resolution operator.22 Within the first
chapter, words such as class, pattern, and object may be redefined and used very differently
than they are in the standard language. Theoretical Computer Science has a equally
important impact on common framework of abstractions. Theoretical and conceptual
abstractions, such as asymptotic analysis also tend appear very early.
There is ongoing debate within the field as what should appear first. The complexity of initial
acquisition is also compounded by the need to use different verbs and labels to describe an
inhuman audience and avoid a misleading anthropomorphization. (Programs are not merely
produced, they're consumed. In human terms, they're read. In machine terms, they're
parsed.)
Design of the Assessment Instrument
The exact instrument given to the students is roughly ten pages to eleven pages in length.
The targeted program outcome, competency in at least higher level programming language, is
subdivided into component pieces, testing smaller elements, some repeatedly to insure that
little guesswork is involved in reaching an answer. The series of questions within each
instrument is relatively comprehensive with respect to SQL. More than one question probes
the ability to form precise SELECT statements. Other questions address the formation of
DELETE, INSERT, and UPDATE statements. In the first three instruments, these commands are
heavily emphasized. Later probes in the sequence emphasize other elements, for example,
CREATE and ALTER statements. By the third course, there will be questions addressing the
formation of SQL queries within one or more higher level languages.23
Most questions were designed to directly test the ability to express commands (using SQL) to
accomplish specific tasks related to storage, retrieval and adaptation of a database.
Discipline-specific arguments are necessary to support which items are included and which are
omitted. Largely, what separate a database design from a simple spreadsheet is the use of
more than one table. Mastering queries where the required responses involve more than one
table is fundamental for competency in SQL. This is a sample question to test for that skill:
Suppose you have the following data schema:
Product(maker, model, type)
PC(model, cores, speed, ram, hd, rd, price)
Laptop(model, cores, speed, ram, hd, rd, screen, price)
Printer(model, color, type, price)
Write an SQL query which retrieves all the makers of printers with prices strictly less
than 1000.
Certain technical and pedagogical criticisms are possible. With respect to the stylized format
(the data schema approach)24, primary keys are not marked. Just as we might usually assume
a certain vocabulary, we can also usually expect that students can infer what the primary
keys are. The expected solution process requires this information --- along with other
necessary details --- for example, that the model codes for printers are distinct from model
codes for laptops. The natural language aspects can be discussed at length.
For now, at least one variation of this particular question is planned to appear on every
instrument in the sequence. The exact wording may change within even a single offering.
Minor variations, designed to discourage cheating and memorization, alter the word printers
to laptops or ask for color printers, etc. We do expect that all of the students will learn other
things and there are other course issues at stake. Later versions --- especially those in
subsequent courses --- might have a different set-up, perhaps employing a different kind of
database specification. In the second course, students may be asked to write functions in
another language to produce the needed response. By end of the final course, the question
may be offered electronically within a controlled environment --- also testing for speed of
response. We expect a high standard at the end. Over time, we expect to see the
percentage of correct answers rise. We also expect that individual students will get faster. A
student who is competent in SQL should achieve mastery of this particular item along the
way.
No matter what vehicle is used, the answer to absolutely every variation of this particular
question requires the formation of a particular, syntactically correct, functional SQL query –- a
well-formed SELECT statement joining two tables.25
In a similar way, every question in the sequence of instruments have, by design, direct, strong
relationships with questions to appear later. Not all of these relationships are as precise as
what is planned for the sample question; however, they all support analytic comparison of
atomic results. These links between the questions offer us much more than an insight into
what the students are learning, the common threads allow us to monitor the progress of
students over time. We want to know how fast they're learning, and how much they're
learning. This is the point of any longitudinal study.
As may be expected, the format of each question is intended to help separate and isolate
each precise skill we want to test.
At times, the ideal of isolating skills is better accomplished with coordinated question
sequences within an instrument. In brief, a well-designed instrument for this type of study
anticipates the need to explain student failure with less ambiguity than a normal examination.
Briefly, a student who can answer the questions has mastered the related skill. What we would
like to claim, with as much authority or evidential weight as possible, is that any student who
can not answer the questions correctly has not mastered the skill. Concretely, we can usually
expect that a good computer science student would know what the terms laptop, PC, or
printer mean in this context. However, an otherwise outstanding student might occasionally
stumble over vocabulary --- particularly if he or she was learning English as a second
language. This is simplest way to explain why each instrument also includes one to two
addition questions to lightly assess natural language skills.
These specific question threads serve multiple purposes, including the need to measure a
different aspect of competency with programming languages. Ultimately, the planned
progression of writing sample questions are tailored to target two specific meta-cognitive
skills involved in competency:
The first is the ability of individual students to interpret programming requirements;
the second is the ability of individuals to document their results, participating
appropriately in the kind of group work where one might never even meet one's
collaborators.
Ultimately, the assessment requires some validation in the sense we must attempt to assert
that most students can understand the questions.
In the first very first instrument, there were two questions merely asking for writing samples.
The first asked for a response in English. The second asked for a response in Chinese. (Later
instruments drop the request for a writing sample in Chinese, and direct the English writing
sample away from elementary issues and toward specific topics of concern in databases.)
The results of that sequence of questions may enhance any attempt at explanation for a given
student who has a significant delay or complete failure in achieving mastery of any other skill.
In practice, there are numerous interconnected skills involved in even the simplest task. Within
just the one sample question given above, there are significant asides and peripheral
considerations. As a claim, the natural language concerns are the most important. Nearly
every method for probing competency in computer programming is affected by natural
language skills.
Additional material is given below is given below to justify the writing sample questions. In
simple terms, studies at the Nanjing campus are affected by English and Chinese language
considerations in many ways.
Well-considered rubrics for scoring the questions are also important to the design of testing
instruments.
Some additional insight into SQL, databases, and general programming is necessary before
we proceed. Outside of a teaching/learning context, it isn't quite natural to have the pertinent
details predigested in any form, let alone the specific, brief form given above. Directly, the data
schema given above abbreviates what would otherwise be an extremely lengthy specification.
Much is stated. Much is implied. A few relationships might need to be inferred. Here, as
with standard story problems for engineering or mathematics, a stylized format offers the
necessary simplicity to put the question within reach of beginning students. Some of the
simplicity does come at cost in precision. Thus, we specifically need to communicate a few
details about test scoring in this context.
The students should, at times, write down the inferences they make about the
relationships as part of the answer --- in much the same manner as a student might
“show work” in answering a mathematics problem. The additional material is not the
answer, but it supports the answer and makes the reasoning transparent.
Some of this expectation reflects an instructor/grader prerogative: Partial credit for a
wrong answer is possible if and only if the nature of the mistake is both obvious and
not integral to the direct skill being tested.26 Other policies may be important. An
explanation serves as insurance. Full credit for an unexplained answer, even it is the
only correct answer anticipated by the instructor, is not always assured.
Generally, for any subject, the process of designing good test questions is a bit of an art. There
many metrics of quality. Clarity is one. Accuracy in reflecting real-life concerns is another.
There are others. Part of the process in designing questions for these efforts anticipates
criticism from general sources. Best practices make this a long-term, public study --- at least
within the NYIT community. The entire sequence, including all students responses, will be
archived, and placed into the shared pool of material at NYIT upon completion. (Interested
faculty may request source material for all delivered instruments during the study.) Versions
without redacted identifying student information should not be released outside the
university.
Natural Language Considerations
Ultimately, the targeted program learning outcome simply involves a very high degree of
natural language competency. With any programming problem or exercise, the task
specification usually omits much more detail than the given sample question. A larger task
would be deliberately vague and success would require practiced problem interpretation.
We can offer very specific supporting material here.
A key example is the technique of noun decomposition as described in 27. This simple
approach (or a variation) is usually introduced in first course where object-oriented
design appears. (At NYIT, this is the very first course in the Computer Science
sequence.)
As a pointed example of application, we may require that students design a database
to store information about their music collections. This technique would direct the
students to focus on the whatever noun phrases appear in a lengthy natural language
description of the task.
The likely outcome would include the identification of things such as title, artist, year,
etc. All of these would likely appear as fields in any good student solution.
Many of the most useful tools for turning a vague problem statement into an appropriate,
effective design are directly connected to natural language tools. Deeper analytic techniques
appear in Software Engineering courses. The very naive noun decomposition technique is a
starting place, and the actual skill is closer to an art. Common knowledge is assumed. So is
common sense. A concept such as artist can further decomposed into name, age, etc. We
don't expect a student to continue the analytic process indefinitely; however, the depth of
analysis can vary --- and there are meta-cognitive skills at play in communicating the limits we
expect.
To connect this with learning outcomes, a student unfamiliar with the concept of a noun
would be at a disadvantage in understanding the noun decomposition technique. Yet we
don't consider English language skills as prerequisites. Like mathematics, Computer Science
offers a certain independence from specific natural languages. Programmers routinely share
source code without sharing a natural language.
As noted earlier, there are additional questions to probe natural language skills. We make no
assumptions about the results beforehand, and comprehensive assessment is not possible.
One intent is simply to correlate relative progress with component skills derived from one
perspective and an embedded, overarching skill derived from another. A second intent
would to expose elements that could better explain encountered failure or delays in
acquiring competency with SQL. We make no assumptions or predictions about the results
beyond the acknowledgement that language issues can impact the progress of even the best
of students. The impact can be either negative or positive.
As a practical matter, the students at the Nanjing campus are learning English as second
(sometimes third) language and proceeding without a true immersion environment. With
instruction delivered in English, lectures in a non-English subject serve to support practice in
two separate skills simultaneously. The connections are clear, nearly self-evident, and not
always discouraging. Students learning a second language may be more familiar with parts of
speech, grammar issues, and meta-cognitive skills than monolingual students who've been
unchallenged in these areas. There is no evidence to support whether our students are
ultimately better or worse than those proceeding from a different background or those who
emphasize different study methods.28 We have problem students. We also have students
with flawless, enviable language skills. We can be illustrative of many suspected issues,
although most are limited in scope. A few of our students do lack vocabulary, avoid listening
and reading practice, and rely upon electronic dictionaries and the whispers of friends during
lecture. There is a reasonable concern whether or not these particular students have the
essential language skills necessary to succeed in coursework --- or the capacity to develop
those skills concurrently while attending courses where English is not taught directly. Much
of this is simply beyond the scope of the current assessment study. The challenges of
teaching in a global program are often unique and ephemeral, and the requirements for clear
instructor-student communication are fairly significant in any course.
The lingering concern that has to be addressed with respect to an assessment study is
whether or not the local language issues may skew the results. Further, we need to at least
consider different ways to design instruments to minimize language concerns.
Some very precise, unambiguous material could be delivered through other mechanisms.
With respect to the sample question given earlier, the set-up could have been expanded to
include the four specific CREATE statements needed to build an example of the implied
database. This is something we expect the students to produce and use; however, providing
just the four CREATE statements as part of the problem statement would increase the length
of the question to more than two pages, with much of the presented detail both irrelevant
and distracting. Doing this --- even just this much --- also introduces a much deeper reading
skill. More to the point, the skill we're driving toward is the linkage between natural
language and precise, mathematically unambiguous material. If we could assume student
mastery of this at the beginning, assessment would be irrelevant. Ultimately, removing the
concerns of stylization from this subject may be an inappropriate goal. In the simple world of
learning exercises and test questions, pool tables are frictionless, players in a market economy
behave rationally, and the simple relationships are, well, simple. Some stylization of test
questions is unavoidable in almost any subject. Here, it may be a positive aspect. While we
don't want to encourage distracting test skills or “plug-n-chug” responses, an anomaly of this
subject is that problem stylization is eerily reflective of the real-life difficulties in working
with databases. That is, some of the student burden conveyed in the sample problem offers
precisely the right level of challenge for practical work.
Real databases don't look the toy models we offer for exercises and test questions. Yet, real
databases are always toy models of something tangible in the outside world. There are
abstractions of abstractions at play here. Consider the sample question again. The necessary
set-up for the problem requires a shared perspective on a small portion of an entire database
--- a very difficult thing to convey quickly and unambiguously. Someone proficient in SQL
might pause before answering this kind of question because that person would be used to
obtaining needed details by working with the database directly. The table and field names in
an existing database --- metadata --- is typically the only reliable resource available reflecting
the designer's intentions. (Documentation material, when it does exists, is often poor.)
Meta-data only offers short labels. The names selected by the designer might imply many
unstated, but critical relationships. Not all names are well-chosen. Poorly chosen names
encourage discrepancies, abuses, etc. An experienced professional would expect to spend
significant time studying a real database, probing and testing to verify the actual
relationships, before working with it. Thus, there is a need to understand how other people
might interpret the same terms. In most cases, professionals would consider a strong
vocabulary, contextual interpretation of term, and inference of intended relationships be
inseparable aspects of the greater skill of working with databases.
The same argument holds when we consider that the database specification could have been
given with lengthy natural language explanations of all the terms and relationships as used in
the problem. Beyond the argument given above, there is a specific local concern. Lengthy
problems in English just don't work in this environment .29
As an alternative, dictionaries could be provided. That is, we disregard the problems and
offer standardized help. Providing complete dictionaries for every student simply goes
beyond our resources. In fact, the instructor burden for providing partial dictionaries is
unreasonably high. Yet, the most appropriate response here is that this proposal would alter
the testing proposition. An instructor-provided, partial dictionary becomes an expected
crutch, and, when omitted, becomes an excuse for not acquiring course skills. When
routinely included, it forms an excuse for not acquiring vocabulary --- in any language.
The particular issue at the Nanjing campus is acquiring a second language concurrently. The
English language vocabulary for our students is not as strong as it would be among native
speakers of English and not as strong as their Chinese language vocabulary. Thus, a further
variation on this approach consider translating dictionaries, offering definitions of English
terms in Chinese. Note that there is a major distinction between providing a dictionary and
allowing the students to provide their own. Here, for this study, we do allow students to
bring in non-electronic translating dictionaries --- carefully checked for crib sheets and other
inappropriate material. As a teaching opportunity, the time required to look up terms is
more than significant in a test-taking environment, and the students themselves have begun
to discover that this isn't the most practical approach.
Simple pragmatics mandate that we ask brief questions where brief answers are satisfactory.
We simply need the students to understand the questions --- quickly and in isolation from
most other skills. Returning to the concerns of assessment, we could consider asking
assessment questions bilingually or entirely in Chinese --- or whatever language the students at
hand feel most proficient in.
It's easy to make the case that this may not be appropriate for instruction. Ultimately, there
are circumstances where this approach may serve needs of assessment much better. Bilingual
test preparation increases the instructor burden and defeats the one of the major purposes
of instruction in English. In this learning environment, this could much worse than simply
confusing. The mere existence of well-design test instruments is important with respect to
English language acquisition. There are cultural issues with respect to how the students here
approach learning English --- anecdotally, they spend enormous effort preparing for
standardized tests, and less effort in conversational or writing practice. In brief, testing in
English is a major motivation.
For this study, the intent of designing the major assessments instrument in Chinese (or
bilingually) would be to isolate their acquisition of SQL from their ongoing acquisition of English.
That is a valuable goal, and this requires serious consideration. The immediate problem (and
the depth of that problem) might be surprising. Students at the Nanjing campus encounter
far greater difficulties in using Chinese in connection to computer applications than they do
with English. In fact, the central focus of any database course in Chinese would necessarily
very different. In brief, by using Chinese as the sole language of instruction for acquisition of
SQL, we'd be adding numerous linguistic requirements which would be difficult to support
robustly.
While Oracle and MySQL implementations of SQL do offer support for Chinese (and, more
directly, Unicode), there are still repeated difficulties with the variety of encodings, fonts, and
expressions. Both natural Chinese character sets, traditional and simplified, are much, much
larger than usual Latin character set used for English (and European languages). In brief,
there are more than 256 distinct characters. Working with the written Chinese language
directly using the 256 character possibilities for byte storage is possible. However, storage of
one character per byte is not possible. At a fine-grained level of analysis, UTF8 is a precise,
unambiguous, industry-standard solution for handling Chinese strings where a quick bijection
into byte storage is needed. Yet, retrieval and display in a general context has been
incompletely standardized. UTF8 storage solutions are straightforward; however, general
tools expose numerical codes rather Chinese characters. These numerical interpretations lack
human-readability. Thus, maintenance of databases without using specialized tools is
problematic. The better solutions are both proprietary and expensive.
One can translate Chinese written with either the traditional or simplified character set into
human-readable material using the Latin character set as a basis. There is a standard for this.
However, it isn't the sort of system that one would expect native speakers to be completely
comfortable using. Directly, Chinese language can be rendered in Hànyǔ Pīnyīn, and version of
Pīnyīn without diacritical marks we may refer to here as Pinyin.30 Hànyǔ Pīnyīn is phonetic ---
in fact, the word Pīnyīn has a denotation consistent with the English word phonetic --- and the
consturction of is relatively recent, although the typographic decisions about where to place
the diacritical mark has the kind of arbitrary complexity one expects out of any language
artifact with a significant history. The general process, called Romanization, is useful applying
older, legacy material (based on the old ASCII standard31) for general string handling to
languages with non-Latin character sets. It's also helpful in the language acquisition process
for non-Chinese speakers.
For native speakers of Chinese, the issue is more complex. Pinyin, as opposed to Hànyǔ
Pīnyīn, is not phonetic. Pinyin uses the Latin character set, but an individual word rendered in
Pinyin could be the result of several different words in Chinese. The four significant diacritical
marks in Hànyǔ Pīnyīn indicate tone. Mandarin has four, sometimes five, tonal forms, and the
tone is necessary to complete the meaning. Chinese is typed using Pinyin, adding to some
confusion over what a native speaker may and may not know. The initial of characters struck
typically yields several options for the precise Chinese character intended --- either simplified or
traditional --- and the user must select from the choices available using a mouse and/or arrow
keys. Like the automatic spelling correction offered for English SMS typing on intelligent
phones, the software is forgiving of mistakes and offers choices which are not restricted to
what was requested. Specifically, the software would typically include those options which are
more likely to be what was intended as opposed to what was requested, with algorithms for
the process using automated inferences based upon preceding characters. That is, one could
mistype the Pinyin for a particular character and still get the correct character as option. All
of this should help to explain the following repeatedly-observed phenomenon:
Some of the freshmen at the Nanjing campus of NYIT, functional typists on US
keyboards and intelligent students, have occasionally demonstrated difficulties with
spelling their own names in Pinyin.
The issue is minor, but underscores a general claim that Pinyin is not a natural way of
typesetting Chinese. Written Chinese material, using either simplified or traditional characters,
can be transliterated automatically into Pinyin. The reverse is more difficult. For non-Chinese
speakers, a good analogy for the human effort might be rendering English phonetically using
the Greek alphabet and then removing the vowels. Even this might not capture the right level
of difficulty. The diacritical marks are significant. Like unpointed Hebrew (points indicate
vowels), sufficiently long passages in Pinyin (without diacritical marks) can be understood by a
native speaker. At least eventually. Unlike unpointed Hebrew, this isn't at all natural.
Ultimately, short passages in Pinyin can be decrypted by an intelligent native speaker with only
modest effort, but regular reading or production would be exhausting without practice. With
automated translation, the reverse holds. Short passages would likely be ambiguous; long
passages would be simpler.
Not knowing where this report might be disseminated, the following conclusion/assertion
may be badly misinterpreted if taken out of context: English is the simpler choice.
Ultimately, technical courses in SQL taught by native speakers for native speakers are very
different than the NYIT sequence. Those courses need to address a wide range of topics
directly connected to the encoding and display problems at the outset. Some instructors use
localized solutions, and many students are handcuffed to specific technologies during
instruction. In other words, they aren't learning SQL in a general context. Instead, they're
learning Oracle, Microsoft Access, etc. specifically and directly. Many of the issues with
respect to SQL are similar, but competency in SQL is less a concern than general functionality
with the tools.
The end result is that using the Chinese language for assessing SQL acquisition is not
appropriate. We cannot present problems for assessment which require skills far beyond
what we're teaching.
Assessment Day at the Nanjing Campus
The agenda for the Assessment day at the Nanjing campus include an informal introduction
similar to the above, along with a handout provided by New York.32 All available faculty were
present.
An early part of the presentation described what is NOT covered by the notion of assessment.
Generally, the point was offered repeatedly that assessment activities at NYIT aren't
threatening to a particular professor or student. NYIT, like many other institutions, makes a
careful effort to safeguard its integrity in recognized ways. For example, the individual
responses to course evaluations, sensitive material, are carefully separated from individual
responses to examinations, graded material. More precisely, students are aware that
safeguards exist to ensure that their comments can be made freely and that their grades are
not at risk.
For an audience of faculty, grades aren't the issue.
As may be expected, some concerns were raised, and it should be noted that there were many
sensitivities in this area. Specific sensitivities centered around a general concern that
assessment studies might be used as yet another mechanism to evaluate instructor
effectiveness. There is some validity to this concern. Assessment activities span multiple
courses. They measure the student acquisition of material. They're often designed to provide
comparable data. So, the potential exists. Yet, one may or should expect that the human
resource procedures for faculty, specifically those involving retention, raises, advancement,
tenure, and other sensitive aspects of employment are simply independent mechanisms. At
NYIT, these are not connected to assessment studies.
It was generally accepted useful assessment studies can be conducted in a manner in which
instructors are not at risk. There was no other general resolution to these concerns other than
an acknowledgement that assessment is a maturing process emerging in higher education.
Part of the thesis presented was that both initiation of and participation in assessment activities
at NYIT lie well within the capacity (and capabilities) of any individual instructor at any time,
and, ultimately, assessment at NYIT is not a mysterious activity. It should be noted that many
faculty were less familiar with the program learning outcomes and that effort should be
expended in disseminating this material routinely through other departmental and campus
initiatives. The faculty generally acknowledged that student assessment within a single course
is already the responsibility of the instructor. Typically, grading has a legacy of meaning and
certain unique features. It may be considered almost a separate process. One distinction as far
as assessment of the institution is that all currently administered examinations for grading
serve only course-related goals. Thus, most assessment initiatives should avoid redundancy
and focus on the issues which affect multiple courses. The distinction between course and
program learning outcomes was discussed at length, and details of the NYIT framework were
presented.
To some extent, competency, even expertise, in the design of assessment instruments
appropriate to each course is expected from the assigned instructor. There were few questions
in this area.
After a break, audience participation devices were provided and a demonstration of rapid
assessment was provided. The demonstration addressed a significant local issue with students
being unwilling to raise their hands during lecture. The demonstration also provided a basis for
discussing how quickly and seamlessly assessment activities could integrated in a course.
Following an instructor question, the limitations of what's possible were discussed.
Sometimes not everything “fits” within a narrow framework of measurable outcomes, and good
instructors may motivate the subject in diverse ways. As a result, students do expect that
presented course material reflects an understanding of how the course objectives may connect
to a larger picture. Yet, what's not expected, outside of assessment efforts, is that each
individual instructor repeatedly attempts to measure all the program outcomes directly, putting
grades at risk33 for broad overarching skills, perhaps incompletely mastered and tangential to
the specific subject at hand. Many examples are possible. While mastery of certain technical
jargon is expected in almost every course, simple logistics prevent testing all language skills
robustly, repeatedly, and comprehensively in every course. Even if it were possible, the
proposition goes far beyond the implicit institution-instructor-student contract. Paraphrasing a
discussion point, many instructors would be uneasy with the practice of putting grades directly
at risk34 for English comprehension skills in a Chemistry35 course; yet, they would be relatively
comfortable with putting those same grades indirectly at risk. That is, a certain fluency is
expected in order for students to follow exam instructions and demonstrate what they know.
The day concluded with a long discussion of local issues.
Significant topics included upcoming examinations and concerns with how to handle incidents
of cheating and plagiarism. A general policy for handling cheating was discussed, leading to a
decision to collect material on specific incidents so that repeat offenders could be tracked
through th dean's office. A policy for plagiarism was also discussed. No significant resolution
was made.
After the meeting, materials relevant to that discussion were found on the New York website
and disseminated to faculty.
Preliminary Report on Assessment Activity Results
Briefly, the number of students in the sample (44) is too small to make conclusive statements.
The material here is suggestive and individual strong performances tend to dominate the
results. Further, it is too early in the sequence of assessment instruments to provide detailed
statistics such as regression tests.
Three out of the four instruments for the first course have been administered.
Some early analyzed results are available.
1. Students in the first stages of the first course in databases are acquiring significant
practice. Two students are already close to to having the entire range of skills involved
in the targeted program outcome.
(That is, at the time of the third assessment in the very first course, these two students
are already nearly where we would expect them at the end of the third course.)
2. With a small number of exceptions, all students have demonstrated significant gains
between the initial assessment and the second assessment. Less significant gains were
made between second and third assessments.
As an issue, many students are converging on the needed atomic skills, but have
problems with small details. Small variants in how partial credit is applied do not
change the overall picture greatly, but affect the statistical metrics. Roughly 83% of the
students are making progress beyond the initial prediction. A small number of students
are far behind. Individual attention has been directed to these students.36
Close analysis of individual test results from the third administration reveals that many
students attempted to memorize responses to earlier versions of the same instrument --
- rather than respond to the questions at hand. This has been discussed with the
students directly, and the fourth instrument in the sequence is being adjusted
accordingly to make this less likely.
3. The ability of the students to express themselves with the English language is modestly
correlated with their ability to express themselves in Chinese. Neither is significantly
correlated with their ability to acquire a programming language --- at least in the initial
phases.
Difficulties in properly assessing Chinese writing samples, the inconclusive nature of
those questions, and the limited space and time for questions of this nature during
examination had lead to a decision to drop this series of questions.
The English writing sample question is being modified as planned, and now targets
course material rather than general topics.
4. The students are acquiring significant English language practice concurrently with the
course.
A peripherally administered instrument, a standard word analogy in English, with
questions similar in form to those offered by the GRE --- albeit simpler, was
administered through a audience response system.
Over a period of nine weeks, the body of students went from an audience average of
24% correctness to 56% correctness.
5. Student feedback through direct discussion has been extraordinarily positive.
Note that all courses present the course learning outcomes on the syllabus. Here, the
first day of class required an explanation of the assessment schedule, the targeted
program outcome, and a description of how this material was to be embedded into
regular examinations. Much of what was discussed covered exactly what did and did
not affect their grades. Student feedback through Likert probes (administered through
the audience response system), that same day was mildly favorable; however, individual
students came forward after the first instrument to express gratitude over being shown
“exactly what we need to learn.”
Informally, they indicated that neither the discussion nor participating in the actual
assessment would have been sufficient. Both made the way forward through studies
clear. This may help explain the significant gains after the first assessment --- although a
conclusion is not yet possible.
In brief, presenting them with the the framework of what they're expected to do, the
connections between all learning outcomes in their raw form, is useful in more than one
way.
6. The final instrument for this course is scheduled for early June.
Analysis and conclusion should follow by July.