automatically grading programming assignments with web-cat stephen h. edwards virginia tech dept. of...
TRANSCRIPT
Automatically Grading Programming Assignments with Web-CAT
Stephen H. EdwardsVirginia TechDept. of Computer [email protected]://web-cat.sourceforge.net/
Stephen Edwards Virginia Tech
My goals today are to …
Explain how requiring students to formulate and test hypotheses about their own code can improve their understanding and performance
Describe our experiences with an alternate grading approach supported by a new tool: Web-CAT
Describe some of the flexibility in Web-CAT for supporting other approaches
Convince you software testing can be an important—and practical—addition to classroom practices
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Students hold onto ineffective techniques
Too often, intro students believe that if their code:
compiles, the errors are mostly gone
runs correctly when I try it once, it is correct
runs on the instructor-provided sample input, it is correct
has a problem, it can be fixed by trial and error
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
What is reflection-in-action?
For an expert, when the current technique is failing …
Step back and reflect: “I must be missing something”
Re-examine the situation, your solution, and your implicit assumptions about the problem
Leads to guesses (hypotheses) about why the solution isn’t working or why something else will be better
“[Carry] out an experiment which serves to generate both a new understanding of the phenomenon and a change in the situation”
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Practicing software testing will help students frame and carry out experiments
The problem: too much focus on synthesis and analysis too early in teaching CS
Need to be able to read and comprehend source code
Envision how a change in the code will result in a change in the behavior
Need explicit, continually reinforced practice in hypothesizing about program behavior and then experimentally verifying their hypotheses
Stephen Edwards Virginia Tech
Student comments suggest their current testing practices are often weak
I run them through some simple tests to ensure that it is operating as expected. But for the most part I have always relied on supplied test data
I don’t think about test cases until I am confident my program is 100% working. Of course, it almost never is …
I usually write the whole thing up and then start doing rapid-fire tests of everything I can think of.
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
A comprehensive strategy is necessary for a culture shift in what students do
Students cannot test their own code
Want a culture shift in student behavior
A single upper-division course would have little impact on practices in other classes
So: Systematically incorporate testing practices across many courses
CS1CS1
CS2CS2
OODesign
OODesign
DataStructDataStruct
TestingPracticesTesting
Practices
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Expect students to apply their testing skills all the time in programming assignments
Expect students to test their own work
Empower students by engaging them in the process of assessing their own programs
Require students to demonstrate the correctness of their own work through testing
Do this consistently across many courses
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
What tools and techniques should I teach?
We want to start with skills that are directly applicable to authentic student-oriented tasks
Don’t want to add bureaucratic busywork to assignments
Without tool support, this is a lost cause!
It is imperative to give students skills they value
… But most textbooks only give a “conceptual” intro to idealized industrial practices, not techniques students can use in their own assignments
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Test-driven development is very accessible for students
Also called “test-first coding”
Focuses on thorough unit testing at the level of individual methods/functions
“Write a little test, write a little code”
Tests come first, and describe what is expected, then followed by code, which must be revised until all tests pass
Encourages lots of small (even tiny) iterations
See http://web-cat.sf.net/ for on-line references
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Students can apply TDD in assignments and get immediate, useful benefits
Conceptually, easy for students to understand and relate to
Increases confidence in code
Increases understanding of requirements
Preempts “big bang” integration
Stephen Edwards Virginia Tech
The problem is devising an effective assessment strategy
Need to assess student performance at testing
Need to give productive feedback
Need to provide rapid turnaround
Cannot afford huge increase in resources required
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech
Conventional automated assessment does not encourage good testing habits
Student uploads program
Program is compiled
Executed against test data
Scored based on output
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech
The conventional approach provides useful benefits that do lead to a cultural change
Fast, precise feedback to students
Chance(s) to improve based on
feedback
Good assessment of behavior
Systematic use resulted in culture
change Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech
But the conventional approach may discourage desired behavior and skills
Focus is on output correctness, first and foremost
“Get it working first, work on commenting, structure, etc. later”
Students not encouraged or rewarded for testing on their own
Students often do less testing
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech
Proper grading and feedback can provide positive incentive for desirable behavior
Decide what behavior to foster Choose a corresponding
scoring/reward system Design feedback approach Use students’ adaptive nature to
drive cultural change
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Proper grading and feedback is critical to reinforcing desired behavior
Assess test validity: correctness of student’s tests
Assess test completeness: the “thoroughness” of student’s tests
Assess program correctness: behavior of student’s solution
Multiply scores as percentages
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
0
25
50
75
100
125
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Students Ranked by Defect Rate
Bu
gs/
KS
LO
C
0
25
50
75
100
125
0 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%
Students Ranked by Defect Rate
Bu
gs/
KS
LO
CStudents improve their code quality when using Web-CAT
Newly written “untested” code
Commerical-quality code
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Submissions Relative to Due Date
0
5
10
15
20
More +9 +8 +7 +6 +5 +4 +3 +2 +1 Due -1 -2
Days Before Due Date
Nu
mb
er o
f S
ub
mis
sio
ns
With Testing Without Testing
Submissions Relative to Due Date
0
5
10
15
20
More +9 +8 +7 +6 +5 +4 +3 +2 +1 Due -1 -2
Days Before Due Date
Nu
mb
er o
f S
ub
mis
sio
ns
With Testing Without Testing
Students start earlier and finish earlier when they use Web-CAT
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
An evaluation of submitted code indicates students program more effectively
Bold p = .05 significance Without With TDD
Recorded grades 90.2% 96.1%
TA assessment 98.1% 98.2%
Automated grader assessment 76.8% 94.0%
Faults on master test suite 36.7% 24.9%
Projected Defects/KSLOC70 38
(45% less!)
How early was first submission? 2.2 days 4.2 days
Stephen Edwards Virginia Tech
After using TDD and Web-CAT, students clearly perceive practical benefits
Agree
Disagree
More helpful at detecting errors than Curator
4.3
Provides excellent support for TDD 4.1
Increases my confidence in correctness 3.9
Increases my confidence when making changes
3.8
Makes me test my solution more thoroughly
3.8
Makes me more systematic in devising tests
3.8
Would like to use, even if not required 3.8 Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech
Student reactions are very positive toward TDD
I am very excited about using TDD.
I agree that TDD can be beneficial and I’m glad we are being required to experiment with it in this course.
If it increases the effectiveness of my programming and decreases the time I spend debugging, then I am all for it.
[Previously,] I had to quit my detailed testing and stick to making the program appear to work with the sample data given every time a deadline drew near. With [TDD], the tests are such an integral part of the project that no time-conserving measure will save me.
Automatically Grading Programming Assignments with Web-CAT
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
We use Web-CAT to automatically process student submissions and check their work
Web application written in 100% pure Java
Deployed as a servlet
Built on Apple’s WebObjects
Uses a large-grained plug-in architecture internally, providing for easily extensible data model, UI, and processing features
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Web-CAT’s strengths are targeted at broader use
Security: mini-plug-ins for different authentication schemes, global user permissions, and per-course role-based permissions
Portability: 100% pure Java servlet for Web-CAT engine
Extensibility: Completely language-neutral, process-agnostic approach to grading, via site-wide or instructor-specific grading plug-ins
Manual grading: HTML “web printouts” of student submissions can be directly marked up by course staff to provide feedback
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Grading plug-ins are the key to process flexibility and extensibility in Web-CAT
Processing for an assignment consists of a “tool chain” or pipeline of one or more grading plug-ins
The instructor has complete control over which plug-ins appear in the pipeline, in what order, and with what parameters
A simple and flexible, yet powerful way for plug-ins to communicate with Web-CAT, with each other
We have a number of existing plug-ins for Java, C++, Scheme, Prolog, Pascal, Standard ML, …
Instructors can write and upload their own plug-ins
Plug-ins can be written in any language executable on the server (we usually use Perl)
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
The most well-known plug-in is for grading Java assignments that include student tests
ANT-based build of arbitrary Java projects PMD and Checkstyle static analysis ANT-based execution of student-written JUnit tests Carefully designed Java security policy Clover test coverage instrumentation ANT-based execution of optional instructor
reference tests Unified HTML web printout Highly configurable (PMD rules, Checkstyle rules,
supplemental jar files, supplemental data files, java security policy, point deductions, and lots more)
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Web-CAT supports a variety of languages, and its Java plug-in is aimed at software testing
ANT-based build of arbitrary Java projects PMD and Checkstyle static analysis ANT-based execution of student-written JUnit
tests Carefully designed Java security policy Clover test coverage instrumentation ANT-based execution of optional instructor
reference tests Unified HTML web printout Highly configurable (PMD rules, Checkstyle rules,
supplemental jar files, supplemental data files, java security policy, point deductions, and lots more)
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Web-CAT provides timely, constructive feedback on how to improve performance
Indicates where code can be improved
Indicates which parts were not tested well enough
Provides as many “revise/ resubmit” cycles as possible
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
The most important step in writing testable assignments is …
Learning to write tests yourself
Writing an instructor’s solution with tests that thoroughly cover all the expected behavior
Practice what you are teaching/preaching
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Students get frustrated without feedback, so reference tests must provide some
If students only get a score, but no other feedback for how to improve, they get easily frustrated
We augment our reference tests to provide “hints” for failed tests, cross-referenced to the program assignment
Requirements in assignment spec
mul: this command takes two arguments from the evaluation stack and multiplies them11.
Feedback to student on failed test
Your testing does not fully cover (11)
More detailed alternate feedback
(11) mul command failed, expected 4 but received 8
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Students will try to get Web-CAT to do their work for them
Students appreciate the feedback, but will avoid thinking at (nearly) all costs
Too much feedback encourages students to use Web-CAT for testing instead of writing their own tests—they use it as a development tool instead of simply to check their work
This limits the learning benefits, which come in large part from students writing their own tests
Lesson: balance providing suggestive feedback without “giving away” the answers: lead the student to think about the problem
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
We have also tried to influence student work habits to improve their success
Encourage early submission by providing extra incentives or using late penalties
Score bonuses and/or penalties are easy Another useful approach:
Generous limit on the total number of submissions (60)
Hints disappear one day before the due date Project closes for one day to encourage students
to step away and reflect on “the last bug” Project opens again for one day with hints re-
enabled, but with a cap on how much the score can improve
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Lessons for writing program assignments intended for automatic grading
Requires greater clarity and specificity
Requires you to explicitly decide what you wish to test, and what you wish to leave open to student interpretation
Requires you to unambiguously specify the behaviors you intend to test
Requires preparing a reference solution before the project is due, more upfront work for professors or TAs
Grading is much easier as many things are taken care by Web-CAT; course staff can focus on assessing design
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Areas to look out for in writing “testable” assignments
How do you write tests for the following:
Main programs
Code that reads/write to/from stdin/stdout or files
Code with graphical output
Code with a graphical user interface
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Testing main programs
The key: think in object-oriented terms
There should be a principal class that does all the work, and a really short main program
The problem is then simply how to test the principal class (i.e., test all of its methods)
Make sure you specify your assignments so that such principal classes provide enough accessors to inspect or extract what you need to test
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Testing input and output behavior
The key: specify assignments so that input and output use streams given as parameters, and are not hard-coded to specific sources destinations
Then use string-based streams to write test cases; show students how
In Java, we use BufferedReaders and PrintWriters for all I/O
In C++, we use istreams and ostreams for all I/O
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Testing programs with graphical output
The key: if graphics are only for output, you can ignore them in testing
Ensure there are enough methods to extract the key data in test cases
We use this approach for testing Karel the Robot programs, which use graphic animation so students can observe behavior
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Testing programs with graphical UIs
This is a harder problem—maybe too distracting for many students, depending on their level
The key question: what is the goal in writing the tests? Is it the GUI you want to test, some internal behavior, or both?
Three basic approaches:
Specify a well-defined boundary between the GUI and the core, and only test the core code
Switch in an alternative implementation of the UI classes during testing
Test by simulating GUI events
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Conclusion: including software testing helps promote learning and performance
If you require students to write their own tests …
Our experience indicates students are more likely to complete assignments on time, produce one third less bugs, and achieve higher grades on assignments
It is definitely more work for the instructor
But it definitely improves the quality of programming assignment writeups and student submissions
Stephen Edwards Virginia Tech Automatically Grading Programming Assignments with Web-CAT
Visit our SourceForge project!
http://web-cat.sourceforge.net/
Info about using our automated grader, getting trial accounts, etc.
Movies of making submissions, setting up assignments, and more
Custom Eclipse plug-ins for C++-style TDD
Links to our own Eclipse feature site