λ - an autograder for snap! - eecs.berkeley.edu · fig 17 lab 12: submission times by day. ......
Post on 23-May-2018
214 Views
Preview:
TRANSCRIPT
-
Lambda: An Autograder for Snap!
Dan Garcia
Electrical Engineering and Computer SciencesUniversity of California at Berkeley
Technical Report No. UCB/EECS-2018-2http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-2.html
January 16, 2018
-
Copyright 2018, by the author(s).All rights reserved.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.
Acknowledgement
The work presented here exists due to the contributions of many people andgrants from the National Science Foundation (grant 1138596), Google, andedX. Building an autograder would not be possible without BJC,Snap! and CS10, which have collectively been supported byhundreds of dedicated students and teachers over the past seven years. In particular, a huge **Thank You** to the following people who helpedmake this possible. Thanks especially to Lauren Mock for always being there,and for all the advice and help! Everyone has been incredibly supportive overthe past year.
-
Copyright 2015, by the author(s).All rights reserved.
Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copyotherwise, to republish, to post on servers or to redistribute to lists,requires prior specific permission.
-
An Autograder for Snap!
by Michael Ball
Research Project
Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination:
Committee:
Professor Dan Garcia Research Advisor
May 13, 2016
(Date)
* * * * * * *
Professor Armando Fox Second Reader
May 13, 2016
(Date)
-
1 2
2 3
3 4
4 6
5 7
6 11
7 14
8 19
8.1 19
8.2 24
9 25
10 27
10.1 27
10.2 31
11 35
12 37
13 38
14 40
TableofContentsAbstract
Acknowledgements
ListofFigures
Introduction
RelatedWork
DesignPrinciples
InterfaceWalkthrough
Implementation
TheWebApplication
TheAutograderInterface
ExperimentalSetup
Results
UsageAnalysis
SurveyAnalysis
FutureWork
Conclusion
References
AppendixA:SourceCode
1
-
1AbstractWhilevisualprogramminglanguagesarehardlyanewdevelopment,recentpushesforincreasedequityandaccesstocomputersciencehaveledtoarenewedinterestanduseofvisualprogramminglanguages.Autogradersareacriticalcomponentofbothin-personandonlinecomputersciencecourses.However,untilthispoint,therehaven'tbeenanyautogradersdocumentedforgeneral-purpose,visualprogrammingdevelopmentenvironments.Thisisaparticularshortcomingasintroductorycourseshavescaledtolargernumbersofstudentsandonlineenvironments.Whiletherearechallengestousingautograders,webelievethattheinstantfeedbackcapabilities,aswellaspotentialtimesavingsforcoursestaffwillhelpusteachagreaternumberofstudents.
Overthepastyear,wehavebuiltanautograder,named(Lambda),forSnap!,avisualblocks-basedprogramminglanguageinspiredbyMIT'sScratch.TheprimarymotivationfordevelopingtheautograderwastorunaseriesofMassiveOpenOnlineCourses(MOOCs)onedx.orgthroughoutthe2015-2016academicyear.However,wealsowishtousetheautogradertobettersupportin-personcomputersciencecourses.InSpring2016,theSnap!autograderwasusedasapartofUCBerkeley'sCS10,TheBeautyandJoyofComputing,a"CS0"non-majorscourse.
Thisreportdescribes,whichconsistsofa"backend"RubyonRailswebserverthatallowsustousetheautograderinaclassroomsetting,throughaprotocolcalledLTI.Thebackendwebapplicationcontainsadatabaseofquestionsandtestfiles,whiletheSnap!interfacecontainsnewfeaturesandaviewtopresenttheresultsoftheautograder.OurinitialresultsshowtheautogradersuccessfullybeingusedinCS10,wheretheautograderwasusedtosupplementorallabcheckoffs,andonedx.orgwheretheautograderwastheprimarymethodforstudentstoreceivecreditforcode,gradedbothforeffortandcorrectness.
2
-
2AcknowledgementsTheworkpresentedhereexistsduetothecontributionsofmanypeopleandgrantsfromtheNationalScienceFoundation(grant1138596),Google,andedX.BuildinganautograderwouldnotbepossiblewithoutBJC,Snap!andCS10,whichhavecollectivelybeensupportedbyhundredsofdedicatedstudentsandteachersoverthepastsevenyears.
Inparticular,ahugeThankYoutothefollowingpeoplewhohelpedmakethispossible.ThanksespeciallytoLaurenMockforalwaysbeingthere,andforalltheadviceandhelp!Everyonehasbeenincrediblysupportiveoverthepastyear.
DanGarciaforbeingamentorforthepastfiveyearsArmandoFoxforreviewingthisworkBrianHarveyandJensMnigforSnap!MaxDougherty,TinaHuang,YifatAmirandPatrickO'HalloranfortheirincredibleworkontheautograderTheCS10stafffortestingunproventechnologyandfordealingwithadditionalstudentquestions!
TAs:RachelHuang,AdamKuphaldt,AlexMcKinney,AmrutaYelamanchili,AranyUthayakumar,ErikDahlquist,JannaGolden,JosephCawthorne,LaraMcConnaughey,WilliamTang,YifatAmir,AndySchmitt,StevenTraversi,andVictoriaShiInstructors:JustinHsiaandGeraldFriedland
3
-
3ListofFigures
Fig1AnexampleSnap!program.
Fig2AtypicalexampleofBJCcurriculumwhichincludesgraphicaloutput.
Fig3AtypicalexampleofaCodeStudioproblemthatgivesstudentsonlyafewblockstoworkwith,andhasafairlyconstrainedsolutionspace.
Fig4Theinitialpageisalistofquestionstotry.
Fig5Administratorshaveadditionalfunctionality.
Fig6Creatinganewcourseisasimpleactionwhichrequireslittleinformation.
Fig7Adashboardshowingthefirsttwolabsthesubmissiontimesforautograderrequestsfor.
Fig8Theinitial(edX)versionwhichhadaheavilyintegratedfeedbackbutton.
Fig9Updatedcontrolsfortheautogradershowingadropdownmenu.(Thecontrolsforrevertingsubmissionsaregreyed-out.)
Fig10Anexampleofthefeedbackpresentedwheneverythingiscorrect.
Fig11Anexampleoffeedbackshowingsomefailingcases.
Fig12Snap!canbeembeddedinedXthroughJSInput.
Fig13Whenastudentclicksonthelink,anewtabwillopenwiththeproperquestiontheyareassigned.Clickingthe"GetFeedback"buttontriggersasubmissionwhichsendsthegradebacktotheLMS.
Fig14AverybasicLTIlaunchsequence.ImagefromtheIMS{{"ims-img"|cite}}.
Fig15Numberofstudentsbynumberofquestionsattempted
Fig16Lab11:submissiontimesbyday.
Fig17Lab12:submissiontimesbyday.
Fig18Lab14:submissiontimesbyday.
Fig19Numberofautogradersubmissionsbyhouroftheday.
Fig20Numberofautogradersubmissionsbydayoftheweek.
Fig21Moststudentsappeartoonlysubmitonceattheendoftheirwork.
Fig22Numberofstudentsbynumberoftimessubmittedforeachlab.
Fig23Studentsarefairlyevenlysplitbetweenpreferringonlinevsorallabcheckoffs.
Fig24Moststudentscompletedcheckoffsalone,andfoundthefeedbackeasytointerpret.
Fig25Moststudentsreportedcompletingworkbeforeusingtheautograder
4
-
5
-
4IntroductionInrecentyears,thefieldofcomputerscience(CS),andthebroadertechnologyindustry,haveundergonesignificantchanges,thoughnotsimplytechnicalones.Aroundthesametimethatenrollmentsstartedbooming[1],theNationalScienceFoundation(NSF)postedtheoriginalsolicitationforBroadeningParticipationinComputing[2].AsaresultoftheeffortstoimprovediversityinCS,curriculasuchasExploringComputerScience[3](ECS),andAPComputerSciencePrinciples[4](APCSP)haveemerged.Manycurriculahavestartedtousevisualprogramminglanguages(VPLs),alsoknownasblocks-basedlanguages,astheirprimarytool.Asgreatastheseenvironmentscanbe,theyoftenlackresourcesandtoolsfoundinmoretraditionalenvironments(suchasPythonorJava).Thislackofinfrastructurecanmakecoursesmoredifficulttoteachandscale,particularlytoentirelyonlineenvironments.
Weaimtohelpstudentslearningprogrammingby:
Deliveringthebestresourcestopossibletocompletelyremotestudents,andImprovingthecapabilityandefficiencyofin-personteachingassistants(TAs).
Oneexampleofmissingresourcesisthecapabilitytoautomaticallyevaluatecode,andsendtheresultstoanotherprocess.Automaticevaluationanddistributingtheresultsarecentralcomponentsinanautograder.Withoutanautograder,itcanbedifficult(ifnotprohibitivelyexpensive)toteachlargeonlinecourses.Duringthe2015-2016schoolyear,wetaughtTheBeautyandJoyofComputing[5](BJC)asaseriesoffourmassiveopenonlinecourses(MOOCs)onedX[6].BJCisanAPCSPrinciplescoursethatusesSnap!asitsprimarylanguage.Withoutanautograder,wedidnotthinkthatwecouldfairlygivecredittostudentsonline.AtBerkeley,BJCisofferedasCS10,whichcoulduseanautograderforSnap!.
WedevelopedasasystemforautogradingSnap![7].iscomposedoftwomainparts,theRubyonRailsapplicationserver,andaJavascriptapplicationthataugmentsSnap!withtesting,analysisandloggingcapabilities.WhilethegradinginterfacewasdevelopedforedX,wereportonitsuseintheclassroomandsomerecommendationsforfuturedevelopment.WealsohopethatwillbeusefuloutsideCS10,throughtheimplementationoftheLearningToolsInteroperability(LTI)protocolwhichshouldallowoursystemtobecompatiblewithmostLearningManagementSystems.Wehavebuiltadditionalfeatureswhichshouldallowflexibilitytointegratetheautograderintoavarietyofclassroomenvironments.Thesefeaturesaredescribedinthechapteronimplementation.
4.1Snap!Snap!isablocks-based,drag-and-dropprogramminglanguageinspiredbyMIT'sScratch[8],butadaptedforuniversity-levelcoursesbyincludingfeaturessuchasfirst-classlistsandcustomfunctions(blocks).Snap!isimplementedasawebapplicationinJavaScript,whichmakesitcompatiblewithmobiledevices.Figure1showsabasicdrawingscriptbeingexecuted.
6
-
Figure1:AnexampleSnap!program.
4.2TheBeautyandJoyofComputing
TheBeautyandJoyofComputing(BJC)isanintroductiontocomputingdesignedtobroadenparticipationamongunderrepresentedgroups.TheprimarylanguageusedisSnap!,andmanyoftheexercisesinthecoursehavevisualoutputs.Thecoursecoversfunctionsandabstraction,recursion,higherorderfunctions(lambdas)andmanyothertopics.ManyoftheexamplesandexercisesinBJChavemultiplecorrectpathstoimplementation,whichcanbeachallengeforautograderstohandle.
Figure2:AtypicalexampleofBJCcurriculumwhichincludesgraphicaloutput.
7
-
4.3CS10
CS10[9]isUCBerkeley'sofferingofBJC.Currently,CS10isofferedeverysemesterto200-300students.Likemanyotherintroductorycourses,CS10reliesheavilyonlaboratorysectionsastheprimarymethodforstudentstolearntoprogram.willbeusedinCS10duringlabsectionstogivestudentsbetterfeedbackastheyareworkingandtogivecreditfortheassignedlabwork.Thisyear,wewereabletotrialthesystemforthreeofthe14labsthatstudentscompletedinSnap!.
4.4CorrectnessandEffort
Whiledesigningtheautograder,weconsideredtwoprimarytypesofgrading:correctnessandeffort.Correctnessissimple:Doesisgivenblock(orscript)dowhattheinstructionssay?Effortismoreopen-ended,butthegoalistobeablerewardstudentsfortryingtocompleteaproblem,aswellastoencourageexperimentationwithSnap!'smanyfeatures.Inboththesecases,wewouldliketheautogradertoberobusttodifferentcorrectsolutions,orthemanydifferentwaysstudentscandemonstrateeffort.
8
-
5RelatedWorkInbuildinganddesigninganautograder,thereissurprisinglylittleworkpublishedaboutautograders,andevenlesssoaboutthoseforvisualprogramminglanguages.WhiletherearemanyexamplesofautogradersforlanguageslikePythonandJava,thereareonlytwothatweareawareofforblocks-basedlanguages,whicharebothveryrecentdevelopments.
5.1Code.orgCodeStudio
Code.org'sCodeStudio[10]isanonlineinteractiveenvironmentbasedaroundtheopen-sourceBlocklyenvironment[11].CodeStudioincludescustomfeedback,butthevastmajorityoftheproblemsareinahighlyrestrictivespacewherestudentsaregivenonlyalimitedsetofblocks.WhileCodeStudioisopensource[12],theautograderandfeedbackmethodologiesarenotdocumented.
Figure3:AtypicalexampleofaCodeStudioproblemthatgivesstudentsonlyafewblockstoworkwith,andhasafairlyconstrainedsolutionspace.
5.2ITCH:IndividualTestingofComputerHomeworkforScratchAssignments
ITCHisanautograderforScratchrecentlypresentedatSIGCSE2016[13].However,ITCH'smethodofautogradingisthroughaninstructor-initiatedscriptwhichshouldberunafterallassignmentsubmissionsarecollected.Thisisafairlycommonscenariofortypicalcodeautograders,butnotamodelwewouldusebecauseourgoalisinstantfeedback.ITCHtakesasimilarmethodtoourSnap!autograderby'spoofing'userinputswhenrequested,butitsharesmanyofthesamelimitations,likealackoffeedbackforgraphicaloutput.
9
-
10
-
6DesignPrinciplesEducationaldesignprincipleshelpedinfluencethedesignoftheautograderandthedirectionforcreationofexercises.Whilethesedesignprincipleshelpedinfluencefeaturesandtheuserinterfaceoftheautograder,theyarealsoextremelyimportantasaguidelineforhowinstructorsandcontentauthorsshouldthinkaboutwritingexercisesandimplementingthemintheclassroom.
6.1DualModalities
Acentralchallengefacedwhenconsideringdevelopmentoftheautograderwaswhattoprioritize:FeedbackorGrading?Whilebothfeaturesarenecessaryforeachother,thereisanacutetensionbetweenatoolwhichisprimarilymotivatedbyprovidingopen-endedfeedback,andonethatisdesignedtoprovidegrades.Thegoalofthissectionistoconsiderthebestpossibleinterfacewecouldgivetostudentstohelpthemimprovetheirprogrammingskillsandcompletelabexercises.
Akeyaspectofthistensionishowtohandletheideaofcorrectness.Inanintroductorycomputersciencecourse,weareoftenlenientwithsmalldifferencesinoutputfromaprogram.(Forexample,Snap!allowsstudentstousebothtraditionalarraysandlinkedlists,buttheirvisualoutputisthesame.Iftestauthorsarenotcareful,itiseasytomistakeonetypeofdataasincorrectwhenitshouldbeaccepted.)Whileitisoftenpossibletoaccountforthesedifferenceswhenwritingtestcases,itcanbesignificantlydifficult.Weneedtomakesurethatwhenthesetoolsareusedforgrading,theydonotcausestudentsunnecessarystressorfrustration.
6.2Learner-CenteredDesign
Learner-centereddesign(LCD)isadesignprincipleadaptedfromuser-centered-design(UCD)[14][Citationnotfound][15].BothLCDandUCDdesignprinciplesstartbyestablishingattributesabouttheuser'sgoals.ForLCDthereare4mainattributes:
Learnersdonotpossessthesamedomainexpertiseasusers.
Learnersareheterogeneous.
Learnersmaynotbeintrinsicallymotivatedinthesamemannerasexperts.
Learnersunderstandinggrowsastheyengageinanewworkdomain.
WhilelanguageslikeSnap!aredesignedtobeeasiertolearn,theydonotnecessarilyemployLCDprinciplesbecausetheyarestillintendedasageneral-purposeprogrammingenvironment.Wecanmakesureourautogradertakesintoeachoftheseprinciples:
Domainexpertise:Programminglanguageshavetoshowerrormessagesthatcouldmakesenseinanysituation.Unfortunately,thismeanstheyusuallyfalltothelowestcommondenominatortypecases,anddonotprovideanycontextualinformationtotheuse.InSnap!,itisnotuncommontoseeamessagethatissimilartoTypeError:expectingalistbutgotnumber.Wecanimproveuponthesemessagesbyshowingstudentshintswhicharespecifictotheproblemathand.
Heterogeneity:Thisisoneoftheharderaspectsfortheautogradertohandle.Noteveryoneapproachesproblemsinthesameway.Ourgeneralapproachistotrytobeaslenientaspossible(whilestillensuringcorrectness)whenwritingtestcases.Wehavespentlotsoftimeconsideringhowauthorsshouldhandledifferentformatsofoutputsothatwetrytoavoidnit-pickyerrors.
Motivation:Wetrytomotivateusersbycarefullychoosinghowwepresentthetoolandtheresults.Inclass,andinthetextwhichappearsonscreenwetrytodownplaytheideaofgradesorerrorsandinsteadfocusonhelpingstudentsimprove.
11
-
ChangingUnderstanding:Dynamicallycapturingauser'sunderstandingisincrediblydifficulttodo.Atthispoint,wearenotabletodynamicallyadjustexercisesorfeedbackpresented,butwehaveplannedoutpossiblemethodsfordoingso.Currently,thebestwayforustoachievethisistohaveTeachingAssistants(TAs)andinstructors,whoareconsciousofstudentsneeds,recommenddifferentproblemsforstudentstopracticewith.
6.3KnowledgeIntegration
Knowledgeintegration(KI)isaframeworkforapproachinghowstudentsshouldsynthesizeinformation[Citationnotfound].TheKIframeworkhasfourcomponentstoorangizeideas:
AddingKnowledgeinvolvesbringinginnewideasthatstudentshavenotseenbefore.ElicitingIdeasistheprocessofcriticallyexaminingideasstudentsalreadyknow.DistinguishingKnowledgeasksstudentstotakemultipleideasandfigureouthowtheyfittogether;whethertheyarecompatibleornot.Reflectingistheprocessofdrawingconclusionsfromwhatstudentshavelearned.
WeusedKIasabasisforwritingthefeedbackmessagesthatstudentsareshownthroughtheautograder.Thegoalistofocusprimarilyontheelicitingknowledgeanddistinguishingknowledgecomponents.Wewantedtofocusonthesetwopiecesbecausetherearemanycommoncomputerscienceproblemswhichcanbeviewedthroughthislens.Systematicallydebuggingcodefollowsaprocessofelicitingknowledgewhentryingtofigureoutwhysomethingisbroken.Distinguishingknowledge(suchasthedifferencesbetweentwokindsofloops,orrecursionanditeration)isanaturalprocessforprogramming.
Wechosenottousetheaddingknowledgecomponentbecausewedonotcurrentlyhavetheautogradersetuptogivegoodfeedbackwhenstudentsaredoingexploratorywork(wheretheywouldbemostlikelytouncovernewconcepts).However,thesetypesofmessageswilllikelyappearinfutureversions.Similarly,whilereflectingisavaluablestep,wedonothavethecapabilitytocollectnorgivefeedbacktoopen-endedreflectionquestions.
However,writingproperKImessagesprovedchallenginginthecurrentsetup.Theinitialversionoftheautograderwasdesignedmorearoundpresentingtheresultsoftestcases,thanitwasforlongerformsoffeedback.(Thisisoneareaforimprovement.)Furthermore,tryingtofollowKIoccasionallyledtomessagesthatdidnotnecessarilyfitwithintherestoftheBJCcurriculumasitwasnotdesignedaroundtheKIframework.
6.4"TA-CenteredDesign"Thoughthisiscertainlylowerontheprioritylistthanlearner-centereddesign,wemakeapointtodescribeTA-centereddesign,andwhythismattersforthetoolswebuild.TeachingAssistants(TAs)andinstructors,arecriticalusersoftheinfrastructureincourses.Theyneedtobeabletoeasilyupdateandcreatecontent,handlegradesandsoon.Thelongerormoredifficultthesetasksare,thelesstimeTAshavetospendhelpingstudentslearn.
WhenconsideringTA-CenteredDesign,aTAismuchmorelikeatypicaluserinUCDthanalearnerinLCD,buttherearemanyideasthatshouldbespecificallyrecognizedforTAs:
TAsareoftenlackingpedagogicalcontentknowledge(PCK).PCKismakingthedistinctionbetweenknowinghowtoprogram,andhowtoteachprogramming.TAscoulduseguidanceinapplyinggoodpedagogy.
TheadmindashboardsbuiltintogiveTAsmoreinsightsthantheycurrentlyhaveabouthowstudentsareperformingandhowoftentheyarecompletingthelabwork.Whilethedashboardshaveawaystogoinfunctionality,thisisanimprovementandgivesTAsareasontokeepusingthesystem.Thetestcaseauthoringinterfaceshouldbeadaptedtomakeiteffortlesstowriteconsistentanddetailedfeedback.
12
-
WhileTAsaremotivatedtoteachtheyarenotalwaysmotivatedtocompletetheextraworkrequiredofthem,suchasgradingorwritingassignments.
Testcasesneedtobeeasytowriteandupload.TheImplementationchapterdescribestheproblemswithourinitialapproachusingedX'stools.TheFutureWorkchapterdescribeshowwecanimprovetheexperienceofwritingautogradertestfilestolowerthebarrier.
OftenTAsarenotexpertsinthetoolstheyarerequiredtousetoaccomplishtheirteachingduties,suchasLMSsandgradingsystemssuchas.TAs,likemostusers,havealimitedamountoftimetocompletetheirwork.
Limit(orautomate)repetitivetasks,especiallyonesthatinvolveconfiguration.TheabilitytoautomaticallyuploadgradesforstudentsisahugetimesaverwhichallowsTAstofocusonmoreimportanttasks,andallowsstudentstostopworryingaboutthestatusoftheirgrades.
Whilethesethreeideasmayseemobvious,theyareimportanttorecognizeifourworkistobeusedbeyondourinitialtestimplementation.Then,weneedtoconsiderhowTAswilluse.Thesuccessofanewautograder,evenifitisbeneficialforstudentsrequiresTAstobecomfortableconfiguringandwritingnewautogradertests.
Ultimately,we'rearetryingtorecognizethatTA's(orindividualinstructors)arealreadylimitedbytime.Makingtestcasesaseasytowriteaspossibleisanecessarypartoftheprocess.
13
-
7InterfaceWalkthroughiscomposedoftwoparts:awebserverandaSnap!interfacewithautogradingcapabilities.Inthecurrentimplementation,onlyinstructorsneedtousethewebserver,thoughinthefuturetheremaybefunctionalityaddedforstudents.(Seethefutureworksection.)
7.1WebApplication
Thebasicwebinterface(figure4)presentsalistofproblemstotry.Thislistispublic,soanyonecanattemptanyproblem,butinstructorsareexpectedtoembedspecificquestionswiththeirdirections.Userscanclickonalinktoworkonaparticularquestion.
Figure4:Theinitialpageisalistofquestionstotry.
Therestoftheinterfaceoptionscomeaftertheuserhasloggedinwithanadminaccount(figure5).
Figure5:Administratorshaveadditionalfunctionality.
Thesefeaturesare:
Creating/EditingQuestions-Aquestioncontainsatestfile,apointsvalue,andastarterfile.Creating/EditingCourses-CoursesdescribeconnectionstotheLMS.Eachcoursehasakeyandsomepolicysettings(figure
14
-
6).Creating/EditingAdminDashboards-Admindashboardsprovidethestatusofstudentperformance.ThisinitialversionisbasedentirelyoncustomSQLqueries,buttheycanbepowerful.ThesedashboardswereasignificantbenefitindoinganalysisforthisreportandweresharedwithTAsduringthecourse(figure7).
Figure6:Creatinganewcourseisasimpleactionwhichrequireslittleinformation.
Figure7:Adashboardshowingthefirsttwolabsthesubmissiontimesforautograderrequestsfor.
7.2TheSnap!Interface
TheautograderaugmentstheSnap!interfacewithafewbasictools.Theinitialversionoftheautograderincludedabuttonand"hamburger"menu,thatweredesignedtoappearintegratedintoSnap!.However,theapparentseamlessnesswasactuallymoreconfusingbecausethecontrolsneverfitwithintherestofSnap!.
15
-
Figure8:Theinitial(edX)versionwhichhadaheavilyintegratedfeedbackbutton.
TheupdatedversionmoreclearlyseparatestheautogradercontrolsfromSnap!,andgivesusmoreroomtoexpandfunctionalityinthefuture.AproblemtitlewillalwaysbedisplayedintheSnap!interface(comparedtooutsidethewindowintheedXversion),whichwasasmallbutimportantchangebecausethenewsetupallowsfordisplayingSnap!inaseparatetabfromthequestioninstructions.
Figure9:Updatedcontrolsfortheautogradershowingadropdownmenu.(Thecontrolsforrevertingsubmissionsaregreyed-out.)
Bothinterfaceshavethefollowingfeaturesforstudents:
The"GetFeedback"buttonrunsasetofautogradertestsonthecode,whicharepresentedwhentestsarecomplete.Thecolorofthestatusbar(thebuttonintheedXversion)changescolorbasedonthequestion'sstate:
Green:AlltestsarepassingOrange:Thestudenthasmodifiedtheircodeandtestsshouldbere-run.Red:Atleastonetestisfailing.
RestoreBestSubmissionRestoreLastSubmission
Thesetwofeaturesencouragestudentstoexperimentandre-writecodewithoutanyfearthattheymighthurttheirscoresorlosework.Everytimethe"GetFeedback"buttonisclicked,asubmissionisrecorded.
ResetRestoresthecurrentSnap!projecttothestateofthestarterfile.
HelpDisplaysasetoftooltipsoverautograder-specificelements.
ShowPreviousResultsThisisanewoptionwhichwillpresentthefeedbackoftheprevioustestswithoutre-runningthem.Thisallowsstudentstoreviewmistakes,andshoulddiscourage"autograder-drivendevelopment".
16
-
Figure10:Anexampleofthefeedbackpresentedwheneverythingiscorrect.
Figure11:Anexampleoffeedbackshowingsomefailingcases.
ThetwoscreenshotsinFigures10and11showtwodifferentversionsoftheautograderpresentingfeedback.Studentsseepassingtestsontopingreenandfailingtestsonthebottominred.Eachsectionhasdetailsabouttheparticulartestcases,andtheboldheadingscancontaintipsorguidanceforstudents.
17
-
18
-
8ImplementationThewebapplicationisbuiltusingtheRubyonRailsframework[16],andmakesuseofmanycommonwebtechnologies.TheSnap!interfaceisimplementedpurelyinJavaScript.TheinitialpurposeofthesystemisnottobeagradestoragebuttoconnectwithanexistinggradebookorLearningManagementSystem(LMS)sothatanygraderesultswillbeintegratedwiththerestofcoursedata.Wechosethisroutebecause,inourexperiencemultiplesourcesofgradesarepronetoerrorsanddelays.Bydesigningasystemwhichdoesn'tneedtostoregradedata,wecanmakealotofsimplificationsandfocusonmoreimportantfeatures.
8.1RubyonRailsBackend
existsasmodern"Softwareasaservice"application.It'sdesignedtobehostedbyacloudservicesprovider,usingawebserverandaseparatedatabaseserver.
8.1.1TheNeedforaWebApplication
TheJavaScriptthatpowerstheautogradingworksentirelyclient-side,meaningaslongasyouhavethetestfiles,there'snoneedforaninternetconnection.Thispathwasinitiallychosenforthreereasons:
Snap!isclient-side,andevaluatingSnap!projectsonaserverwouldrequireasignificantamountofwork.edXprovidesacustomproblemtypecalledJSInput[17]whichgaveusaclearpathforintegrationwithedX.Developinganautograderandawebapplicationrequiredmoreresourcesthanwereavailable.
Whiletheentirelyclient-sidepathwasagooddecision,weranintoanumberofissuesbyrelyingonJSInputandtryingtokeepallfeaturesclient-side.
8.1.1.1ChallengeswithJSInput
edX'sJSInputproblemtypeprovidesaJavaScriptAPIforsendingscorestotheedXplatform.ItallowsustobuildinacustomversionofSnap!alongsidetherestofthecontentinedX.
19
-
Figure12:Snap!canbeembeddedinedXthroughJSInput.
However,weencounteredseveralproblemswhiledevelopingtheJSInputbasedintegration:
Developingcodewasveryslow!ChangestocoderequiredmanuallyuploadinganewfiletoedX,whichisamulti-stepprocess.ThelibrariesusedforJSInputswallowednativeJavaScripterrors,makingdebuggingnearlyimpossible.TheedXinterfacehasit'sownmechanismfora"Check"buttonandshowingfeedback.Communicatingthedetailedoutputfromtheautograderdidn'tworkverywell,andweendedupdevelopingmanyworkaroundstogetaseamlessUI.There'snoroomforstoringorretrievingusermetadata.Werelyonfeaturesthatallowstudentstorecallprevioussubmissions.ThroughJSInputtheonlyoptionforthesefeaturesweretousethebrowser'sLocalStorageAPI.ThisAPIhaslimits,likeamaxof5MBofstorage,thatcausedproblemsforsomestudents.Furthermore,withoutadedicateddatabase,everysingletestfilewrittenhadlotsofhard-codedmetadatathatwasrepetitiveandpronetoerror.WhileedXprovidesuserlogsfortheentirecourse,wefounddealingwiththeselogstobeneedlesslycomplex.Theyareslowtoget,andtheautograderresultsaredifficulttoseparatefromtherestofthecoursedata.Assuch,wehaven'tanalyzedtheedXdatatotheextentwe'dliketo.Asimplerloggingsystemdescribedbelowhasbeenimmenselyhelpfulforouranalysis.
Theonebenefitoftheseproblemswasthatitforcedthedevelopmentoftheautograderintotwocomponents:AJSinterfacetoedX,anda"dumb"client-sidecomponentthatsitsontopofSnap!.Thisdistinctionwashelpfulwhenadaptingtheautogradertoworkwiththenewwebapplication.
Perhapsmostimportantly:thegradingsystemcouldonlyworkwithedX.CS10usesCanvas[Citationnotfound]asit'sLMS,andmanyhighschoolsusedifferentsystems.Theneedtobuildacustomsolutionforeveryplatformwouldbeprohibitivelyexpensive.Fortunately,theLTIprotocolprovidesadecentsolutionformostofthetaskswe'dliketoaccomplish.
Attheendoftheday,thedecisiontobuildaninitialversiontiedtoJSInputwasagoodone,asitwasstillprobablyfasterthanbuildingafullwebapplicationatthesametime.
8.1.2BasicArchitecture
20
-
isaRubyonRails(commonlyabbreviatedas"RoR")[16]webapplicationbackedbyaPostgreSQLdatabase.Thedatabaseprimarilycontainsasetofquestions,asubmissionslog,andauserstable,aswellassomeadditionalmetadata.ThecurrentversionisdeployedtoHerokuatlambda.cs10.org,butitcouldbedeployedtoanycloudprovider.
8.1.2.1QuestionsandSubmissions
Thecorefunctionalityisprimarilysupportedbytwo,fairlysimple,datamodels:QuestionsandSubmissions.
AQuestionneedsonlythreeattributes:
title:Ahuman-readableIDforthequestionpoints:Pointsareusedtonormalizescores.(SeetheLTIsectionbelow.)testfile:ThetestfileisaJavaScriptfile(describedbelow)whichincludesthetestcasesaswellasfeedbackpresentedtothestudent.
Thoughtheyaren'tcurrentlyused,futureupdatesforwillmakeuseofthefollowingproperties:
content:Currently,itisuptocoursestafftoprovidecontextforthequestionswhicharebeinggraded.Inthefuture,thewilldisplaythiscontentalongsidetheSnap!interface.tags:Questionscancontaintagswhichcanaidinsearching,ortryingtocorrelatestudentperformanceacrossproblems.Thereisalsopotentialforusingtagstorecommendproblemstostudentsasastudytool.
ASubmissionhasafewkeyproperties:
testresultsisaJSON-formattedresultfromtheautograder.Itcontainsthepointsgiventoeachtestcaseaswellasthespecificresultsandfeedback.codesubmissionisafullexportoftheSnap!thatstudentswrote.userinfoisasetofdataaboutthesubmittinguserwhichincludeswhatsourcetheycamefrom(seetheaccountssection),andifthey'reapartofacourse.
Notethatloggingsubmissionsispurelyforpurposesofanalysisandbackup.ByimplementingtheLTIprotocol,theLMSwillcontainallnecessarydataforstudentstoreceivegrades.However,ifwechoosetoadapttoincluderesourcesforstudyingorquestionrecommendation,theinternalsubmissionsdatabasewillbecomemoreimportant.
ACourseisanobjectwhichmanagestheLTIconnection,andneedsonlytwovalues:
consumer_keyisauniquekeyforeachcourse.ThishelpstheapplicationseparatebetweendifferentLTIconsumers,primarilyforpurposesofanalysis.consumer_secretisahashoftheconsumer_key.It'sautomaticallygeneratedbytheapplicationwhenaCourseobjectiscreated.
NotethattheCoursestableisn'tentirelynecessary.TheLTIconnection'skeyandsecretvaluescouldbeentirelystatic(i.e.inanenvironmentconfiguration),butsuchanapproachispronetoerrorsandhassecurityconcernsasanapplicationisconnectedtomultiplesystems.
8.1.2.2UserIdentities
Whenbuildingatoolwithgradingdata,itwascriticallyimportantthatwehadaneasyandwaytoidentifystudents,andtominimizetheneedforanadditionallogin.
8.1.2.2.1LTI
21
https://lambda.cs10.org
-
TheIMSGlobalLearningConsortium[18]isastandardsbodycomposedofeducationalinstituions,intrestbodiesandedtechcompanies.IMSpublishesaspecificationcalledLTI[19]whichcanbrieflybedescribedas"OAuthforeducationalapplications".TheLTIauthenticationprocessisactuallybasedontheOAuth[Citationnotfound]protocol,butit'sdesignedtobecompletelyseamlessforstudents.(UnlikeaGoogleorFacebookauthorization,astudentwhoisalreadyauthenticatedinsideaLMSdoesnotneedtospecifically'authorize'anapplicationwhenLTIisused.)
TheLTIprotocoldefinestwo"categories"ofapplications:aToolConsumer(TC)andaToolProvider(TP).isaprovider,whiletheLMSisatypicalconsumer,inthiscasebCourses(Berkeley'sinstanceofInstructureCanvas).Atypicaluserflowinvolvesastudentvisitinganassignmentpage(insideaLMS)whichcontainseitheraniframeelementoraspeciallink.Currently,implementsversion1.1[Citationnotfound]oftheLTIspecification,thoughsupportforversion2.0[Citationnotfound]isplanned.LTI1.1onlyallowsatoolprovidertoreadandsendgradesforacurrentlyloggedinstudent,somustworkwithinthislimitation.LTI2.0willpotentiallyallowformoredataaboutcoursestobesharedwithstudents(suchashandlingparterassignments),butitsofarnotwellsupportedamongLMSvendors.
Figure13:Whenastudentclicksonthelink,anewtabwillopenwiththeproperquestiontheyareassigned.Clickingthe"GetFeedback"buttontriggersasubmissionwhichsendsthegradebacktotheLMS.
22
-
Figure14:AverybasicLTIlaunchsequence.ImagefromtheIMS[Citationnotfound].
Whenauserclicksthelink(oranembeddediframeisdisplayed),aHTTPPOSTrequestismaketotheproviderwhichincludesapplication-levelconfigurationdata(includingaconsumer_keyandconsumer_secret).ThechallengeisthatthecurrentversionoftheLTIprotocolrequiresthatthislaunch_urlbethesameforeveryassignment.(InthiscasetheURLishttps://lambda.cs10.org/lti/sessions.)AftercompletingtheOAuthhandshake,theTP(ourapplication)checksforthepresenceofadditionalconfigurationinfopassedbytheTC:
Ifaquestion_idisprovided,thenwillloadthespecificquestion.CurrentlyLTIdoesn'tprovideastandardinterfaceforloadingaspecificresource,soifaninstructorpassesinthisvalue,studentswillautomaticallyberedirectedtotheproperquestion.AgradepassbackURLisoptional.IftheapplicationseesthisURL,itwillpostascorebacktotheTC.IftheURLismissing,thenskipspostingascorebutstillsavesthesubmissiontothelocaldatabase.UserInfo:Iftheinstructionalstaffchooseto,theycanconfiguretheLMStosend"Public"informationto.(Publicheremeans,whatFERPAdefinesaredirectoryinformation.InthecaseofUCBerkeley,thisincludesthestudent'sname,email,anduserID,specificallydifferentfromtheirstudentID.REF?)IftheTCdoesn'tsendpublicinformation,thenitwillsendanobfuscatedhashtoidentifyeachstudent.Again,thisprimarilyonlyaffectstheanalysiscapabilitiesprovidedtoTAs.
8.1.2.2.2NeedFor(Regular)OAuth
Unfortunately,despitetheadvantagesofLTI,wefoundthattraditionalOAuthwasstillanecessarycomponent.Theprimarymotivationwastheneedforsiteadmins,andTAswhocanloginwithouthavingtogothroughaLMS.We'reusingGoogleasanOAuthproviderinadditiontoLTI.Inthefuture,wewouldalsoliketoconnecttheSnap!cloudaccountsystemthroughOAuth,butthisiswaitingonenhancementstotheSnap!cloud.
Thisfeaturecanbeusefulforstudentsaswell,oncewebuildoutstudentdashboards.However,inordertobeeffective,we'llneedawaytoassociateLTIaccountswithOAuthaccounts.Thoughnotfullyimplemented,wewillbeabletoautomaticallyassociateaccountsforstudentsiftheysharethesameemailaddress.ForcampusessetuplikeBerkeley,thiswillthedefaultcaseformoststudentssincetheirGoogleaccountandtheLMSaccountoriginatefromthesamecampussystems.Iftheemailsdon'tautomaticallymatch,weshouldbeabletoprovidethisfunctionalitybyemailingactivationcodes.
8.1.3SecurityConcerns
23
-
Finally,weneedtodiscussasignificantconcernaboutsecurity.Currently,theautogradertestfilesareimplementedinplainJavaScript,andareservedalongsidetherestofthepagecontent.Thismeansthere'safairlygapingholeallowingforCross-Site-Scripting(XSS)vulnerabilities.Thecurrentmitigatingfactoristhatonlytrustedaccountswithanadminflagcanuploadtestcases.Thisisanacceptablelimitationfornow,butintheFutureWorkchapterwedescribeapotentialwayaroundthisbyallowingtestcasestobewritteninSnap!,thensecurelycompliedtoJSontheserverside.
8.2TheAutograderInterface
TheautogradercomponentsarebuiltinplainJavaScriptandHTML.GreatcarehasbeentakentoavoidmodifyingtheSnap!environmentasmuchaspossible.TheprimaryreasonforthisistoallowtheautogradertobeeasilyupdatedalongwithnewversionsofSnap!.
24
-
9ExperimentalSetupIntheSpring2016semester,CS10[9]usedtheautograderaspartoftheroutineforlabcheckoffs.Intotal,theautograderwasusedforthreedifferentlabs,butstudentswereonlyrequiredtotrytheautograderforonelab.
9.1OralLabCheckoffs
CS10usesorallabcheckoffsasameansofgrantingcreditforalab.ThewaythistypicallyworksisthatstudentsaregivenoneweekfromtheassignedlabdatetocheckinwithaTAorlabassistant.Studentsarecheckedoffwhentheysuccessfullyanswertwoorthreequestionsaboutthelab,andusuallyareaskedtopresentanexampleofworkingcode.
It'sworthmentioningthatthesecheckoffsaredesignedmorearoundcompletenessandeffortplacedinthelab,ratherthanabsolutecorrectness.(Though,whetherornottograntcreditisuptotheindividualTAs.)OrallabcheckoffsprovideagoodreasonforTAsandstudentstointeractmoreoften.Overthepastfewsemesters,themodeloflabcheckoffshasbeenrefinedbasedonstudentandTAfeedback,andisdesignedtorequireonlyminimaladditionaleffortforthecoursestafftomanage,andforstudentscomplete.We'recomparingtheautograderagainstafairlystrictstandardinthisscenario,whichwehopeillustratesthepotential.
9.1.1ChallengesofOralLabCheckoffs
Manyofthechallengesfacedwithorallabcheckoffsmotivatethedevelopmentofanautograder.Labcheckoffscantakeanywherefrom2-10minutesperstudent,orperpairofstudents.Duringabusylabsection,particularlyclosetodeadlines,thistimecancreatealargebackloginordertogetstudentscheckedoff.Whilethetimediscussingwithindividualstudentsisvaluable,thecreationofaqueuecanwastestudent'stimeiftheydon'tneedadditionalhelp.Furthermore,ifmostofthelabworkcanbedoneathome,thenoralcheckoffscanbeinflexibleforstudentswhohaveahardertimemakingittolab,suchasthosewithdisabilitiesorfamilies.
Finally,traditional"clipboardstyle"methodsfordealingwithinputtingscoresfororalcheckoffsisafairlycomplexandslowtask,wheregradescangetlost.Togetaroundthis,weputsignificanteffortintoasemi-automatedsystemwhichallowsTAsandlabassistantstoinputgradesquicklyandsecurely.Withoutsuchaninvestment,orallabcheckoffswouldbemuchlesspractical.
9.1AutograderCheckoffsInSpring2016,therewereatotalof18labs,14ofwhichusedSnap![20].Wehadouronlineautogradersetupfor3ofthelaterlabs:
Lab# LabTopic Autograder
Lab11 RecursionPart2 Required
Question:Completethedefinitionofmergesort
Lab12 Tic-Tac-Toe Optional
Question:Completethetttsolverblock.
Lab14 HigherOrderFunctions Optional
Question:Completetheis_pandigital?block
Question:Completetheminvalueof_overallnumbersin_block.
25
-
Note:optionalmeansthatthestudentscouldreceivecreditforthelabcheckoffthrougheithertheautograderoranorallabcheckoff.Requiredmeansthattheonlytheautograderwasacceptedforcredit.
9.2ReasoningWechosethissetupforavarietyofreasons,tryingtobalanceanewtechnologywiththeneedtotestitout.
Onelabwasrequiredsothateverystudentwouldtrytheautograderatleastonceandcouldprovidefeedbackabouttheirexperiences.Labs12and14wereoptionalbecausewedidn'twanttopenalizestudentsiftheyfelttheautograderwasmoreconfusingorstressful,butwewantedtogiveasmanypeopleachancetouseitaspossible.Finally,thishybridmodelforcheckoffswilllikelybethebasisforfuturesemestersofCS10,asmorequestionsandfeedbackarewritten.
9.3ChallengesofAutograderCheckoffs
Naturally,eventhemostadvancedautograderwillbenomatchforaTA...yet.Wefullyrecognize(asdothestudents)thatanautogradercannotreplacethedetailed,specializedandmorearticulateoffeedbackofaTA.Furthermore,comparedtothecurrentwebpageforlabcheckoffquestions[20],eventheeasiest-to-useautograderwillbesignificantlymoreworkthanpostingafewsentencestoawebpage.
However,whendeployedintheclassroomwitha"hybrid"modelofbothlabandoralcheckoffswethinkwecanalleviatethein-classpressuresofcheckoffsbutstillgivestudentscreditfortheworktheyaredoing.Bydesigningamodelthatgivesstudentsmorechoice,TAswillhopefullybeabletogivemoretimetothestudentsthatneeditmost.Thisstillleavestheproblemthatanautograderisstillmoreworkthanthestatusquo.Wethinkthatthiswillbecomelessofaproblemovertimeasadatabaseofquestionsisbuiltupandiftheanalyticalcapabilitiesprovetobeusefultoteachingstaff.
26
-
10ResultsOverthecourseofthreelabassignments,wefoundthatstudentsusedtheautogradertocompletelabsoutsideofcoursetime.Whilewehavenobasisforwhen(orhowoften)studentscompletedlabsathomewithouttheautograder,thisshowsthatstudentsareatleastreceivingthebenefitsoffeedbackwhenaTAisnotpresent.
Secondly,whenwesurveyedstudents,theywerealmostevenlysplitbetweenpreferringorallabcheckoffsortheautograder.Giventhelabsetup,andthedifferingadvantagesforeachenvironment,thisiswhatwewouldhopefor.
Note,unlikeothereducationalstudies,wearenottryingtoclaimanylearninggainsfromthissystem.
10.1Usage&StatisticsThefirstpartofouranalysiswillbetolookathowoftenandwhenstudentsweresubmittingtheirlabs.ThisdatawasobtainedfromaggregateinformationintheSubmissionsdatabaseaswellasfrombCoursestocomparetonon-autogradedlabs.
10.1.1BasicStatistics
TotalNumberofstudentsinCS10:169TotalNumberofstudentusingtheautograder:145(86%)
Lab# OralCheckoffs AutograderCheckoffs
10 145 N/A
11 34 133
12 57 80
13 132 N/A
14 97 64
15 142 N/A
Notethisdatahassomeanomaliesduetobugintheinitialsetup,andconfusionamongcoursestaff:
Lab11hadabugpostingscoresthroughLTIforthefirstday.SomeTA'smistakenlysubmittedscorestwice.Thereissomeoverlapofthose34students.Lab14hadabuginoneexercisecausingmanystudentstogetcheckedoffbybothmethods.
Wecanlookatthenumberoftimesthatstudentstriedtheautograder:
27
-
QuestionsAttempted
Figure15:Numberofstudentsbynumberofquestionsattempted
Whatthisshowsusisthat,thoughstudentswereonlyrequiredtocompleteonelabusingtheautograder,nearly2/3(93/149)foundtheautogradercompellingenoughtotryasecondtime.Fromtalkingtostudentsandstaff,aportionofthedropoffinstudentsusingtheautogradermaybeduetothefactthattheautograderdoesn'thandlepairsofstudents.Giventhatthelaterlabsaresomeofthemoredifficult,manystudentsmaychoosetoworkinpairs.
10.1.2SubmissionTimes
Thesecondthingtolookatiswhenstudentsaresubmittingtheirwork.Whilewecertainlystillwantstudentstoattendlab,improvingthe"athome"experienceforstudentswouldbeasignificantbenefit.Here,weseethatstudentsarechoosingtousetheautograderathome,withusagepatternsthatyouwouldexpectfromundergraduates.
Thesechartsshowtheoverallsubmissiontimesforeachofthethreelabs.Theredbarindicatesthedatethatthelabwasdueforfullcredit.
Date
Figure16:Lab11:submissiontimesbyday.
Date
28
-
Figure17:Lab12:submissiontimesbyday.
Date
Figure18:Lab14:submissiontimesbyday.
Thesegraphsdon'trevealanythingtoosurprising.Forthemostpart,studentsareusingtheautogradertocompletelabsatthesamepacetheynormallywould.
Wecanlookatthedaysoftheweekaswellasthetimeofdaytoseewhenstudentsareworkingonlabs.Normallabtimesareusuallybetween9:00-19:00,thoughthisvariesbythedayoftheweek.
HourofDay(24-hours)
Figure19:Numberofautogradersubmissionsbyhouroftheday.
Dayofweek
Figure20:Numberofautogradersubmissionsbydayoftheweek.
Again,thesegraphsarefairlyclosetowhatcoursestaffwouldexpect.Moststudentsarecontinuingtocompletelabsduringtheirscheduledtime,butasignificantnumberareworkingathome.
29
-
10.1.3StaffTimeSavings
Whiletimesavingsarenotaprimarymotivationforthiswork,weshouldconsiderthepotentialbenefitsthatcouldbesavedbeallowingstudentstogetcheckedoffusingtheautograder.Anecdotally,orallabcheckoffstakebetween2to10minutestocomplete,withtheaveragetimesomewherearound3to4minutes.Thecurrentusagepatterns(aswellastheresultsinthenextsection),suggestthatwecanexpect33%-50%ofstudentstousetheautograder.
Ifwehave15labswhichuseSnap!,and150studentscompletinglabs,andassumethatalabcheckofftakes3minutes:33%ofstudentstheautograderwouldfreeupapproximately38hoursofTAtime.(Thisisslightlyunder30%ofthetotalworkloadforasingle8hourperweekTAappointmentatBerkeley.)
However,ifwehave15labswhichuseSnap!,and300studentscompletinglabs,andassumethatalabcheckofftakes4minutes:50%ofstudentstheautograderwouldfreeupapproximately150hoursofTAtime.(Thisis110%ofthetotalworkloadforasingle8hourperweekTAappointmentatBerkeley.)
Naturally,thismightleadonetoquestionthevalueoforallabcheckoffs.That'snotthegoalhereatall.Orallabcheckoffshavebeenanincrediblepositivepedagogicaltool.Whilewritingtestscantakesometime,thehopeisthatthecostwouldaromatizeitselfwellovermanysemesters.
10.1.4SubmissionPatterns
Wecanalsolookathowoftenstudentsattempteachquestion.Thisshowswhetherstudentsareusingtheautogradermoreasafeedbacktool,asacrutchorassimplyacreditmechanism.Whilethere'snoclearexactnumberoftimesastudentshouldusesubmittheirwork,it'sclearthatwewantanoverall'happy-medium'.
TimesSubmitted
Figure21:Moststudentsappeartoonlysubmitonceattheendoftheirwork.
Thedatashowthatmoststudentsappeartobesubmittingonlyonce,meaningthey'renotcurrentlygettingmuchbenefitbythefeedbackpresented.Ifthereweremorefeedbackpresented,orpotentiallymorechallengingquestionsthismightchange.Thoughnotyetimplemented,non-gradedfeedbacksuchascodequalitysuggestionsmightchangethewaystudentsworktousetheautograder.
30
-
TimesSubmitted
Figure22:Numberofstudentsbynumberoftimessubmittedforeachlab.
Theseresultsalsohowareallylongtailforthenumberofsubmissionsbysomestudents.Thisistobeexpectedwithanyautograder.However,fromlookingatthedata,andfromTAreportsmanyofthesehighsubmissionnumbersmaybeduetothepreviouslymentionedbugs.(Studentstriedsubmittingmanytimessimplyhopingthattheerrorswoulddisappear.)
However,thedifferenceinattemptsforlab14isprettyclear.Whilemoststudentsstillonlysubmittedafewnumberoftimes,thereisamuchwiderdiversityinthenumberofquestions.Thenumberofblocksgradedforlab14was4comparedtothe1or2forlabs11and12.
10.2SurveyFeedback&Analysis
Alongwiththedatawecollected,wesurveyedstudentsattwopointstogetfeedbackabouttheiruseoftheautograder.ThefirstsurveyoccurredduringtheCS10midterm,shortlyafterstudentsshouldhavecompletedlabs11and12.Thesecondsurveywasgivenalongwiththefinalexam,attheendofthesemester.
10.2.1MidtermFeedback
Thefirstquestionweaskedstudentswaswhethertheypreferredonlinelabcheckoffsororallabcheckoffs.
31
-
Figure23:Studentsarefairlyevenlysplitbetweenpreferringonlinevsorallabcheckoffs.
WhendiscussingwithTAsthisisalmosttheexactoppositeoftheresultstheyinitiallyexpected.TAs(andinstructors,includingfromothercourses)expectedalmostallstudentstopreferusingtheautogradertogettingcheckedofforally.(Ratiossomestaffsuggestedwere75/25,80/20,90/10infavorofanautograder.Whilenoonehadexactlythesameprediction,everyonewespoketoassumed>=70%studentswouldpreferanautograder.)Thisvalidatesthatwehavedoneaprettygoodjobindesigningandtweakingtheorallabcheckoffsystem(whichhasevolvedoverthepastfewyear),butalsothatstudentsliketalkingtoTAsandthatstudentsarenotsimplytryingtobeatthesystem.Instructionalstaffhadconcernsaboutstudentspreferences,becausemanyassumedthattheautograderwouldbethepathofleastresistanceforstudents.It'sworthnoting,howeverthatpartofthebiastowardsin-personoveronlinecheckoffscouldverylikelybeattributedtobugsinthesystemearlyon,butthisisstillapromisingresult.(Manyopen-endedfeedbackcommentsmentionedbugsorglitchesasareasonforthefeelingsabouttheautograder.)
Hereisasamplingofcommentsstudentsgave.Eachofthesecommentsisrepresentativeoffeelingsofotherstudents.
"Itwouldsavealotoftimeduringcheckoff."
"Yougettotalktoanactualperson!Ifanyquestionsremain,theycaneasilybeanswered!"
"IfIdon'tfinishinclass,Iammoreincentivizedtofinishitathomeinsteadofatnextlab."
Itworkedforme,butIguessdoingmanuallabcheck-offwasbetter,inaway,becauseyougetfeedbackfromtheTA.
Aheads-upwouldhavebeennice,aswellassomethingexplaininghowtouseit,butIlikethisfeaturealot.Whenitworksproperly,itisveryusefulandhelpfulformylearning.
Idon'ttrustit.
Theautogradesometimeshasbugs.Forexample,ifmycodeiswrongbutsometimesstillreportedthecorrectvalue,itwouldmarkmeasapass.
Manystudentspreferredthehigherfidelityofinpersonquestionaskingandanswering.Whatsinterestingisthatautogradedcheckoffswouldntpreventstudentsfromgettingtheirquestionsanswered.Manystudentsremarkedaboutdifferentlevelsofstresswhichcomefromautomatedsystemsorhumans.Whilestudentsdidntelaborateonwhytheywouldbemore(orless)stressedwithanautogradersystem,themostprobableansweristhatsuchaviewisreallyamatterofpersonalpreference.Manystudentsdoenjoythat(human)TAsaremuchmorerelaxedandcomfortabletodiscussquestionsandgenerallyforgivingoferrors.Incontrast,theautograder(currently)canonlyprovidestaticresponses,andisverybrittlewhendeterminingcorrectness.However,somestudentsalsonotedthattheydprefernottotalktoaTA"unlessabsolutelynecessary"andthattheyfoundtheautomatedformateasiertodealwith.Wethinkthisisyetanotherdemonstrationthataone-size-fits-allmodelisn'tusuallythebestapproach.
32
-
Whenweaskedforgeneralfeedbackaboutthetool,mostofthecommentswerethatitwasconfusingtouse.Thisisunderstandable,becauseinretrospecttherewasnotenoughdocumentationnorTAtrailingbeforetheautograderwaslaunchedinclass.Fortunately,thisisafairlyeasyproblemtoaddressforfuturesemesters.
10.2.2FinalSurveyFeedback
Onepromisingresult,isthatweaskedstudentsforfeedbackontheoveralluseoflabcheckoffs.Whilethereis(notsurprisingly)alargecontingentofstudentswhodislikethelabcheckoffs,mostofthestudentsinsupportofcheckoffsspecificallyaskedformoreautograderquestions.
Whenaskedforfeedbackspecifictotheautograder,studentsseemedmorepositivethantheydidonthemidtermsurvey.Ofthestudentswhohadcomplaints,"bugs"and"glitches"werethebiggestreasons.However,aswiththemidtermsurvey,thereisalargenumberofstudentswhoareverystronglyinfavoroftheopportunitytotalktoTAs.
Thefinalquestionsthatweaskedmostlybackupthedatathatwascollected.
Figure24:Moststudentscompletedcheckoffsalone,andfoundthefeedbackeasytointerpret.
Figure25:Moststudentsreportedcompletingworkbeforeusingtheautograder
Sofartheseresultsconfirmthedatathat'sbeencollected.Inthefuture,thissuggestsTAscanintroducetheautogradertostudentsaswellaspresentthemotivationsforhowstudentsshouldthinkaboutusingitasastudytool.
33
-
34
-
11FutureWorkThusfar,wehaveshownthecurrentsystemhasenoughcapabilitytobeusedintheclassroom.However,weseeanumberofareaswherecouldbeexpanded.Additionally,someoftheanalysishasrevealedtheneedtomorecloselyexploreatsomespecificresearchquestions.
11.1Server-Side
Theserver-sideimprovementstoarenumerous,butwewantedtohighlightafewthatwethinkcouldprovidethebiggestbenefittostudentsandinstructors.
QuestionTagsandrecommendations.'sdatamodelalreadyincludesquestion-leveltags,buttheyaren'tcurrentlyusedforanything.Wethinkitwouldbeimmenselyhelpfulasastudytoolifstudentscouldberecommendedproblemsbasedonrelatedtagsforquestionstheyhavestruggledwith.
Better(User-Friendly)DashboardsWritingSQLqueriestooperatrethecurrentdashboardsisnotaviablesolutionforthefuture.Whenaloggedinasanadministratorthereshouldbelinksforeachitem(likeacourseoraquestion)forinstructorstoviewbasicstatistics.Supportdrillingdowntoindividualtestcaseswhenanalyzingquestions.Awellwrittenautogradertestcoversmanytypesofmistakes.TAsshouldgetfeedbackaboutwhichparticulartestsstudentsarestrugglingwiththemostsothattheycanaddressthoseconceptsinclass.
Integrateddisplayofquestioninstructions.Thecurrentversionassumesthatinstructorswillgivestudentsdirectionsinsomeplaceotherthantheautograder.WhilethisworksOK,itwouldbemuchnicerifstudentscouldchosetohideorshowinstructionsondemand.ThisisactuallyafeatureregressionfromedX,wheretheautograderwasalwaysembeddedalongsidethecoursecontent.
11.2Autograding
Asidefromcontinualbugfixes,therearefourkeyareasforimprovingtheautogradingcapabilities.
1. Clearer,moreexplicitfeedbackThecurrentoutputprovidedbytheautograderisfocusedmostlyontheresultsofindividualtestcases,bycomparingtheexpectedvsreturnedoutputsofblocks.Whiletestauthorsareabletoworkindividuallywrittenfeedbackintothestestcases,theoverallpresentationisn'tveryclear.Furthermore,becausethere'snotmuchroomonthescreenforgoodfeedback,authorsdon'thaveanincentivetowriteasmuch.
2. Snap!TestAPIimprovementsCurrently,itcanbefairlycumbersometowritetestsinJavaScript.Thereisalotofboilerplatecode,andcleanlypresentingimagesofblocks(insteadofjustatextversion)requiresalotofeffortfortestauthors.Oneinitialsolutiontothisproblemisthemigratethecurrentobject-orientedAPItoaversionwhichisbasedmorearoundconfigurationthancode.Whileitcouldn'tbe100%configurationbecauseauthorswillneedtowritecustomtestfunctionsinsomecases,aformatwhichoperatesmorelikeaJSONfilewouldbeeasiertomanage.Snap!testsshouldbewritableinSnap!.ByusingaSnap!featurecalled"codification"weareabletocompileSnap!codetoJavaScript(oranyotherlanguage).HereisanexampleofaprototypeSnap!libraryforwritingtests.BycompilingSnap!to
35
-
JavaScriptwecanalsoimprovethesecurityofteststhatarewritten,becausethesetoffunctionsavailableinSnap!iseasiertorestrict(andcatchviolations)thanJavaScript.
3. StaticanalysiscapabilitiesWe'veincludedsomeverybasicstaticanalysiscapabilitiesinthecurrentversion.However,they'retrickytouseandrequirealotofcustomcode.Throughstaticanalysisweshouldbeabletogivestudentstargetedfeedbackabouthowtoimprovetheircode.Sometoolsthatwouldbeusefularepartternmatchingfunctionsforcodestructure(likefindingallif(condition){returncondition}typestructures).
4. Image-basedautogradingWe'veexploredimagebasedautogradingforproblemslikefractalsandotherturtlegraphicsexercises.ImageswouldbeanOKwaytojudgecorrectness,buttheyprovideverylimitedfeedbacktostudentsabouthowtoimprove.Somepathsforimprovingthefeedbackfromimage-basedautogradingwouldbetotrytoaccountforspecificerrors.Forexample,youcouldrotatetwoimagesuntilyoufindtheminimumdifference.Ifarotationmakestwoimagesmatch,thenyoucouldprovidefeedbackaboutwheresuchanerrormightoccur.OthermoreadvancedalgorithmssuchasRANSACcouldbeexplored.
11.3ResearchDirections
Finally,nowthatwe'veshownstudentsareinterestedinusinganautograder,weshouldcarryoutmoretargetedresearch.
Whattypesofinterventionshelpstudentslearnorcompletelabs?Insteadofsimplypresentingfeedbackwhenstudentsfailatest,weshouldhavetargetedinterventionsthataskquestionsortryadifferentmethodforsolvingtheproblem.
Canweautomaticallygeneratebetterhintstogivetostudents?Thisisahighlyactiveareaofresearch,andnowthatiscollectionstudentdatawecouldbeabletousethistoimprovethefeedbackforfuturecourses.
HowdoestheautograderaffectmotivationinearlyCScourses?Somestudentsareunderstandablyconcernedabouttheaffectsautograderhaveontheirmotivationtocompleteatask.Whilethereissomeexistingworkonthistopic,webelievethere'sroomtoexplorethisinthecontextofvisualprogramminglanguages.
36
-
12ConclusionWepresent,asystemforautogradingSnap!programs.Thetoolcontainsbothanovelfront-endstudent-centricautograderthatpresentsimmediatefeedbacktostudents,aswellasabackenddatabasethatcanserveasquestionrepository.AftersuccessfullybeingusedonedX,we'veshownthatthetoolscanbeadoptedinatraditionalclassroom.ToaccomplishweimplementedtheLTIprotocol,sothatmaybeusedwithasmanyLMSsarepossiblewiththegoalofreachingmorestudents.Additionally,we'veseenthebenefitsofhavingcompletecontrolofyourinfrastructure.
Ourinitialsurveyshaveshownthatwhilemanystudentsenjoyusinganautograder,wemustbecarefulnottoreplaceinstructionalstaffwithtechnology.Thestressthatmanystudentsreportedfromtheautogradersuggestthatthereisalotmorehumanfactorsresearchthatcanbedoneaboutapplyingnewtechnologiestotheclassroom.Still,weareconfidentthatsmartclassroompolicieswhichgivestudentschoicewillcontinuetoallowforthebenefitsofautograderswithoutpenalizingstudentswhoareuncomfortablewiththem.Thisdoesleaveconcernforonline-onlylearningenvironments,butweseeevenaprimitiveautograderasabigstepabovenothing.Futureinterfaceenhancementsandnewfeaturesforstudyingwillhopefullyonlyaddtothebenefitsstudentsseewhileusingtheautograder.
Despitesomesuccesses,weknowthatthereisalongwaytogo.Wearestillonlyabletogradeasubsetofallpossibleproblemswecouldassigntostudents.Therearemanyactiveresearchareasintechniqueslikeautomatichintgenerationwhichwethinkwillproveusefulinthefuture.However,wealsorecognizethatdespitethepromise,weneedtospendtolowerthebarriertoentryforwritingnewcontent,bothintermsofsavingtimeandfollowingpedagogicalbestpractices.Fortunately,wethinkthereisaclearpathforwardinthisarea.
Aboveall,wehaveshownthatitispossibletobuildanautograderforavisualprogrammingenvironment.Thoughthesetoolsdevelopedoutofaneedtoscaleteachingcomputerscience,webelievethathavethepotentialtogreatlyenhancetraditionalclassrooms.Asanareaofactiveresearchweareexcitedtoseethedirectionsthatautogradingvisualprogramminglanguagesmaytake.
37
-
13References
1. UndergradComputerScienceEnrollmentsRiseforFifthStraightYearCRATaulbeeReport-GovAffairs, http://cra.org/govaffairs/blog/2013/03/taulbeereport/
2. Cuny,JaniceandNSF,nsf09534BroadeningParticipationinComputing(BPC)|NSF-NationalScienceFoundation, 2009.https://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=13510&ods_key=nsf09534
3. ExploringComputerScience,http://www.exploringcs.org/
4.Board,TheCollegeandDiez,Lien,APComputerSciencePrinciples-ANewAPCourse-AdvancesinAP-TheCollegeBoard|AdvancesinAP,https://advancesinap.collegeboard.org/stem/computer-science-principles
5. Harvey,BrianandGarcia,DanielandDevelopment,CorporationEducational,BJC-BeautyandJoyofComputing, http://bjc.berkeley.edu/
6. AboutUs|edX,https://www.edx.org/about-us
7. Harvey,BrianandMnig,Jens,Snap!(BuildYourOwnBlocks)4.0,2016.http://snap.berkeley.edu/
8. Scratch-Imagine,Program,Share,https://scratch.mit.edu/
9. UCBerkeley},CS10|Spring2016,http://cs10.org/sp16/
10. {Code.orgCodeStudio,https://studio.code.org/
11. Fraser,Niel,Blockly|GoogleDevelopers,https://developers.google.com/blockly/
12. Code.org,code-dot-orgGithub,2016.https://github.com/code-dot-org/code-dot-org/tree/staging/apps
13. Johnson,DavidE.,ITCH,Proceedingsofthe47thACMTechnicalSymposiumonComputingScienceEducation-SIGCSE'16,ACMPress,2016.http://dl.acm.org/citation.cfm?id=2839509.2844600
14. Soloway,ElliotandGuzdial,MarkandHay,KennethE.,Learner-centereddesign:thechallengeforHCIinthe21st century,ACM,1994.http://dl.acm.org/citation.cfm?id=174809.174813
15. ExploringaStructuredDefinitionforLearner-CenteredDesign, http://www.umich.edu/~icls/proceedings/pdf/Quintana2.pdf
16. GettingStartedwithRailsRubyonRailsGuides,http://guides.rubyonrails.org/getting_started.html#what-is-rails-questionmark
17. 4.3.CustomJavaScriptApplicationsOpenedXDeveloper'sGuidedocumentation,http://edx.readthedocs.io/projects/edx-developer-guide/en/latest/extending_platform/javascript.html
18. IMSGlobalLearningConsortium,https://www.imsglobal.org/
19. LearningToolsInteroperabilityv1.1ImplementationGuide|IMSGlobalLearningConsortium, https://www.imsglobal.org/specs/ltiv1p1/implementation-guide
20. Huang,RachelandKuphaldt,Adam,LabCheck-OffQuestions,2016.http://cs10.org/sp16/labquestions/
38
http://cra.org/govaffairs/blog/2013/03/taulbeereport/https://www.nsf.gov/publications/pub{\_}summ.jsp?WT.z{\_}pims{\_}id=13510{\&}ods{\_}key=nsf09534http://www.exploringcs.org/https://advancesinap.collegeboard.org/stem/computer-science-principleshttp://bjc.berkeley.edu/https://www.edx.org/about-ushttp://snap.berkeley.edu/https://scratch.mit.edu/http://cs10.org/sp16/https://studio.code.org/https://developers.google.com/blockly/https://github.com/code-dot-org/code-dot-org/tree/staging/appshttp://dl.acm.org/citation.cfm?id=2839509.2844600http://dl.acm.org/citation.cfm?id=174809.174813http://www.umich.edu/{~}icls/proceedings/pdf/Quintana2.pdfhttp://guides.rubyonrails.org/getting{\_}started.html{\#}what-is-rails-questionmarkhttp://edx.readthedocs.io/projects/edx-developer-guide/en/latest/extending{\_}platform/javascript.htmlhttps://www.imsglobal.org/https://www.imsglobal.org/specs/ltiv1p1/implementation-guidehttp://cs10.org/sp16/labquestions/
-
39
-
Appendix A:ObtainingtheSourceCode
ThecodeforallprojectsinthisreportisavailableinvariousrepositoriesonGithub.
cycomachead/lambdacontainsthemainRailswebapplication.cycomachead/lambda-evaluatorcontainstheJavaScriptsourcethattalkstoSnap!.jmoenig/Snap--Build-Your-Own-BlockscontainsthesourceforSnap!.cycomachead/thesiscontainsthesourceforthiswork.TheCS10organizationcontainsinfoabouttheCS10course.Thebeautjoyandbjc-edcorganizationscontainsthecurriculumusedinedXandCS10.
ThegraphsinthisreportweregeneratedprimarilyusingtheblazergemforRails,andthisreportwasproducedusingtheGitBookebooktool.Youcanreaditonline.
40
https://github.com/cycomachead/lambdahttps://github.com/cycomachead/lambda-evaluatorhttps://github.com/jmoenig/Snap--Build-Your-Own-Blockshttps://github.com/cycomachead/thesishttps://github.com/cs10https://github.com/beautjoyhttps://github.com/bjc-edchttps://github.com/ankane/blazerhttps://gitbook.comhttps://cycomachead.gitbook.com/thesis
AbstractAcknowledgementsList of Figures
top related