λ - an autograder for snap! - eecs.berkeley.edu · fig 17 lab 12: submission times by day. ......

Click here to load reader

Upload: phungtram

Post on 23-May-2018

214 views

Category:

Documents


2 download

TRANSCRIPT

  • Lambda: An Autograder for Snap!

    Dan Garcia

    Electrical Engineering and Computer SciencesUniversity of California at Berkeley

    Technical Report No. UCB/EECS-2018-2http://www2.eecs.berkeley.edu/Pubs/TechRpts/2018/EECS-2018-2.html

    January 16, 2018

  • Copyright 2018, by the author(s).All rights reserved.

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission.

    Acknowledgement

    The work presented here exists due to the contributions of many people andgrants from the National Science Foundation (grant 1138596), Google, andedX. Building an autograder would not be possible without BJC,Snap! and CS10, which have collectively been supported byhundreds of dedicated students and teachers over the past seven years. In particular, a huge **Thank You** to the following people who helpedmake this possible. Thanks especially to Lauren Mock for always being there,and for all the advice and help! Everyone has been incredibly supportive overthe past year.

  • Copyright 2015, by the author(s).All rights reserved.

    Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and thatcopies bear this notice and the full citation on the first page. To copyotherwise, to republish, to post on servers or to redistribute to lists,requires prior specific permission.

  • An Autograder for Snap!

    by Michael Ball

    Research Project

    Submitted to the Department of Electrical Engineering and Computer Sciences, University of California at Berkeley, in partial satisfaction of the requirements for the degree of Master of Science, Plan II. Approval for the Report and Comprehensive Examination:

    Committee:

    Professor Dan Garcia Research Advisor

    May 13, 2016

    (Date)

    * * * * * * *

    Professor Armando Fox Second Reader

    May 13, 2016

    (Date)

  • 1 2

    2 3

    3 4

    4 6

    5 7

    6 11

    7 14

    8 19

    8.1 19

    8.2 24

    9 25

    10 27

    10.1 27

    10.2 31

    11 35

    12 37

    13 38

    14 40

    TableofContentsAbstract

    Acknowledgements

    ListofFigures

    Introduction

    RelatedWork

    DesignPrinciples

    InterfaceWalkthrough

    Implementation

    TheWebApplication

    TheAutograderInterface

    ExperimentalSetup

    Results

    UsageAnalysis

    SurveyAnalysis

    FutureWork

    Conclusion

    References

    AppendixA:SourceCode

    1

  • 1AbstractWhilevisualprogramminglanguagesarehardlyanewdevelopment,recentpushesforincreasedequityandaccesstocomputersciencehaveledtoarenewedinterestanduseofvisualprogramminglanguages.Autogradersareacriticalcomponentofbothin-personandonlinecomputersciencecourses.However,untilthispoint,therehaven'tbeenanyautogradersdocumentedforgeneral-purpose,visualprogrammingdevelopmentenvironments.Thisisaparticularshortcomingasintroductorycourseshavescaledtolargernumbersofstudentsandonlineenvironments.Whiletherearechallengestousingautograders,webelievethattheinstantfeedbackcapabilities,aswellaspotentialtimesavingsforcoursestaffwillhelpusteachagreaternumberofstudents.

    Overthepastyear,wehavebuiltanautograder,named(Lambda),forSnap!,avisualblocks-basedprogramminglanguageinspiredbyMIT'sScratch.TheprimarymotivationfordevelopingtheautograderwastorunaseriesofMassiveOpenOnlineCourses(MOOCs)onedx.orgthroughoutthe2015-2016academicyear.However,wealsowishtousetheautogradertobettersupportin-personcomputersciencecourses.InSpring2016,theSnap!autograderwasusedasapartofUCBerkeley'sCS10,TheBeautyandJoyofComputing,a"CS0"non-majorscourse.

    Thisreportdescribes,whichconsistsofa"backend"RubyonRailswebserverthatallowsustousetheautograderinaclassroomsetting,throughaprotocolcalledLTI.Thebackendwebapplicationcontainsadatabaseofquestionsandtestfiles,whiletheSnap!interfacecontainsnewfeaturesandaviewtopresenttheresultsoftheautograder.OurinitialresultsshowtheautogradersuccessfullybeingusedinCS10,wheretheautograderwasusedtosupplementorallabcheckoffs,andonedx.orgwheretheautograderwastheprimarymethodforstudentstoreceivecreditforcode,gradedbothforeffortandcorrectness.

    2

  • 2AcknowledgementsTheworkpresentedhereexistsduetothecontributionsofmanypeopleandgrantsfromtheNationalScienceFoundation(grant1138596),Google,andedX.BuildinganautograderwouldnotbepossiblewithoutBJC,Snap!andCS10,whichhavecollectivelybeensupportedbyhundredsofdedicatedstudentsandteachersoverthepastsevenyears.

    Inparticular,ahugeThankYoutothefollowingpeoplewhohelpedmakethispossible.ThanksespeciallytoLaurenMockforalwaysbeingthere,andforalltheadviceandhelp!Everyonehasbeenincrediblysupportiveoverthepastyear.

    DanGarciaforbeingamentorforthepastfiveyearsArmandoFoxforreviewingthisworkBrianHarveyandJensMnigforSnap!MaxDougherty,TinaHuang,YifatAmirandPatrickO'HalloranfortheirincredibleworkontheautograderTheCS10stafffortestingunproventechnologyandfordealingwithadditionalstudentquestions!

    TAs:RachelHuang,AdamKuphaldt,AlexMcKinney,AmrutaYelamanchili,AranyUthayakumar,ErikDahlquist,JannaGolden,JosephCawthorne,LaraMcConnaughey,WilliamTang,YifatAmir,AndySchmitt,StevenTraversi,andVictoriaShiInstructors:JustinHsiaandGeraldFriedland

    3

  • 3ListofFigures

    Fig1AnexampleSnap!program.

    Fig2AtypicalexampleofBJCcurriculumwhichincludesgraphicaloutput.

    Fig3AtypicalexampleofaCodeStudioproblemthatgivesstudentsonlyafewblockstoworkwith,andhasafairlyconstrainedsolutionspace.

    Fig4Theinitialpageisalistofquestionstotry.

    Fig5Administratorshaveadditionalfunctionality.

    Fig6Creatinganewcourseisasimpleactionwhichrequireslittleinformation.

    Fig7Adashboardshowingthefirsttwolabsthesubmissiontimesforautograderrequestsfor.

    Fig8Theinitial(edX)versionwhichhadaheavilyintegratedfeedbackbutton.

    Fig9Updatedcontrolsfortheautogradershowingadropdownmenu.(Thecontrolsforrevertingsubmissionsaregreyed-out.)

    Fig10Anexampleofthefeedbackpresentedwheneverythingiscorrect.

    Fig11Anexampleoffeedbackshowingsomefailingcases.

    Fig12Snap!canbeembeddedinedXthroughJSInput.

    Fig13Whenastudentclicksonthelink,anewtabwillopenwiththeproperquestiontheyareassigned.Clickingthe"GetFeedback"buttontriggersasubmissionwhichsendsthegradebacktotheLMS.

    Fig14AverybasicLTIlaunchsequence.ImagefromtheIMS{{"ims-img"|cite}}.

    Fig15Numberofstudentsbynumberofquestionsattempted

    Fig16Lab11:submissiontimesbyday.

    Fig17Lab12:submissiontimesbyday.

    Fig18Lab14:submissiontimesbyday.

    Fig19Numberofautogradersubmissionsbyhouroftheday.

    Fig20Numberofautogradersubmissionsbydayoftheweek.

    Fig21Moststudentsappeartoonlysubmitonceattheendoftheirwork.

    Fig22Numberofstudentsbynumberoftimessubmittedforeachlab.

    Fig23Studentsarefairlyevenlysplitbetweenpreferringonlinevsorallabcheckoffs.

    Fig24Moststudentscompletedcheckoffsalone,andfoundthefeedbackeasytointerpret.

    Fig25Moststudentsreportedcompletingworkbeforeusingtheautograder

    4

  • 5

  • 4IntroductionInrecentyears,thefieldofcomputerscience(CS),andthebroadertechnologyindustry,haveundergonesignificantchanges,thoughnotsimplytechnicalones.Aroundthesametimethatenrollmentsstartedbooming[1],theNationalScienceFoundation(NSF)postedtheoriginalsolicitationforBroadeningParticipationinComputing[2].AsaresultoftheeffortstoimprovediversityinCS,curriculasuchasExploringComputerScience[3](ECS),andAPComputerSciencePrinciples[4](APCSP)haveemerged.Manycurriculahavestartedtousevisualprogramminglanguages(VPLs),alsoknownasblocks-basedlanguages,astheirprimarytool.Asgreatastheseenvironmentscanbe,theyoftenlackresourcesandtoolsfoundinmoretraditionalenvironments(suchasPythonorJava).Thislackofinfrastructurecanmakecoursesmoredifficulttoteachandscale,particularlytoentirelyonlineenvironments.

    Weaimtohelpstudentslearningprogrammingby:

    Deliveringthebestresourcestopossibletocompletelyremotestudents,andImprovingthecapabilityandefficiencyofin-personteachingassistants(TAs).

    Oneexampleofmissingresourcesisthecapabilitytoautomaticallyevaluatecode,andsendtheresultstoanotherprocess.Automaticevaluationanddistributingtheresultsarecentralcomponentsinanautograder.Withoutanautograder,itcanbedifficult(ifnotprohibitivelyexpensive)toteachlargeonlinecourses.Duringthe2015-2016schoolyear,wetaughtTheBeautyandJoyofComputing[5](BJC)asaseriesoffourmassiveopenonlinecourses(MOOCs)onedX[6].BJCisanAPCSPrinciplescoursethatusesSnap!asitsprimarylanguage.Withoutanautograder,wedidnotthinkthatwecouldfairlygivecredittostudentsonline.AtBerkeley,BJCisofferedasCS10,whichcoulduseanautograderforSnap!.

    WedevelopedasasystemforautogradingSnap![7].iscomposedoftwomainparts,theRubyonRailsapplicationserver,andaJavascriptapplicationthataugmentsSnap!withtesting,analysisandloggingcapabilities.WhilethegradinginterfacewasdevelopedforedX,wereportonitsuseintheclassroomandsomerecommendationsforfuturedevelopment.WealsohopethatwillbeusefuloutsideCS10,throughtheimplementationoftheLearningToolsInteroperability(LTI)protocolwhichshouldallowoursystemtobecompatiblewithmostLearningManagementSystems.Wehavebuiltadditionalfeatureswhichshouldallowflexibilitytointegratetheautograderintoavarietyofclassroomenvironments.Thesefeaturesaredescribedinthechapteronimplementation.

    4.1Snap!Snap!isablocks-based,drag-and-dropprogramminglanguageinspiredbyMIT'sScratch[8],butadaptedforuniversity-levelcoursesbyincludingfeaturessuchasfirst-classlistsandcustomfunctions(blocks).Snap!isimplementedasawebapplicationinJavaScript,whichmakesitcompatiblewithmobiledevices.Figure1showsabasicdrawingscriptbeingexecuted.

    6

  • Figure1:AnexampleSnap!program.

    4.2TheBeautyandJoyofComputing

    TheBeautyandJoyofComputing(BJC)isanintroductiontocomputingdesignedtobroadenparticipationamongunderrepresentedgroups.TheprimarylanguageusedisSnap!,andmanyoftheexercisesinthecoursehavevisualoutputs.Thecoursecoversfunctionsandabstraction,recursion,higherorderfunctions(lambdas)andmanyothertopics.ManyoftheexamplesandexercisesinBJChavemultiplecorrectpathstoimplementation,whichcanbeachallengeforautograderstohandle.

    Figure2:AtypicalexampleofBJCcurriculumwhichincludesgraphicaloutput.

    7

  • 4.3CS10

    CS10[9]isUCBerkeley'sofferingofBJC.Currently,CS10isofferedeverysemesterto200-300students.Likemanyotherintroductorycourses,CS10reliesheavilyonlaboratorysectionsastheprimarymethodforstudentstolearntoprogram.willbeusedinCS10duringlabsectionstogivestudentsbetterfeedbackastheyareworkingandtogivecreditfortheassignedlabwork.Thisyear,wewereabletotrialthesystemforthreeofthe14labsthatstudentscompletedinSnap!.

    4.4CorrectnessandEffort

    Whiledesigningtheautograder,weconsideredtwoprimarytypesofgrading:correctnessandeffort.Correctnessissimple:Doesisgivenblock(orscript)dowhattheinstructionssay?Effortismoreopen-ended,butthegoalistobeablerewardstudentsfortryingtocompleteaproblem,aswellastoencourageexperimentationwithSnap!'smanyfeatures.Inboththesecases,wewouldliketheautogradertoberobusttodifferentcorrectsolutions,orthemanydifferentwaysstudentscandemonstrateeffort.

    8

  • 5RelatedWorkInbuildinganddesigninganautograder,thereissurprisinglylittleworkpublishedaboutautograders,andevenlesssoaboutthoseforvisualprogramminglanguages.WhiletherearemanyexamplesofautogradersforlanguageslikePythonandJava,thereareonlytwothatweareawareofforblocks-basedlanguages,whicharebothveryrecentdevelopments.

    5.1Code.orgCodeStudio

    Code.org'sCodeStudio[10]isanonlineinteractiveenvironmentbasedaroundtheopen-sourceBlocklyenvironment[11].CodeStudioincludescustomfeedback,butthevastmajorityoftheproblemsareinahighlyrestrictivespacewherestudentsaregivenonlyalimitedsetofblocks.WhileCodeStudioisopensource[12],theautograderandfeedbackmethodologiesarenotdocumented.

    Figure3:AtypicalexampleofaCodeStudioproblemthatgivesstudentsonlyafewblockstoworkwith,andhasafairlyconstrainedsolutionspace.

    5.2ITCH:IndividualTestingofComputerHomeworkforScratchAssignments

    ITCHisanautograderforScratchrecentlypresentedatSIGCSE2016[13].However,ITCH'smethodofautogradingisthroughaninstructor-initiatedscriptwhichshouldberunafterallassignmentsubmissionsarecollected.Thisisafairlycommonscenariofortypicalcodeautograders,butnotamodelwewouldusebecauseourgoalisinstantfeedback.ITCHtakesasimilarmethodtoourSnap!autograderby'spoofing'userinputswhenrequested,butitsharesmanyofthesamelimitations,likealackoffeedbackforgraphicaloutput.

    9

  • 10

  • 6DesignPrinciplesEducationaldesignprincipleshelpedinfluencethedesignoftheautograderandthedirectionforcreationofexercises.Whilethesedesignprincipleshelpedinfluencefeaturesandtheuserinterfaceoftheautograder,theyarealsoextremelyimportantasaguidelineforhowinstructorsandcontentauthorsshouldthinkaboutwritingexercisesandimplementingthemintheclassroom.

    6.1DualModalities

    Acentralchallengefacedwhenconsideringdevelopmentoftheautograderwaswhattoprioritize:FeedbackorGrading?Whilebothfeaturesarenecessaryforeachother,thereisanacutetensionbetweenatoolwhichisprimarilymotivatedbyprovidingopen-endedfeedback,andonethatisdesignedtoprovidegrades.Thegoalofthissectionistoconsiderthebestpossibleinterfacewecouldgivetostudentstohelpthemimprovetheirprogrammingskillsandcompletelabexercises.

    Akeyaspectofthistensionishowtohandletheideaofcorrectness.Inanintroductorycomputersciencecourse,weareoftenlenientwithsmalldifferencesinoutputfromaprogram.(Forexample,Snap!allowsstudentstousebothtraditionalarraysandlinkedlists,buttheirvisualoutputisthesame.Iftestauthorsarenotcareful,itiseasytomistakeonetypeofdataasincorrectwhenitshouldbeaccepted.)Whileitisoftenpossibletoaccountforthesedifferenceswhenwritingtestcases,itcanbesignificantlydifficult.Weneedtomakesurethatwhenthesetoolsareusedforgrading,theydonotcausestudentsunnecessarystressorfrustration.

    6.2Learner-CenteredDesign

    Learner-centereddesign(LCD)isadesignprincipleadaptedfromuser-centered-design(UCD)[14][Citationnotfound][15].BothLCDandUCDdesignprinciplesstartbyestablishingattributesabouttheuser'sgoals.ForLCDthereare4mainattributes:

    Learnersdonotpossessthesamedomainexpertiseasusers.

    Learnersareheterogeneous.

    Learnersmaynotbeintrinsicallymotivatedinthesamemannerasexperts.

    Learnersunderstandinggrowsastheyengageinanewworkdomain.

    WhilelanguageslikeSnap!aredesignedtobeeasiertolearn,theydonotnecessarilyemployLCDprinciplesbecausetheyarestillintendedasageneral-purposeprogrammingenvironment.Wecanmakesureourautogradertakesintoeachoftheseprinciples:

    Domainexpertise:Programminglanguageshavetoshowerrormessagesthatcouldmakesenseinanysituation.Unfortunately,thismeanstheyusuallyfalltothelowestcommondenominatortypecases,anddonotprovideanycontextualinformationtotheuse.InSnap!,itisnotuncommontoseeamessagethatissimilartoTypeError:expectingalistbutgotnumber.Wecanimproveuponthesemessagesbyshowingstudentshintswhicharespecifictotheproblemathand.

    Heterogeneity:Thisisoneoftheharderaspectsfortheautogradertohandle.Noteveryoneapproachesproblemsinthesameway.Ourgeneralapproachistotrytobeaslenientaspossible(whilestillensuringcorrectness)whenwritingtestcases.Wehavespentlotsoftimeconsideringhowauthorsshouldhandledifferentformatsofoutputsothatwetrytoavoidnit-pickyerrors.

    Motivation:Wetrytomotivateusersbycarefullychoosinghowwepresentthetoolandtheresults.Inclass,andinthetextwhichappearsonscreenwetrytodownplaytheideaofgradesorerrorsandinsteadfocusonhelpingstudentsimprove.

    11

  • ChangingUnderstanding:Dynamicallycapturingauser'sunderstandingisincrediblydifficulttodo.Atthispoint,wearenotabletodynamicallyadjustexercisesorfeedbackpresented,butwehaveplannedoutpossiblemethodsfordoingso.Currently,thebestwayforustoachievethisistohaveTeachingAssistants(TAs)andinstructors,whoareconsciousofstudentsneeds,recommenddifferentproblemsforstudentstopracticewith.

    6.3KnowledgeIntegration

    Knowledgeintegration(KI)isaframeworkforapproachinghowstudentsshouldsynthesizeinformation[Citationnotfound].TheKIframeworkhasfourcomponentstoorangizeideas:

    AddingKnowledgeinvolvesbringinginnewideasthatstudentshavenotseenbefore.ElicitingIdeasistheprocessofcriticallyexaminingideasstudentsalreadyknow.DistinguishingKnowledgeasksstudentstotakemultipleideasandfigureouthowtheyfittogether;whethertheyarecompatibleornot.Reflectingistheprocessofdrawingconclusionsfromwhatstudentshavelearned.

    WeusedKIasabasisforwritingthefeedbackmessagesthatstudentsareshownthroughtheautograder.Thegoalistofocusprimarilyontheelicitingknowledgeanddistinguishingknowledgecomponents.Wewantedtofocusonthesetwopiecesbecausetherearemanycommoncomputerscienceproblemswhichcanbeviewedthroughthislens.Systematicallydebuggingcodefollowsaprocessofelicitingknowledgewhentryingtofigureoutwhysomethingisbroken.Distinguishingknowledge(suchasthedifferencesbetweentwokindsofloops,orrecursionanditeration)isanaturalprocessforprogramming.

    Wechosenottousetheaddingknowledgecomponentbecausewedonotcurrentlyhavetheautogradersetuptogivegoodfeedbackwhenstudentsaredoingexploratorywork(wheretheywouldbemostlikelytouncovernewconcepts).However,thesetypesofmessageswilllikelyappearinfutureversions.Similarly,whilereflectingisavaluablestep,wedonothavethecapabilitytocollectnorgivefeedbacktoopen-endedreflectionquestions.

    However,writingproperKImessagesprovedchallenginginthecurrentsetup.Theinitialversionoftheautograderwasdesignedmorearoundpresentingtheresultsoftestcases,thanitwasforlongerformsoffeedback.(Thisisoneareaforimprovement.)Furthermore,tryingtofollowKIoccasionallyledtomessagesthatdidnotnecessarilyfitwithintherestoftheBJCcurriculumasitwasnotdesignedaroundtheKIframework.

    6.4"TA-CenteredDesign"Thoughthisiscertainlylowerontheprioritylistthanlearner-centereddesign,wemakeapointtodescribeTA-centereddesign,andwhythismattersforthetoolswebuild.TeachingAssistants(TAs)andinstructors,arecriticalusersoftheinfrastructureincourses.Theyneedtobeabletoeasilyupdateandcreatecontent,handlegradesandsoon.Thelongerormoredifficultthesetasksare,thelesstimeTAshavetospendhelpingstudentslearn.

    WhenconsideringTA-CenteredDesign,aTAismuchmorelikeatypicaluserinUCDthanalearnerinLCD,buttherearemanyideasthatshouldbespecificallyrecognizedforTAs:

    TAsareoftenlackingpedagogicalcontentknowledge(PCK).PCKismakingthedistinctionbetweenknowinghowtoprogram,andhowtoteachprogramming.TAscoulduseguidanceinapplyinggoodpedagogy.

    TheadmindashboardsbuiltintogiveTAsmoreinsightsthantheycurrentlyhaveabouthowstudentsareperformingandhowoftentheyarecompletingthelabwork.Whilethedashboardshaveawaystogoinfunctionality,thisisanimprovementandgivesTAsareasontokeepusingthesystem.Thetestcaseauthoringinterfaceshouldbeadaptedtomakeiteffortlesstowriteconsistentanddetailedfeedback.

    12

  • WhileTAsaremotivatedtoteachtheyarenotalwaysmotivatedtocompletetheextraworkrequiredofthem,suchasgradingorwritingassignments.

    Testcasesneedtobeeasytowriteandupload.TheImplementationchapterdescribestheproblemswithourinitialapproachusingedX'stools.TheFutureWorkchapterdescribeshowwecanimprovetheexperienceofwritingautogradertestfilestolowerthebarrier.

    OftenTAsarenotexpertsinthetoolstheyarerequiredtousetoaccomplishtheirteachingduties,suchasLMSsandgradingsystemssuchas.TAs,likemostusers,havealimitedamountoftimetocompletetheirwork.

    Limit(orautomate)repetitivetasks,especiallyonesthatinvolveconfiguration.TheabilitytoautomaticallyuploadgradesforstudentsisahugetimesaverwhichallowsTAstofocusonmoreimportanttasks,andallowsstudentstostopworryingaboutthestatusoftheirgrades.

    Whilethesethreeideasmayseemobvious,theyareimportanttorecognizeifourworkistobeusedbeyondourinitialtestimplementation.Then,weneedtoconsiderhowTAswilluse.Thesuccessofanewautograder,evenifitisbeneficialforstudentsrequiresTAstobecomfortableconfiguringandwritingnewautogradertests.

    Ultimately,we'rearetryingtorecognizethatTA's(orindividualinstructors)arealreadylimitedbytime.Makingtestcasesaseasytowriteaspossibleisanecessarypartoftheprocess.

    13

  • 7InterfaceWalkthroughiscomposedoftwoparts:awebserverandaSnap!interfacewithautogradingcapabilities.Inthecurrentimplementation,onlyinstructorsneedtousethewebserver,thoughinthefuturetheremaybefunctionalityaddedforstudents.(Seethefutureworksection.)

    7.1WebApplication

    Thebasicwebinterface(figure4)presentsalistofproblemstotry.Thislistispublic,soanyonecanattemptanyproblem,butinstructorsareexpectedtoembedspecificquestionswiththeirdirections.Userscanclickonalinktoworkonaparticularquestion.

    Figure4:Theinitialpageisalistofquestionstotry.

    Therestoftheinterfaceoptionscomeaftertheuserhasloggedinwithanadminaccount(figure5).

    Figure5:Administratorshaveadditionalfunctionality.

    Thesefeaturesare:

    Creating/EditingQuestions-Aquestioncontainsatestfile,apointsvalue,andastarterfile.Creating/EditingCourses-CoursesdescribeconnectionstotheLMS.Eachcoursehasakeyandsomepolicysettings(figure

    14

  • 6).Creating/EditingAdminDashboards-Admindashboardsprovidethestatusofstudentperformance.ThisinitialversionisbasedentirelyoncustomSQLqueries,buttheycanbepowerful.ThesedashboardswereasignificantbenefitindoinganalysisforthisreportandweresharedwithTAsduringthecourse(figure7).

    Figure6:Creatinganewcourseisasimpleactionwhichrequireslittleinformation.

    Figure7:Adashboardshowingthefirsttwolabsthesubmissiontimesforautograderrequestsfor.

    7.2TheSnap!Interface

    TheautograderaugmentstheSnap!interfacewithafewbasictools.Theinitialversionoftheautograderincludedabuttonand"hamburger"menu,thatweredesignedtoappearintegratedintoSnap!.However,theapparentseamlessnesswasactuallymoreconfusingbecausethecontrolsneverfitwithintherestofSnap!.

    15

  • Figure8:Theinitial(edX)versionwhichhadaheavilyintegratedfeedbackbutton.

    TheupdatedversionmoreclearlyseparatestheautogradercontrolsfromSnap!,andgivesusmoreroomtoexpandfunctionalityinthefuture.AproblemtitlewillalwaysbedisplayedintheSnap!interface(comparedtooutsidethewindowintheedXversion),whichwasasmallbutimportantchangebecausethenewsetupallowsfordisplayingSnap!inaseparatetabfromthequestioninstructions.

    Figure9:Updatedcontrolsfortheautogradershowingadropdownmenu.(Thecontrolsforrevertingsubmissionsaregreyed-out.)

    Bothinterfaceshavethefollowingfeaturesforstudents:

    The"GetFeedback"buttonrunsasetofautogradertestsonthecode,whicharepresentedwhentestsarecomplete.Thecolorofthestatusbar(thebuttonintheedXversion)changescolorbasedonthequestion'sstate:

    Green:AlltestsarepassingOrange:Thestudenthasmodifiedtheircodeandtestsshouldbere-run.Red:Atleastonetestisfailing.

    RestoreBestSubmissionRestoreLastSubmission

    Thesetwofeaturesencouragestudentstoexperimentandre-writecodewithoutanyfearthattheymighthurttheirscoresorlosework.Everytimethe"GetFeedback"buttonisclicked,asubmissionisrecorded.

    ResetRestoresthecurrentSnap!projecttothestateofthestarterfile.

    HelpDisplaysasetoftooltipsoverautograder-specificelements.

    ShowPreviousResultsThisisanewoptionwhichwillpresentthefeedbackoftheprevioustestswithoutre-runningthem.Thisallowsstudentstoreviewmistakes,andshoulddiscourage"autograder-drivendevelopment".

    16

  • Figure10:Anexampleofthefeedbackpresentedwheneverythingiscorrect.

    Figure11:Anexampleoffeedbackshowingsomefailingcases.

    ThetwoscreenshotsinFigures10and11showtwodifferentversionsoftheautograderpresentingfeedback.Studentsseepassingtestsontopingreenandfailingtestsonthebottominred.Eachsectionhasdetailsabouttheparticulartestcases,andtheboldheadingscancontaintipsorguidanceforstudents.

    17

  • 18

  • 8ImplementationThewebapplicationisbuiltusingtheRubyonRailsframework[16],andmakesuseofmanycommonwebtechnologies.TheSnap!interfaceisimplementedpurelyinJavaScript.TheinitialpurposeofthesystemisnottobeagradestoragebuttoconnectwithanexistinggradebookorLearningManagementSystem(LMS)sothatanygraderesultswillbeintegratedwiththerestofcoursedata.Wechosethisroutebecause,inourexperiencemultiplesourcesofgradesarepronetoerrorsanddelays.Bydesigningasystemwhichdoesn'tneedtostoregradedata,wecanmakealotofsimplificationsandfocusonmoreimportantfeatures.

    8.1RubyonRailsBackend

    existsasmodern"Softwareasaservice"application.It'sdesignedtobehostedbyacloudservicesprovider,usingawebserverandaseparatedatabaseserver.

    8.1.1TheNeedforaWebApplication

    TheJavaScriptthatpowerstheautogradingworksentirelyclient-side,meaningaslongasyouhavethetestfiles,there'snoneedforaninternetconnection.Thispathwasinitiallychosenforthreereasons:

    Snap!isclient-side,andevaluatingSnap!projectsonaserverwouldrequireasignificantamountofwork.edXprovidesacustomproblemtypecalledJSInput[17]whichgaveusaclearpathforintegrationwithedX.Developinganautograderandawebapplicationrequiredmoreresourcesthanwereavailable.

    Whiletheentirelyclient-sidepathwasagooddecision,weranintoanumberofissuesbyrelyingonJSInputandtryingtokeepallfeaturesclient-side.

    8.1.1.1ChallengeswithJSInput

    edX'sJSInputproblemtypeprovidesaJavaScriptAPIforsendingscorestotheedXplatform.ItallowsustobuildinacustomversionofSnap!alongsidetherestofthecontentinedX.

    19

  • Figure12:Snap!canbeembeddedinedXthroughJSInput.

    However,weencounteredseveralproblemswhiledevelopingtheJSInputbasedintegration:

    Developingcodewasveryslow!ChangestocoderequiredmanuallyuploadinganewfiletoedX,whichisamulti-stepprocess.ThelibrariesusedforJSInputswallowednativeJavaScripterrors,makingdebuggingnearlyimpossible.TheedXinterfacehasit'sownmechanismfora"Check"buttonandshowingfeedback.Communicatingthedetailedoutputfromtheautograderdidn'tworkverywell,andweendedupdevelopingmanyworkaroundstogetaseamlessUI.There'snoroomforstoringorretrievingusermetadata.Werelyonfeaturesthatallowstudentstorecallprevioussubmissions.ThroughJSInputtheonlyoptionforthesefeaturesweretousethebrowser'sLocalStorageAPI.ThisAPIhaslimits,likeamaxof5MBofstorage,thatcausedproblemsforsomestudents.Furthermore,withoutadedicateddatabase,everysingletestfilewrittenhadlotsofhard-codedmetadatathatwasrepetitiveandpronetoerror.WhileedXprovidesuserlogsfortheentirecourse,wefounddealingwiththeselogstobeneedlesslycomplex.Theyareslowtoget,andtheautograderresultsaredifficulttoseparatefromtherestofthecoursedata.Assuch,wehaven'tanalyzedtheedXdatatotheextentwe'dliketo.Asimplerloggingsystemdescribedbelowhasbeenimmenselyhelpfulforouranalysis.

    Theonebenefitoftheseproblemswasthatitforcedthedevelopmentoftheautograderintotwocomponents:AJSinterfacetoedX,anda"dumb"client-sidecomponentthatsitsontopofSnap!.Thisdistinctionwashelpfulwhenadaptingtheautogradertoworkwiththenewwebapplication.

    Perhapsmostimportantly:thegradingsystemcouldonlyworkwithedX.CS10usesCanvas[Citationnotfound]asit'sLMS,andmanyhighschoolsusedifferentsystems.Theneedtobuildacustomsolutionforeveryplatformwouldbeprohibitivelyexpensive.Fortunately,theLTIprotocolprovidesadecentsolutionformostofthetaskswe'dliketoaccomplish.

    Attheendoftheday,thedecisiontobuildaninitialversiontiedtoJSInputwasagoodone,asitwasstillprobablyfasterthanbuildingafullwebapplicationatthesametime.

    8.1.2BasicArchitecture

    20

  • isaRubyonRails(commonlyabbreviatedas"RoR")[16]webapplicationbackedbyaPostgreSQLdatabase.Thedatabaseprimarilycontainsasetofquestions,asubmissionslog,andauserstable,aswellassomeadditionalmetadata.ThecurrentversionisdeployedtoHerokuatlambda.cs10.org,butitcouldbedeployedtoanycloudprovider.

    8.1.2.1QuestionsandSubmissions

    Thecorefunctionalityisprimarilysupportedbytwo,fairlysimple,datamodels:QuestionsandSubmissions.

    AQuestionneedsonlythreeattributes:

    title:Ahuman-readableIDforthequestionpoints:Pointsareusedtonormalizescores.(SeetheLTIsectionbelow.)testfile:ThetestfileisaJavaScriptfile(describedbelow)whichincludesthetestcasesaswellasfeedbackpresentedtothestudent.

    Thoughtheyaren'tcurrentlyused,futureupdatesforwillmakeuseofthefollowingproperties:

    content:Currently,itisuptocoursestafftoprovidecontextforthequestionswhicharebeinggraded.Inthefuture,thewilldisplaythiscontentalongsidetheSnap!interface.tags:Questionscancontaintagswhichcanaidinsearching,ortryingtocorrelatestudentperformanceacrossproblems.Thereisalsopotentialforusingtagstorecommendproblemstostudentsasastudytool.

    ASubmissionhasafewkeyproperties:

    testresultsisaJSON-formattedresultfromtheautograder.Itcontainsthepointsgiventoeachtestcaseaswellasthespecificresultsandfeedback.codesubmissionisafullexportoftheSnap!thatstudentswrote.userinfoisasetofdataaboutthesubmittinguserwhichincludeswhatsourcetheycamefrom(seetheaccountssection),andifthey'reapartofacourse.

    Notethatloggingsubmissionsispurelyforpurposesofanalysisandbackup.ByimplementingtheLTIprotocol,theLMSwillcontainallnecessarydataforstudentstoreceivegrades.However,ifwechoosetoadapttoincluderesourcesforstudyingorquestionrecommendation,theinternalsubmissionsdatabasewillbecomemoreimportant.

    ACourseisanobjectwhichmanagestheLTIconnection,andneedsonlytwovalues:

    consumer_keyisauniquekeyforeachcourse.ThishelpstheapplicationseparatebetweendifferentLTIconsumers,primarilyforpurposesofanalysis.consumer_secretisahashoftheconsumer_key.It'sautomaticallygeneratedbytheapplicationwhenaCourseobjectiscreated.

    NotethattheCoursestableisn'tentirelynecessary.TheLTIconnection'skeyandsecretvaluescouldbeentirelystatic(i.e.inanenvironmentconfiguration),butsuchanapproachispronetoerrorsandhassecurityconcernsasanapplicationisconnectedtomultiplesystems.

    8.1.2.2UserIdentities

    Whenbuildingatoolwithgradingdata,itwascriticallyimportantthatwehadaneasyandwaytoidentifystudents,andtominimizetheneedforanadditionallogin.

    8.1.2.2.1LTI

    21

    https://lambda.cs10.org

  • TheIMSGlobalLearningConsortium[18]isastandardsbodycomposedofeducationalinstituions,intrestbodiesandedtechcompanies.IMSpublishesaspecificationcalledLTI[19]whichcanbrieflybedescribedas"OAuthforeducationalapplications".TheLTIauthenticationprocessisactuallybasedontheOAuth[Citationnotfound]protocol,butit'sdesignedtobecompletelyseamlessforstudents.(UnlikeaGoogleorFacebookauthorization,astudentwhoisalreadyauthenticatedinsideaLMSdoesnotneedtospecifically'authorize'anapplicationwhenLTIisused.)

    TheLTIprotocoldefinestwo"categories"ofapplications:aToolConsumer(TC)andaToolProvider(TP).isaprovider,whiletheLMSisatypicalconsumer,inthiscasebCourses(Berkeley'sinstanceofInstructureCanvas).Atypicaluserflowinvolvesastudentvisitinganassignmentpage(insideaLMS)whichcontainseitheraniframeelementoraspeciallink.Currently,implementsversion1.1[Citationnotfound]oftheLTIspecification,thoughsupportforversion2.0[Citationnotfound]isplanned.LTI1.1onlyallowsatoolprovidertoreadandsendgradesforacurrentlyloggedinstudent,somustworkwithinthislimitation.LTI2.0willpotentiallyallowformoredataaboutcoursestobesharedwithstudents(suchashandlingparterassignments),butitsofarnotwellsupportedamongLMSvendors.

    Figure13:Whenastudentclicksonthelink,anewtabwillopenwiththeproperquestiontheyareassigned.Clickingthe"GetFeedback"buttontriggersasubmissionwhichsendsthegradebacktotheLMS.

    22

  • Figure14:AverybasicLTIlaunchsequence.ImagefromtheIMS[Citationnotfound].

    Whenauserclicksthelink(oranembeddediframeisdisplayed),aHTTPPOSTrequestismaketotheproviderwhichincludesapplication-levelconfigurationdata(includingaconsumer_keyandconsumer_secret).ThechallengeisthatthecurrentversionoftheLTIprotocolrequiresthatthislaunch_urlbethesameforeveryassignment.(InthiscasetheURLishttps://lambda.cs10.org/lti/sessions.)AftercompletingtheOAuthhandshake,theTP(ourapplication)checksforthepresenceofadditionalconfigurationinfopassedbytheTC:

    Ifaquestion_idisprovided,thenwillloadthespecificquestion.CurrentlyLTIdoesn'tprovideastandardinterfaceforloadingaspecificresource,soifaninstructorpassesinthisvalue,studentswillautomaticallyberedirectedtotheproperquestion.AgradepassbackURLisoptional.IftheapplicationseesthisURL,itwillpostascorebacktotheTC.IftheURLismissing,thenskipspostingascorebutstillsavesthesubmissiontothelocaldatabase.UserInfo:Iftheinstructionalstaffchooseto,theycanconfiguretheLMStosend"Public"informationto.(Publicheremeans,whatFERPAdefinesaredirectoryinformation.InthecaseofUCBerkeley,thisincludesthestudent'sname,email,anduserID,specificallydifferentfromtheirstudentID.REF?)IftheTCdoesn'tsendpublicinformation,thenitwillsendanobfuscatedhashtoidentifyeachstudent.Again,thisprimarilyonlyaffectstheanalysiscapabilitiesprovidedtoTAs.

    8.1.2.2.2NeedFor(Regular)OAuth

    Unfortunately,despitetheadvantagesofLTI,wefoundthattraditionalOAuthwasstillanecessarycomponent.Theprimarymotivationwastheneedforsiteadmins,andTAswhocanloginwithouthavingtogothroughaLMS.We'reusingGoogleasanOAuthproviderinadditiontoLTI.Inthefuture,wewouldalsoliketoconnecttheSnap!cloudaccountsystemthroughOAuth,butthisiswaitingonenhancementstotheSnap!cloud.

    Thisfeaturecanbeusefulforstudentsaswell,oncewebuildoutstudentdashboards.However,inordertobeeffective,we'llneedawaytoassociateLTIaccountswithOAuthaccounts.Thoughnotfullyimplemented,wewillbeabletoautomaticallyassociateaccountsforstudentsiftheysharethesameemailaddress.ForcampusessetuplikeBerkeley,thiswillthedefaultcaseformoststudentssincetheirGoogleaccountandtheLMSaccountoriginatefromthesamecampussystems.Iftheemailsdon'tautomaticallymatch,weshouldbeabletoprovidethisfunctionalitybyemailingactivationcodes.

    8.1.3SecurityConcerns

    23

  • Finally,weneedtodiscussasignificantconcernaboutsecurity.Currently,theautogradertestfilesareimplementedinplainJavaScript,andareservedalongsidetherestofthepagecontent.Thismeansthere'safairlygapingholeallowingforCross-Site-Scripting(XSS)vulnerabilities.Thecurrentmitigatingfactoristhatonlytrustedaccountswithanadminflagcanuploadtestcases.Thisisanacceptablelimitationfornow,butintheFutureWorkchapterwedescribeapotentialwayaroundthisbyallowingtestcasestobewritteninSnap!,thensecurelycompliedtoJSontheserverside.

    8.2TheAutograderInterface

    TheautogradercomponentsarebuiltinplainJavaScriptandHTML.GreatcarehasbeentakentoavoidmodifyingtheSnap!environmentasmuchaspossible.TheprimaryreasonforthisistoallowtheautogradertobeeasilyupdatedalongwithnewversionsofSnap!.

    24

  • 9ExperimentalSetupIntheSpring2016semester,CS10[9]usedtheautograderaspartoftheroutineforlabcheckoffs.Intotal,theautograderwasusedforthreedifferentlabs,butstudentswereonlyrequiredtotrytheautograderforonelab.

    9.1OralLabCheckoffs

    CS10usesorallabcheckoffsasameansofgrantingcreditforalab.ThewaythistypicallyworksisthatstudentsaregivenoneweekfromtheassignedlabdatetocheckinwithaTAorlabassistant.Studentsarecheckedoffwhentheysuccessfullyanswertwoorthreequestionsaboutthelab,andusuallyareaskedtopresentanexampleofworkingcode.

    It'sworthmentioningthatthesecheckoffsaredesignedmorearoundcompletenessandeffortplacedinthelab,ratherthanabsolutecorrectness.(Though,whetherornottograntcreditisuptotheindividualTAs.)OrallabcheckoffsprovideagoodreasonforTAsandstudentstointeractmoreoften.Overthepastfewsemesters,themodeloflabcheckoffshasbeenrefinedbasedonstudentandTAfeedback,andisdesignedtorequireonlyminimaladditionaleffortforthecoursestafftomanage,andforstudentscomplete.We'recomparingtheautograderagainstafairlystrictstandardinthisscenario,whichwehopeillustratesthepotential.

    9.1.1ChallengesofOralLabCheckoffs

    Manyofthechallengesfacedwithorallabcheckoffsmotivatethedevelopmentofanautograder.Labcheckoffscantakeanywherefrom2-10minutesperstudent,orperpairofstudents.Duringabusylabsection,particularlyclosetodeadlines,thistimecancreatealargebackloginordertogetstudentscheckedoff.Whilethetimediscussingwithindividualstudentsisvaluable,thecreationofaqueuecanwastestudent'stimeiftheydon'tneedadditionalhelp.Furthermore,ifmostofthelabworkcanbedoneathome,thenoralcheckoffscanbeinflexibleforstudentswhohaveahardertimemakingittolab,suchasthosewithdisabilitiesorfamilies.

    Finally,traditional"clipboardstyle"methodsfordealingwithinputtingscoresfororalcheckoffsisafairlycomplexandslowtask,wheregradescangetlost.Togetaroundthis,weputsignificanteffortintoasemi-automatedsystemwhichallowsTAsandlabassistantstoinputgradesquicklyandsecurely.Withoutsuchaninvestment,orallabcheckoffswouldbemuchlesspractical.

    9.1AutograderCheckoffsInSpring2016,therewereatotalof18labs,14ofwhichusedSnap![20].Wehadouronlineautogradersetupfor3ofthelaterlabs:

    Lab# LabTopic Autograder

    Lab11 RecursionPart2 Required

    Question:Completethedefinitionofmergesort

    Lab12 Tic-Tac-Toe Optional

    Question:Completethetttsolverblock.

    Lab14 HigherOrderFunctions Optional

    Question:Completetheis_pandigital?block

    Question:Completetheminvalueof_overallnumbersin_block.

    25

  • Note:optionalmeansthatthestudentscouldreceivecreditforthelabcheckoffthrougheithertheautograderoranorallabcheckoff.Requiredmeansthattheonlytheautograderwasacceptedforcredit.

    9.2ReasoningWechosethissetupforavarietyofreasons,tryingtobalanceanewtechnologywiththeneedtotestitout.

    Onelabwasrequiredsothateverystudentwouldtrytheautograderatleastonceandcouldprovidefeedbackabouttheirexperiences.Labs12and14wereoptionalbecausewedidn'twanttopenalizestudentsiftheyfelttheautograderwasmoreconfusingorstressful,butwewantedtogiveasmanypeopleachancetouseitaspossible.Finally,thishybridmodelforcheckoffswilllikelybethebasisforfuturesemestersofCS10,asmorequestionsandfeedbackarewritten.

    9.3ChallengesofAutograderCheckoffs

    Naturally,eventhemostadvancedautograderwillbenomatchforaTA...yet.Wefullyrecognize(asdothestudents)thatanautogradercannotreplacethedetailed,specializedandmorearticulateoffeedbackofaTA.Furthermore,comparedtothecurrentwebpageforlabcheckoffquestions[20],eventheeasiest-to-useautograderwillbesignificantlymoreworkthanpostingafewsentencestoawebpage.

    However,whendeployedintheclassroomwitha"hybrid"modelofbothlabandoralcheckoffswethinkwecanalleviatethein-classpressuresofcheckoffsbutstillgivestudentscreditfortheworktheyaredoing.Bydesigningamodelthatgivesstudentsmorechoice,TAswillhopefullybeabletogivemoretimetothestudentsthatneeditmost.Thisstillleavestheproblemthatanautograderisstillmoreworkthanthestatusquo.Wethinkthatthiswillbecomelessofaproblemovertimeasadatabaseofquestionsisbuiltupandiftheanalyticalcapabilitiesprovetobeusefultoteachingstaff.

    26

  • 10ResultsOverthecourseofthreelabassignments,wefoundthatstudentsusedtheautogradertocompletelabsoutsideofcoursetime.Whilewehavenobasisforwhen(orhowoften)studentscompletedlabsathomewithouttheautograder,thisshowsthatstudentsareatleastreceivingthebenefitsoffeedbackwhenaTAisnotpresent.

    Secondly,whenwesurveyedstudents,theywerealmostevenlysplitbetweenpreferringorallabcheckoffsortheautograder.Giventhelabsetup,andthedifferingadvantagesforeachenvironment,thisiswhatwewouldhopefor.

    Note,unlikeothereducationalstudies,wearenottryingtoclaimanylearninggainsfromthissystem.

    10.1Usage&StatisticsThefirstpartofouranalysiswillbetolookathowoftenandwhenstudentsweresubmittingtheirlabs.ThisdatawasobtainedfromaggregateinformationintheSubmissionsdatabaseaswellasfrombCoursestocomparetonon-autogradedlabs.

    10.1.1BasicStatistics

    TotalNumberofstudentsinCS10:169TotalNumberofstudentusingtheautograder:145(86%)

    Lab# OralCheckoffs AutograderCheckoffs

    10 145 N/A

    11 34 133

    12 57 80

    13 132 N/A

    14 97 64

    15 142 N/A

    Notethisdatahassomeanomaliesduetobugintheinitialsetup,andconfusionamongcoursestaff:

    Lab11hadabugpostingscoresthroughLTIforthefirstday.SomeTA'smistakenlysubmittedscorestwice.Thereissomeoverlapofthose34students.Lab14hadabuginoneexercisecausingmanystudentstogetcheckedoffbybothmethods.

    Wecanlookatthenumberoftimesthatstudentstriedtheautograder:

    27

  • QuestionsAttempted

    Figure15:Numberofstudentsbynumberofquestionsattempted

    Whatthisshowsusisthat,thoughstudentswereonlyrequiredtocompleteonelabusingtheautograder,nearly2/3(93/149)foundtheautogradercompellingenoughtotryasecondtime.Fromtalkingtostudentsandstaff,aportionofthedropoffinstudentsusingtheautogradermaybeduetothefactthattheautograderdoesn'thandlepairsofstudents.Giventhatthelaterlabsaresomeofthemoredifficult,manystudentsmaychoosetoworkinpairs.

    10.1.2SubmissionTimes

    Thesecondthingtolookatiswhenstudentsaresubmittingtheirwork.Whilewecertainlystillwantstudentstoattendlab,improvingthe"athome"experienceforstudentswouldbeasignificantbenefit.Here,weseethatstudentsarechoosingtousetheautograderathome,withusagepatternsthatyouwouldexpectfromundergraduates.

    Thesechartsshowtheoverallsubmissiontimesforeachofthethreelabs.Theredbarindicatesthedatethatthelabwasdueforfullcredit.

    Date

    Figure16:Lab11:submissiontimesbyday.

    Date

    28

  • Figure17:Lab12:submissiontimesbyday.

    Date

    Figure18:Lab14:submissiontimesbyday.

    Thesegraphsdon'trevealanythingtoosurprising.Forthemostpart,studentsareusingtheautogradertocompletelabsatthesamepacetheynormallywould.

    Wecanlookatthedaysoftheweekaswellasthetimeofdaytoseewhenstudentsareworkingonlabs.Normallabtimesareusuallybetween9:00-19:00,thoughthisvariesbythedayoftheweek.

    HourofDay(24-hours)

    Figure19:Numberofautogradersubmissionsbyhouroftheday.

    Dayofweek

    Figure20:Numberofautogradersubmissionsbydayoftheweek.

    Again,thesegraphsarefairlyclosetowhatcoursestaffwouldexpect.Moststudentsarecontinuingtocompletelabsduringtheirscheduledtime,butasignificantnumberareworkingathome.

    29

  • 10.1.3StaffTimeSavings

    Whiletimesavingsarenotaprimarymotivationforthiswork,weshouldconsiderthepotentialbenefitsthatcouldbesavedbeallowingstudentstogetcheckedoffusingtheautograder.Anecdotally,orallabcheckoffstakebetween2to10minutestocomplete,withtheaveragetimesomewherearound3to4minutes.Thecurrentusagepatterns(aswellastheresultsinthenextsection),suggestthatwecanexpect33%-50%ofstudentstousetheautograder.

    Ifwehave15labswhichuseSnap!,and150studentscompletinglabs,andassumethatalabcheckofftakes3minutes:33%ofstudentstheautograderwouldfreeupapproximately38hoursofTAtime.(Thisisslightlyunder30%ofthetotalworkloadforasingle8hourperweekTAappointmentatBerkeley.)

    However,ifwehave15labswhichuseSnap!,and300studentscompletinglabs,andassumethatalabcheckofftakes4minutes:50%ofstudentstheautograderwouldfreeupapproximately150hoursofTAtime.(Thisis110%ofthetotalworkloadforasingle8hourperweekTAappointmentatBerkeley.)

    Naturally,thismightleadonetoquestionthevalueoforallabcheckoffs.That'snotthegoalhereatall.Orallabcheckoffshavebeenanincrediblepositivepedagogicaltool.Whilewritingtestscantakesometime,thehopeisthatthecostwouldaromatizeitselfwellovermanysemesters.

    10.1.4SubmissionPatterns

    Wecanalsolookathowoftenstudentsattempteachquestion.Thisshowswhetherstudentsareusingtheautogradermoreasafeedbacktool,asacrutchorassimplyacreditmechanism.Whilethere'snoclearexactnumberoftimesastudentshouldusesubmittheirwork,it'sclearthatwewantanoverall'happy-medium'.

    TimesSubmitted

    Figure21:Moststudentsappeartoonlysubmitonceattheendoftheirwork.

    Thedatashowthatmoststudentsappeartobesubmittingonlyonce,meaningthey'renotcurrentlygettingmuchbenefitbythefeedbackpresented.Ifthereweremorefeedbackpresented,orpotentiallymorechallengingquestionsthismightchange.Thoughnotyetimplemented,non-gradedfeedbacksuchascodequalitysuggestionsmightchangethewaystudentsworktousetheautograder.

    30

  • TimesSubmitted

    Figure22:Numberofstudentsbynumberoftimessubmittedforeachlab.

    Theseresultsalsohowareallylongtailforthenumberofsubmissionsbysomestudents.Thisistobeexpectedwithanyautograder.However,fromlookingatthedata,andfromTAreportsmanyofthesehighsubmissionnumbersmaybeduetothepreviouslymentionedbugs.(Studentstriedsubmittingmanytimessimplyhopingthattheerrorswoulddisappear.)

    However,thedifferenceinattemptsforlab14isprettyclear.Whilemoststudentsstillonlysubmittedafewnumberoftimes,thereisamuchwiderdiversityinthenumberofquestions.Thenumberofblocksgradedforlab14was4comparedtothe1or2forlabs11and12.

    10.2SurveyFeedback&Analysis

    Alongwiththedatawecollected,wesurveyedstudentsattwopointstogetfeedbackabouttheiruseoftheautograder.ThefirstsurveyoccurredduringtheCS10midterm,shortlyafterstudentsshouldhavecompletedlabs11and12.Thesecondsurveywasgivenalongwiththefinalexam,attheendofthesemester.

    10.2.1MidtermFeedback

    Thefirstquestionweaskedstudentswaswhethertheypreferredonlinelabcheckoffsororallabcheckoffs.

    31

  • Figure23:Studentsarefairlyevenlysplitbetweenpreferringonlinevsorallabcheckoffs.

    WhendiscussingwithTAsthisisalmosttheexactoppositeoftheresultstheyinitiallyexpected.TAs(andinstructors,includingfromothercourses)expectedalmostallstudentstopreferusingtheautogradertogettingcheckedofforally.(Ratiossomestaffsuggestedwere75/25,80/20,90/10infavorofanautograder.Whilenoonehadexactlythesameprediction,everyonewespoketoassumed>=70%studentswouldpreferanautograder.)Thisvalidatesthatwehavedoneaprettygoodjobindesigningandtweakingtheorallabcheckoffsystem(whichhasevolvedoverthepastfewyear),butalsothatstudentsliketalkingtoTAsandthatstudentsarenotsimplytryingtobeatthesystem.Instructionalstaffhadconcernsaboutstudentspreferences,becausemanyassumedthattheautograderwouldbethepathofleastresistanceforstudents.It'sworthnoting,howeverthatpartofthebiastowardsin-personoveronlinecheckoffscouldverylikelybeattributedtobugsinthesystemearlyon,butthisisstillapromisingresult.(Manyopen-endedfeedbackcommentsmentionedbugsorglitchesasareasonforthefeelingsabouttheautograder.)

    Hereisasamplingofcommentsstudentsgave.Eachofthesecommentsisrepresentativeoffeelingsofotherstudents.

    "Itwouldsavealotoftimeduringcheckoff."

    "Yougettotalktoanactualperson!Ifanyquestionsremain,theycaneasilybeanswered!"

    "IfIdon'tfinishinclass,Iammoreincentivizedtofinishitathomeinsteadofatnextlab."

    Itworkedforme,butIguessdoingmanuallabcheck-offwasbetter,inaway,becauseyougetfeedbackfromtheTA.

    Aheads-upwouldhavebeennice,aswellassomethingexplaininghowtouseit,butIlikethisfeaturealot.Whenitworksproperly,itisveryusefulandhelpfulformylearning.

    Idon'ttrustit.

    Theautogradesometimeshasbugs.Forexample,ifmycodeiswrongbutsometimesstillreportedthecorrectvalue,itwouldmarkmeasapass.

    Manystudentspreferredthehigherfidelityofinpersonquestionaskingandanswering.Whatsinterestingisthatautogradedcheckoffswouldntpreventstudentsfromgettingtheirquestionsanswered.Manystudentsremarkedaboutdifferentlevelsofstresswhichcomefromautomatedsystemsorhumans.Whilestudentsdidntelaborateonwhytheywouldbemore(orless)stressedwithanautogradersystem,themostprobableansweristhatsuchaviewisreallyamatterofpersonalpreference.Manystudentsdoenjoythat(human)TAsaremuchmorerelaxedandcomfortabletodiscussquestionsandgenerallyforgivingoferrors.Incontrast,theautograder(currently)canonlyprovidestaticresponses,andisverybrittlewhendeterminingcorrectness.However,somestudentsalsonotedthattheydprefernottotalktoaTA"unlessabsolutelynecessary"andthattheyfoundtheautomatedformateasiertodealwith.Wethinkthisisyetanotherdemonstrationthataone-size-fits-allmodelisn'tusuallythebestapproach.

    32

  • Whenweaskedforgeneralfeedbackaboutthetool,mostofthecommentswerethatitwasconfusingtouse.Thisisunderstandable,becauseinretrospecttherewasnotenoughdocumentationnorTAtrailingbeforetheautograderwaslaunchedinclass.Fortunately,thisisafairlyeasyproblemtoaddressforfuturesemesters.

    10.2.2FinalSurveyFeedback

    Onepromisingresult,isthatweaskedstudentsforfeedbackontheoveralluseoflabcheckoffs.Whilethereis(notsurprisingly)alargecontingentofstudentswhodislikethelabcheckoffs,mostofthestudentsinsupportofcheckoffsspecificallyaskedformoreautograderquestions.

    Whenaskedforfeedbackspecifictotheautograder,studentsseemedmorepositivethantheydidonthemidtermsurvey.Ofthestudentswhohadcomplaints,"bugs"and"glitches"werethebiggestreasons.However,aswiththemidtermsurvey,thereisalargenumberofstudentswhoareverystronglyinfavoroftheopportunitytotalktoTAs.

    Thefinalquestionsthatweaskedmostlybackupthedatathatwascollected.

    Figure24:Moststudentscompletedcheckoffsalone,andfoundthefeedbackeasytointerpret.

    Figure25:Moststudentsreportedcompletingworkbeforeusingtheautograder

    Sofartheseresultsconfirmthedatathat'sbeencollected.Inthefuture,thissuggestsTAscanintroducetheautogradertostudentsaswellaspresentthemotivationsforhowstudentsshouldthinkaboutusingitasastudytool.

    33

  • 34

  • 11FutureWorkThusfar,wehaveshownthecurrentsystemhasenoughcapabilitytobeusedintheclassroom.However,weseeanumberofareaswherecouldbeexpanded.Additionally,someoftheanalysishasrevealedtheneedtomorecloselyexploreatsomespecificresearchquestions.

    11.1Server-Side

    Theserver-sideimprovementstoarenumerous,butwewantedtohighlightafewthatwethinkcouldprovidethebiggestbenefittostudentsandinstructors.

    QuestionTagsandrecommendations.'sdatamodelalreadyincludesquestion-leveltags,buttheyaren'tcurrentlyusedforanything.Wethinkitwouldbeimmenselyhelpfulasastudytoolifstudentscouldberecommendedproblemsbasedonrelatedtagsforquestionstheyhavestruggledwith.

    Better(User-Friendly)DashboardsWritingSQLqueriestooperatrethecurrentdashboardsisnotaviablesolutionforthefuture.Whenaloggedinasanadministratorthereshouldbelinksforeachitem(likeacourseoraquestion)forinstructorstoviewbasicstatistics.Supportdrillingdowntoindividualtestcaseswhenanalyzingquestions.Awellwrittenautogradertestcoversmanytypesofmistakes.TAsshouldgetfeedbackaboutwhichparticulartestsstudentsarestrugglingwiththemostsothattheycanaddressthoseconceptsinclass.

    Integrateddisplayofquestioninstructions.Thecurrentversionassumesthatinstructorswillgivestudentsdirectionsinsomeplaceotherthantheautograder.WhilethisworksOK,itwouldbemuchnicerifstudentscouldchosetohideorshowinstructionsondemand.ThisisactuallyafeatureregressionfromedX,wheretheautograderwasalwaysembeddedalongsidethecoursecontent.

    11.2Autograding

    Asidefromcontinualbugfixes,therearefourkeyareasforimprovingtheautogradingcapabilities.

    1. Clearer,moreexplicitfeedbackThecurrentoutputprovidedbytheautograderisfocusedmostlyontheresultsofindividualtestcases,bycomparingtheexpectedvsreturnedoutputsofblocks.Whiletestauthorsareabletoworkindividuallywrittenfeedbackintothestestcases,theoverallpresentationisn'tveryclear.Furthermore,becausethere'snotmuchroomonthescreenforgoodfeedback,authorsdon'thaveanincentivetowriteasmuch.

    2. Snap!TestAPIimprovementsCurrently,itcanbefairlycumbersometowritetestsinJavaScript.Thereisalotofboilerplatecode,andcleanlypresentingimagesofblocks(insteadofjustatextversion)requiresalotofeffortfortestauthors.Oneinitialsolutiontothisproblemisthemigratethecurrentobject-orientedAPItoaversionwhichisbasedmorearoundconfigurationthancode.Whileitcouldn'tbe100%configurationbecauseauthorswillneedtowritecustomtestfunctionsinsomecases,aformatwhichoperatesmorelikeaJSONfilewouldbeeasiertomanage.Snap!testsshouldbewritableinSnap!.ByusingaSnap!featurecalled"codification"weareabletocompileSnap!codetoJavaScript(oranyotherlanguage).HereisanexampleofaprototypeSnap!libraryforwritingtests.BycompilingSnap!to

    35

  • JavaScriptwecanalsoimprovethesecurityofteststhatarewritten,becausethesetoffunctionsavailableinSnap!iseasiertorestrict(andcatchviolations)thanJavaScript.

    3. StaticanalysiscapabilitiesWe'veincludedsomeverybasicstaticanalysiscapabilitiesinthecurrentversion.However,they'retrickytouseandrequirealotofcustomcode.Throughstaticanalysisweshouldbeabletogivestudentstargetedfeedbackabouthowtoimprovetheircode.Sometoolsthatwouldbeusefularepartternmatchingfunctionsforcodestructure(likefindingallif(condition){returncondition}typestructures).

    4. Image-basedautogradingWe'veexploredimagebasedautogradingforproblemslikefractalsandotherturtlegraphicsexercises.ImageswouldbeanOKwaytojudgecorrectness,buttheyprovideverylimitedfeedbacktostudentsabouthowtoimprove.Somepathsforimprovingthefeedbackfromimage-basedautogradingwouldbetotrytoaccountforspecificerrors.Forexample,youcouldrotatetwoimagesuntilyoufindtheminimumdifference.Ifarotationmakestwoimagesmatch,thenyoucouldprovidefeedbackaboutwheresuchanerrormightoccur.OthermoreadvancedalgorithmssuchasRANSACcouldbeexplored.

    11.3ResearchDirections

    Finally,nowthatwe'veshownstudentsareinterestedinusinganautograder,weshouldcarryoutmoretargetedresearch.

    Whattypesofinterventionshelpstudentslearnorcompletelabs?Insteadofsimplypresentingfeedbackwhenstudentsfailatest,weshouldhavetargetedinterventionsthataskquestionsortryadifferentmethodforsolvingtheproblem.

    Canweautomaticallygeneratebetterhintstogivetostudents?Thisisahighlyactiveareaofresearch,andnowthatiscollectionstudentdatawecouldbeabletousethistoimprovethefeedbackforfuturecourses.

    HowdoestheautograderaffectmotivationinearlyCScourses?Somestudentsareunderstandablyconcernedabouttheaffectsautograderhaveontheirmotivationtocompleteatask.Whilethereissomeexistingworkonthistopic,webelievethere'sroomtoexplorethisinthecontextofvisualprogramminglanguages.

    36

  • 12ConclusionWepresent,asystemforautogradingSnap!programs.Thetoolcontainsbothanovelfront-endstudent-centricautograderthatpresentsimmediatefeedbacktostudents,aswellasabackenddatabasethatcanserveasquestionrepository.AftersuccessfullybeingusedonedX,we'veshownthatthetoolscanbeadoptedinatraditionalclassroom.ToaccomplishweimplementedtheLTIprotocol,sothatmaybeusedwithasmanyLMSsarepossiblewiththegoalofreachingmorestudents.Additionally,we'veseenthebenefitsofhavingcompletecontrolofyourinfrastructure.

    Ourinitialsurveyshaveshownthatwhilemanystudentsenjoyusinganautograder,wemustbecarefulnottoreplaceinstructionalstaffwithtechnology.Thestressthatmanystudentsreportedfromtheautogradersuggestthatthereisalotmorehumanfactorsresearchthatcanbedoneaboutapplyingnewtechnologiestotheclassroom.Still,weareconfidentthatsmartclassroompolicieswhichgivestudentschoicewillcontinuetoallowforthebenefitsofautograderswithoutpenalizingstudentswhoareuncomfortablewiththem.Thisdoesleaveconcernforonline-onlylearningenvironments,butweseeevenaprimitiveautograderasabigstepabovenothing.Futureinterfaceenhancementsandnewfeaturesforstudyingwillhopefullyonlyaddtothebenefitsstudentsseewhileusingtheautograder.

    Despitesomesuccesses,weknowthatthereisalongwaytogo.Wearestillonlyabletogradeasubsetofallpossibleproblemswecouldassigntostudents.Therearemanyactiveresearchareasintechniqueslikeautomatichintgenerationwhichwethinkwillproveusefulinthefuture.However,wealsorecognizethatdespitethepromise,weneedtospendtolowerthebarriertoentryforwritingnewcontent,bothintermsofsavingtimeandfollowingpedagogicalbestpractices.Fortunately,wethinkthereisaclearpathforwardinthisarea.

    Aboveall,wehaveshownthatitispossibletobuildanautograderforavisualprogrammingenvironment.Thoughthesetoolsdevelopedoutofaneedtoscaleteachingcomputerscience,webelievethathavethepotentialtogreatlyenhancetraditionalclassrooms.Asanareaofactiveresearchweareexcitedtoseethedirectionsthatautogradingvisualprogramminglanguagesmaytake.

    37

  • 13References

    1. UndergradComputerScienceEnrollmentsRiseforFifthStraightYearCRATaulbeeReport-GovAffairs, http://cra.org/govaffairs/blog/2013/03/taulbeereport/

    2. Cuny,JaniceandNSF,nsf09534BroadeningParticipationinComputing(BPC)|NSF-NationalScienceFoundation, 2009.https://www.nsf.gov/publications/pub_summ.jsp?WT.z_pims_id=13510&ods_key=nsf09534

    3. ExploringComputerScience,http://www.exploringcs.org/

    4.Board,TheCollegeandDiez,Lien,APComputerSciencePrinciples-ANewAPCourse-AdvancesinAP-TheCollegeBoard|AdvancesinAP,https://advancesinap.collegeboard.org/stem/computer-science-principles

    5. Harvey,BrianandGarcia,DanielandDevelopment,CorporationEducational,BJC-BeautyandJoyofComputing, http://bjc.berkeley.edu/

    6. AboutUs|edX,https://www.edx.org/about-us

    7. Harvey,BrianandMnig,Jens,Snap!(BuildYourOwnBlocks)4.0,2016.http://snap.berkeley.edu/

    8. Scratch-Imagine,Program,Share,https://scratch.mit.edu/

    9. UCBerkeley},CS10|Spring2016,http://cs10.org/sp16/

    10. {Code.orgCodeStudio,https://studio.code.org/

    11. Fraser,Niel,Blockly|GoogleDevelopers,https://developers.google.com/blockly/

    12. Code.org,code-dot-orgGithub,2016.https://github.com/code-dot-org/code-dot-org/tree/staging/apps

    13. Johnson,DavidE.,ITCH,Proceedingsofthe47thACMTechnicalSymposiumonComputingScienceEducation-SIGCSE'16,ACMPress,2016.http://dl.acm.org/citation.cfm?id=2839509.2844600

    14. Soloway,ElliotandGuzdial,MarkandHay,KennethE.,Learner-centereddesign:thechallengeforHCIinthe21st century,ACM,1994.http://dl.acm.org/citation.cfm?id=174809.174813

    15. ExploringaStructuredDefinitionforLearner-CenteredDesign, http://www.umich.edu/~icls/proceedings/pdf/Quintana2.pdf

    16. GettingStartedwithRailsRubyonRailsGuides,http://guides.rubyonrails.org/getting_started.html#what-is-rails-questionmark

    17. 4.3.CustomJavaScriptApplicationsOpenedXDeveloper'sGuidedocumentation,http://edx.readthedocs.io/projects/edx-developer-guide/en/latest/extending_platform/javascript.html

    18. IMSGlobalLearningConsortium,https://www.imsglobal.org/

    19. LearningToolsInteroperabilityv1.1ImplementationGuide|IMSGlobalLearningConsortium, https://www.imsglobal.org/specs/ltiv1p1/implementation-guide

    20. Huang,RachelandKuphaldt,Adam,LabCheck-OffQuestions,2016.http://cs10.org/sp16/labquestions/

    38

    http://cra.org/govaffairs/blog/2013/03/taulbeereport/https://www.nsf.gov/publications/pub{\_}summ.jsp?WT.z{\_}pims{\_}id=13510{\&}ods{\_}key=nsf09534http://www.exploringcs.org/https://advancesinap.collegeboard.org/stem/computer-science-principleshttp://bjc.berkeley.edu/https://www.edx.org/about-ushttp://snap.berkeley.edu/https://scratch.mit.edu/http://cs10.org/sp16/https://studio.code.org/https://developers.google.com/blockly/https://github.com/code-dot-org/code-dot-org/tree/staging/appshttp://dl.acm.org/citation.cfm?id=2839509.2844600http://dl.acm.org/citation.cfm?id=174809.174813http://www.umich.edu/{~}icls/proceedings/pdf/Quintana2.pdfhttp://guides.rubyonrails.org/getting{\_}started.html{\#}what-is-rails-questionmarkhttp://edx.readthedocs.io/projects/edx-developer-guide/en/latest/extending{\_}platform/javascript.htmlhttps://www.imsglobal.org/https://www.imsglobal.org/specs/ltiv1p1/implementation-guidehttp://cs10.org/sp16/labquestions/

  • 39

  • Appendix A:ObtainingtheSourceCode

    ThecodeforallprojectsinthisreportisavailableinvariousrepositoriesonGithub.

    cycomachead/lambdacontainsthemainRailswebapplication.cycomachead/lambda-evaluatorcontainstheJavaScriptsourcethattalkstoSnap!.jmoenig/Snap--Build-Your-Own-BlockscontainsthesourceforSnap!.cycomachead/thesiscontainsthesourceforthiswork.TheCS10organizationcontainsinfoabouttheCS10course.Thebeautjoyandbjc-edcorganizationscontainsthecurriculumusedinedXandCS10.

    ThegraphsinthisreportweregeneratedprimarilyusingtheblazergemforRails,andthisreportwasproducedusingtheGitBookebooktool.Youcanreaditonline.

    40

    https://github.com/cycomachead/lambdahttps://github.com/cycomachead/lambda-evaluatorhttps://github.com/jmoenig/Snap--Build-Your-Own-Blockshttps://github.com/cycomachead/thesishttps://github.com/cs10https://github.com/beautjoyhttps://github.com/bjc-edchttps://github.com/ankane/blazerhttps://gitbook.comhttps://cycomachead.gitbook.com/thesis

    AbstractAcknowledgementsList of Figures