digibury: neil brown - observing programming novices on a large scale
DESCRIPTION
Neil Brown, Research Associate at University of Kent presented his talk at Digibury, November 13, 2013. In it he explored how people learn to programme, what they find diffcult and what problems slow them down.TRANSCRIPT
Observing programming
novicesNeil BrownUniversity of Kent@twistedsq
Digibury, 13 Nov 2013
Neil Brown, University of Kent, @twistedsq
How do people learn to program?(and how can we help them?)
How can we find this outat a large scale?
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
Neil Brown, University of Kent, @twistedsq
What We Make
Neil Brown, University of Kent, @twistedsq
What We Make
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
2.5 million usersannually
0.4 million usersannually
Neil Brown, University of Kent, @twistedsq
What We Make
BlueJ Greenfoot
2.5 million usersannually
0.4 million usersannually
What Are They All Doing?
Neil Brown, University of Kent, @twistedsq
Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”
Neil Brown, University of Kent, @twistedsq
Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007
“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”
Study Size: 62 students
Neil Brown, University of Kent, @twistedsq
BIG DATAAdd recording to all BlueJ instances
(With explicit opt-in)
Neil Brown, University of Kent, @twistedsq
BIG DATAAdd recording to all BlueJ instances
(With explicit opt-in)
MEDIUM
Neil Brown, University of Kent, @twistedsq
How Much Data?
20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year
Neil Brown, University of Kent, @twistedsq
How Much Data?
20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year
✓✗ 40%
✓✗ ≈ 1 GB✗ 300-400 GB
Neil Brown, University of Kent, @twistedsq
Headline statistics so far (5 months in)
140,000 opted-in users
600,000 projects
5,100,000 successful compilations
4,700,000 unsuccessful compilations
Neil Brown, University of Kent, @twistedsq
Hardware Specs2 machines (1 for recording, 1 for analysis)24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID
Neil Brown, University of Kent, @twistedsq
Most common compile errorsUnknown variable 17%
Semi-colon expected 10%
Unknown method 7%
Bracket expected 7%
Unknown class 5%
Illegal start of expression 4%
Neil Brown, University of Kent, @twistedsq
Most common compile errorsUnknown variable 17%
Semi-colon expected 10%
Unknown method 7%
Bracket expected 7%
Unknown class 5%
Illegal start of expression 4%
Do they change during the term?
Neil Brown, University of Kent, @twistedsq
Compile errors over time
Neil Brown, University of Kent, @twistedsq
Rarer compile errors
65th most common compilation error:
非法的表�式开始
Neil Brown, University of Kent, @twistedsq
Rarer compile errors
65th most common compilation error:
非法的表�式开始
Neil Brown, University of Kent, @twistedsq
Problematic if statements
What does this code do?
if (x >= 6 && x <= 9){ x = 0;}
Neil Brown, University of Kent, @twistedsq
Problematic if statements
What does this code do?
if (x*x >= 36 && x*x <= 81);{ x = 0;}
Neil Brown, University of Kent, @twistedsq
Problematic if statements
How prevalent is this mistake?
How long does it take before people fix it?
Appeared in 0.15% of source files
Later fixed in half of them...
Neil Brown, University of Kent, @twistedsq
Problematic if statements
Neil Brown, University of Kent, @twistedsq
ChallengesA lot of data -- and a lot of method questions, e.g.
- How do you measure error difficulty? - What is a frequent error? (what is worth caring about?) - How much can you get from this kind of data-set?
Scaling the analysis (already maxing out 24 cores)
Questions?