digibury: neil brown - observing programming novices on a large scale

25
Observing programming novices Neil Brown University of Kent @twistedsq Digibury, 13 Nov 2013

Upload: lizzie-hodgson

Post on 14-Jan-2015

217 views

Category:

Technology


0 download

DESCRIPTION

Neil Brown, Research Associate at University of Kent presented his talk at Digibury, November 13, 2013. In it he explored how people learn to programme, what they find diffcult and what problems slow them down.

TRANSCRIPT

Page 1: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Observing programming

novicesNeil BrownUniversity of Kent@twistedsq

Digibury, 13 Nov 2013

Page 2: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How do people learn to program?(and how can we help them?)

How can we find this outat a large scale?

Page 3: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

Page 4: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

Page 5: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

Page 6: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

Page 7: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

What We Make

BlueJ Greenfoot

2.5 million usersannually

0.4 million usersannually

What Are They All Doing?

Page 8: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Page 9: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Some Small-Scale StudiesAn Exploration of Novice Compilation Behaviour in BlueJ, Matt Jadud, 2007

“Many students write significant amounts of code (10+ lines) at a time, and then attempt to eliminate all the syntactic errors that exist in the code”

Study Size: 62 students

Page 10: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

Page 11: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

BIG DATAAdd recording to all BlueJ instances

(With explicit opt-in)

MEDIUM

Page 12: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

Page 13: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

How Much Data?

20,000 users per day≈ 25% opt-in?≈ 100KB data per user per day≈ 0.5GB per day≈ 200GB per year

✓✗ 40%

✓✗ ≈ 1 GB✗ 300-400 GB

Page 14: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Headline statistics so far (5 months in)

140,000 opted-in users

600,000 projects

5,100,000 successful compilations

4,700,000 unsuccessful compilations

Page 15: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Hardware Specs2 machines (1 for recording, 1 for analysis)24 core 2.5Ghz Xeon, 32GB RAM, 5TB RAID

Page 16: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Page 17: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Most common compile errorsUnknown variable 17%

Semi-colon expected 10%

Unknown method 7%

Bracket expected 7%

Unknown class 5%

Illegal start of expression 4%

Do they change during the term?

Page 18: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Compile errors over time

Page 19: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Page 20: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Rarer compile errors

65th most common compilation error:

非法的表�式开始

Page 21: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x >= 6 && x <= 9){ x = 0;}

Page 22: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

What does this code do?

if (x*x >= 36 && x*x <= 81);{ x = 0;}

Page 23: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

How prevalent is this mistake?

How long does it take before people fix it?

Appeared in 0.15% of source files

Later fixed in half of them...

Page 24: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

Problematic if statements

Page 25: Digibury: Neil Brown - Observing Programming Novices on a Large Scale

Neil Brown, University of Kent, @twistedsq

ChallengesA lot of data -- and a lot of method questions, e.g.

- How do you measure error difficulty? - What is a frequent error? (what is worth caring about?) - How much can you get from this kind of data-set?

Scaling the analysis (already maxing out 24 cores)

Questions?