a data-driven method for the detection of close submitters in online learning environments

14
A Data-driven Method for the Detection of Close Submitters in Online Learning Environments José A. Ruipérez Valiente a,b @JoseARuiperez Srećko Joksimović c @s_joksimovic Vitomir Kovanović c @vkovanovic Dragan Gašević c @dgasevic Pedro J. Muñoz Merino a @pedmume Carlos Delgado Kloos a @cdkloos a Universidad Carlos III de Madrid b IMDEA Networks Institute c The University of Edinburgh WWW’17, Perth

Upload: jose-antonio-ruiperez-valiente

Post on 16-Mar-2018

205 views

Category:

Education


1 download

TRANSCRIPT

LOGO

A Data-driven Method for the Detection of Close Submitters in Online Learning Environments

José A. Ruipérez Valiente a,b – @JoseARuiperez

Srećko Joksimović c – @s_joksimovic

Vitomir Kovanović c – @vkovanovic

Dragan Gašević c – @dgasevic

Pedro J. Muñoz Merino a – @pedmume

Carlos Delgado Kloos a – @cdkloos

a Universidad Carlos III de Madridb IMDEA Networks Institutec The University of Edinburgh

WWW’17, Perth

Overview

Detect pairs or groups of accounts that always submit their assignments very close in time

Main goals:

Design and develop a general algorithm to detect these accounts

Apply it to our specific case study with Massive Open Online Course (MOOC) data

Analyze and discuss the results in different directions

Related to:

Emerging groups and collaboration in MOOCs (surveys and social activity)

Enrolling in a MOOC with friends improves completion rate [Brooks et al., 2015] and they enjoy

watching videos in groups [Li et al., 2014]

Copying Answers using Multiple Existence Online (CAMEO) [Ruipérez-Valiente et al., 2016;

Northcutt et al., 2016; Alexandron et al., 2016]

Academic dishonesty (breaking honor code) and gaming the system (exploit system properties)

2

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

3

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Basic problem description

N number of accounts | M number of assignments, then:

𝑠𝑝𝑖 = 𝑠𝑝𝑖,1 𝑠𝑝𝑖,1 ⋯ 𝑠𝑝𝑖,𝑀 , 𝑖 ∈ 1⋯𝑁

where 𝑠𝑝𝑖,𝑗 is the submission timestamp

of student i for assignment j. Then we

define SP as:𝑆𝑃 =

𝑠𝑝1𝑠𝑝2⋮

𝑠𝑝𝑁

=

[𝑠𝑝1,1 𝑠𝑝1,2 𝑠𝑝1,3[𝑠𝑝2,1 𝑠𝑝2,2 𝑠𝑝2,3⋮

[𝑠𝑝𝑁,1

⋮𝑠𝑝𝑁,2

⋮𝑠𝑝𝑁,3

⋯ 𝑠𝑝1,𝑀]

⋯ 𝑠𝑝2,𝑀]

⋱⋯

⋮𝑠𝑝𝑁,𝑀]

𝐷𝑆 =

𝑑𝑠1,1 𝑑𝑠1,2 𝑑𝑠1,3𝑑𝑠2,1 𝑑𝑠2,2 𝑑𝑠2,3⋮

𝑑𝑠𝑁,1

⋮𝑑𝑠𝑁,2

⋮𝑑𝑠𝑁,3

⋯ 𝑑𝑠1,𝑁⋯ 𝑑𝑠2,𝑁⋱⋯

⋮𝑑𝑠𝑁,𝑁

then we can define a distance matrix

DS where 𝑑𝑠𝑖,𝑗 = 𝑑𝑖𝑠𝑠(𝑠𝑝𝑖, 𝑠𝑝𝑗). Note:

• Matrix is symmetric and hollow

• High complexity 𝑂(𝑁2 ∗ 𝑑)• Keep set D of unique distances

4

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Problem operationalization

Assignments

Keep only graded quizzes and last submission to each quiz

Course accounts

Keep those accounts that submitted all graded quizzes

Dissimilarity measure

Mean Absolute Deviation (MAD)

Mean Squared Deviation (MSD)

𝑑𝑖𝑠𝑠𝑀𝐴𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =1

𝑀𝑘=1

𝑀

𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘

𝑑𝑖𝑠𝑠𝑀𝑆𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =1

𝑀𝑘=1

𝑀

𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘2

5

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Two MOOCs on Coursera by the University of Edinburgh

Introduction to Philosophy (PHIL)

• One graded quiz per week, 6-12 questions per quiz

• 7 weeks

• 2359 accounts submitted all assignments

Music Theory (MUSIC)

• One graded quiz per week, 10-14 questions per week

• 5 weeks

• 5159 accounts submitted all assignments

Example of notation 𝐷𝑆𝑚𝑢𝑠𝑀𝐴𝐷

Case study

6

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Compute set D for both courses and

dissimilarity measures

𝐷𝑚𝑢𝑠𝑀𝐴𝐷and 𝐷𝑚𝑢𝑠

𝑀𝑆𝐷 13.305.061

(i.e., (5.159*5.158)/2)

𝐷𝑝ℎ𝑖𝑙𝑀𝐴𝐷and 𝐷𝑝ℎ𝑖𝑙

𝑀𝑆𝐷 2.781.261

Distances overview and distribution

7

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

We follow the next steps:

Select an initial threshold by ‘common-sense’ MAD = 30 minutes

Compute quantile that value represents 4.81e-6 for MUSIC and

5.76e-6 for PHIL

Based on that initial threshold, we test different quantiles and

select one of them

Identifying close submitters

8

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Close submitter pairs by quantile

QuantileCourse

MUSIC PHIL

6e-6

Account pairs 78 17

MAD threshold 0.61h 0.57h

MSD threshold 0.51h2 0.51h2

1e-5

Account pairs 132 28

MAD threshold 0.9h 1.25h

MSD threshold 1.15h2 1.98h2

5e-5

Account pairs 664 140

MAD threshold 2.9h 4.98h

MSD threshold 10.94h2 38.13h2

9

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Based on the identified pairs of ‘close submitters’

Identifying couples and communities

Graph nodes connected

with a undirected edge

between each one of the

pairs

MUSIC: 99 different

accounts, 30 couples

PHIL: 26 different

accounts, 11 couples

10

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Selected variables:

FinalGrade: The final numeric course grade (between 0 and 100)

GotCertificate: Boolean variable representing certificate

SubmissionCount: Number of submissions

ActiveDaysCount: Number of active days

DistinctVideoCount: Number of videos accessed or downloaded

DistinctThreadCount: Number of discussion topics accessed

Examining differences: ‘close submitters’ vs. others

11

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

Examining differences: ‘close submitters’ vs. others

MANOVA is significant for both courses and for both certificate and non-certificate earners

All independent t-tests are significant too

Discussion and conclusions

12

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

‘Close submitters’ are a population statistically different than the rest of accounts

What are they actually doing?

Is it good or bad for learning achievement?

Implications for learning, research and certificate value

Future work

Clustering based on their indicators Assess different associations

Couple and community analysis roles, good or bad for learning, etc

Algorithm improvements more robust, different criteria

Bigger longitudinal study with more MOOCs to increase generalizability

Other settings e.g., online on-campus courses for credit

13

WWW’17, Perth

@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments

LOGOWWW’17, Perth

A Data-driven Method for the Detection of Close Submitters in Online Learning Environments

José A. Ruipérez Valiente a,b – @JoseARuiperez

Srećko Joksimović c – @s_joksimovic

Vitomir Kovanović c – @vkovanovic

Dragan Gašević c – @dgasevic

Pedro J. Muñoz-Merino a – @pedmume

Carlos Delgado Kloos a – @cdkloos

a Universidad Carlos III de Madridb IMDEA Networks Institutec The University of Edinburgh