a data-driven method for the detection of close submitters in online learning environments
TRANSCRIPT
LOGO
A Data-driven Method for the Detection of Close Submitters in Online Learning Environments
José A. Ruipérez Valiente a,b – @JoseARuiperez
Srećko Joksimović c – @s_joksimovic
Vitomir Kovanović c – @vkovanovic
Dragan Gašević c – @dgasevic
Pedro J. Muñoz Merino a – @pedmume
Carlos Delgado Kloos a – @cdkloos
a Universidad Carlos III de Madridb IMDEA Networks Institutec The University of Edinburgh
WWW’17, Perth
Overview
Detect pairs or groups of accounts that always submit their assignments very close in time
Main goals:
Design and develop a general algorithm to detect these accounts
Apply it to our specific case study with Massive Open Online Course (MOOC) data
Analyze and discuss the results in different directions
Related to:
Emerging groups and collaboration in MOOCs (surveys and social activity)
Enrolling in a MOOC with friends improves completion rate [Brooks et al., 2015] and they enjoy
watching videos in groups [Li et al., 2014]
Copying Answers using Multiple Existence Online (CAMEO) [Ruipérez-Valiente et al., 2016;
Northcutt et al., 2016; Alexandron et al., 2016]
Academic dishonesty (breaking honor code) and gaming the system (exploit system properties)
2
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
3
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Basic problem description
N number of accounts | M number of assignments, then:
𝑠𝑝𝑖 = 𝑠𝑝𝑖,1 𝑠𝑝𝑖,1 ⋯ 𝑠𝑝𝑖,𝑀 , 𝑖 ∈ 1⋯𝑁
where 𝑠𝑝𝑖,𝑗 is the submission timestamp
of student i for assignment j. Then we
define SP as:𝑆𝑃 =
𝑠𝑝1𝑠𝑝2⋮
𝑠𝑝𝑁
=
[𝑠𝑝1,1 𝑠𝑝1,2 𝑠𝑝1,3[𝑠𝑝2,1 𝑠𝑝2,2 𝑠𝑝2,3⋮
[𝑠𝑝𝑁,1
⋮𝑠𝑝𝑁,2
⋮𝑠𝑝𝑁,3
⋯ 𝑠𝑝1,𝑀]
⋯ 𝑠𝑝2,𝑀]
⋱⋯
⋮𝑠𝑝𝑁,𝑀]
𝐷𝑆 =
𝑑𝑠1,1 𝑑𝑠1,2 𝑑𝑠1,3𝑑𝑠2,1 𝑑𝑠2,2 𝑑𝑠2,3⋮
𝑑𝑠𝑁,1
⋮𝑑𝑠𝑁,2
⋮𝑑𝑠𝑁,3
⋯ 𝑑𝑠1,𝑁⋯ 𝑑𝑠2,𝑁⋱⋯
⋮𝑑𝑠𝑁,𝑁
then we can define a distance matrix
DS where 𝑑𝑠𝑖,𝑗 = 𝑑𝑖𝑠𝑠(𝑠𝑝𝑖, 𝑠𝑝𝑗). Note:
• Matrix is symmetric and hollow
• High complexity 𝑂(𝑁2 ∗ 𝑑)• Keep set D of unique distances
4
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Problem operationalization
Assignments
Keep only graded quizzes and last submission to each quiz
Course accounts
Keep those accounts that submitted all graded quizzes
Dissimilarity measure
Mean Absolute Deviation (MAD)
Mean Squared Deviation (MSD)
𝑑𝑖𝑠𝑠𝑀𝐴𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =1
𝑀𝑘=1
𝑀
𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘
𝑑𝑖𝑠𝑠𝑀𝑆𝐷 𝑠𝑝𝑖, 𝑠𝑝𝑗 =1
𝑀𝑘=1
𝑀
𝑠𝑝𝑖,𝑘 − 𝑠𝑝𝑗,𝑘2
5
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Two MOOCs on Coursera by the University of Edinburgh
Introduction to Philosophy (PHIL)
• One graded quiz per week, 6-12 questions per quiz
• 7 weeks
• 2359 accounts submitted all assignments
Music Theory (MUSIC)
• One graded quiz per week, 10-14 questions per week
• 5 weeks
• 5159 accounts submitted all assignments
Example of notation 𝐷𝑆𝑚𝑢𝑠𝑀𝐴𝐷
Case study
6
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Compute set D for both courses and
dissimilarity measures
𝐷𝑚𝑢𝑠𝑀𝐴𝐷and 𝐷𝑚𝑢𝑠
𝑀𝑆𝐷 13.305.061
(i.e., (5.159*5.158)/2)
𝐷𝑝ℎ𝑖𝑙𝑀𝐴𝐷and 𝐷𝑝ℎ𝑖𝑙
𝑀𝑆𝐷 2.781.261
Distances overview and distribution
7
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
We follow the next steps:
Select an initial threshold by ‘common-sense’ MAD = 30 minutes
Compute quantile that value represents 4.81e-6 for MUSIC and
5.76e-6 for PHIL
Based on that initial threshold, we test different quantiles and
select one of them
Identifying close submitters
8
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Close submitter pairs by quantile
QuantileCourse
MUSIC PHIL
6e-6
Account pairs 78 17
MAD threshold 0.61h 0.57h
MSD threshold 0.51h2 0.51h2
1e-5
Account pairs 132 28
MAD threshold 0.9h 1.25h
MSD threshold 1.15h2 1.98h2
5e-5
Account pairs 664 140
MAD threshold 2.9h 4.98h
MSD threshold 10.94h2 38.13h2
9
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Based on the identified pairs of ‘close submitters’
Identifying couples and communities
Graph nodes connected
with a undirected edge
between each one of the
pairs
MUSIC: 99 different
accounts, 30 couples
PHIL: 26 different
accounts, 11 couples
10
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Selected variables:
FinalGrade: The final numeric course grade (between 0 and 100)
GotCertificate: Boolean variable representing certificate
SubmissionCount: Number of submissions
ActiveDaysCount: Number of active days
DistinctVideoCount: Number of videos accessed or downloaded
DistinctThreadCount: Number of discussion topics accessed
Examining differences: ‘close submitters’ vs. others
11
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
Examining differences: ‘close submitters’ vs. others
MANOVA is significant for both courses and for both certificate and non-certificate earners
All independent t-tests are significant too
Discussion and conclusions
12
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
‘Close submitters’ are a population statistically different than the rest of accounts
What are they actually doing?
Is it good or bad for learning achievement?
Implications for learning, research and certificate value
Future work
Clustering based on their indicators Assess different associations
Couple and community analysis roles, good or bad for learning, etc
Algorithm improvements more robust, different criteria
Bigger longitudinal study with more MOOCs to increase generalizability
Other settings e.g., online on-campus courses for credit
13
WWW’17, Perth
@JoseARuiperezA Data-driven Method for the Detection of Close Submitters in Online Learning Environments
LOGOWWW’17, Perth
A Data-driven Method for the Detection of Close Submitters in Online Learning Environments
José A. Ruipérez Valiente a,b – @JoseARuiperez
Srećko Joksimović c – @s_joksimovic
Vitomir Kovanović c – @vkovanovic
Dragan Gašević c – @dgasevic
Pedro J. Muñoz-Merino a – @pedmume
Carlos Delgado Kloos a – @cdkloos
a Universidad Carlos III de Madridb IMDEA Networks Institutec The University of Edinburgh