course drop-out prediction on mooc platform via clustering...

TSINGHUA SCIENCE AND TECHNOLOGYISSNll1007-0214 05/11 pp412–422DOI: 10 .26599 /TST.2018 .9010110Volume 24, Number 4, August 2019

Course Drop-out Prediction on MOOC Platform via Clusteringand Tensor Completion

Jinzhi Liao, Jiuyang Tang, and Xiang Zhao�

Abstract: As a supplement to traditional education, online courses offer people, regardless of their age, gender, or

profession, the chance to access state-of-the-art knowledge. Nonetheless, despite the large number of students

who choose to begin online courses, it is easy to observe that quite a few of them drop out in the middle, and

information on this is vital for course organizers to improve their curriculum outlines. In this work, in order to

make a precise prediction of the drop-out rate, we propose a combined method MOOP, which consists of a

global tensor and local tensor to express all available feature aspects. Specifically, the global tensor structure is

proposed to model the data of the online courses, while a local tensor is clustered to capture the inner connection

of courses. Consequently, drop-out prediction is achieved by adopting a high-accuracy low-rank tensor completion

method, equipped with a pigeon-inspired algorithm to optimize the parameters. The proposed method is empirically

evaluated on real-world Massive Open Online Courses (MOOC) data, and is demonstrated to offer remarkable

superiority over alternatives in terms of efficiency and accuracy.

Key words: MOOC platform; drop-out prediction; tensor completion; clustering

1 Introduction

With the proliferation of the Internet, Massive OpenOnline Courses (MOOC), which integrate picture,voice, and flash animations into online courses,significantly influence traditional education whilerelating to it as a competitor and supplement. CurrentMOOC platforms, such as Coursera and EdX, offerthe opportunity for every person, regardless of theirage, gender, or previous educational background, toaccess state-of-the-art courses and communicate withtop lecturers. While hundreds of courses are registeredon these platforms on a daily basis, an ever-largernumber of students[1] choose to participate.

Nevertheless, what ought not to be neglected is thelarge number of participants who drop out of on-line

� Jinzhi Liao, Jiuyang Tang, and Xiang Zhao are withNational University of Defense Technology, Changsha 410072,China. E-mail: [email protected]; [email protected];[email protected].�To whom correspondence should be addressed.

Manuscript received: 2018-05-15; accepted: 2018-06-26

educational platforms, since it is common for studentsto select courses on impulse, only to give up midwaythrough. Instead of focusing on student behavior atcourse selection, it is advisable for educators to paymore attention to the drop-out behavior, which providesmore information about the quality of courses andthe preferences of students. In contrast to traditionalmodes of education, on-line courses such as MOOCare unable to supervise students’ learning process,which, to a certain degree, influences the outcomeof these courses. Once the system is able to capturestudents’ behavior after selecting a course, educatorswill formulate better schemes to model the long-terminterest of participants and, in turn, the overall qualityof courses. Therefore, the ability to predict drop-outs isof urgent significance.

An ideal drop-out prediction model can learn fromthe history of participant behavior and make moreprecise predictions. To illustrate, assume a student hasalready selected “python”, “java”, and “world economicforum”, but has finished only the first two. Then, whenthe student is planning to select “macro-economics” and

Jinzhi Liao et al.: Course Drop-out Prediction on MOOC Platform via Clustering and Tensor Completion 413

“TensorFlow”, the model is supposed to provide a drop-out prediction for these two courses. What should beaccounted for is the inner connection of the courses.In other words, “python”, “java”, and “TensorFlow”all belong to the machine learning framework, while“macro-economics” and “world economic forum”can be categorized as economics. Consequently, theprediction model ought to yield a high drop-outprobability for the newly selected course “macro-economics”, and a lower likelihood of dropping outof “TensorFlow”. Figure 1 depicts the aforementionedscenario. Therefore, for the aim of constructing anefficacious drop-out prediction model, the focus is toharness the similarities between students and courses.

In this research, targeted at predicting the drop-out rate of students enrolling in on-line courses, wepropose a combinational model of a global tensor andlocal tensor to describe the inner correlations betweenstudents and courses. In contrast to conventionalmethods, we estimate missing values in a combinationalmanner, taking into account (1) the global connectionsbetween students and all courses, and (2) localconnections between students in similar course groups.To enhance the accuracy of clustering, a new similaritycalculation method is devised in such a way that itworks closely with High-accuracy Low-Rank TensorCompletion (HaLRTC) to provide an accurate drop-outprediction. With the goal of predicting MOOC drop-out, we put forward a method named Mooc drOp-OutPrediction (MOOP).

To summarize, this article makes four keycontributions.� We propose to process drop-out prediction by

merging a global tensor and local tensor to express all

available feature aspects. To the best of our knowledge,this represents one of the first attempts at this approach.� A global tensor structure is proposed to model

the MOOC data, while a local tensor is clusteredto represent course connections. A new similarityestimation method is introduced to enhance theexplanatory power of the cluster.� Drop-out prediction is achieved by adopting a

high-accuracy low-rank tensor completion method,equipped with a pigeon-inspired algorithm to optimizethe parameters.� We experimentally evaluate the proposed method

on real-world MOOC data, and thereby demonstrateremarkable superiority over alternatives in terms ofefficiency and accuracy.

The rest of this paper is organized as follows.Section 2 presents an overview of previous workrelating to various prediction methods in the domainof MOOC drop-out. After outlining the necessarybackground knowledge, the proposed MOOP methodis introduced in Section 3, including the specific modeland algorithms. Experimental studies are reported inSection 4, followed by a concluding Section 5.

2 Related Work

MOOC, as a new approach to education, revolutionizetraditional education methods and attract attentionboth in academic and industrial settings. Although thecourses are of huge benefit and convenience, one criticalproblem that should not be neglected is their highdrop-out rates[2–4]. When course organizers arrangetheir course schedules, the drop-out rate should begiven equal, if not greater, attention when comparedwith the selection rate, since the act of selecting

Fig. 1 Sketch of course drop-out problem.

414 Tsinghua Science and Technology, August 2019, 24(4): 412–422

does not guarantee long-term persistence with study.Consequently, a large number of methods have beendevised to deal with the task of predicting MOOC drop-outs.

Traditionally, statistical techniques have been widelyutilized to explore the correlations between the drop-out rate and other factors, with the aim of working outthe determinant factors of success in schooling. Someprevious studies discovered that university entry scoreswere a good index to forecast student performance[5],since it is obvious that the higher the entry score,the better equipped the student is for study. However,Thomas et al.[6] found that entry score actually hada poor correlation to performance in some specificdomains, such as mechanics courses, and that thesescores were not easily accessible, since many academiesdid not arrange other examinations to test the students’professional skills. The common weakness of thesefactors lies in the uncertainties inherent in differentsituations. Specifically, the differences between theage might determine degree of education and theinstitution[7], and gender plays an important rolein engineering and physical sciences, where theproportion of female students is very low. Motivationhas been generally shown to correlate positivelywith student success and retention[8]. The drop-outrate is also considered an aspect of the analyticframework; it is used, for example, to predict theperformance of participants in online classes[9] andto identify predictors of academic persistence indistance education[10]. Although the addition of drop-out rates expands the statistical methods, the intrinsicdisadvantages of tackling high-dimensional data restrictthe development of successful methods.

Based on its remarkable performance in predictionand classification of context analysis, Natural LanguageProcessing (NLP) is also promoted as a methodfor the analysis of MOOC data. There are manymethods based on NLP aimed at investigating elementsof MOOC that are unrelated to student success.Elouazizi[11] used the linguistic aspects of point-of-view as an indicator of cognitive presence to constructan exploratory framework for examining students’learning-based inquiry in the MOOC context. Focusingon finding the leaders in MOOC discussion forums,Moon et al.[12] proposed a method to measure languageaccommodation which represents the students’ choicesof words given a specific theme. Several NLP methodsare adopted by Crossley et al.[13] to compare the

performance of different languages in forum posts toevaluate which one is predictive of MOOC completion.Wen et al.[14] tested three courses using a sentimentanalysis approach to verify the significant correlationbetween the sentiments expressed in the course forumposts and the number of students who drop out of thecourse. Moreover, in Ref. [15], two approaches aredevised to quantify the engagement, which is validatedon three courses with different topics. In particular,the motivation and engagement attitudes contained incourse forums are identified to achieve this task.

Based on its success in other domains, researchershave gradually began to apply neural networks topredict MOOC drop-outs. An artificial neural networkwas initially put forward to predict the drop-out rate,and achieved a high level of accuracy and efficiency[16].In order to further improve the accuracy, a multi-layer network was proposed, upon which a MOOCdrop-out predictor was constructed by utilizing thecollected data[17]. Additionally, drop-out prediction wasviewed as a sequence classification problem and atemporal model was utilized. Among the methods, longshort-term memory networks yield the best results.Sinha et al.[18] paid attention to students’ behaviorwhen they interacted with MOOC video lectures,to capture behavioral patterns in student activity.They then utilized these patterns to evaluate students’information processing index. In a following work[19],they constructed a graph to capture the sequence ofactive and passive learning activities, and used graphmetrics as features for predicting attrition.

With the increasing popularity of machine learningtechniques, many other attempts have been made tohandle the problem. Kloft et al.[20] proposed a machinelearning method based on support vector machines forpredicting drop-outs, where the MOOC data of thecurrent and previous weeks were learned to estimate themissing data in the following week. Nagrecha et al.[21]

combined feature engineering, data preprocessing, andmetrics as parameters in the context of interpretabilityto construct a decision tree to accomplish drop-outprediction. Hidden Markov models were employed byBalakrishnan and Coetzee[22] to help predict studentretention as well as to infer general patterns of behaviorbetween those students that complete the course, andthose that drop out at different points in time. Logisticregression, one of the most conventional machinelearning methods, is also widely utilized[23–26].

It is undoubtedly difficult for any given solution to


overmatch the others in all scenarios, since differentdrop-out prediction tasks require different models tocapture the core of problem, and the fittest models oughtto deal with diverse realistic demands. In this paper,in order to fully exploit the latent patterns of data, wepropose the MOOP model, which is validated to achievepromising outcomes.

3 Methodology

In this section, we first elaborate the tensor structure tomodel the MOOC data, then introduce a new similaritymethod to cluster course data for constructing the localtensor, and follow this with details of a pigeon-inspiredparameter optimization procedure. The overall processis depicted as Algorithm 1.

3.1 Tensor structure for prediction

We first throw light on the basics of tensors, and thenpresent the tensor structure equipped with a fast low-rank tensor completion for drop-out prediction.

Tensor basics. A tensor is a high-dimensionaldata representation, whose expression is vector (1-mode) and matrix (2-mode). An n-mode tensor canbe defined as X 2 RI1�I2��In , where In denotes thequantity of mode n, and its elements are denotedas x.I1;:::;Ik/, where 1 6 k 6 n. The matriculatingoperator, which unfolds a tensor into a matrix, is definedas unfold .X; n/ D X.n/, in which the tensor element.I1; I2; : : : ; In/ is mapped to the matrix element

Algorithm 1 MOOPInput: global tensor T , completion tensor T , balance

parameter ˛, weight parameter ˇ D .ˇ1; ˇ2; : : : ; ˇk/, and i D . i1; i2; : : : ; ik/; i 2 Œ1; k�, selecting feature F1,finishing feature F2, global rating feature F3, and criterionprecision P ;

Output: prediction ˇ, i .1: initialize ˛ D 0:5, ˇ, i

2: local tensor Lk = J-Sim(X , F1, F2, F3)3: for n D 1 W k do4: train data X D random.T /5: test data X D T � X6: while Function.X ;X / reach P do7: ˇn; n D PIO.X ;Lk ; a; ˇn; n/

8: X D HaLRTC.X ;Lk ; a; ˇn; n/

9: Lk D refresh.Lk ;X /10: return ˇn; n

11: end while12: end for13: return ˇ, i ;

.In; J /, where

J D

k�1YmD1; m¤n

Im (1)

The reverse of the matriculation is defined asfold .X.n/; n/ D X in a similar way.

The inner product of two same-size tensors A;B 2R.I1�I2��IN / is defined as the sum of the products oftheir entries:.A;B/ D

Xi1

Xi2

: : :XiN

a.i1;i2;:::;in/b.i1;i2;:::;in/ (2)

For any 1 6 n 6 N , the product of a matrix M 2

RJ�In with a tensor A 2 R.I1�I2��IN / is expressedas A �n M , and transformed into the product of twomatrixes.

Y D A �nM , Y.n/ DMA.n/ (3)

Denote kXkF Dp.X;X/ as the Frobenius norm of

a tensor. It is clear that kXkF D kX.k/k:

Global tensor. Suppose that there are m courses andn students in the MOOC data. In order to employ theexcellent explanatory power of the tensor structure, weconstruct the data into 3-dimension tensor T , as Fig. 2shows.

The abscissa axis and vertical axis represent thestudents, which means the scale of each matrix in thetensor is n � n, and the other dimension represents them course. Thus the size of T is n � n � m, and werandomly select a slice of size k. There are differentvalue interpretations for Xi ik: (1) The value Xi ik D 0

represents “student i does not select the course k”;(2) the value Xi ik D 1 represents “student i doesselect the course k, and fails to finish it”; (3) the valueXi ik D 2 represents “student i does select the coursek, and finishes it”; (4) the value Xijk D 0 represents“student i and student j do not select the same course

n

m

n

…

…

Fig. 2 Sketch of global tensor.


k”; (5) the value Xijk D 1 represents “student i andstudent j do select the same course k, and they bothfail to finish it”; and (6) the value Xijk D 2 represents“student i and student j select the same course k, andthey both finish it”. As we can see, the tensor can notonly express the drop-out performance of all studentsin all courses, but also reveal the relationship betweenstudents in the specific course. Therefore, we put alldata into one tensor to construct a global tensor as astep in generating a MOOP for drop-out prediction.

3.2 Similarity model for local tensor

We throw light on the local tensor, and propose our newsimilarity calculation method, J-Sim, to cluster coursedimension and construct the local tensor.

Local tensor. The idea of constructing a localtensor comes from the fact that there are evidentdifferences among courses in different categories, suchas humanities and social sciences, which renders thesimple combination of student’s behaviour across allcourses inappropriate. For example, a student majoringin humanities might pay more attention in a coursesuch as literature, and naturally shares little similaritywith another student taking a course in physicalsciences. These uncorrelated student pairs increasecomputation cost and decrease accuracy. To enhancethe performance of the model, we construct a localtensor as a supplement to the global tensor so as topredict drop-out rate.

In order to model the local tensor, firstly, weintroduce course vectors, which comprise three typesof features:� Selecting feature .f1/. For courses, this feature

is assigned with the value 0 or 1 for each student,representing whether the student has chosen the courseor not;

� Finishing feature .f2/. For courses, this featureis assigned with the value 0 or 1 for each student,representing whether the student has finished the courseor not;� Global rating feature .f3/. For courses, this feature

records all the finished courses in the selected courses.Secondly, we expect to cluster course dimension

based on the features above to acquire several localtensors. In a local tensor, there are only correlativecourses, and students’ behaviours will be evaluated inone subject framework, as Fig. 3 shows. There is nodoubt that with the combination of local tensors thepertinence of MOOP sharply increases.

Similarity estimation method. The clusterperformance is largely influenced by the results ofsimilarity estimation, which is elaborated as follows.

Suppose that there are two students X and Y , andXi; j D .x1; x2/ denotes the relation between X withcourse i; j . Yi; j D .y1; y2/ can be interpreted similarlyas above. In the traditional method, the similaritybetween two vectors is usually calculated via the cosinefunction, which can be expressed as below:

cos � Dx1x2 C y1y2q

x21 C y21 �

qx22 C y

22

(4)

The advantage of the cosine function lies in the factthat it has a high sensitivity to common behaviours. Inother words, the more mutual coursesX and Y select incommon, the more similar they might be.

Nevertheless, as can be observed from Eq. (4), wemay easily draw the conclusion that whenX expands toaX , the similarity does not change.

cos � Dax1x2 C ay1y2q

.ax/21 C .ay/21 �

qx22 C y

22

(5)

where a denotes the arbitrary constant.When it comes to MOOC, the problem can be

…

Students

Stu

den

ts

Students

Stu

den

ts

Students

Stu

den

ts

…

…

…

…

…

…

Courses_m

Fig. 3 Sketch of local tensor.


interpreted by saying that if student X selects the samecourses as student Y , where X gives up on all ofthe courses while Y finishes them, the similarity ofthese two students still reaches 100%. In fact, althoughthe interests of the two students might be similar, theoutcome of the learning process is disparate. Besides,the cosine function cannot distinguish the differentvectors having the same angle, which might result inconsidering two students with totally different tastesas similar. Thus, we have to construct another uniquesimilarity estimation method to modify the cosinefunction.

Inspired by the promising properties of hyperbolicfunctions, we propose the similarity estimation methodto cluster dimensions.

sim.X; Y / D2 � exp

��˛

�1

cos �� 1

��1C exp

��˛

�1

cos �� 1

�� (6)

where cos � denotes cosine similarity between X andY , and ˛ represents balance factor. The effectiveness ofour similarity equation, namely J-Sim, is evaluated inSection 4.

3.3 Combination method

The merge of global tensor and local tensor equippedwith a tensor completion method, as well as a parameteroptimization algorithm applied to each combinationblock, are illustrated in this subsection.

Merge of global and local tensors. Consideringjust a global tensor or local tensor in isolation willdistort the inner connections between students andcourses. Combining a global tensor and local tensoris advantageous in three ways: (1) A global tensorintroduces the overall features into the model, whichcan reveal how the whole course framework influencesstudents; (2) the local tensor specifies the problemof different course groups possibly having differenteffects on students; (3) the merge of two tensors leadsto a comprehensive analysis of the intrinsic relationsbetween courses and students.

Suppose that we get k local tensors after the formerclustering. Therefore, we utilize ˇ D .ˇ1; ˇ2; : : : ; ˇk/

to describe the connections between the globaltensor and each local tensor, and D . 1; 2; : : : ; k/represents the corresponding parameters of each localtensor. To further explore the influence of students’behaviours from other local tensors, we expand into i D . i1; i2; : : : ; ik/; i 2 Œ1; k�. In other words,

when we concentrate on estimating the drop-out rate ofcourses in a certain local tensor, not the global tensor’sbut other local tensors’ influence should be taken intoaccount, and it is obvious the parameter i must bedifferent in different local tensors. The vivid descriptionis illustrated in Fig. 4.

Tensor completion method. In order to accuratelypredict and accomplish the above merge, we adopt atensor completion method to estimate MOOC drop-outs.

Conventional tensor completion methods, such asCanonical Polyadic (CP) decomposition[27] and Tuckerdecomposition[28], adopt a heuristic algorithm, whichlowers the dimension of existing data to simplify thecalculation. Particularly in CP decomposition, a tensorA 2 Rn1�n2��nd is represented with a suitably larger as a linear combination of r rank-1 tensors (vectors);that is,

A D

rXiD1

�i˛1i ˝ ˛

2i ˝ � � � ˝ ˛

di (7)

minX;˛1; :::; ˛n

�i k X � ˛1i ˝ � � � ˝ ˛

ni k

2F (8)

s.t. X˝ D D˝ (9)

where ˝ denotes the real number filed of tensor.In Tucker decomposition, a tensorA 2 Rn1�n2��nd

is decomposed into a set of matrices U.m/ 2 RIm�Jm

.1 6 m 6 d/; and one small core tensor G 2

RJ1�J2��Jd ; that is,A D G �1 U.1/ �2 U.2/ � � � � �d U.d/ (10)

minX;G;U.1/:::U.n/

1

2k X�G�1U.1/�� nU.n/ k

2F (11)

s.t. X˝ D D˝ (12)

Nonetheless, these methods require thetransformation of data structures and eachdecomposition partially distorts, hence giving riseto gradual error accumulation.

In contrast to conventional methods, this paperfocuses on global and local tensor completion,which produces high accuracy and demands lowcomputational efforts from the algorithm. Thus, weincorporate an HaLRTC algorithm[29] for this purpose,which has been proven to outperform other tensorcompletion methods.

Parameter optimization procedure. We mainlyutilize the Pigeon-Inspired Optimization (PIO)procedure to optimize the parameters of MOOP.As introduced in Ref. [30], PIO, which is a population-based swarm intelligence algorithm, imitates pigeons


…

Students

Stu

den

ts

Students

Stu

den

ts

Students

Stu

den

ts

n

n

…

…

…

…

…

…

…

…

m

Courses_k

β1

γ12γ12

Fig. 4 Sketch of connecting global and local tensors.

navigation homing behavior. In the algorithm, Duanand Qiao[30] adopted two operators to describe twostages of the homing phenomenon.

In the first stage, pigeons can briefly picture thetopographic map in their head by means of magneticsense. They take the height of the sun as a compassto modify their flight path. When approaching thedestination, the dependence on the sun decreases.

Landmark operator. In the second stage, whenapproaching the destination, the pigeons pay moreattention to the landmark. When spotting the familiarbuilding, they will fly straight to their goal. Otherwise,they will follow leaders that are familiar with thelandmark.

PIO sets an initial location Xi D Œxi1; xi2; : : : ; xin�

and velocity Vi D Œvi1; vi2; : : : ; vin� for the pigeons.Then, the new location and velocity of each pigeon areupdated.

V ti D Vt�1i e�R�t C rand.Xgbest �X

t�1i / (13)

X ti D Xt�1i C V ti (14)

where R 2 Œ0; 1� denotes the map compass operator,rand represents random number values in Œ0; 1�, tdenotes the current iteration, and Xgbest is the globaloptimum in t � 1 iterations. The first stage operationwill repeat until T1 iterations.

In the second stage, with the landmark operatorutilized, pigeons compare the operator with thedestination. If matching well, the pigeons fly straightto the goal. After each iteration, the half of thepigeons which are furthest from the destination can beweeded out. Xcenter, which is the central location of theremainder, will be set as the new landmark. The secondstage operation will repeat until T2 iterations. Thecombined system is defined as follows:

X t�1center D

PX t�1i � F.X t�1i /

N t�1p

PF.X t�1i /

(15)

N tp D

N t�1p

2(16)

Xi D Xt�1i C rand.X t�1center �X

t�1i / (17)


where F.�/ is the quality of the individual pigeonindividual, and defined as

F.X t�1i / D

8<:1

fitness.X t�1i /C "; for maximization;

fitness.X t�1i /; for minimization(18)

where " denotes the random noise.With PIO applied to optimize the parameters of

MOOP, each global tensor and local tensor pair hasmore explanatory and typical power for prediction.

4 Experiments and Results

In this section, the experimental results are reported,followed by an in-depth analysis.

4.1 Experiment settings

We utilized the most widely used public MOOCstudents-courses data (https://kddcup2015.com) forempirical evaluation.

Data sources. Enrolment data are leveraged forevaluation, containing information on students,courses, and the completion states of specific courses.Specifically, the number of registered students is79 186, and the number of courses is 39. We counthow many students are selecting each given course,and how many courses are registered by each student,and detailed statistics of the data are depicted in Fig. 5.As we can see, a great number of registered students(74 821) have selected less than 3 courses, which mightrepresent the contingency of their selecting behaviorand thus sharpen the explanatory ability of MOOCdata. Therefore, we filter for students who select lessthan 3 courses in the following experiment, and Fig. 5cexpresses the outcome.

Evaluation indices. Since our target is to predict thedropout of students, we adopt F1-score as the evaluationmetric.

MAPE D2 � TP

2 � TPC FNC FP(19)

TP denotes that the prediction is positive and the truevalue is positive; FP denotes that the prediction ispositive and the true value is negative; and FN denotesthat the prediction is negative and the true value isnegative. We randomly set several existing values asnull to construct training data and testing data, andthe selected data was transformed into a selecting list(original value is 1) and a finishing list (original valueis 2). Then, we utilized MOOP to estimate the missingdata, and evaluated the prediction with the two lists.

5 10 15 20 25 30 35 40

Course number

0

2000

4000

6000

8000

10 000

12 000

14 000

Reg

iste

red

stu

den

ts

(a) Course perspective

1 2 3 4 5 6 7 8

Registered students (×104)0

5

10

15

20

25

30

Sel

ecte

d c

ours

e

(b) Student perspective

500 1000 1500 2000 2500 3000 3500 4000 4500

Registered students

0

5

10

15

20

25

30

Sel

ecte

d c

ou

rse

(c) Filtering students

Fig. 5 Data statistics. (a) The amount of registering studentsabout each course; (b) the amount of courses each studentregistering; and (c) detailed statistics of the data.

4.2 Experiment results

We firstly showed J-Sim variation trend in ˛ D

Œ0:1; 8�, which verified the superiority of the similaritycalculation method in Figs. 6 and 7. The outcomereveals that J-Sim has a more flexible variationtrend than the cosine function and is better atdistinguishing the course similarities for clustering thanthe conventional cosine similarity method. Note that thelighter the colour is, the less discriminative the MOOCdata are, and we choose ˛ D 0:5.

We further compare MOOP with other state-of-the-art methods. Matrix Factorization (MF) is a traditional


π/20 π/10 3π/20 π/5 π/4 3π/10 7π/20 2π/5 9π/20 π/2 Angle 3

0

0.2

0.4

0.6

0.8

1.0

Sim

ilarit

y

Fig. 6 Variation trend of J-Sim.

(a) Via cosine similarity

(b) Via J-Sim

Fig. 7 Visualization of clutering result.

method widely used in collaborative filtering andcan solve the problem to an extent. The Tuckerdecomposition method, as a conventional tensorcompletion method, is also utilized for comparison. ASupport Vector Machine (SVM), which can efficientlyperform non-linear classification, is also consideredas a competitor. Furthermore, in order to validate the

superiority of the merged tensors, the prediction methodbased only on a global tensor, g-MOOP, is implementedfor comparison as well. In other words, there are 5different methods are put into assessment.

As Table 1 shows, MOOP outperforms othermethods. The reason that all results in the selectinglist are higher than the drop-out list is that we supposethe students who finish courses also select the samecourses, and the drop-out list results are considered asan index.

Overall speaking, MF has the worst performance.Since a matrix can only simultaneously process twodimensions of data, namely students and courses, somefeatures of the data, such as the interactions withinstudents or courses, can be lost. Furthermore, themethod cannot deeply explore the connection betweentwo dimensions because of limitations of the algorithm.Consequently, it merely achieves 69.5% on selectingand 65.6% on drop-out. Other methods, such as Tuckerand g-MOOP, only contain one more dimension, andgain an improvement of 11.8% and 15.3% on selecting,12.5% and 16.2% on drop-out. On one hand, this revealsthe superiority of the tensor structure; on the other hand,it shows that the improvement is limited since it fails todeeply explore the interactions.

The diversity between Tucker and g-MOOP mightlay in the core of the tensor completion algorithm. Asmentioned above, Tucker transforms the original tensorinto a core tensor with a corresponding matrix in eachdimension, while HaLRTC, used in g-MOOP, focuseson retaining the original tensor as far as the algorithmcan reach. Thus, the latter sacrifices efficiency to gainaccuracy, and gains an improvement of 3% over Tucker.

Although SVM has an advantage in features analysis,it is overmatched by MOOP by 6.1% on selecting and10.7% on drop-out, which can be attributed to thelimitations in data size.

4.3 Discussion

Based on the superiority of MOOP, educators canaccurately orientate students who might give up on thecourse halfway through. The results will contribute tothe following improvements.

Table 1 Evaluation of different methods about prediction.

F1-scoreMethod

MF Tucker g-MOOP SVM MOOPSelecting 0.695 0.777 0.801 0.853 0.905Drop-out 0.656 0.738 0.762 0.794 0.879


� Educators can provide some special educationalpattern for students with a high drop-out rate. Oncethese students login to the MOOC platform, the systemwould recommend those special courses to them, whichreflects the idea of teaching students in accordance oftheir aptitude.� When it comes to the situation where some

analysis is taking place about given course, the behaviorof students with a high drop-out rate might decrease theprecision. Since some of these students select the courseat random, especially the number selecting less than 3courses, this could lead to errors into the evaluation ofa course.� With more information acquired, educators could

analyze the reasons why these students have such a highdrop-out rate. The outcome will drive reforms of theeducation system to some extent.

These measures will eventually enhance the ability ofstudents, and also promote the quality of education.

5 Conclusion

In this paper, a MOOC prediction method based on thecombination of a global tensor and local tensor, solvedwith HaLRTC and optimized with PIO, is proposedto tackle the drop-out prediction task. Concretely,local tensors are clustered by J-Sim, which divides allcourses into several course groups. When estimating themissing drop-out data in a specific course, the globaltensor and other local tensors are import optimizationparameters to complete the local tensor that the selectedcourse belongs to. Experiment results not only validatethe superiority of our proposed method, but also implythe prospect of application to real-time MOOC drop-outprediction scenarios.

With regard to future work, there are several latentproblems which are worthy of exploring. Firstly, thesuperior experimental results of MOOP can mostlybe ascribed to limitations in the scale of MOOCdata. When it comes to a situation where morefeatures of data are provided, the problem of how tomaintain the superiority arises. Secondly, the tensorcompletion method, HaLRTC, used in this worksacrifices efficiency to enhance accuracy, which mightlead to high time consumption. We plan to propose anew algorithm to handle completing the tensor task,which will balance efficiency and accuracy. Thirdly,additional valuation metrics and datasets could beadopted to verify the performance of MOOP.

Acknowledgment

This work was partially supported by the National Projectof Educational Science Planning (No. ECA160409), theHunan Provincial Project of Educational Science Planning(No. XJK016QXX001), and the National Natural ScienceFoundation of China (Nos. 71690233 and 71331008).

References

[1] C. Alario-Hoyos, M. Perez-Sanagustın, C. D. Kloos, andP. J. Munoz-Merino, Recommendations for the design anddeployment of MOOCs: Insights about the MOOC digitaleducation of the future deployed in MirıadaX, in Proc.2nd Int. Conf. Technological Ecosystems for EnhancingMulticulturality, Salamanca, Spain, 2014, pp. 403–408.

[2] H. Khalil and M. Ebner, MOOCs completion ratesand possible methods to improve retention—A literaturereview, in Proc. World Conf. Educational Multimedia,Hypermedia and Telecommunications, Chesapeake, VA,USA, 2014, pp. 1236–1244.

[3] D. F. O. Onah, J. Sinclair, and R. Boyatt, Dropout ratesof massive open online courses: Behavioural patterns,in Proc. 6th Int. Conf. Education and New LearningTechnologies, Barcelona, Spain, 2014, pp. 5825–5834.

[4] J. Z. Qiu, J. Tang, T. X. Liu, J. Gong, C. H. Zhang, Q.Zhang, and Y. F. Xue, Modeling and predicting learningbehavior in MOOCs, in Proc. 9th ACM Int. Conf. WebSearch and Data Mining, San Francisco, CA, USA, 2016,pp. 93–102.

[5] G. L. Zhang, T. J. Anderson, M. W. Ohland, and B.R. Thorndyke, Identifying factors influencing engineeringstudent graduation: A longitudinal and cross-institutionalstudy, J. Eng. Educ., vol. 93, no. 4, pp. 313–320, 2004.

[6] G. Thomas, A. Henderson, and T. L. Goldfinch,The influence of university entry scores on studentperformance in engineering mechanics, in Proc.Australasian Association for Engineering EducationConf., Adelaide, Australia, 2009, pp. 980–985.

[7] M. Sheard, Hardiness commitment, gender, and agedifferentiate university academic performance, Br. J.Educ. Psychol., vol. 79, no. 1, pp. 189–204, 2009.

[8] B. F. French, J. C. Immekus, and W. C. Oakes, Anexamination of indicators of engineering students’ successand persistence, J. Eng. Educ., vol. 94, no. 4, pp. 419–425,2005.

[9] A. Y. Wang and M. H. Newlin, Predictors of web-studentperformance: The role of self-efficacy and reasons fortaking an on-line class, Comput. Hum. Behav., vol. 18, no.2, pp. 151–163, 2002.

[10] A. Parker, A study of variables that predict dropout fromdistance education, Int. J. Educ. Technol., vol. 1, no. 2, pp.1–10, 1999.

[11] N. Elouazizi, Point-of-view mining and cognitive presencein MOOCs: A (computational) linguistics perspective, inProc. 2014 Conf. Empirical Methods in Natural LanguageProcessing, Doha, Qatar, 2014, pp. 32–37.


[12] S. Moon, S. Potdar, and L. Martin, Identifying studentleaders from MOOC discussion forums through languageinfluence, in Proc. 2014 Conf. Empirical Methods inNatural Language Processing, Doha, Qatar, 2014, pp. 15–20.

[13] S. Crossley, D. S. McNamara, R. Baker, Y. Wang,L. Paquette, T. Barnes, and Y. Bergner, Language tocompletion: Success in an educational data mining massiveopen online class, in Proc. Int. Conf. Educational DataMining, Madrid, Spain, 2015.

[14] M. M. Wen, D. Y. Yang, and C. P. Rose, Sentiment analysisin MOOC discussion forums: What does it tell us? in Proc.of Educational Data Mining, London, UK, 2014.

[15] M. M. Wen, D. Y. Yang, and C. P. Rose, Linguisticreflections of student engagement in massive open onlinecourses, in Proc. 8th Int. AAAI Conf. Web and SocialMedia, Ann Arbor, MI, USA, 2014.

[16] D. Chaplot, E. Rhim, and J. Kim, Predicting studentattrition in MOOCs using sentiment analysis and neuralnetworks, in Proc. 17th Int. Conf. Artificial Intelligence inEducation, Madrid, Spain, 2015.

[17] J. Whitehill, K. Mohan, D. Seaton, Y. Rosen, and D.Tingley, Delving deeper into MOOC student dropoutprediction, arXiv preprint arXiv:1702.06404, 2017.

[18] T. Sinha, P. Jermann, N. Li, and P. Dillenbourg, Yourclick decides your fate: Inferring information processingand attrition behavior from MOOC video clickstreaminteractions, in Proc. 2014 Conf. on Empirical Methods inNatural Language Processing, Doha, Qatar, 2014.

[19] T. Sinha, N. Li, P. Jermann, and P. Dillenbourg, Capturing“attrition intensifying” structural traits from didacticinteraction sequences of MOOC learners, in Proc. 2014Conf. Empirical Methods in Natural Language Processing,Doha, Qatar, 2014, pp. 42–49.

[20] M. Kloft, F. Stiehler, Z. L. Zheng, and N. Pinkwart,Predicting MOOC dropout over weeks using machinelearning methods, in Proc. 2014 Conf. Empirical Methodsin Natural Language Processing, Doha, Qatar, 2014, pp.60–65.

[21] S. Nagrecha, J. Z. Dillon, and N. V. Chawla, MOOCdropout prediction: Lessons learned from making pipelinesinterpretable, in Proc. 26th Int. Conf. World Wide WebCompanion, Perth, Australia, 2017, pp. 351–359.

[22] G. K. Balakrishnan and D. Coetzee, Predicting StudentRetention in Massive Open Online Courses Using HiddenMarkov Models, University of California, Berkeley, CA,USA, Tech. Rep. No. UCB/EEC5-2013-109, 2013.

[23] R. F. Kizilcec and S. Halawa, Attrition and achievementgaps in online learning, in Proc. 2nd ACM Conf. Learning,Vancouver, Canada, 2015, pp. 57–66.

[24] C. Robinson, M. Yeomans, J. Reich, C. Hulleman, andH. Gehlbach, Forecasting student achievement in MOOCswith natural language processing, in Proc. 6th Int. Conf.Learning Analytics & Knowledge, Edinburgh, UK, 2016,pp. 383–387.

[25] C. Taylor, K. Veeramachaneni, and U. M. O’Reilly, Likelyto stop? Predicting stopout in massive open online courses,arXiv preprint arXiv:1408.3382, 2014.

[26] C. Ye and G. Biswas, Early prediction of student dropoutand performance in MOOCSs using higher granularitytemporal information, J. Learn. Anal., vol. 1, no. 3, pp.169–172, 2014.

[27] E. Acar, D. M. Dunlavy, T. G. Kolda, and M.Mørup, Scalable tensor factorizations for incomplete data,Chemom. Intell. Lab. Syst., vol. 106, no. 1, pp. 41–56,2011.

[28] T. G. Kolda and B. W. Bader, Tensor decompositions andapplications, SIAM Rev., vol. 51, no. 3, pp. 455–500, 2009.

[29] J. Liu, P. Musialski, P. Wonka, and J. P. Ye, Tensorcompletion for estimating missing values in visual data,IEEE Trans. Pattern Anal. Mach. Intell., vol. 35, no. 1, pp.208–220, 2013.

[30] H. B. Duan and P. X. Qiao, Pigeon-inspired optimization:A new swarm intelligence optimizer for air robot pathplanning, Int. J. Intell. Comput. Cybern., vol. 7, no. 1, pp.24–37, 2014.

Jinzhi Liao received the BS degreefrom National University of DefenseTechnology (NUDT), China, in 2016. Heis currently working toward the MS degreeat NUDT. His research interests includeknowledge graph and tensor completions.

Jiuyang Tang received the PhD degreefrom National University of DefenseTechnology (NUDT), China, in 2006. Heis currently a professor with NUDT. Hisresearch interests include knowledge graphand text analytics.

Xiang Zhao received the PhD degreefrom University of New South Wales,Australia, in 2014. He is currently anassistant professor at National Universityof Defense Technology, China. He haspublished several refereed papers ininternational conferences and journals,such as SIGMOD, SIGIR, VLDB Journal,

and IEEE TKDE. His research interests include graph datamanagement and knowledge graph construction.

course drop-out prediction on mooc platform via clustering...

Documents