[ieee proceedings of the iti 2009 31st international conference on information technology interfaces...

6
Prediction of Academic Performance Using Discriminant Analysis Blaženka Divjak, Dijana Oreški University of Zagreb, Faculty of Organization and Informatics, Pavlinska 2, Varaždin [email protected], [email protected] Abstract. In this paper, discriminant analysis is used as a means of analysing the effect of 30 variables upon the dependent variable Student success at the Faculty of Organization and Informatics, University of Zagreb. The data were collected by a questionnaire administered on two occasions: in the academic year 2006/07, among second-year students of the undergraduate study programme Information and Business Systems, and in the academic year 2008/09, among third- and fourth-year students of the undergraduate study programme Information Systems. This research is aimed at determining predictor variables for student success at the Faculty of Organization and Informatics. The number of students included in the first and second questionnaire administration was 110 and 113, respectively. Keywords. Academic performance, discriminant analysis, motivation styles 1. Introduction The problem of prediction of student success is very important from the practical point of view since students’ achievement may not only influence the admission process and criteria but also individual student’s attitudes toward studying. There have been many studies of the problem of prediction of academic performance in educational institutions, particularly that in higher education. Most of them use only hard data such as the overall grade average from previous schooling, admission test scores, work experience, age, gender, etc. [5]. Sometimes they include, goal statements, references, personal interview and others [6]. To what extent and which personality traits predict academic performance was investigated in two longitudinal studies of two British university samples [2]. The results of this study support the hypothesis that personality is significantly related to academic performance. In this research, however, we also do not focus only on ‘hard’ data taken from students’ files but also take into account various ‘soft’ descriptors. Traditionally, researchers have used statistical models (e.g., discriminant analysis, multiple regression, stepwise regression) or neural networks to predict student success in a university study programme. The contemporary literature (see [5] for example) has shown that when the results obtained by a discriminant analysis were compared with those of neural networks in terms of the parameter academic performance on a categorical scale, no statistically significant differences were found. That was the main reason we use traditional statistical methods in our research. 2. Data The data were collected by a questionnaire administered on two occasions: in the academic year 2006/07, among second-year students of the undergraduate study programme Information and Business Systems, and in the academic year 2008/09, among third and fourth-year students of the undergraduate study programme Information Systems. The first questionnaire was conducted on the course level and comprised 34 variables, 31 of which refer to the mode of instruction in the course. The questionnaire consisted of statements, so the respondents were supposed to estimate the degree to which they agree, that is, disagree with a given statement. In 25 variables a Likert scale ranging 1-5 (1 – I totally disagree, 2 – I disagree, 3 – I neither agree nor disagree, 4 – I agree, 5 – I fully agree ) was used. For the three sociological variables (gender, year of enrolment, population of the place of residence) a Likert scale was not used. The variables were divided into 7 categories: Course difficulty, Course content, Communication within the course, IT tools and literature, Mode of instruction, Learning support, Others. The second research was conducted in the academic year 2008/09 on the level of the study programme. In the survey 113 questionnaires were collected. The questionnaire comprised 36 225 Proceedings of the ITI 2009 31 st Int. Conf. on Information Technology Interfaces, June 22-25, 2009, Cavtat, Croatia

Upload: dijana

Post on 12-Dec-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

Prediction of Academic Performance Using Discriminant Analysis

Blaženka Divjak, Dijana Oreški University of Zagreb, Faculty of Organization and Informatics, Pavlinska 2, Varaždin

[email protected], [email protected]

Abstract. In this paper, discriminant analysis is used as a means of analysing the effect of 30 variables upon the dependent variable Student success at the Faculty of Organization and Informatics, University of Zagreb. The data were collected by a questionnaire administered on two occasions: in the academic year 2006/07, among second-year students of the undergraduate study programme Information and Business Systems, and in the academic year 2008/09, among third- and fourth-year students of the undergraduate study programme Information Systems. This research is aimed at determining predictor variables for student success at the Faculty of Organization and Informatics. The number of students included in the first and second questionnaire administration was 110 and 113, respectively. Keywords. Academic performance, discriminant analysis, motivation styles 1. Introduction The problem of prediction of student success is very important from the practical point of view since students’ achievement may not only influence the admission process and criteria but also individual student’s attitudes toward studying. There have been many studies of the problem of prediction of academic performance in educational institutions, particularly that in higher education. Most of them use only hard data such as the overall grade average from previous schooling, admission test scores, work experience, age, gender, etc. [5]. Sometimes they include, goal statements, references, personal interview and others [6]. To what extent and which personality traits predict academic performance was investigated in two longitudinal studies of two British university samples [2]. The results of this study support the hypothesis that personality is significantly related to academic performance. In this research, however, we also do not focus only on ‘hard’ data taken from students’

files but also take into account various ‘soft’ descriptors.

Traditionally, researchers have used statistical models (e.g., discriminant analysis, multiple regression, stepwise regression) or neural networks to predict student success in a university study programme. The contemporary literature (see [5] for example) has shown that when the results obtained by a discriminant analysis were compared with those of neural networks in terms of the parameter academic performance on a categorical scale, no statistically significant differences were found. That was the main reason we use traditional statistical methods in our research. 2. Data

The data were collected by a questionnaire administered on two occasions: in the academic year 2006/07, among second-year students of the undergraduate study programme Information and Business Systems, and in the academic year 2008/09, among third and fourth-year students of the undergraduate study programme Information Systems.

The first questionnaire was conducted on the course level and comprised 34 variables, 31 of which refer to the mode of instruction in the course. The questionnaire consisted of statements, so the respondents were supposed to estimate the degree to which they agree, that is, disagree with a given statement. In 25 variables a Likert scale ranging 1-5 (1 – I totally disagree, 2 – I disagree, 3 – I neither agree nor disagree, 4 – I agree, 5 – I fully agree ) was used. For the three sociological variables (gender, year of enrolment, population of the place of residence) a Likert scale was not used. The variables were divided into 7 categories: Course difficulty, Course content, Communication within the course, IT tools and literature, Mode of instruction, Learning support, Others. The second research was conducted in the academic year 2008/09 on the level of the study programme. In the survey 113 questionnaires were collected. The questionnaire comprised 36

225Proceedings of the ITI 2009 31st Int. Conf. on Information Technology Interfaces, June 22-25, 2009, Cavtat, Croatia

Page 2: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

variables. The included variables referred to: student effort (22), secondary school (3), personality (4), achievement motivation (1), self-reported learning style (1), studying success (4) and gender (1). The questionnaire consisted of statements, so the respondents were supposed to estimate the degree to which they agree, that is, disagree with a given statement. In 18 variables a Likert scale was used, whereas in 17 variables concrete answers to the questions were offered and in one variable an open question was included to which the respondents had to provide their own answer.

3. Research and hypotheses 3.1. First - pilot research

Some of the results of the statistical analysis of the data gathered in the first research are described in [4] and here we will present some new results on the same data concerning discriminant analysis. The results of the research [4] into factors determining student success in Mathematics courses at the Faculty of Organization and Informatics indicate that: Communication between students and teachers/Teaching materials (literature), Teaching methodology, Student involvement, Relevance of the course for the overall studies are key factors determining student success in Mathematics courses at the Faculty of Organization and Informatics.

We collected 110 questionaries and total number of students was 132. A preliminary discriminant analysis was conducted using these data. The students were divided into two groups: those who had passed the course Selected Chapters in Mathematics through mid-term tests (56 students) and those who had not passed the course Selected Chapters in Mathematics through mid-term tests (44 students). Upon conducting the discriminant analysis, variables were established that contribute to discrimination of students who had passed the course through mid-term tests and those who had not passed the course through mid-term tests. The seven variables are as follows:

- I find mid-term tests useful; - I find compulsory homework

assignments useful; - Number of weekly hours spent by the

student preparing for the course; - Lectures and seminars were related;

- Involvement in a project within the course was interesting and useful;

- Seminars were interesting and useful;

- I find the compulsory attendance in lectures and seminars useful.

We used this pilot research together with literature review as a basis for constructing the second questionnaire.

3.2. Second research

The analysis of the data gathered in the second questionnaire administration shows that for 54% of the students the motivation for their effort arises from the ambition to graduate from college, find employment and start earning an income as soon as possible (goal-oriented). Next, 19% of the students responded that they are motivated by learning and investing into their own development (learning-oriented), whereas 21% of the students are motivated by socializing during their studies at the Faculty (relationship-oriented). These results are completely in line with previous research. Similarly, authors of [8], who are also doing research at the Faculty of Organization and Informatics, conclude that the dominant motivational style is the goal-oriented one (51.106%). The second- and third-ranking motivational styles are relationship- and learning-oriented ones, with a similar proportion (24.761% and 24.133%, respectively). The motivation for enrolling the faculty and motivational styles are analyzed with a view to adapting the instruction as much as possible to individual needs [8]. There were 219 students at this level and the size of the sample was N=113, which is in compliance with the recommended size of the sample, which should ideally be four times larger than the number of independent variables, in this case, 30. This research does not include unsuccessful students, that is, the ones who dropped out of college or did not manage to enrol in at least the third year of their studies. Finally, there was no overlaping between groups in the first and the second research, since students belong to different study programs. A two-group discriminant analysis was applied, with the first group consisting of more successful students at the Faculty of Organization and Informatics, and the second group consisting of

226

Page 3: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

less successful students at the Faculty of Organization and Informatics. On the basis of variables Year of enrolment at the Faculty, Current grade average and Number of exams to be passed, the students were divided into two groups: the more successful students and the less successful ones. By means of formula (1/((Number_of_years_of_studying+corrective)/average_length_of_studying))+(student’s_grade_average /grade_average_of_the_studies)) the index of success was calculated for each student. In the formula, to the value Number of years of studying the corrective value obtained on the basis of the variable Number of exams to be passed was added. The corrective value attributed to students who have not got any exams to pass is 0.2 (i.e., it is assumed that they will continue studying for another 0.2 year), whereas the corrective value for students who have 1-4 exams to pass is 0.5 (i.e., it is assumed that they will continue studying for another 0.5 year). In addition, the corrective value attributed to students who have 5-8 exams to pass is 0.9 (i.e., it is assumed that they will continue studying for another 0.9 year), for students who have 9-12 exams to pass the corrective value is 1.3 (i.e., it is assumed that they will continue studying for another 1.3 year), whereas for students who have 13 or more exams to pass the corrective value equals 1.8. It is to be noted that the above-mentioned corrective values are estimated values not based on research or the existing data on this matter. Furthermore, the average length of studying of full-time students of the undergraduate study programme Information Systems at the Faculty of Organization and Informatics is 5.7 years, whereas the grade average of students of the undergraduate study programme Information Systems at the Faculty is 3.43. The values obtained by means of the above-mentioned formula are in range between 2.74 and 1.38. On the basis of these values students were ranked in accordance with their study success. The limit was set at 2.02, that is, students whose index exceeded or was equal to 2.02 were included in the group of more successful students, whereas students whose index was lower than 2.02 were included in the group of less successful students. Upon testing several possibilities with a higher and lower limit value, it was established that after 2.02 indexes tend to diminish in accordance with a slight linear trend.

Therefore the limit was set at this particular value, as it makes ‘most sense’. Out of the total of 113 items in the sample, 55 (or 48.67% of the total sample) fall into the group of more successful students, whereas 58 (or 51.33% of the total sample) fall into the other group, the one made up by less successful students. In other words, although the first group is smaller in size than the second, this difference is slight, which makes the application of discriminant analysis possible. Furthermore, it is to be noted that such a ratio is realistic. The dependent variable represents a true dichotomy, that is, the groups are mutually exclusive, so one item can only belong to one group. Discriminant analysis is aimed at determining how independent variables discriminate among the members of the two groups, the more successful students and the less successful ones. Two hypotheses, H1 and H2, were set up: H1: Previous education, regular active class attendance, lectures and motivation for the studies will show discriminant validity in the prediction of academic performance (studying success). H2: Student’s gender will not show discriminant validity in the prediction of academic performance. 4. Results and discussion One of the assumptions for the application of discriminant analysis refers to the existence of multicolinearity between independent variables. Prior to conducting the discriminant analysis, correlation between independent variables had been obtained. The matrix of average correlations within groups was calculated, the results of which do not indicate the existence of multicolinearity (i.e., all the correlation coefficients are less than 0.6). In the first stage of the discriminant analysis the extent to which independent variables are capable of discriminating between groups was established. For that purpose a forward stepwise method was used, in which variables are gradually added to the model until satisfactory criteria have been met. At each particular step variables with the highest F value are selected (higher than the specified F to enter value) for inclusion in the model [7]. The procedure ends when the F to enter value for no variable is higher than the specified F to enter value. In the

227

Page 4: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

analysis the following F values were used: F to enter value = 2 and F-remove value = 1. Conclusions about the significance of the model were made on the basis of the values of Wilks` lambda, which is commonly used in discriminant analysis [7]. The discriminant function was significant as Wilks` lambda was 0.71 (p-value < 0.0001). Its value (0.71) indicates the existence of differences between groups, that is, the fact that two groups have different means. The model is constituted by the eight variables that have a significant effect upon discrimination between groups [3]. On the basis of the values of Wilks` lambda for particular variables, the degree to which each individual variable contributes to discrimination between groups was determined. This discrimination is thus accounted for by the following variables: first grade obtained at the Faculty, students’ time management and goal-centredness, admission exam score, degree of students’ personal responsibility and their interest in this Faculty, preparation for classes and in-class activity, and learning style. The remaining variables (22) were not included in the model. The conducted discriminant analysis is a two-group analysis, so one discriminant function was performed. Owing to this function, it is possible to discriminate between the more successful students at the Faculty of Organization and Informatics and the less successful ones. Table 1. shows the evaluation of the function according to the means of canonical variables. From Table 1. it is evident that the less successful students contribute to the canonical function to a higher degree.

Table 1. Means of canonical variables

Canonical variables means

Less successful students 0.61

More successful students -0.64

The Table 2. shows the significance of the discriminant function. Wilks` lambda (0.71) indicates the existence of a difference between groups, that is, the effect of the variables of the model upon the discrimination between groups. The following values are shown in the Table 2.: - eigenvalue of the discriminant function, which equals 0.40 and shows the significance of dimensions in independent variable classification

- canonical correlation, which equals 0.53, on the basis of which it can be concluded that there is a correlation between the discriminant function and the two groups - chi-square test for the canonical function, which equals 35.93.

. Table 2. Chi-square test

Eigenvalue 0.40

Canonical R 0.53

Wilks` Lambda 0.71

Chi-square 35.93

Df 8

p - level 0.000018

Structure coefficients show a relative strength of discriminant variables, and their values are shown in the third column of the Table 3.

Table 3. Standardized and structure coefficients

Standardized coefficients

Structure coefficients

First grade obtained at the Faculty

-0.61 -0.56

Good time management and ability to allocate time for studying

-0.40 -0.43

Admission exam score -0.34 -0.51

Goal-centredness 0.52 0.20

Personal responsibility -0.34 -0.14

Interest in this Faculty -0.34 -0.23

Learning style -0.32 -0.13

Preparation for classes

and in-class activity

-0.28 -0.16

228

Page 5: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

The Table 3. also shows the values of standardized coefficients. Standardized coefficients are used for evaluating the unique contribution of the independent variable to the discriminant function [3]. Based on the structure coefficients value, it can be concluded that student success at the Faculty of Organization and Informatics is mainly determined by intense goal-centredness. The lack of student success at the Faculty of Organization and Informatics is mainly determined by variables: Admission exam score, Personal responsibility, First grade obtained at the Faculty, Good time management and ability to allocate time for studying, Learning style, and Preparation for classes and in-class activity. In other words, the less successful students at the Faculty of Organization and Informatics, who take it longer to graduate, are the students who did not score too well in the admission exam; whose first grade obtained at the Faculty was not too high; who are less responsible; who are poor time managers and therefore unable to allocate time for studying. In addition, the less successful students do not prepare for classes and are not actively involved in classes. It is interesting that the variable Grades during secondary school has not been included in the model. On the other hand, there are researches in some other school systems that reported differently. For example in the paper [1] it was presented that variable Grades during secondary school has been accurate predictors of university performance, but in that research, probability of success was defined as the likelihood that a student would obtain a professional degree. It has been established that the eight above-mentioned variables which were included in the model make it possible to discriminate between the two groups of students. The following issue to be discussed is the classification of students. It has already been mentioned that the two groups were not of equal size, the larger group being the one comprising less successful students. It can be assumed that this ratio is in correspondence with the actual situation in the sample. Therefore the a priori probability in classification functions is proportional with the size of the groups (51:49). The classification matrix is shown in Table 5. The classification matrix clearly shows how students constituting the sample are distributed across groups. Here 76.36% students from the group of more successful students are well-distributed, whereas 74.14% of those who are

less successful are distributed in line with expectations. A total of 75.22 % students were distributed in line with the expected classification.

Possible limitations of the study come from the fact that we research on the students of informatics and maybe other groups will show different characteristics. Therefore, it would be interesting to perform research on different studies. Further, proportion of female students (72% of male and 28% of female) were less than male students and therefore results of confirmation of hypotesis H2 is not fully reliable. 5. Conclusion

At the beginning of the research two hypotheses were set up. On the basis of the results obtained in the analyses it can be concluded that previous education has a significant effect upon academic performance (studying success), which means that the first part of hypothesis H1 is confirmed. The second part of hypothesis H1 is also confirmed because the variable Preparation for classes and in-class activity was included in the model, which means that regular attendance and active class participation influence studying success. Hypothesis H1 was confirmed in the first (pilot) research as well. In hypothesis H2 it was assumed that gender does not have a significant effect upon student success. In the first case the variable Gender was not included in the model, whereas in the second case it was, which indicates that gender does influence studying success, although not significantly and there is a space for further research. Hypothesis H2 should therefore be considered undecided, since gender contributes to discrimination between more successful students and less successful to a certain extend. On the basis of results of the discriminant analysis it has been established statistically significant predictors of student academic performance at the Faculty of Organization and Informatics. The analysis of the initial set consisting of 30 variables showed that 8 among them significantly contribute to discrimination between the more successful students and the less successful ones. This refers to the following variables: Admission exam score, Personal responsibility, First grade obtained at the Faculty, Good time management and ability to allocate time for studying Learning style and Preparation for classes and in-class activity. To

229

Page 6: [IEEE Proceedings of the ITI 2009 31st International Conference on Information Technology Interfaces (ITI) - Cavtat/Dubrovnik, Croatia (2009.06.22-2009.06.25)] Proceedings of the ITI

a lesser degree, this also refers to variables Gender and Persistence in learning (If I don’t understand something after I have read it for the first time, I keep trying until I understand it completely). 6. References [1] Ardila A. Predictors of university academic

performance in Colombia. International Journal of Educational Research. 2001; 35:411–417.

[2] Chamorro-Premuzic T., Furnham A.,. Personality predicts academic performance: Evidence from two longitudinal university samples. Journal of Research in Personality. 2003; 37:319–338.

[3] Garson G. D. Discriminant Function Analysis, Statnotes: Topics in Multivariate

Analysis. 2008. http://www2.chass.ncsu.edu/garson/pa765/sta tnote.htm [28.09.2008.]

[4] Oreški D, Peharda P. Application of factor analysis in course evaluation. Proceedings of the ITI 2008 30th International Conference on Information Technology Interfaces. 551 -556.

[5] Paliwal M, Kumar U.A. A study of academic

performance of business school graduates using neural network and statistical techniques. Expert Systems with Applications. 2008. Article in Press doi:10.1016/j.eswa.2008.11.003.

[6] Pritchard M.E, Wilson G.S. Using Emotional and Social Factors to Predict Student Success.

Journal of College Student Development. 2003; 44(1):18-28.

[7] Rencher A.C. Methods of Multivariate Analysis. USA: J. Wiley & Sons; 2002. [8] Vida�ek–Hainš V, Divjak B, Ostroški M. Motivation for Studying and Gender Issue in

Motivation. Article in the review process

Appendix

Table 4. Variables in the model

Table 5. Classification matrix

Percent correct Less successful students More successful students

Less successful students 74.14 43 15

More successful students 76.36 13 42

Total 75.22 56 57

Wilks´

Lambda Partial

Lambda F-remove p - level Tolerance 1-Tolerance

(R2)

First grade obtained at the Faculty 0.79 0.91 10.67 0.01 0.88 0.12

Good time management and ability to allocate time for

studying 0.75 0.96 4.80 0.03 0.94 0.06

Admission exam score 0.74 0.97 3.09 0.08 0.89 0.11

Goal-centredness 0.76 0.94 6.91 0.01 0.80 0.20

Personal responsibility 0.73 0.97 2.79 0.10 0.81 0.19

Interest in this Faculty 0.74 0.97 3.03 0.08 0.87 0.13

Learning style 0.73 0.97 2.81 0.10 0.92 0.08 Preparation for classes and in-

class activity 0.73 0.98 2.17 0.14 0.90 0.10

230