classroom fraud case study

Statistics Assignment 4

Classroom Fraud

Group 8

Pia Bakshi

Shruti Shukla

Srilakshmi Anumolu

Vikas Vimal

IntroductionThe question at hand implores the reader to delve into the behavioral and the

statistical germination and implications of a fraudulent disposition of the

teacher. We have been asked to assess the two given data sets of two classes

and determine if any of the two poses as an apparent fraud collection of

information.

ProcedureThere are 22 students in classroom A and B who were expected to answer 44

multiple choice questions. The correct answers are symbolized by the use of

alphabets corresponding to the correct answer and the wrong answer is

represented by the values 1, 2, 3 or 4 corresponding to the wrong options a, b ,

c and d, respectively. 0 symbolizes an unanswered question.

Before proceeding with the explanation, it is pertinent to communicate the

underlying assumption of our argument. We have assumed that the pattern of

answering the questions (No of right answers by a student) follows a normal

distribution. However, any fraudulent activity would make it appear otherwise.

As mentioned earlier, we have adopted two collaborative methods to arrive at a

conclusion-

1. Two Statistical approaches

- On the basis of the number of answers the students answered correctly

- On the basis of the number of students who got the correct answers

2. Behavioral approach

Through the mean and the standard deviation of the entire data, we can

ascertain the expected probability of an answer being correct, in the first case

scenario and the student being correct, in the second case.

On the basis of the number of answers the students got correctly:We have divided the total number of correct answers by each student into

ranges of 3 (3-6, 6-9, 9-12 and so on and so forth; where 3 symbolizes the lower

limit and 6 the upper limit). Next, we found the probability of a student having

answers between the respective ranges if it’s a normal distribution (p).

To find the goodness of the fit of the probability distribution of the student

giving the correct answer in his/her respective range (Chi-square test) with

respect to normal distribution, we found the expected probability (np) by

multiplying the number of students (22) with the above mentioned ’p’, where Sn

is the number of students falling in a range.

Χ2=∑(Sn−np)

2

np

DOF= 7

X2 (Upper limit @95% confidence) = 16.01

X2 (Lower limit @95% confidence) = 1.69

Observations:

Classroom A:

DOF= 7



X2 obtained from the data for Classroom B is 8.49. The expected range of X2 is

1.69 to 17.01. A value falling in this range corresponds with a value falling in the

95% confidence zone, thus we can reject the hypothesis in our case that the

data is fraudulent.

upper limitstndardisep(z) lower limitstandardisep(z)2 Column3 p of box np Column4 Sn Column5 (Sn-np)^2/np9 1.976- 0.024 6 2.504- 0.006 0.018 0.395 1 0.929

12 1.448- 0.074 9 1.976- 0.024 0.050 1.094 1 0.008 15 0.920- 0.179 12 1.448- 0.074 0.105 2.309 3 0.206 18 0.392- 0.348 15 0.920- 0.179 0.169 3.712 2 0.790 21 0.136 0.554 18 0.392- 0.348 0.207 4.545 1 2.765 24 0.664 0.747 21 0.136 0.554 0.193 4.237 6 0.734 27 1.192 0.883 24 0.664 0.747 0.137 3.008 6 2.977 30 1.720 0.957 27 1.192 0.883 0.074 1.626 2 0.086

22 X2 8.49504

mean 20.23 stddev 5.681 n 22

As is visible from the graph, thus obtained, the expected values follow a normal

distribution pattern .On the other hand, the obtained values from the data set

show a positive disparity (higher blue bar than red) indicating a higher number

of correct answers in a given range when compared to the expected value.

This leads us to believe that there is a small possibility of this data being

fraudulent.

1 2 3 4 5 6 7 8 -

1.000

2.000

3.000

4.000

5.000

6.000

7.000

npSn

Classroom B:

DOF= 7



X2 obtained from the data for Classroom B is 2.75. The expected range of X2 is

1.69 to 17.01. A value falling in this range corresponds with a value falling in the

95% confidence zone, thus we can reject the hypothesis in our case that the

data is fraudulent.

upper limitstndardisep(z) lower limitstandardisep(z)2 Column3 p of box np Column4 Sn Column5 (Sn-np)^2/np9 1.566- 0.059 6 2.171- 0.015 0.044 0.786 1 0.059

12 0.962- 0.168 9 1.566- 0.059 0.109 1.968 2 0.001 15 0.358- 0.360 12 0.962- 0.168 0.192 3.459 3 0.061 18 0.246 0.597 15 0.358- 0.360 0.237 4.267 5 0.126 21 0.850 0.802 18 0.246 0.597 0.205 3.694 4 0.025 24 1.455 0.927 21 0.850 0.802 0.125 2.244 1 0.690 27 2.059 0.980 24 1.455 0.927 0.053 0.956 1 0.002 30 2.663 0.996 27 2.059 0.980 0.016 0.286 1 1.783

18 X2 2.74602

mean 16.778 stddev 4.965 n 18

1 2 3 4 5 6 7 8 -

1.000

2.000

3.000

4.000

5.000

6.000

npSn


distribution pattern. On the other hand, the obtained values from the data set

show a similarity with the expected normal distribution (red bars) and there is

an acute lack of disparity (the difference between the blue bars and the red

bars) suggesting the lack of the possibility of fraudulent manipulation of the

data.

On the basis of the number of students who got the correct answersWe have divided the number of students into ranges of 3 (0-3, 3-6, 6-9, 9-12

and so on and so forth; where 0 symbolizes the lower limit and 3 the upper limit)

on the basis of answering the questions.

Next, we found the probability of answerability by a certain number of students.

We broke down the data in boxes of three students, i.e., the number of

questions solved by 1,2 or 3 students falls in the range of 0-3, those solved by

4,5,or 6 students falls in the next box.

To find the final association between the number of students and the probability

of answerability, we performed the (Chi-square test) on the given data and the

expected data. We arrived at the expected data by analyzing the mean and std-

dev of the given data and assuming that it should follow a normal distribution.

We assessed the expected probability(np) by multiplying the number of

students(22) with the above mentioned ’p’, where Sn is the quantitative ability

of answering correctly, falling in a range.

Χ2=∑(Sn−np)

2

np

Observations:

Classroom A:

DOF= 6



The X2 obtained from the data is 3776.12.

The expected range of X2 is 1.69 to 17.01. A value falling outside this range

corresponds with a value falling outside the 95% confidence zone, thus we

accept that our data for classroom A is fraudulent.

Classroom A Actual Expected

No. of questions solved Sn upper limit lower limit p np (Sn-np)^2/np

no of questions solved by 0 - 3 students 4 3 0 0.11 5.06 0.22




no of questions solved by 12 - 15 student 11 15 12 0.02 1.06 93.53


no of questions solved by 18 - 21 student 2 21 18 0.00 0.00 2,904.00

n 44

X2 3,776

mean 6.29

Std Dev 2.93

0-3 3-6 6-9 9-12 12-15 15-18 18-21 -

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

18.00

Expected(np) Actual(Sn)


distribution pattern with a positive skew.

On the other hand, the obtained values from the data set show a negatively

skewed disparity (higher blue bar than green) indicating increased ability to

answer correctly in a given range when compared to the expected value. This

leads us to believe that there is a possibility of this data being fraudulent.

Classroom B-

X2 obtained from the data for Classroom B is 5.49.

DOF= 6



A value falling in this range corresponds with a value falling in the 95%

confidence zone, thus we cannot reject the hypothesis in our case that the data

is fraudulent.

0-3 3-6 6-9 9-12 12-15 15-18 18-21 -

2.00

4.00

6.00

8.00

10.00

12.00

14.00

16.00

Expected(np) Actual(Sn)


distribution pattern with a negative skew.

On the other hand, the obtained values from the data set show a similar

negative skew (higher green bar than blue). This leads us to believe that there is

no statistically significant possibility of this data being fraudulent.

Classroom B Actual(Sn) Expected(np)

No. of questions solved Sn upper limit lower limit p np (Sn-np)^2/np


no of questions solved by 3 - 6 students 14 6 3 0.23 10.0

0 1.60

no of questions solved by 6 - 9 students 7 9 6 0.24 10.3

6 1.09





n 44

X2 3,776

mean 6.29

Std Dev 2.93

Behavioral ApproachWe are basing our deduction on a simple premise- that the intention of the

examination is to reflect the class’s academic prowess in the subject and, by

corollary, the teacher’s ability and efficiency.

If we approach the issue at hand sans statistical or quantitative dispositions and

venture into the psychological or behavioral connotations of the same, we arrive

at a simple logical argument- Poor performance in an examination by the class

reflects poorly on the teacher’s efficiency and effectiveness, too.

The question clearly states that there is definite fraudulent activity in the

examination conducted, either in the case of Classroom A or B. If at all, the

teacher was to manipulate the results, she/he would maneuver them in the

direction that is most favorable to her/him- that is an improved class

performance.

In light of that, she/he would increase the cumulative results of her/his class and

as indicated in the graphs comparing actual performance with expected

performance, the results of A seem tampered.

Also, it is imperative to note that increased class performance (in terms of

marks/student output alludes to efficient performance on the part of the

teacher. Through a cumulative and culminating study of the data(based on the

supposition of the existence of Normal Distribution) it is possible to assume that

there is a chance of fraudulent activity and the study of data, hence obtained

makes us believe that this manipulation presented itself in case of Classroom A.

classroom fraud case study

Documents