classroom fraud case study
DESCRIPTION
The classroom fraud case study using the chi squared theory.TRANSCRIPT
Statistics Assignment 4
Classroom Fraud
Group 8
Pia Bakshi
Shruti Shukla
Srilakshmi Anumolu
Vikas Vimal
IntroductionThe question at hand implores the reader to delve into the behavioral and the
statistical germination and implications of a fraudulent disposition of the
teacher. We have been asked to assess the two given data sets of two classes
and determine if any of the two poses as an apparent fraud collection of
information.
ProcedureThere are 22 students in classroom A and B who were expected to answer 44
multiple choice questions. The correct answers are symbolized by the use of
alphabets corresponding to the correct answer and the wrong answer is
represented by the values 1, 2, 3 or 4 corresponding to the wrong options a, b ,
c and d, respectively. 0 symbolizes an unanswered question.
Before proceeding with the explanation, it is pertinent to communicate the
underlying assumption of our argument. We have assumed that the pattern of
answering the questions (No of right answers by a student) follows a normal
distribution. However, any fraudulent activity would make it appear otherwise.
As mentioned earlier, we have adopted two collaborative methods to arrive at a
conclusion-
1. Two Statistical approaches
- On the basis of the number of answers the students answered correctly
- On the basis of the number of students who got the correct answers
2. Behavioral approach
Through the mean and the standard deviation of the entire data, we can
ascertain the expected probability of an answer being correct, in the first case
scenario and the student being correct, in the second case.
On the basis of the number of answers the students got correctly:We have divided the total number of correct answers by each student into
ranges of 3 (3-6, 6-9, 9-12 and so on and so forth; where 3 symbolizes the lower
limit and 6 the upper limit). Next, we found the probability of a student having
answers between the respective ranges if it’s a normal distribution (p).
To find the goodness of the fit of the probability distribution of the student
giving the correct answer in his/her respective range (Chi-square test) with
respect to normal distribution, we found the expected probability (np) by
multiplying the number of students (22) with the above mentioned ’p’, where Sn
is the number of students falling in a range.
Χ2=∑(Sn−np)
2
np
DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69
Observations:
Classroom A:
DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69
X2 obtained from the data for Classroom B is 8.49. The expected range of X2 is
1.69 to 17.01. A value falling in this range corresponds with a value falling in the
95% confidence zone, thus we can reject the hypothesis in our case that the
data is fraudulent.
upper limitstndardisep(z) lower limitstandardisep(z)2 Column3 p of box np Column4 Sn Column5 (Sn-np)^2/np9 1.976- 0.024 6 2.504- 0.006 0.018 0.395 1 0.929
12 1.448- 0.074 9 1.976- 0.024 0.050 1.094 1 0.008 15 0.920- 0.179 12 1.448- 0.074 0.105 2.309 3 0.206 18 0.392- 0.348 15 0.920- 0.179 0.169 3.712 2 0.790 21 0.136 0.554 18 0.392- 0.348 0.207 4.545 1 2.765 24 0.664 0.747 21 0.136 0.554 0.193 4.237 6 0.734 27 1.192 0.883 24 0.664 0.747 0.137 3.008 6 2.977 30 1.720 0.957 27 1.192 0.883 0.074 1.626 2 0.086
22 X2 8.49504
mean 20.23 stddev 5.681 n 22
As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern .On the other hand, the obtained values from the data set
show a positive disparity (higher blue bar than red) indicating a higher number
of correct answers in a given range when compared to the expected value.
This leads us to believe that there is a small possibility of this data being
fraudulent.
1 2 3 4 5 6 7 8 -
1.000
2.000
3.000
4.000
5.000
6.000
7.000
npSn
Classroom B:
DOF= 7
X2 (Upper limit @95% confidence) = 16.01
X2 (Lower limit @95% confidence) = 1.69
X2 obtained from the data for Classroom B is 2.75. The expected range of X2 is
1.69 to 17.01. A value falling in this range corresponds with a value falling in the
95% confidence zone, thus we can reject the hypothesis in our case that the
data is fraudulent.
upper limitstndardisep(z) lower limitstandardisep(z)2 Column3 p of box np Column4 Sn Column5 (Sn-np)^2/np9 1.566- 0.059 6 2.171- 0.015 0.044 0.786 1 0.059
12 0.962- 0.168 9 1.566- 0.059 0.109 1.968 2 0.001 15 0.358- 0.360 12 0.962- 0.168 0.192 3.459 3 0.061 18 0.246 0.597 15 0.358- 0.360 0.237 4.267 5 0.126 21 0.850 0.802 18 0.246 0.597 0.205 3.694 4 0.025 24 1.455 0.927 21 0.850 0.802 0.125 2.244 1 0.690 27 2.059 0.980 24 1.455 0.927 0.053 0.956 1 0.002 30 2.663 0.996 27 2.059 0.980 0.016 0.286 1 1.783
18 X2 2.74602
mean 16.778 stddev 4.965 n 18
1 2 3 4 5 6 7 8 -
1.000
2.000
3.000
4.000
5.000
6.000
npSn
As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern. On the other hand, the obtained values from the data set
show a similarity with the expected normal distribution (red bars) and there is
an acute lack of disparity (the difference between the blue bars and the red
bars) suggesting the lack of the possibility of fraudulent manipulation of the
data.
On the basis of the number of students who got the correct answersWe have divided the number of students into ranges of 3 (0-3, 3-6, 6-9, 9-12
and so on and so forth; where 0 symbolizes the lower limit and 3 the upper limit)
on the basis of answering the questions.
Next, we found the probability of answerability by a certain number of students.
We broke down the data in boxes of three students, i.e., the number of
questions solved by 1,2 or 3 students falls in the range of 0-3, those solved by
4,5,or 6 students falls in the next box.
To find the final association between the number of students and the probability
of answerability, we performed the (Chi-square test) on the given data and the
expected data. We arrived at the expected data by analyzing the mean and std-
dev of the given data and assuming that it should follow a normal distribution.
We assessed the expected probability(np) by multiplying the number of
students(22) with the above mentioned ’p’, where Sn is the quantitative ability
of answering correctly, falling in a range.
Χ2=∑(Sn−np)
2
np
Observations:
Classroom A:
DOF= 6
X2 (Upper limit @95% confidence) = 14.449
X2 (Lower limit @95% confidence) = 1.237
The X2 obtained from the data is 3776.12.
The expected range of X2 is 1.69 to 17.01. A value falling outside this range
corresponds with a value falling outside the 95% confidence zone, thus we
accept that our data for classroom A is fraudulent.
Classroom A Actual Expected
No. of questions solved Sn upper limit lower limit p np (Sn-np)^2/np
no of questions solved by 0 - 3 students 4 3 0 0.11 5.06 0.22
no of questions solved by 3 - 6 students 5 6 3 0.33 14.53 6.25
no of questions solved by 6 - 9 students 7 9 6 0.36 15.93 5.00
no of questions solved by 9 - 12 students 8 12 9 0.15 6.66 0.27
no of questions solved by 12 - 15 student 11 15 12 0.02 1.06 93.53
no of questions solved by 15 - 18 student 7 18 15 0.00 0.06 766.85
no of questions solved by 18 - 21 student 2 21 18 0.00 0.00 2,904.00
n 44
X2 3,776
mean 6.29
Std Dev 2.93
0-3 3-6 6-9 9-12 12-15 15-18 18-21 -
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
18.00
Expected(np) Actual(Sn)
As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern with a positive skew.
On the other hand, the obtained values from the data set show a negatively
skewed disparity (higher blue bar than green) indicating increased ability to
answer correctly in a given range when compared to the expected value. This
leads us to believe that there is a possibility of this data being fraudulent.
Classroom B-
X2 obtained from the data for Classroom B is 5.49.
DOF= 6
X2 (Upper limit @95% confidence) = 14.449
X2 (Lower limit @95% confidence) = 1.237
A value falling in this range corresponds with a value falling in the 95%
confidence zone, thus we cannot reject the hypothesis in our case that the data
is fraudulent.
0-3 3-6 6-9 9-12 12-15 15-18 18-21 -
2.00
4.00
6.00
8.00
10.00
12.00
14.00
16.00
Expected(np) Actual(Sn)
As is visible from the graph, thus obtained, the expected values follow a normal
distribution pattern with a negative skew.
On the other hand, the obtained values from the data set show a similar
negative skew (higher green bar than blue). This leads us to believe that there is
no statistically significant possibility of this data being fraudulent.
Classroom B Actual(Sn) Expected(np)
No. of questions solved Sn upper limit lower limit p np (Sn-np)^2/np
no of questions solved by 0 - 3 students 6 3 0 0.15 6.67 0.07
no of questions solved by 3 - 6 students 14 6 3 0.23 10.0
0 1.60
no of questions solved by 6 - 9 students 7 9 6 0.24 10.3
6 1.09
no of questions solved by 9 - 12 students 10 12 9 0.17 7.41 0.90
no of questions solved by 12 - 15 student 6 15 12 0.08 3.66 1.49
no of questions solved by 15 - 18 student 1 18 15 0.03 1.25 0.05
no of questions solved by 18 - 21 student 0 21 18 0.01 0.29 0.29
n 44
X2 3,776
mean 6.29
Std Dev 2.93
Behavioral ApproachWe are basing our deduction on a simple premise- that the intention of the
examination is to reflect the class’s academic prowess in the subject and, by
corollary, the teacher’s ability and efficiency.
If we approach the issue at hand sans statistical or quantitative dispositions and
venture into the psychological or behavioral connotations of the same, we arrive
at a simple logical argument- Poor performance in an examination by the class
reflects poorly on the teacher’s efficiency and effectiveness, too.
The question clearly states that there is definite fraudulent activity in the
examination conducted, either in the case of Classroom A or B. If at all, the
teacher was to manipulate the results, she/he would maneuver them in the
direction that is most favorable to her/him- that is an improved class
performance.
In light of that, she/he would increase the cumulative results of her/his class and
as indicated in the graphs comparing actual performance with expected
performance, the results of A seem tampered.
Also, it is imperative to note that increased class performance (in terms of
marks/student output alludes to efficient performance on the part of the
teacher. Through a cumulative and culminating study of the data(based on the
supposition of the existence of Normal Distribution) it is possible to assume that
there is a chance of fraudulent activity and the study of data, hence obtained
makes us believe that this manipulation presented itself in case of Classroom A.