amaka_project_editted
Post on 19-Feb-2017
18 Views
Preview:
TRANSCRIPT
A SOCIAL MEDIA BASED STUDENT LEARNING EXPERIENCE ANALYZER
USING A TEXT MINING TECHNIQUE
BY
OFULUE AMAKA MARY
12CH014362
A PROJECT SUBMITTED TO THE DEPARTMENT OF COMPUTER AND
INFORMATION SCIECES IN THE COLLEGE OF SCIENCE AND
TECHNOLOGY, COVENANT UNIVERSITY, OTA.
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD
OF THE BACHELOR OF SCIENCE (B.Sc.) HONOURS DEGREE IN
MANAGEMENT INFORMATION SYSTEM.
MAY, 2016
CERTIFICATION
This is to certify that this project was carried out by OFULUE AMAKA (12CH014362)
in the Department of Computer and Information Sciences, College of Science and
Technology, Covenant University, Ota.
DR. OLAWANDE DARAMOLA
.................................................................
Main Supervisor Signature & Date
MR. AZUBUIKE EZENWOKE .................................................................
Co-Supervisor Signature & Date
DR. ARIYO ADEBIYI .................................................................
HOD, CIS Signature & Date
i
ii
DEDICATION
I dedicate this project to God almighty for his grace and live over my life, divine speed,
favour, health.
I also dedicate this project to my parents, siblings and relative for their financial support,
wise counsel and words of encouragement.
iii
ACKNOWLEDGEMENT
I genuinely appreciate God for his word concerning my life and his grace during the
course of this project. I am most grateful to my parents and relatives who showed their
sincere interest in the success of my project through their support, prayer and words of
encouragement.
I also appreciate Mr. Ezenwoke Azubuike, my project supervisor for his time, creative
ideas, and study materials which he readily made available for the success of this project.
Finally, I am thankful to Bishop David Oyedepo, the chancellor of Covenant University,
Dr. A.A Adebiyi, the Head of department of Computer and Information Sciences and
every lecturer in the department.
.
iv
TABLE OF CONTENTS
Certification..........................................................................................................................i
Dedication............................................................................................................................ii
Acknowledgement..............................................................................................................iii
List of Tables.....................................................................................................................vii
Table of Figures................................................................................................................viii
Abstract...............................................................................................................................ix
Chapter One: Introduction..................................................................................................1
1.1 Background Of The Study....................................................................................1
1.2 Statement of the Problem......................................................................................2
1.3 Aim and Objectives of the Study..........................................................................2
1.4 Research Methodology.........................................................................................3
1.5 Significance Of The Study....................................................................................3
1.6 Limitation Of The Study.......................................................................................4
1.7 Arrangement of Thesis..........................................................................................4
Chapter Two: Literature Review........................................................................................5
2.1 Introduction...........................................................................................................5
2.2 Overview Of Social Media...................................................................................5
2.3 Evolution of Social Media....................................................................................6
2.4 Overview of Popular Social Media Platforms......................................................6
2.5 Overview of Text Mining.....................................................................................8
2.5.1 Text Mining and Data Mining.......................................................................8
2.5.2 Areas of Text Mining.....................................................................................9
2.6 Text Mining Techniques.....................................................................................13
v
2.6.1 Naïve Bayes.................................................................................................13
2.6.2 Support Vector Machine..............................................................................16
2.6.3 K-Nearest Neighbour...................................................................................18
2.6.4 Decision Tree...............................................................................................19
2.6.5 Neural Network...........................................................................................23
2.7 Review of Existing System.................................................................................25
2.7.1 Mining student data to analyse learning behaviour.....................................26
2.7.2 Mining social media data to understand student learning experience.........26
2.7.3 Mining educational data to analyse student’ performance..........................27
Chapter Three: System Modeling and Design..................................................................29
3.1 Introduction.........................................................................................................29
3.2 System Requirement...........................................................................................29
3.3 System Design.....................................................................................................29
3.3.1 Unified modelling language........................................................................30
3.4 Overview Of Naïve Bayes Classifier..................................................................32
3.4.1 Posterior Probability....................................................................................32
3.4.2 Class-Conditional Probabilities...................................................................33
3.4.3 Prior Probabilities........................................................................................33
3.4.4 Multi-Variate Bernoulli Naïve Bayes..........................................................34
3.4.5 Multinomial Naïve Bayes............................................................................34
3.4.6 Performance of Multi-Variate Bernoulli and Multinomial Model..............35
3.4.7 Continuous Variables...................................................................................35
3.4.8 Eager and Lazy Learning Algorithms..........................................................36
3.4.9 The Bag of Word Model..............................................................................36
3.5 Text Pre-Processing............................................................................................37
vi
3.5.1 Tokenization................................................................................................37
3.5.2 Stop Words..................................................................................................37
3.5.3 Stemming and Lemmatizing........................................................................37
3.5.4 N-grams.......................................................................................................37
3.6 Workflow of the Project......................................................................................38
Chapter Four: System Implementation and Evaluation....................................................40
4.1 Introduction.........................................................................................................40
4.2 System Requirements..........................................................................................40
4.2.1 Hardware Requirements..............................................................................40
4.2.2 Software Requirements................................................................................41
4.3 Implementation Tools.........................................................................................41
4.4 System Modules and Interfaces..........................................................................42
4.4.1 Home Page...................................................................................................42
4.4.2 Result Page..................................................................................................43
4.4.3 Result and Interpretation..............................................................................44
4.4.4 Recommendation.........................................................................................44
4.5 Data Gathering Process.......................................................................................45
4.5.1 Train Dataset................................................................................................46
Chapter Five.......................................................................................................................51
Summary, Recommendation and Conclusion...................................................................51
5.1 Summary.............................................................................................................51
5.2 recommendation..................................................................................................51
5.3 Conclusion..........................................................................................................52
vii
LIST OF TABLES
Table 2.1 Popular social media platform.............................................................................6
Table 2.2 Train Dataset from Document...........................................................................14
Table 2.3 Frequency table for positive category...............................................................14
Table 2.4 Frequency table for negative category...............................................................15
Table 2.5 Weather dataset..................................................................................................20
Table 2.6 comparison of various classification methods based on artificial neural network
(adapted from Sasithra & saravanan, 2014)......................................................................24
Table 4.1 server side hardware requirements....................................................................40
Table 4.2 Client side hardware requirements....................................................................40
Table 4.3 Development Software Requirements...............................................................41
Table 4.4 Web Client Software Requirements..................................................................41
viii
TABLE OF FIGURES
Figure 2-1 seven practice areas of text mining..............................................................9
Figure 2-2 artificial neural network (adapted from Sasithra & saravanan, 2014).......23
Figure 2-3 Association rules graph for students with grade “fail” using Arviewer....26
Figure 2-4 number of tweets for each issue detected from the Purdue tweet collection
...........................................................................................................................................27
Figure 3-1 use case diagram of showing the action performed by management or
educators............................................................................................................................30
Figure 3-2 Activity diagram showing the flow of activities involved in analysing data
...........................................................................................................................................31
Figure 3-3 workflow of social media data integrated with qualitative analysis and data
mining algorithm...............................................................................................................37
Figure 4-1 homepage of application............................................................................42
Figure 4-2 a bar chart showing the category ratio of the learning experience of students
offering GST121................................................................................................................42
Figure 4-3 cross-section of comments used for classification and their corresponding
category..............................................................................................................................43
Figure 4-4 screenshot of Covenant University e-learning Moodle Homepage............44
Figure 4-5 screenshots of some comments posted by students on the discussion forum
...........................................................................................................................................45
ix
ABSTRACT
The quality of teaching and learning in any institution can be traced to the learning
experiences of student.
Traditional methods of evaluating student learning experiences have limitations such as
lack of flexibility, level of subjectivity and no measure to tell if respondents are being
truthful. However, students would share their worries, struggles and concern about their
learning experiences on informal channels such as Facebook, Twitter, discussion forum
etc.
Data available in such an environment is massive and requires automated means like text
mining techniques to provide important information on students’ experiences during their
learning process.
The aim of this research is to design and implement a forum-based student learning
experience analyzer using a text mining technique.
The system will help management or educators of Covenant University make decisions
that concern the performance of students.
x
1CHAPTER ONE: INTRODUCTION
1.1 BACKGROUND OF THE STUDY
Academic Ranking of World University (2015) shows that Harvard has held the top
position in the annual worldwide ranking of top universities since the list started. Other
institutions such as Stanford University, Princeton University, University of Cambridge,
Massachusetts Institute of Technology (MIT), University of California, to mention a few
have been ranked best because of an important attribute that they share which is, “the
quality of teaching and learning”.
“It was found that there is a significant correlation between the performance of students
and satisfaction with academic process and facilities provided by the institution”
(Karemera, 2003). The learning experiences of students during their course of study are
one of the pervasive sources of information about the quality of teaching and learning in
an institution. Feedback on the learning experiences of students is usually obtained from
formal method such as questionnaires. However, due to the need to express their selves,
students communicate their opinions on informal channels.
On various social media, students would usually share their worries, concerns,
excitement, happiness and struggle about their learning experiences (Pagare, 2014). In
particular, students express their selves on discussion forum. The volume of data
available in such environments is massive and requires automated means like text mining
techniques to provide valuable information on students’ experiences during their learning
process (Pagare, 2014).
“Text Mining is the process of discovering hidden and useful pattern from unstructured
text documents” (Ms. Priyanka Patel, 2015). Text mining is also known as knowledge
discovery in Text and some specific techniques for achieving this include, K-nearest
neighbour, Maximum Entropy, Neural Network, Decision tree, Support Vector Machine,
and Rocchio’s Naïve Bayes Multi-label Algorithm.
Employing text mining techniques to derive useful information from students’ informal
conversations on social media platforms would lead to a comparison graph that shows the
1
factors affecting the learning experiences of students offering a particular course as well
as recommended solutions to management of the institution on how to enhance the
quality of teaching and learning.
1.2 STATEMENT OF THE PROBLEM
Evaluation of student learning experiences is of interest to those who teach and are
accountable for the development and accreditation of courses.
Traditionally, methods such as surveys, focus groups, student evaluation of teaching
(SET) questionnaires have been used as instrument to evaluate the learning experiences
of students in other to understand the factors affecting their performance. However, these
methods possess the following limitations:
Flexibility: questionnaires are structured instrument and so, allow very little
flexibility.
Level of subjectivity: the opinions and feeling of respondent are most times not
acknowledged because the options in the questionnaire are pre-defined.
There is no way to tell if a respondent is being truthful.
From the above listed concerns, there is need to consider alternative methods for
evaluating student learning experiences based on massive user generated content
available on social media.
1.3 AIM AND OBJECTIVES OF THE STUDY
The aim of this study is to design and implement a forum-based student learning
experience analyzer using a Naïve Bayes classifier algorithm.
In order to attain the aim of designing a social media based student learning experience
analyzer, the following are objectives of this study:
To extract information that pertains to the educational life of students from an
informal electronic platform.
2
To preprocess extracted data in order to achieve relevant information needed for
the implementation process.
To model the system using UML diagrams.
To implement a student learning experience analyzer using a Naïve Bayes
classifier algorithm.
1.4 RESEARCH METHODOLOGY
Literature Reviews: Various articles, books, journals and research papers would
be studied. Review of existing project relevant to the project would also be
considered.
Data Collection: A discussion forum will be created on the Moodle platform (e-
learning management system, Covenant University) for students to post comment
about their learning experiences for a particular course.
Modeling: A simplified representation of the social media based student learning
experience analyzer would be done using Unified Modeling Language (UML)
diagrams.
Implementation: Considering the fact that the system is web-based,
implementation would be done using HTML, Scikit-Learn and python.
1.5 SIGNIFICANCE OF THE STUDY
This study is significant in solving the problem of improvement in the quality of teaching
and learning delivered to students and the performance of students in Covenant
University. The benefits of this study are:
To identify factors that affects the performance of students offering GST121.
To help management of Covenant University make informed decisions towards
improving quality of teaching and students’ performance.
To recommend solutions to the identified factors that affects student learning
experiences.
3
1.6 LIMITATION OF THE STUDY
This study is limited in scope to courses offered by freshmen of Covenant University.
1.7 ARRANGEMENT OF THESIS
This project report consists of five chapters.
Chapter 1: This chapter gives a detailed background study of the system, the aim and
objectives, the research methodology, the significance of study and limitations of the
study.
Chapter 2: This chapter contains extensive information on the project from existing
projects and reviews from journals and books.
Chapter 3: This chapter focuses on the system analysis and design. It contains all the
diagrammatic models that would help give a structure of the system, extensive
information about the classifier algorithm implemented and steps involved in data pre-
processing.
Chapter 4: This chapter contains detailed information of the implementation of the
system, such as the programming language used, screen shots and the dataset used for
system deployment.
Chapter 5: This chapter contains recommendations and concluding remark.
4
2CHAPTER TWO: LITERATURE REVIEW
2.1 INTRODUCTION
The quality of teaching and learning is a top priority for most institutions. Attaining such
goal is possible with the relevant information that pertains to the learning experiences of
students.
Social media sites are platform where people with similar interest share, upload and post
contents. Formal evaluation tools like questionnaires have several limitations, one of
which is lack of flexibility compared to an informal channel where students freely
express their selves.
Data on such channel is valuable to help make decisions on how to improve students’
achievement. To this end, there is need for a processing method or technique to evaluate
the learning experiences of students from such channel.
This chapter gives extensive information about key terms that are relevant to the research
study and existing systems in this field.
2.2 OVERVIEW OF SOCIAL MEDIA
The term “social media” is an internet-based platform that allows users to share and
create content, or join online communities. Classifications of social media include the
following:
Social Network Sites: these sites allow users to view a list of other users with
whom they share a connection. Examples of social network sites are; Facebook,
Linkedin, Friendster (Danah & Ellison, 2007)
Bookmarking Sites: are social software tools that allow users to submit, classify,
localise and share their bookmarked webpages to a hub site where they can be
tagged by other users. Bookmarking is the process that is used by people to
organise, arrange, maintain and reserve links on website pages. Examples of
bookmarking sites are; Delicious, Pinterest, Hacker news.
5
Social News Sites: are sites that provide users with quick access to a variety of
news articles. Articles and news from other websites are aggregated with the
social news sites which enable users to share content with other users as well as
participate with each other. Examples are; Digg, Slash dot, Newsvine, Mixx.
Media Sharing Sites: Media sharing sites provide services that allow users to
upload and share pictures and videos. Some of these services have social features
such as commenting, profiles, etc. Examples of media sharing sites are;
SlideShare, Youtube and Flickr.
Micro blogging: This involves publishing digital content such as text, pictures,
links, and videos in small pieces on the internet. “Micro blogging has become
common among groups of friends and professional co-workers who often update
content and follow each other’s posts thereby creating a sense of online
community. Popular examples include; Twitter and Tumblr” (Educase, 2009)
Forums: A forum is a section of a website that enables users to connect and
interact with each other by commenting in response to a published post.
2.3 EVOLUTION OF SOCIAL MEDIA
Websites that enabled users to share, create and upload content began to emerge in the
late 1990s. This was a result of the popularity of broadband internet. In 1997,
sixdegrees.com - the first social network site was generated.
From 2002 onward, a large number of social network sites were created. Some of which
include, Myspace, Friendster.
Recently, social media has gained extensive acceptance by social media site users. In July
2012, twitter had an estimate of 517 million users worldwide (Dewing, 2012).
2.4 OVERVIEW OF POPULAR SOCIAL MEDIA PLATFORMS
An overview of social media platforms is presented in Table 2-1.
Table 2.1 Popular social media platform
SOCIAL MEDIA LOGO DESCRIPTION
6
Facebook Facebook is a social networking channel that enables users to send messages to friends, upload videos and pictures and create profiles and groups,
Google+ Google+ is a networking site that has features such as personal profiles for uploading photos and videos, status updates as well as “communities” for sharing information with several people. It also has special features like “hangouts” for video chatting with one person or many people.
Twitter Twitter is a micro blogging service that enable users to read and send messages known as”tweets” to a number of followers.
Linkedin Linkedin is used for professional networking. The network numbers are called “connection”.
BlogsBlogs typically focus on specific subject and provide users with a comment area to discuss about each posting.
Pinterest Pinterest is a social media site that allows
users to share photos and manage photo
collections. Users can browse other pin
boards for images or “like” photos.
Youtube Youtube is a social media site that allows
users to share videos. Users can create
their own ‘channels’ on Youtube to
organize their videos.
7
Flickr Flickr is a social media site that allows
users to share and embed photographs. It
is also used by bloggers for hosting
images and videos.
Instagram Instagram is social networking sites that
enable users to upload photos and videos
as well as apply digital filters to them.
2.5 OVERVIEW OF TEXT MINING
Text Mining is an evolving field in computer science that is used to extract relevant
information from unstructured textual data through the identification and study of
patterns.
“The phrase ‘text mining’ refers to any system that analyses a huge amount of text and
detects linguistic usage patterns in order to extract useful information” (Sebastiana,
2002).
According to Chen (2001), text mining performs various search functions, categorization
and linguistic analysis. Text Mining can simply be defined as the process involved in
analysing text to obtain information that is useful for a specific goal.
2.5.1 Text Mining and Data Mining
Data mining involves identification of patterns in data while text mining is involves
identification of patterns in text. Data mining is the extraction of useful information from
data (Witten and Frank, 2000).
“Text Mining as exploratory data analysis is a method of (building and) using software
systems to support researchers in deriving new and relevant information from large text
collection. It is a partially automated process in which the researcher is still involved,
interacting with the system. The interaction is a cycle based on the system assumptions,
8
and the user either utilizes or ignores those suggestions and decides on the next move”
(Hearst, 1999).
Data Mining is a phase in knowledge Discovery from Data (KDD). Knowledge
Discovery from Data is concerned with the acquisition of useful knowledge from data.
“Data mining requires interaction between the data mining tools and the researcher and
so, may be considered as a computerized process because data mining tools automatically
search the data for anomalies thereby identifying problems that have not yet been clearly
stated by the end user, while data analysis ‘relies on the end users to select the data,
define the problem and instigate the appropriate data analysis to produce the information
that which helps to solve problems that they uncovered’” (Rob and Coronel, 2002).
2.5.2 Areas of Text Mining
Text mining incorporates seven practice areas; information retrieval, web mining,
document clustering, document classification, information extraction, concept extraction,
natural language processing.
Figure 2.1 Seven practice areas of text mining
9
2.5.2.1 Information Retrieval
Information retrieval (IR) is the process of finding materials (usually document) of an
unstructured nature (usually text) that satisfies information need from within large
collection (usually stored as computer).
Information retrieval is quick turning into the overwhelming type of information access,
surpassing conventional database style searching. Information retrieval is considered as
an argumentation to document retrieval where documents are processed to consolidate or
extract specific information requested by the user. An IR system allows us to reduce the
set of documents that are relevant to a particular problem. The most recognised
information retrieval systems are web search tools such as Google. IR can accelerate the
analysis meaningfully by decreasing the number of documents for an analysis.
2.5.2.2 Document Clustering
Clustering is the breakdown of data into clusters (groups of similar objects). Each cluster
consists of objects that have similar attributes, which makes it different from objects of
other groups. The aim of a good document clustering pattern is to reduce the distances
between documents while increasing inter-cluster distances by using an appropriate
distance measure between documents.
Clustering is a form of unsupervised learning and this is the main difference between
clustering and classification (supervised learning). “Unsupervised” means that documents
have not been assigned to classes by a human expert. Compared to classification where
the classifier learns the association between objects and classes from a train set i.e. a set
of data labelled manually, and then imitates the learnt behaviour on a test data i.e. a set of
unlabelled data, in “clustering”, the nature of data determines the cluster membership,
(Jajoo, 2008).
a) Applications of Clustering
Clustering has several applications in the fields of business and science.
10
1) Finding Similar Documents: Clustering enables the discovery of documents that
are conceptually alike in contrast to a search-based approach (where search result
is based on whether documents share many of the same words). This feature is
often
2) Organizing Large Document Collections: Document retrieval emphasizes on
acquiring documents relevant to a specific query, but it fails to make meaning of a
large number of unclassified documents. The solution is to manage these
documents in a category-based form such that it will be identical to the manual
arrangement done by humans given ample time.
3) Duplicate Content Detection: Clustering could be applied to find duplicates
within a collection of documents. Clustering is used for grouping of related
articles, plagiarism detection, and to rearrange the ranking of search results which
is to ensure diversity among the top documents.
4) Recommendation System: Clustering enables the recommendation of articles for
users based on previous articles read.
5) Search Optimization: Clustering helps in refining the quality of search engines
by comparing user query directly to the clusters and not the documents.
2.5.2.3 Document Classification
“Document classification is the allocation of natural language documents to defined
categories based on their attributes” (Sebastiana, 2002). It is a form of supervised
learning where the categories for each training document are known in advance.
Automatic text classification has various applications such as; automatic extract of
metadata, indexing for document retrieval, and maintaining large collection of web
resources. In the 1990s, document classification was dominated by “knowledge
engineering” techniques that sought to extract categorization rules from human experts
and then, code the rules into a system which would enable an automatic classification of
new documents. Since then, the major approach has been to use machine learning
techniques to infer categories automatically from a training set of documents.
11
Machine learning techniques such as decision tree and association rules have been used
for text or document classification.
2.5.2.4 Web Mining
Web mining is an area in text mining that involves a large volume of data on the web.
Most documents on the web have structured text format as well as hyperlinks between
texts. With the growth of social media channels and the internet, the value of web mining
will continue to increase.
Although web mining is still an emerging area in computer science, it makes use of
advanced technology in natural language processing and document classification.
2.5.2.5 Information Extraction
The goal of information extraction is to identify occurrences of specific defined class of
entities, relationships and events in natural language text by extracting relevant attributes
of the entities, relationships or events.
The information to be extracted is specified in a user-defined format called templates
which are directed to information extraction system for text processing.
The goal of information extraction is to construct structured data from unstructured data.
2.5.2.6 Natural Language Processing (NLP)
NLP is a field in computer science that concerns the development of systems that enables
communication between humans and computers using natural language. NLP is also
referred to as computational linguistics. “Effective communication” is the goal of
processing natural language. Some NLP applications include: Machine translation,
Spelling and grammar checking, Optical Character recognition (OCR)
12
2.5.2.7 Concept extraction
Concept extraction is an aspect of text mining that involves the extraction of concepts
from artifacts.
2.6 TEXT MINING TECHNIQUES
There are several techniques used in processing or mining text. Some of these text mining
techniques include Naïve Bayes, Decision tree, K-nearest neighbour and Neural network.
2.6.1 Naïve Bayes
Bayesian classification is a type of supervised learning with a statistical approach for
classification that presumes a fundamental probabilistic model. Naïve Bayes is a text
categorization method with several applications in language discovery, sentiment
discovery, document categorization and email spam exposure.
“Naïve Bayes approach to text classification is based on calculating the posterior
probability of the documents present in the different classes” (Ms. Priyanka Patel, 2015).
There are two phases involved in classifying text using naïve Bayes. The first phase is
training a set of data and the second phase is the classification phase.
Classification using Naïve Bayes classifier is achieved by computing likelihood and prior
probability to form posterior probability.
Prior probability: prior probability is based on previous experience. It is the probability
that an observation will fall into a group before you collect the data.
Posterior probability: It is the probability of assigning observations to groups given the
data.
Bayesian classifier is based on Bayes theorem, which is P(Cj | d) = p(d|Cj) p(Cj) / P(d)
Where,
P(Cj | d is the probability of instance d being in class Cj,
13
p(d|Cj) is the probability of generating instance d given class Cj,
P(Cj) is the probability of occurrence of class Cj,
P(d) is the probability of instance d occurring.
2.6.1.1 Illustration
An example of the process involved in classifying a text document is given, following the
steps below.
Table 2.2 Train Dataset from Document
Doc No. Text Category
1. I Loved the movie +
2. I hated the movie -
3. A great movie. A good movie +
4. Poor acting -
5. Great acting. A good movie +
STEP ONE: Create a frequency table for documents in the positive category
Table 2.3 Frequency table for positive category
I Loved the movie hated a great poor acting good
1. 1 1 1 1
3. 1 1 1 1
5. 1 1 1 1 1
14
STEP TWO: Create a frequency table for documents in the negative category
Table 2.4 Frequency table for negative category
I Loved the movie hated a great poor acting good
2. 1 1 1 1
4. 1 1
STEP THREE: Compute the posterior probability of positive outcome and negative
outcome for Vj
Vj = I hate this poor acting
If Vj = positive, P(I|+) * P(Hate|+) * P(the|+) * (Poor|+) * P(acting|+)
Vj = 6.03 X 10^-7
If Vj = negative, P(I|+) * P(Hate|+) * P(the|+) * (Poor|+) * P(acting|+)
Vj = 1.22 X 10^-5
CONCLUSION: “I hate the poor acting” falls under the negative category because the
computed posterior probability of negative outcome is greater than posterior probability
of positive outcome.
2.6.1.2 Strengths of Naïve Bayes
It is simple to implement.
It is easy to train.
2.6.1.3 Weaknesses of Naïve Bayes
It has a strong feature independence assumption
15
2.6.2 Support Vector Machine
SVM was first introduced in 1992 by Vapnik, Boser and Guyon. SVM is related to
statistical learning theory. According to the figure below, training set can either be linear
separable or non-linearly separable.
Figure 2.2 Linearly separable
Figure 2.3 Non-linearly separable
NB: The challenge of training set that are not linearly separable is solved by transforming
original data to map into new space using a Kernel Function.
For the function, f(x) = Wt X + b
W is the normal to the line and is known as the weight vector
b is the bias
16
dd
d
F(x) < 0 F(x) > 0
F(x) = 0
Since w^t X + b = 0 and c(w^t X + b) = 0 define the same plane, the normalization for w
can be freely chosen. Normalization is chosen such that w^t X + b = +1 and w^t X + b = -
1 for the positive and negative support vectors respectively.
Then, the margin is given by
w . (x+ - x-) = w^t (x+ - x-) = 2
“Support vectors are the data points that lie close to the decision margin. They are the
essential elements of every training set.” (Berwick, 2003)
17
Decision boundary or
separating plane
||w|| ||w|| ||w||
SVM increases the margin around the separating decision boundary. Finding the optimal
hyper plane is an optimization problem that can be solved using optimization techniques
such as Lagrange multipliers (Zisserman, 2015).
2.6.2.1 Strengths of SVM
It is easy to train
It scales relatively well to high dimensional data
Tradeoff between error and classifier complexity can be controlled explicitly.
2.6.2.2 Weaknesses of SVM
Difficulty in choosing a “good” kernel function.
2.6.3 K-Nearest Neighbour
Using KNN classifier, an object is being assigned to the class most common amongst its
k nearest neighbours. K is a positive integer. If k=1, then the object is assigned to the
class of its nearest neighbour.
In binary classification problems, it is necessary to choose k to be an odd number in other
to avoid a tie vote.
Figure 2.4 Graphical representation of an unknown class and its neighbours
18
X
If k = 5, then from the figure above, x will be classified as a circle because three of its
nearest neighbours are classified as circle.
Distance Metrics: To make predictions using KNN, there is need to derive a metric for
measuring the distance between the cases and query point. Examples of distance
functions used are: Mahnattan distance and Euclidean distance.
2.6.3.1 Strengths of KNN
The learning process is less expensive
There are no assumptions about the characteristics of the concepts to learn
The Complex concepts can be learned by local approximation
2.6.3.2 Weaknesses of KNN
The model cannot be interpreted
It is computationally expensive to find the k nearest neighbors when the dataset is
large
Performance depends on the number of dimensions
2.6.4 Decision Tree
Decision tree is a classification tree that generates a tree and a set of rules which
represents the model of different classes, from a given dataset. According to Han and
Kamber (2001), decision tree is a flow-like tree structure, where each internal node
represents a test on an attribute and each branch represents the classes. In a decision tree,
the peak node is also referred to as the root node. The rules corresponding to a tree are
derived by traversing each leaf of the tree starting from the node.
There are two phases involved in developing a decision tree.
Tree building phase: This phase involves the partitioning of training data
repeatedly until all the objects in each partition belong to a class.
19
Tree pruning phase: This phase involves the removal of variation or statistical
noise particular to a training set.
ID3 Algorithm Iterative dichotomizer 3 (ID3) was introduced by Quinlan for creating
decision trees from data. In ID3, the nodes represent a splitting attribute while the
branches are a possible value for each attribute. At each node, the splitting attribute is
chosen to be the most useful among the attributes not yet considered in the path from the
root. ID3 algorithm uses the value of information gain to determine the effectiveness of a
split. The attribute with the highest information gain is selected as the splitting attribute
and then, the dataset is split for all discrete values of the attribute.
Illustration of classification decision tree
The dataset for text classification using decision tree is presented in Table 2-5.
Table 2.5 Weather dataset
ID OUTLOOK TEMPERATURE
HUMIDITY WIND PLAY
1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Normal Weak Yes6 Rain Cool Normal Strong No7 Overcast Cool Normal Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Normal Weak Yes10 Rain Mild Normal Weak Yes11 Sunny Mild Normal Strong Yes12 Overcast Mild High Strong Yes13 overcast Hot Normal Weak Yes14 rain Mild High strong No
The condition attributes in the dataset are, outlook, temperature, humidity and wind. The
decision attributes are to play or not to play. {Sunny, overcast, rain}, {hot, mild, cool},
20
{high, normal} and {weak, strong} are the values of the attributes outlook, temperature,
humidity and wind respectively.
Entropy provides an information-theoretic approach to measure the effectiveness of a
split. It measures the amount of information in an attribute.
Entropy (s) = -p(P) log2 p(P) – p(N) log2 p(N)
For the illustration above, entropy (s) = - (9/14) * log2(9/14) – (5/14)*log2(5/14) = 0.940
Gain(S,wind) = Entropy (S) – (8/14) * Entropy (S weak) – (6/14) * Entropy (S strong)
= 0.940 – (8/14) (0.811) – (6/14) * 1.0 = 0.048
Entropy (S weak) = - (6/8) * log2 (6/8) – 2/8 * log2 (2/8) = 0.811
Entropy (S strong) = - (3/6) * log2 (3/6) * log2 (3/6) * log2 (3/6) = 1.0
Similarly, Gain (S, outlook) = 0.246
Gain (S temperature) = 0.029
Gain (S humidity) = 0.151
Since the “outlook” attribute has three values, the root node will have three branches
(sunny, overcast, rain). The next question is “what attribute should be tested at the sunny
branch node?” Since outlook has been used as the root node, the decision of the next root
node lies on the remaining three attribute: humidity, temperature or wind.
Gain (S sunny, humidity) = 0.970
Gain (S sunny, temperature) = 0.570
Gain (S sunny, wind) = 0.019
Humidity has the highest gain; therefore it is used as the next decision node. This process
continues until all data in the dataset is classified.
21
Figure 2.5 representation of a constructed decision tree
The corresponding rules are:
If outlook = sunny and humidity = high then play = no
If outlook = sunny and humidity = normal then play = yes
If outlook = overcast then play = yes
If outlook = overcast and wind = weak then play = yes
If outlook = overcast and wind = strong then play = no
2.6.4.1 Strengths of decision tree
It generates logical rules
Less computation is required for classification
It processes continuous and categorical variables
2.6.4.2 Weaknesses of decision tree
It is not suitable for prediction of continuous attribute
It performs inadequately given many classes and small data
It is computationally expensive to generate a decision tree
22
2.6.5 Neural Network
Neural network is a branch in artificial intelligence. It is referred to as artificial neural
networks (ANNs). Rather than programming systems to execute certain task, the systems
learn to perform these tasks using ANN by generating an artificial intelligence system
(AIS). Artificial intelligence system (AIS) is a logical model that can precisely find
hidden patterns in data.
An artificial neural network is made up of several artificial neurons that are linked to
form network architecture. The aim of a neural network is to convert the input into
significant output (Sasithra & saravanan, 2014). The teaching node can be unsupervised
or supervised. Application areas of ANN are: Bankruptcy prediction, speech recognition
and fault detection.
Figure 2.6 artificial neural network (adapted from Sasithra & saravanan, 2014)
A neural network is trained using a back propagation algorithm. Gradient descent method
(GDM) is used to minimize the mean squared error between network output and the
actual error rate. The parameters considered to measure the efficiency of the network are:
23
no. of epochs taken for convergence of the network, calculated mean square error and
rate of convergence.
Table 2.6 comparison of various classification methods based on artificial neural network (adapted from Sasithra & saravanan, 2014)
24
2.6.5.1 Strengths of Neural Network
Appropriate results are displayed for complex domains
It is suitable for both continuous and discrete data
2.6.5.2 Weaknesses of neural network
It is usually difficult for users to interpret learned result
Training is relatively slow
2.7 REVIEW OF EXISTING SYSTEM
A few existing and related projects where reviewed during the course of this research
project.
25
2.7.1 Mining student data to analyse learning behaviour
The research project was conducted by Alaa El-Halees of the department of Computer
Science, Islamic University of Gaza. The study involved four phases from which
knowledge was extracted to describe students’ behaviour.
A data mining technique is used to determine association rules which are sorted
using lift metrics and then, represented graphically.
Classification rules were discovered using decision tree.
The students were clustered into groups using EM-clustering
An outlier analysis was used to detect all outliers in the data.
Figure 2.7 Association rules graph for students with grade “fail” using Arviewer
2.7.2 Mining social media data to understand student learning experience
The study explores a social media (twitter) with the aim of understanding the learning
experience of engineering students in Purdue University, United States of America.
26
Using an inductive content analysis, it was discovered that engineering students in
Purdue University are struggling with heavy study load which leads to several outcomes
such as: sleep problem, lack of social engagement and other physical and psychological
health problems.
2,785 tweets with #engineeringproblems were related to the educational life of
engineering students in Purdue University and five categories for classification were
identified. 70 percent of the 2,785 tweets were used for training (1,950 tweets) and 30
percent for testing (835 tweets).
Figure 2.8 number of tweets for each issue detected from the Purdue tweet collection
2.7.3 Mining educational data to analyse student’ performance
The main objective of the research project was to use decision tree (a data mining
technique) to understand students’ performance. A decision tree is a tree in which each
branch node represents a choice between several alternatives, and each node represents a
decision.
27
Information such as class test, class attendance and seminar or assignment marks were
collected from the student management system, to predict students’ performance at the
end of the semester.
28
3CHAPTER THREE: SYSTEM MODELING AND DESIGN
3.1 INTRODUCTION
This chapter focuses on an overview of naïve Bayes classifier algorithm, steps involved
in pre-processing data as well as the diagrammatic models used to design the system
architecture.
3.2 SYSTEM REQUIREMENT
The system requirement defines the system’s operational constraints and functions in
detail.
System requirement is divided into; Functional requirements and Non-Functional
requirements.
The functional requirements include the following:
The system allows users to view analysis of the factors affecting the learning
experience of student for a particular course in a graphical form.
The system allows users to view recommendations for a particular course.
The Non-functional requirements include the following:
The graphical user interface should be simple and user-friendly
The system should be efficient, that is, it performs task in limited task and with
limited computer resources
The system should be reliable.
The system should analyse or classify data accurately.
3.3 SYSTEM DESIGN
System modelling is an act of representing the features of a system using graphical
notation. System modelling helps an analyst to understand fully, the functionality of the
29
system and also, are used to communicate effectively with customers (Sommervile,
2007).
3.3.1 Unified modelling language
Unified modelling language is an industry standard graphical notation for defining
software analysis and designs.
Types of UML Diagrams
1. Activity diagrams: is used to demonstrate the activities that form a process.
2. Use-case diagrams: is used to demonstrate the interactions that exist between a
system and its environment.
3. Sequence diagrams: is an interaction diagram that shows how processes operate
with one another and in what order.
4. Class diagrams: is the building block of object-oriented programming. It is used
both for general conceptual modeling of the systematics of the application.
5. State diagrams: describes the behavior of a single object in response to a series
of event in a system.
For this project, the system will be modelled using use case diagram and activity
diagram.
3.3.1.1 Use Case Diagram
A use case diagram is a graphical representation of the relationships that exist between
use cases and actors. Use cases are developed at requirements elicitation stage of
software engineering and are further developed as they are reviewed by stakeholders
during analysis. “A use case is a typical representation of a major piece of complete
functionality” (Bernd Bruegge, 2000).
Use case is represented as an ellipse. It is a unique name (usually a present-tense
verb phrase) expressed in an active voice.
30
Actor represents a human or computer that interacts with the system. In UML, an
actor is represented by arrows and lines.
Relationships between use cases are represented by arrows and lines. The default
relationship that exists between a use case and an actor is the <<communicates>>
relationship represented by a line.
Figure 3.9 use case diagram of showing the action performed by management or educators.
3.3.1.2 Activity Diagram
Activity diagrams show a breakdown in the complex flow of a use case. UML activity
diagrams are an enhanced form of flowcharts. “An activity is a step that needs to be
executed, whether by a computer or by a human” (Fowler, 2000).
Furthermore, activity diagrams allow for parallelism (that is, they can operate alternately,
simultaneously or consecutively) when the order of activities is not necessary.
31
View Analysis
Management ///actoreducato
Figure 3.10 Activity diagram showing the flow of activities involved in analysing data
3.4 OVERVIEW OF NAÏVE BAYES CLASSIFIER
Naïve Bayes classifiers are known to be efficient. The probabilistic model of Bayes’
theorem originates from the postulation that the attributes in the dataset are mutually
independent.
Naïve Bayes have been applied in various fields because of its strengths, some of which
are; easy to implement and relatively robust. Some applications of naïve Bayes includes:
diagnosis of diseases, spam filtering, to mention a few. Moreover, the nature of problem
to be solved is a determinant of the classification model that will be used.
3.4.1 Posterior Probability
Posterior probability can be understood as: “what is the probability that a particular
object or an entity belongs to class I given its observed feature values?” An actual
example would be “what is the probability that a boy has diabetes given a certain value
for a pre-breakfast blood glucose measurement and a certain value for a post-breakfast
blood glucose measurement? “
32
P (diabetes | xi), xi = [90mg/dl, 145mg/dl]
Let
Xi be the feature vector of sample i, i ϵ {1, 2… n}
ωj be the notation of class j, j ϵ {1,2,…., m}
p (xi | ωj) be the probability of observing sample xi given that it belongs to class ωj.
The objective function in the naïve Bayes probability is to maximize the posterior
probability, given the training data in order to formulate the decision rule.
3.4.2 Class-Conditional Probabilities
The class-conditional probabilities (likelihoods) can be derived by directly estimating the
training. Thus, given a d-dimensional feature vector x, the class conditional probability
can be calculated as follows:
Here, P(xi∨ω j) means “how likely is ti to observe this particular pattern x given that it
belongs to class ωj?” For every feature vector, the individual likelihood can be derived
from the maximum-likelihood estimate, that is, a frequency in the case of categorical
data.
3.4.3 Prior Probabilities
The prior probabilities describe possibility of knowing a particular class.”
If there is a uniform distribution among the priors, the posterior probabilities will be
determined by the evidence term and the class-conditional probabilities. But if the
33
Nxi , ωj : Number of times feature xi appears in samples from class ωj.
Nωj : Total count of all features in class ω
evidence term is a constant, then, decision rule will be dependent on the class-conditional
probabilities.
3.4.4 Multi-Variate Bernoulli Naïve Bayes
The multivariate Bernoulli model is based on data in binary form. Each token in the
feature vector of a document is categorized as either a value of 1 or 0. The feature vector
is known to have m dimensions, where m is the number of words in a vocabulary. The
value “1” implies the occurrence of a word in the document while the value “0” implies
the no-occurrence of a word in the text document d.
3.4.5 Multinomial Naïve Bayes
3.4.5.1 Term Frequency
It is sometimes referred to as raw frequency. Term frequency is another method that can
be used to categorize document rather than using binary values – in the case of
multivariate Bernoulli naïve Bayes.
Basically, term frequency is the amount of times a given token t occurs in a document d.
Maximum-likelihood estimate can be derived from term frequency by training data to
evaluate the class-conditional probabilities in the multinomial model.
34
P(x ∣ωj)=∏i=1 mP (xi ∣ωj)b ⋅(1−P(xi ∣ωj))(1−b)(b∈0,1)
Let P(xi ∣ωj) be the maximum-likelihood estimate that a particular word or token xi occurs in class ωj.
P(xi ∣ωj)=dfxi , y+1 dfy+2
Where
dfxi , y Represents the number of documents in the training dataset that contain the feature xi and belongs
to the class ωj .
Dfy Represents the number of documents in the training dataset that belong to the class ωj .
3.4.5.2 Term Frequency – Inverse Document Frequency (TF-IDF)
Another method for categorizing document is TF-IDF. It can also be referred to as a
weighted term frequency and is useful to remove stop words from text document. This
approach presumes that the significance of a word is proportional to the number of times
that particular word occurs across a text document.
Aside been used to rank documents by relevance, TF-IDF can be applied to text
classification using any of the text classification technique such as Naïve Bayes.
3.4.6 Performance of Multi-Variate Bernoulli and Multinomial Model
Based on empirical comparisons, there is evidence that the multinomial model tends to
perform better than multivariate Bernoulli when the vocabulary size is relatively large.
However, the performance of machine learning algorithms is highly dependent on the
appropriate choice of feature.
In practice, it is advised that the choice between multivariate and multinomial model for
text classification should follow comparative studies which includes different
combinations of selection and feature extraction process.
3.4.7 Continuous Variables
Naive Bayes can be used on continuous data although it is known as a conventional
instance of categorical data. An identified strategy for achieving using naïve Bayes
classification would be
to use a Gaussian kernel to calculate the class-conditional probabilities. The Gaussian
naïve Bayes model can be written as:
35
P (xik∣ω) = 12πσω2exp − (xik−μω) 22σω2
Where
μ (sample mean) and σ(the standard deviation) are the parameters that are to be estimated from
the training data. Under the naïve Bayes assumption of conditional independence, the class-
conditional probability can then be computed as the product of the individual probabilities:
3.4.8 Eager and Lazy Learning Algorithms
Eager learners are machine learning algorithms that are able to learn from a model of
training dataset the moment data is available. Once the model is learned, the training data
will then be re-evaluated in order to make a new prediction. Naïve Bayes classifier is an
example of an eager learner algorithm because they are relatively fast in classifying new
instances.
Lazy learners on the other hand, predict the class label of new instances by memorizing
and re-evaluating the training dataset. The advantage of lazy learning algorithm is that the
training phrase is relatively fast. Nonetheless, the actual prediction is slower compared to
the eager learners as a result of the re-evaluation of training data. An example of a lazy
learner is k-nearest neighbour algorithm.
3.4.9 The Bag of Word Model
Good features are measured by:
Salient: Features must be relevant with respect to the problem domain.
Discriminatory: Features should contain adequate information to ensure accuracy in
distinguishing between patterns when used to train the classifier.
Prior to using machine learning algorithm for training, there is need to represent a text
document as a feature vector. Hence, the Bag of Word (BOW) model is used for such
processing.
“The BOW representation of a document D enables the transformed dataset to be viewed
as a matrix, where vectors are represented as rows and terms are represented as columns.
This view enables the application of various matrix decomposition techniques to
clustering and dimensionality reduction. Moreover, documents can be compared using
classical distance/similarity measures since they are treated as vectors.” (Milos &
Mirjana, 2008).
36
3.5 TEXT PRE-PROCESSING
3.5.1 Tokenization
The process ‘tokenization’ involves the breaking down of a document into singular token
that is used as input for several natural language processing (NLP) algorithms.
Tokenization is usually followed by certain processing steps, such as stemming,
lemmatizing, stop-word removal and the construction of n-grams.
3.5.2 Stop Words
These are characters or words that are common in text document and hence, are not
relevant for training a dataset. Examples of such words are: or, so, to, and, the etc. Stop
word removal can be done by using stop word dictionary to search for stop words.
Another method to remove stop words in a text document is to create a stop list by
sorting every word based on frequency. Thereafter, the stop list is used to remove stop
words that are ranked among the top word n words in the stop list.
3.5.3 Stemming and Lemmatizing
Stemming is the process of deriving the root form of a word. Porter Stemmer - a
stemming algorithm - was developed by Martin F. Porter in 1979. For example, the
words, machinery, mechanism and mechanizing can be stemmed to machine.
Lemmatization is computationally more expensive and difficult than stemming.
However, there is little influence of stemming and lemmatization on the performance of
text classification.
3.5.4 N-grams
Using the n-gram model, a token is defined as a sequence of n items. The most preferred
n-gram model is the unigram (1-gram) where each word consists of one letter or symbol.
The choice of the number n depends on two factors: the language and where it would be
applied.
37
3.6 WORKFLOW OF THE PROJECT
Figure 3.11 workflow of social media data integrated with qualitative analysis and data mining algorithm
From figure 3.3, width of the blue arrows represents data volumes - wider indicates more
data volume. Black arrows represent data analysis, computation and results flow. The
figure 3.3 is a model that represents the seven steps involved in this project.
Step one – Data Gathering
With the consent of the lecturer for Communications in English II (GST121), a
discussion forum was created on the e-learning management system (URL –
moodle2.covenantuniversity@edu.ng) by the CSIS department, Covenant University.
According to Martin (2009), discussion is a significant dimension of a learning process.
On the 10th March 2016, the discussion forum was open for students offering GST121 to
post comments about their learning experiences. A total of 118 messages were gathered
from the discussion forum for the study.
38
Step two – Data Sampling
After data was gathered and stored, specific comments related to the learning experiences
of students offering GST121 were used for qualitative analysis.
Step three – Qualitative Data Analysis
A qualitative content analysis also known as inductive content analysis was conducted on
the samples. It is a qualitative research method that is used to manually analyse text
content and generate pre-defined categories of data.
According to Rost et al, in large-scale social media data analysis, faulty assumptions are
likely to occur if machine learning algorithms are used without carrying out a qualitative
survey on data. “We concur with this argument as it is found that no appropriate
unsupervised algorithm could reveal in-depth meanings to the data” (Chen, Vorvoreanu,
& Madhavan, 2014).
Step four – Qualitative Result
The major problems affecting the learning experiences of students offering GST121 was
classified under several categories.
Step five – Model Adaption
Based on the categories from step four, a multi-label naïve Bayes classification algorithm
was implemented. he classification algorithm is used to train the analyzer that assists in
classifying the learning experiences of students offering GST121.
Step six – Data Analysis Result
The result would help educators make informed decisions to the identify factors that
affect students’ learning experiences.
39
4CHAPTER FOUR: SYSTEM IMPLEMENTATION AND EVALUATION
4.1 INTRODUCTION
This chapter provides an overview on the choice programming languages, software and
hardware requirements, and the different interfaces which were implemented.
4.2 SYSTEM REQUIREMENTS
System requirements are descriptions of a system functionalities and operational
constraint. Requirements may range from high level abstract statements of the services
provided by a system and the operational constraint to a detailed mathematical functional
specification. There are two types of system requirements, they are:
Hardware Requirements
Software Requirements
4.2.1 Hardware Requirements
This part of the system requirements is concerned with the physical components of a
computer that is needed for an effective and efficient operation of the application. The
hardware requirement is sub-divided into:
Server Requirement
Client Requirement
Table 4.7 server side hardware requirements
REQUIREMENT HARDWARE
Processor Intel Core i5 2.0Ghz and higherPrimary Memory (RAM) 6GB of RAM or higherSecondary Memory 10GB hard disk space of higherArchitecture X64 (64 Bit)
Table 4.8 Client side hardware requirements
40
REQUIREMENTS HARDWARE
Processor Intel Pentium III 1.2Ghz and higherPrimary Memory (RAM) 1 GB of RAM or higherSecondary Memory 3 GB hard disk space and higher
4.2.2 Software Requirements
Software requirement is concerned with defining software resource requirements and
prerequisites applications that need to be installed on a computer for an application to
operate optimally. However, this can be classified into:
Development Software Requirements: These are tools and software that are
needed for the successful development and deployment of the application.
Client Software Requirements: These are necessary software needed to run the
application.
Table 4.9 Development Software Requirements
REQUIREMENTS SOFTWAREOperating system Microsoft Windows 8.0, 8.1 and 10.0Programming Languages Python, HTML, CSS, JavaScriptCore Python Packages Flask, Scikit-Learn, PandasDevelopment tool PyCharm IDEWeb Server Flask Development Server
Table 4.10 Web Client Software Requirements
REQUIREMENTS SOFTWAREOperating System Microsoft Windows 7, 8.0, 8.1, 10Internet Browser Google chrome, Mozilla Firefox, Opera
4.3 IMPLEMENTATION TOOLS
The tools used to implement the application include:
1. Python: Python is a high-level, interactive and interpreted language. Other
features of python include; easy-to-learn, interactive mode, scalability, easy-to-
maintain. Python was the choice programming language due to its standard library
and wide array of packages available for machine learning.
41
2. Hyper Text Mark-up Language (HTML): HTML is a mark-up language that is
used to create user interfaces for mobile application as well as webpages. HTML
is used together with CSS and JavaScript.
3. Python Machine Learning Packages: Packages are namespaces which contain
multiple packages and modules themselves, the machine learning, numerical &
scientific packages used in this project research includes:
SciKit-Learn: It is an open source machine learning library for Python
programming language which features several classification algorithms.
Pandas: It is a data analysis tools for the Python programming language.
It is open source and easy to use.
Scipy: It is an open source Python library used by engineers, analysts and
scientists for scientific or technical computing.
4 Flask: Flask is a micro web framework written in python and is based on the
Werkbeug toolkit and Jinja2 template engine.
5 PyCharam: It is a Python language Integrated Development Environment (IDE).
4.4 SYSTEM MODULES AND INTERFACES
The section describes the various modules and interfaces of the application.
4.4.1 Home Page
This is the page displayed after the program is run. It has a “view analysis” button which
the user clicks to view a graphical representation of the learning experience of students
offering GST121.
42
Figure 4.12 homepage of application
4.4.2 Result Page
The result is a bar chart representation of the analysis and a table showing the test dataset
(responses and categories) used for classification.
Figure 4.13 a bar chart showing the category ratio of the learning experience of students offering GST121
43
4.4.3 Result and Interpretation
From the inductive content analysis stage, a total of 114 posted comments on the learning
experiences of students offering GST121 were gathered with three categories – relevance
of the course, unfavourable environment and lecture quality. 70 percent of the 114
comments were used for training (89 comments) and 30 percent for testing.
4.4.4 Recommendation
Based on the classification result, the following are recommendations for the identified
factors affecting the learning experiences of students offering GST121:
1) The air conditioner should be switched on few minutes before commencement of lecture.
2) The lecturers should engage students by asking questions often during the lecture.3) Attendance for GST121 could be automated because of the population of students
offering the course or if still manual, should be coordinated properly by the lectures.
4) Lecturers should also introduce videos as a way of improving lecture quality.5) Lecturers should also ensure mastery of slides as most students commented that
the lecturers always read from the presentation slides consistently.6) The duration for the GST121 examination should also be reconsidered.
Figure 4.14 cross-section of comments used for classification and their corresponding category
44
4.5 DATA GATHERING PROCESS
For the study, a discussion forum (which is a social media category) on the Covenant
University e-learning platform was created for freshmen, Covenant University to express
their worries and concern about GST121. Thereafter, data was collected from the forum
for pre-processing and classification.
Figure 4.15 screenshot of Covenant University e-learning Moodle Homepage
45
Figure 4.16 screenshots of some comments posted by students on the discussion forum
4.5.1 Train Dataset
46
47
48
49
50
CHAPTER FIVE
5SUMMARY, RECOMMENDATION AND CONCLUSION
This chapter gives a summary of this work. A number of recommendations for future
works in this area were made and consequently the conclusion.
5.1 SUMMARY
This project represents a forum-based student learning experience analyzer using a text
mining technique. It has a graphic user interface with a “view analysis” button which
enables users to view the graphical representation of students’ learning experiences. The
system is a model of the use of various machines learning algorithm for the classification
of unstructured text and designed using UML diagrams. The forum-based student
learning experience analyzer provides users with recommendations to aid decision
making.
5.2 RECOMMENDATION
Recommendations for further improvement of this project include:
Evaluation of the learning experiences of students across different academic
level
The scope of this project was limited to freshmen of Covenant University offering
Communications in English (GST121) because of time constraint. I would
propose for further study that courses offered by students across different
academic level be considered.
Introduction of other text mining classification algorithms
For this project, Gaussian naïve Bayes was used as the classification algorithm. I
would suggest for further work that other text mining classification algorithms
such as Support vector system, K-nearest neighbour, Rocchio’s and decision tree
be used to classify the learning experiences of students. This will act as a measure
to check accuracy among the classifier algorithms.
51
5.3 CONCLUSION
The growth of digital technology has influenced the rate at which students share, update
and post comments about their learning experiences on social media. Evaluating the
learning experiences of students from data available on an informal platform is useful to
understand the worries, concern and struggle about students’ learning process. The
volume of data in such an environment can only be valuable if processed using an
effective automated process such as a text mining technique.
52
REFERENCES
Retrieved January 21, 2016, from ABBYY FireReader Engine 11:
http://www.abbyy.com/ocr-sdk/
Bejar, J. (2013). K-nearest neighbours.
Bernd Bruegge, A. D. (2000). Object-oriented software engineering. Germeny.
Berwick, R. (2003). An Idoit's guide to Support Vector Machines (SVMs).
Chen, X., Vorvoreanu, M., & Madhavan, k. (2014). Mining Social Media Data for
Understanding Students' Learning Experiences. IEEE transactions onLlearning
Technologies, 246-259.
Classification using Decision Trees. (n.d.).
Danah, B. M., & Ellison, N. B. (2007). Social Network Sites: Definition, history and
scholarship.
Dewing, M. (2012). Social Media: an Introduction. Canada: Library of Parliament.
Educase. (2009). 7 things you should know about microblogging.
Fowler, M. (2000). UML Distilled.
Jajoo, P. (2008). Document Clustering. India.
Milos, & Mirjana. (2008). Text Mining: Approaches and Application. 227-234.
Ms. Priyanka Patel, M. K. (2015). A Review: Text Classification on Social Media Data.
IOSR Journal of Computer Engineering, 80-84.
Pagare, P. K. (2014). Analyzing Social Media Data for Understanding Student's Problem.
International Journal of Computer Applications, 17-22.
Sasithra, k., & saravanan. (2014). Review on Classification Based on Artificial Neural
Networks. International Journal of Ambient System and Applications, 11-18.
Sommervile, I. (2007). Software Engineering. United Kingdom.
53
Witten, I. H. (n.d.). Text Mining. Hamilton, New Zealand.
Yiu, T. (2001, April 5). Decision Tree Classification.
Zisserman, A. (2015). The SVM Classifier.
54
top related