amaka_project_editted

A SOCIAL MEDIA BASED STUDENT LEARNING EXPERIENCE ANALYZER

USING A TEXT MINING TECHNIQUE

BY

OFULUE AMAKA MARY

12CH014362

A PROJECT SUBMITTED TO THE DEPARTMENT OF COMPUTER AND

INFORMATION SCIECES IN THE COLLEGE OF SCIENCE AND

TECHNOLOGY, COVENANT UNIVERSITY, OTA.

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE AWARD

OF THE BACHELOR OF SCIENCE (B.Sc.) HONOURS DEGREE IN

MANAGEMENT INFORMATION SYSTEM.

MAY, 2016

CERTIFICATION

This is to certify that this project was carried out by OFULUE AMAKA (12CH014362)

in the Department of Computer and Information Sciences, College of Science and

Technology, Covenant University, Ota.

DR. OLAWANDE DARAMOLA

.................................................................

Main Supervisor Signature & Date

MR. AZUBUIKE EZENWOKE .................................................................

Co-Supervisor Signature & Date

DR. ARIYO ADEBIYI .................................................................

HOD, CIS Signature & Date

i

DEDICATION

I dedicate this project to God almighty for his grace and live over my life, divine speed,

favour, health.

I also dedicate this project to my parents, siblings and relative for their financial support,

wise counsel and words of encouragement.

iii

ACKNOWLEDGEMENT

I genuinely appreciate God for his word concerning my life and his grace during the

course of this project. I am most grateful to my parents and relatives who showed their

sincere interest in the success of my project through their support, prayer and words of

encouragement.

I also appreciate Mr. Ezenwoke Azubuike, my project supervisor for his time, creative

ideas, and study materials which he readily made available for the success of this project.

Finally, I am thankful to Bishop David Oyedepo, the chancellor of Covenant University,

Dr. A.A Adebiyi, the Head of department of Computer and Information Sciences and

every lecturer in the department.

.

iv

TABLE OF CONTENTS

Certification..........................................................................................................................i

Dedication............................................................................................................................ii

Acknowledgement..............................................................................................................iii

List of Tables.....................................................................................................................vii

Table of Figures................................................................................................................viii

Abstract...............................................................................................................................ix

Chapter One: Introduction..................................................................................................1

1.1 Background Of The Study....................................................................................1

1.2 Statement of the Problem......................................................................................2

1.3 Aim and Objectives of the Study..........................................................................2

1.4 Research Methodology.........................................................................................3

1.5 Significance Of The Study....................................................................................3

1.6 Limitation Of The Study.......................................................................................4

1.7 Arrangement of Thesis..........................................................................................4

Chapter Two: Literature Review........................................................................................5

2.1 Introduction...........................................................................................................5

2.2 Overview Of Social Media...................................................................................5

2.3 Evolution of Social Media....................................................................................6

2.4 Overview of Popular Social Media Platforms......................................................6

2.5 Overview of Text Mining.....................................................................................8

2.5.1 Text Mining and Data Mining.......................................................................8

2.5.2 Areas of Text Mining.....................................................................................9

2.6 Text Mining Techniques.....................................................................................13

v

2.6.1 Naïve Bayes.................................................................................................13

2.6.2 Support Vector Machine..............................................................................16

2.6.3 K-Nearest Neighbour...................................................................................18

2.6.4 Decision Tree...............................................................................................19

2.6.5 Neural Network...........................................................................................23

2.7 Review of Existing System.................................................................................25

2.7.1 Mining student data to analyse learning behaviour.....................................26

2.7.2 Mining social media data to understand student learning experience.........26

2.7.3 Mining educational data to analyse student’ performance..........................27

Chapter Three: System Modeling and Design..................................................................29

3.1 Introduction.........................................................................................................29

3.2 System Requirement...........................................................................................29

3.3 System Design.....................................................................................................29

3.3.1 Unified modelling language........................................................................30

3.4 Overview Of Naïve Bayes Classifier..................................................................32

3.4.1 Posterior Probability....................................................................................32

3.4.2 Class-Conditional Probabilities...................................................................33

3.4.3 Prior Probabilities........................................................................................33

3.4.4 Multi-Variate Bernoulli Naïve Bayes..........................................................34

3.4.5 Multinomial Naïve Bayes............................................................................34

3.4.6 Performance of Multi-Variate Bernoulli and Multinomial Model..............35

3.4.7 Continuous Variables...................................................................................35

3.4.8 Eager and Lazy Learning Algorithms..........................................................36

3.4.9 The Bag of Word Model..............................................................................36

3.5 Text Pre-Processing............................................................................................37

vi

3.5.1 Tokenization................................................................................................37

3.5.2 Stop Words..................................................................................................37

3.5.3 Stemming and Lemmatizing........................................................................37

3.5.4 N-grams.......................................................................................................37

3.6 Workflow of the Project......................................................................................38

Chapter Four: System Implementation and Evaluation....................................................40

4.1 Introduction.........................................................................................................40

4.2 System Requirements..........................................................................................40

4.2.1 Hardware Requirements..............................................................................40

4.2.2 Software Requirements................................................................................41

4.3 Implementation Tools.........................................................................................41

4.4 System Modules and Interfaces..........................................................................42

4.4.1 Home Page...................................................................................................42

4.4.2 Result Page..................................................................................................43

4.4.3 Result and Interpretation..............................................................................44

4.4.4 Recommendation.........................................................................................44

4.5 Data Gathering Process.......................................................................................45

4.5.1 Train Dataset................................................................................................46

Chapter Five.......................................................................................................................51

Summary, Recommendation and Conclusion...................................................................51

5.1 Summary.............................................................................................................51

5.2 recommendation..................................................................................................51

5.3 Conclusion..........................................................................................................52

vii

LIST OF TABLES

Table 2.1 Popular social media platform.............................................................................6

Table 2.2 Train Dataset from Document...........................................................................14

Table 2.3 Frequency table for positive category...............................................................14

Table 2.4 Frequency table for negative category...............................................................15

Table 2.5 Weather dataset..................................................................................................20

Table 2.6 comparison of various classification methods based on artificial neural network

(adapted from Sasithra & saravanan, 2014)......................................................................24

Table 4.1 server side hardware requirements....................................................................40

Table 4.2 Client side hardware requirements....................................................................40

Table 4.3 Development Software Requirements...............................................................41

Table 4.4 Web Client Software Requirements..................................................................41

viii

TABLE OF FIGURES

Figure 2-1 seven practice areas of text mining..............................................................9

Figure 2-2 artificial neural network (adapted from Sasithra & saravanan, 2014).......23

Figure 2-3 Association rules graph for students with grade “fail” using Arviewer....26

Figure 2-4 number of tweets for each issue detected from the Purdue tweet collection

...........................................................................................................................................27

Figure 3-1 use case diagram of showing the action performed by management or

educators............................................................................................................................30

Figure 3-2 Activity diagram showing the flow of activities involved in analysing data

...........................................................................................................................................31

Figure 3-3 workflow of social media data integrated with qualitative analysis and data

mining algorithm...............................................................................................................37

Figure 4-1 homepage of application............................................................................42

Figure 4-2 a bar chart showing the category ratio of the learning experience of students

offering GST121................................................................................................................42

Figure 4-3 cross-section of comments used for classification and their corresponding

category..............................................................................................................................43

Figure 4-4 screenshot of Covenant University e-learning Moodle Homepage............44

Figure 4-5 screenshots of some comments posted by students on the discussion forum

...........................................................................................................................................45

ix

ABSTRACT

The quality of teaching and learning in any institution can be traced to the learning

experiences of student.

Traditional methods of evaluating student learning experiences have limitations such as

lack of flexibility, level of subjectivity and no measure to tell if respondents are being

truthful. However, students would share their worries, struggles and concern about their

learning experiences on informal channels such as Facebook, Twitter, discussion forum

etc.

Data available in such an environment is massive and requires automated means like text

mining techniques to provide important information on students’ experiences during their

learning process.

The aim of this research is to design and implement a forum-based student learning

experience analyzer using a text mining technique.

The system will help management or educators of Covenant University make decisions

that concern the performance of students.

x

1CHAPTER ONE: INTRODUCTION

1.1 BACKGROUND OF THE STUDY

Academic Ranking of World University (2015) shows that Harvard has held the top

position in the annual worldwide ranking of top universities since the list started. Other

institutions such as Stanford University, Princeton University, University of Cambridge,

Massachusetts Institute of Technology (MIT), University of California, to mention a few

have been ranked best because of an important attribute that they share which is, “the

quality of teaching and learning”.

“It was found that there is a significant correlation between the performance of students

and satisfaction with academic process and facilities provided by the institution”

(Karemera, 2003). The learning experiences of students during their course of study are

one of the pervasive sources of information about the quality of teaching and learning in

an institution. Feedback on the learning experiences of students is usually obtained from

formal method such as questionnaires. However, due to the need to express their selves,

students communicate their opinions on informal channels.

On various social media, students would usually share their worries, concerns,

excitement, happiness and struggle about their learning experiences (Pagare, 2014). In

particular, students express their selves on discussion forum. The volume of data

available in such environments is massive and requires automated means like text mining

techniques to provide valuable information on students’ experiences during their learning

process (Pagare, 2014).

“Text Mining is the process of discovering hidden and useful pattern from unstructured

text documents” (Ms. Priyanka Patel, 2015). Text mining is also known as knowledge

discovery in Text and some specific techniques for achieving this include, K-nearest

neighbour, Maximum Entropy, Neural Network, Decision tree, Support Vector Machine,

and Rocchio’s Naïve Bayes Multi-label Algorithm.

Employing text mining techniques to derive useful information from students’ informal

conversations on social media platforms would lead to a comparison graph that shows the

1

factors affecting the learning experiences of students offering a particular course as well

as recommended solutions to management of the institution on how to enhance the

quality of teaching and learning.

1.2 STATEMENT OF THE PROBLEM

Evaluation of student learning experiences is of interest to those who teach and are

accountable for the development and accreditation of courses.

Traditionally, methods such as surveys, focus groups, student evaluation of teaching

(SET) questionnaires have been used as instrument to evaluate the learning experiences

of students in other to understand the factors affecting their performance. However, these

methods possess the following limitations:

Flexibility: questionnaires are structured instrument and so, allow very little

flexibility.

Level of subjectivity: the opinions and feeling of respondent are most times not

acknowledged because the options in the questionnaire are pre-defined.

There is no way to tell if a respondent is being truthful.

From the above listed concerns, there is need to consider alternative methods for

evaluating student learning experiences based on massive user generated content

available on social media.

1.3 AIM AND OBJECTIVES OF THE STUDY

The aim of this study is to design and implement a forum-based student learning

experience analyzer using a Naïve Bayes classifier algorithm.

In order to attain the aim of designing a social media based student learning experience

analyzer, the following are objectives of this study:

To extract information that pertains to the educational life of students from an

informal electronic platform.

2

To preprocess extracted data in order to achieve relevant information needed for

the implementation process.

To model the system using UML diagrams.

To implement a student learning experience analyzer using a Naïve Bayes

classifier algorithm.

1.4 RESEARCH METHODOLOGY

Literature Reviews: Various articles, books, journals and research papers would

be studied. Review of existing project relevant to the project would also be

considered.

Data Collection: A discussion forum will be created on the Moodle platform (e-

learning management system, Covenant University) for students to post comment

about their learning experiences for a particular course.

Modeling: A simplified representation of the social media based student learning

experience analyzer would be done using Unified Modeling Language (UML)

diagrams.

Implementation: Considering the fact that the system is web-based,

implementation would be done using HTML, Scikit-Learn and python.

1.5 SIGNIFICANCE OF THE STUDY

This study is significant in solving the problem of improvement in the quality of teaching

and learning delivered to students and the performance of students in Covenant

University. The benefits of this study are:

To identify factors that affects the performance of students offering GST121.

To help management of Covenant University make informed decisions towards

improving quality of teaching and students’ performance.

To recommend solutions to the identified factors that affects student learning

experiences.

3

1.6 LIMITATION OF THE STUDY

This study is limited in scope to courses offered by freshmen of Covenant University.

1.7 ARRANGEMENT OF THESIS

This project report consists of five chapters.

Chapter 1: This chapter gives a detailed background study of the system, the aim and

objectives, the research methodology, the significance of study and limitations of the

study.

Chapter 2: This chapter contains extensive information on the project from existing

projects and reviews from journals and books.

Chapter 3: This chapter focuses on the system analysis and design. It contains all the

diagrammatic models that would help give a structure of the system, extensive

information about the classifier algorithm implemented and steps involved in data pre-

processing.

Chapter 4: This chapter contains detailed information of the implementation of the

system, such as the programming language used, screen shots and the dataset used for

system deployment.

Chapter 5: This chapter contains recommendations and concluding remark.

4

2CHAPTER TWO: LITERATURE REVIEW

2.1 INTRODUCTION

The quality of teaching and learning is a top priority for most institutions. Attaining such

goal is possible with the relevant information that pertains to the learning experiences of

students.

Social media sites are platform where people with similar interest share, upload and post

contents. Formal evaluation tools like questionnaires have several limitations, one of

which is lack of flexibility compared to an informal channel where students freely

express their selves.

Data on such channel is valuable to help make decisions on how to improve students’

achievement. To this end, there is need for a processing method or technique to evaluate

the learning experiences of students from such channel.

This chapter gives extensive information about key terms that are relevant to the research

study and existing systems in this field.

2.2 OVERVIEW OF SOCIAL MEDIA

The term “social media” is an internet-based platform that allows users to share and

create content, or join online communities. Classifications of social media include the

following:

Social Network Sites: these sites allow users to view a list of other users with

whom they share a connection. Examples of social network sites are; Facebook,

Linkedin, Friendster (Danah & Ellison, 2007)

Bookmarking Sites: are social software tools that allow users to submit, classify,

localise and share their bookmarked webpages to a hub site where they can be

tagged by other users. Bookmarking is the process that is used by people to

organise, arrange, maintain and reserve links on website pages. Examples of

bookmarking sites are; Delicious, Pinterest, Hacker news.

5

Social News Sites: are sites that provide users with quick access to a variety of

news articles. Articles and news from other websites are aggregated with the

social news sites which enable users to share content with other users as well as

participate with each other. Examples are; Digg, Slash dot, Newsvine, Mixx.

Media Sharing Sites: Media sharing sites provide services that allow users to

upload and share pictures and videos. Some of these services have social features

such as commenting, profiles, etc. Examples of media sharing sites are;

SlideShare, Youtube and Flickr.

Micro blogging: This involves publishing digital content such as text, pictures,

links, and videos in small pieces on the internet. “Micro blogging has become

common among groups of friends and professional co-workers who often update

content and follow each other’s posts thereby creating a sense of online

community. Popular examples include; Twitter and Tumblr” (Educase, 2009)

Forums: A forum is a section of a website that enables users to connect and

interact with each other by commenting in response to a published post.

2.3 EVOLUTION OF SOCIAL MEDIA

Websites that enabled users to share, create and upload content began to emerge in the

late 1990s. This was a result of the popularity of broadband internet. In 1997,

sixdegrees.com - the first social network site was generated.

From 2002 onward, a large number of social network sites were created. Some of which

include, Myspace, Friendster.

Recently, social media has gained extensive acceptance by social media site users. In July

2012, twitter had an estimate of 517 million users worldwide (Dewing, 2012).

2.4 OVERVIEW OF POPULAR SOCIAL MEDIA PLATFORMS

An overview of social media platforms is presented in Table 2-1.

Table 2.1 Popular social media platform

SOCIAL MEDIA LOGO DESCRIPTION

6

Facebook Facebook is a social networking channel that enables users to send messages to friends, upload videos and pictures and create profiles and groups,

Google+ Google+ is a networking site that has features such as personal profiles for uploading photos and videos, status updates as well as “communities” for sharing information with several people. It also has special features like “hangouts” for video chatting with one person or many people.

Twitter Twitter is a micro blogging service that enable users to read and send messages known as”tweets” to a number of followers.

Linkedin Linkedin is used for professional networking. The network numbers are called “connection”.

BlogsBlogs typically focus on specific subject and provide users with a comment area to discuss about each posting.

Pinterest Pinterest is a social media site that allows

users to share photos and manage photo

collections. Users can browse other pin

boards for images or “like” photos.

Youtube Youtube is a social media site that allows

users to share videos. Users can create

their own ‘channels’ on Youtube to

organize their videos.

7

Flickr Flickr is a social media site that allows

users to share and embed photographs. It

is also used by bloggers for hosting

images and videos.

Instagram Instagram is social networking sites that

enable users to upload photos and videos

as well as apply digital filters to them.

2.5 OVERVIEW OF TEXT MINING

Text Mining is an evolving field in computer science that is used to extract relevant

information from unstructured textual data through the identification and study of

patterns.

“The phrase ‘text mining’ refers to any system that analyses a huge amount of text and

detects linguistic usage patterns in order to extract useful information” (Sebastiana,

2002).

According to Chen (2001), text mining performs various search functions, categorization

and linguistic analysis. Text Mining can simply be defined as the process involved in

analysing text to obtain information that is useful for a specific goal.

2.5.1 Text Mining and Data Mining

Data mining involves identification of patterns in data while text mining is involves

identification of patterns in text. Data mining is the extraction of useful information from

data (Witten and Frank, 2000).

“Text Mining as exploratory data analysis is a method of (building and) using software

systems to support researchers in deriving new and relevant information from large text

collection. It is a partially automated process in which the researcher is still involved,

interacting with the system. The interaction is a cycle based on the system assumptions,

8

and the user either utilizes or ignores those suggestions and decides on the next move”

(Hearst, 1999).

Data Mining is a phase in knowledge Discovery from Data (KDD). Knowledge

Discovery from Data is concerned with the acquisition of useful knowledge from data.

“Data mining requires interaction between the data mining tools and the researcher and

so, may be considered as a computerized process because data mining tools automatically

search the data for anomalies thereby identifying problems that have not yet been clearly

stated by the end user, while data analysis ‘relies on the end users to select the data,

define the problem and instigate the appropriate data analysis to produce the information

that which helps to solve problems that they uncovered’” (Rob and Coronel, 2002).

2.5.2 Areas of Text Mining

Text mining incorporates seven practice areas; information retrieval, web mining,

document clustering, document classification, information extraction, concept extraction,

natural language processing.

Figure 2.1 Seven practice areas of text mining

9

2.5.2.1 Information Retrieval

Information retrieval (IR) is the process of finding materials (usually document) of an

unstructured nature (usually text) that satisfies information need from within large

collection (usually stored as computer).

Information retrieval is quick turning into the overwhelming type of information access,

surpassing conventional database style searching. Information retrieval is considered as

an argumentation to document retrieval where documents are processed to consolidate or

extract specific information requested by the user. An IR system allows us to reduce the

set of documents that are relevant to a particular problem. The most recognised

information retrieval systems are web search tools such as Google. IR can accelerate the

analysis meaningfully by decreasing the number of documents for an analysis.

2.5.2.2 Document Clustering

Clustering is the breakdown of data into clusters (groups of similar objects). Each cluster

consists of objects that have similar attributes, which makes it different from objects of

other groups. The aim of a good document clustering pattern is to reduce the distances

between documents while increasing inter-cluster distances by using an appropriate

distance measure between documents.

Clustering is a form of unsupervised learning and this is the main difference between

clustering and classification (supervised learning). “Unsupervised” means that documents

have not been assigned to classes by a human expert. Compared to classification where

the classifier learns the association between objects and classes from a train set i.e. a set

of data labelled manually, and then imitates the learnt behaviour on a test data i.e. a set of

unlabelled data, in “clustering”, the nature of data determines the cluster membership,

(Jajoo, 2008).

a) Applications of Clustering

Clustering has several applications in the fields of business and science.

10

1) Finding Similar Documents: Clustering enables the discovery of documents that

are conceptually alike in contrast to a search-based approach (where search result

is based on whether documents share many of the same words). This feature is

often

2) Organizing Large Document Collections: Document retrieval emphasizes on

acquiring documents relevant to a specific query, but it fails to make meaning of a

large number of unclassified documents. The solution is to manage these

documents in a category-based form such that it will be identical to the manual

arrangement done by humans given ample time.

3) Duplicate Content Detection: Clustering could be applied to find duplicates

within a collection of documents. Clustering is used for grouping of related

articles, plagiarism detection, and to rearrange the ranking of search results which

is to ensure diversity among the top documents.

4) Recommendation System: Clustering enables the recommendation of articles for

users based on previous articles read.

5) Search Optimization: Clustering helps in refining the quality of search engines

by comparing user query directly to the clusters and not the documents.

2.5.2.3 Document Classification

“Document classification is the allocation of natural language documents to defined

categories based on their attributes” (Sebastiana, 2002). It is a form of supervised

learning where the categories for each training document are known in advance.

Automatic text classification has various applications such as; automatic extract of

metadata, indexing for document retrieval, and maintaining large collection of web

resources. In the 1990s, document classification was dominated by “knowledge

engineering” techniques that sought to extract categorization rules from human experts

and then, code the rules into a system which would enable an automatic classification of

new documents. Since then, the major approach has been to use machine learning

techniques to infer categories automatically from a training set of documents.

11

Machine learning techniques such as decision tree and association rules have been used

for text or document classification.

2.5.2.4 Web Mining

Web mining is an area in text mining that involves a large volume of data on the web.

Most documents on the web have structured text format as well as hyperlinks between

texts. With the growth of social media channels and the internet, the value of web mining

will continue to increase.

Although web mining is still an emerging area in computer science, it makes use of

advanced technology in natural language processing and document classification.

2.5.2.5 Information Extraction

The goal of information extraction is to identify occurrences of specific defined class of

entities, relationships and events in natural language text by extracting relevant attributes

of the entities, relationships or events.

The information to be extracted is specified in a user-defined format called templates

which are directed to information extraction system for text processing.

The goal of information extraction is to construct structured data from unstructured data.

2.5.2.6 Natural Language Processing (NLP)

NLP is a field in computer science that concerns the development of systems that enables

communication between humans and computers using natural language. NLP is also

referred to as computational linguistics. “Effective communication” is the goal of

processing natural language. Some NLP applications include: Machine translation,

Spelling and grammar checking, Optical Character recognition (OCR)

12

2.5.2.7 Concept extraction

Concept extraction is an aspect of text mining that involves the extraction of concepts

from artifacts.

2.6 TEXT MINING TECHNIQUES

There are several techniques used in processing or mining text. Some of these text mining

techniques include Naïve Bayes, Decision tree, K-nearest neighbour and Neural network.

2.6.1 Naïve Bayes

Bayesian classification is a type of supervised learning with a statistical approach for

classification that presumes a fundamental probabilistic model. Naïve Bayes is a text

categorization method with several applications in language discovery, sentiment

discovery, document categorization and email spam exposure.

“Naïve Bayes approach to text classification is based on calculating the posterior

probability of the documents present in the different classes” (Ms. Priyanka Patel, 2015).

There are two phases involved in classifying text using naïve Bayes. The first phase is

training a set of data and the second phase is the classification phase.

Classification using Naïve Bayes classifier is achieved by computing likelihood and prior

probability to form posterior probability.

Prior probability: prior probability is based on previous experience. It is the probability

that an observation will fall into a group before you collect the data.

Posterior probability: It is the probability of assigning observations to groups given the

data.

Bayesian classifier is based on Bayes theorem, which is P(Cj | d) = p(d|Cj) p(Cj) / P(d)

Where,

P(Cj | d is the probability of instance d being in class Cj,

13

p(d|Cj) is the probability of generating instance d given class Cj,

P(Cj) is the probability of occurrence of class Cj,

P(d) is the probability of instance d occurring.

2.6.1.1 Illustration

An example of the process involved in classifying a text document is given, following the

steps below.

Table 2.2 Train Dataset from Document

Doc No. Text Category

1. I Loved the movie +

2. I hated the movie -

3. A great movie. A good movie +

4. Poor acting -

5. Great acting. A good movie +

STEP ONE: Create a frequency table for documents in the positive category

Table 2.3 Frequency table for positive category

I Loved the movie hated a great poor acting good

1. 1 1 1 1

3. 1 1 1 1

5. 1 1 1 1 1

14

STEP TWO: Create a frequency table for documents in the negative category

Table 2.4 Frequency table for negative category

I Loved the movie hated a great poor acting good

2. 1 1 1 1

4. 1 1

STEP THREE: Compute the posterior probability of positive outcome and negative

outcome for Vj

Vj = I hate this poor acting

If Vj = positive, P(I|+) * P(Hate|+) * P(the|+) * (Poor|+) * P(acting|+)

Vj = 6.03 X 10^-7

If Vj = negative, P(I|+) * P(Hate|+) * P(the|+) * (Poor|+) * P(acting|+)

Vj = 1.22 X 10^-5

CONCLUSION: “I hate the poor acting” falls under the negative category because the

computed posterior probability of negative outcome is greater than posterior probability

of positive outcome.

2.6.1.2 Strengths of Naïve Bayes

It is simple to implement.

It is easy to train.

2.6.1.3 Weaknesses of Naïve Bayes

It has a strong feature independence assumption

15

2.6.2 Support Vector Machine

SVM was first introduced in 1992 by Vapnik, Boser and Guyon. SVM is related to

statistical learning theory. According to the figure below, training set can either be linear

separable or non-linearly separable.

Figure 2.2 Linearly separable

Figure 2.3 Non-linearly separable

NB: The challenge of training set that are not linearly separable is solved by transforming

original data to map into new space using a Kernel Function.

For the function, f(x) = Wt X + b

W is the normal to the line and is known as the weight vector

b is the bias

16

dd

d

F(x) < 0 F(x) > 0

F(x) = 0

Since w^t X + b = 0 and c(w^t X + b) = 0 define the same plane, the normalization for w

can be freely chosen. Normalization is chosen such that w^t X + b = +1 and w^t X + b = -

1 for the positive and negative support vectors respectively.

Then, the margin is given by

w . (x+ - x-) = w^t (x+ - x-) = 2

“Support vectors are the data points that lie close to the decision margin. They are the

essential elements of every training set.” (Berwick, 2003)

17

Decision boundary or

separating plane

||w|| ||w|| ||w||

SVM increases the margin around the separating decision boundary. Finding the optimal

hyper plane is an optimization problem that can be solved using optimization techniques

such as Lagrange multipliers (Zisserman, 2015).

2.6.2.1 Strengths of SVM

It is easy to train

It scales relatively well to high dimensional data

Tradeoff between error and classifier complexity can be controlled explicitly.

2.6.2.2 Weaknesses of SVM

Difficulty in choosing a “good” kernel function.

2.6.3 K-Nearest Neighbour

Using KNN classifier, an object is being assigned to the class most common amongst its

k nearest neighbours. K is a positive integer. If k=1, then the object is assigned to the

class of its nearest neighbour.

In binary classification problems, it is necessary to choose k to be an odd number in other

to avoid a tie vote.

Figure 2.4 Graphical representation of an unknown class and its neighbours

18

X

If k = 5, then from the figure above, x will be classified as a circle because three of its

nearest neighbours are classified as circle.

Distance Metrics: To make predictions using KNN, there is need to derive a metric for

measuring the distance between the cases and query point. Examples of distance

functions used are: Mahnattan distance and Euclidean distance.

2.6.3.1 Strengths of KNN

The learning process is less expensive

There are no assumptions about the characteristics of the concepts to learn

The Complex concepts can be learned by local approximation

2.6.3.2 Weaknesses of KNN

The model cannot be interpreted

It is computationally expensive to find the k nearest neighbors when the dataset is

large

Performance depends on the number of dimensions

2.6.4 Decision Tree

Decision tree is a classification tree that generates a tree and a set of rules which

represents the model of different classes, from a given dataset. According to Han and

Kamber (2001), decision tree is a flow-like tree structure, where each internal node

represents a test on an attribute and each branch represents the classes. In a decision tree,

the peak node is also referred to as the root node. The rules corresponding to a tree are

derived by traversing each leaf of the tree starting from the node.

There are two phases involved in developing a decision tree.

Tree building phase: This phase involves the partitioning of training data

repeatedly until all the objects in each partition belong to a class.

19

Tree pruning phase: This phase involves the removal of variation or statistical

noise particular to a training set.

ID3 Algorithm Iterative dichotomizer 3 (ID3) was introduced by Quinlan for creating

decision trees from data. In ID3, the nodes represent a splitting attribute while the

branches are a possible value for each attribute. At each node, the splitting attribute is

chosen to be the most useful among the attributes not yet considered in the path from the

root. ID3 algorithm uses the value of information gain to determine the effectiveness of a

split. The attribute with the highest information gain is selected as the splitting attribute

and then, the dataset is split for all discrete values of the attribute.

Illustration of classification decision tree

The dataset for text classification using decision tree is presented in Table 2-5.

Table 2.5 Weather dataset

ID OUTLOOK TEMPERATURE

HUMIDITY WIND PLAY

1 Sunny Hot High Weak No2 Sunny Hot High Strong No3 Overcast Hot High Weak Yes4 Rain Mild High Weak Yes5 Rain Cool Normal Weak Yes6 Rain Cool Normal Strong No7 Overcast Cool Normal Strong Yes8 Sunny Mild High Weak No9 Sunny Cool Normal Weak Yes10 Rain Mild Normal Weak Yes11 Sunny Mild Normal Strong Yes12 Overcast Mild High Strong Yes13 overcast Hot Normal Weak Yes14 rain Mild High strong No

The condition attributes in the dataset are, outlook, temperature, humidity and wind. The

decision attributes are to play or not to play. {Sunny, overcast, rain}, {hot, mild, cool},

20

{high, normal} and {weak, strong} are the values of the attributes outlook, temperature,

humidity and wind respectively.

Entropy provides an information-theoretic approach to measure the effectiveness of a

split. It measures the amount of information in an attribute.

Entropy (s) = -p(P) log2 p(P) – p(N) log2 p(N)

For the illustration above, entropy (s) = - (9/14) * log2(9/14) – (5/14)*log2(5/14) = 0.940

Gain(S,wind) = Entropy (S) – (8/14) * Entropy (S weak) – (6/14) * Entropy (S strong)

= 0.940 – (8/14) (0.811) – (6/14) * 1.0 = 0.048

Entropy (S weak) = - (6/8) * log2 (6/8) – 2/8 * log2 (2/8) = 0.811

Entropy (S strong) = - (3/6) * log2 (3/6) * log2 (3/6) * log2 (3/6) = 1.0

Similarly, Gain (S, outlook) = 0.246

Gain (S temperature) = 0.029

Gain (S humidity) = 0.151

Since the “outlook” attribute has three values, the root node will have three branches

(sunny, overcast, rain). The next question is “what attribute should be tested at the sunny

branch node?” Since outlook has been used as the root node, the decision of the next root

node lies on the remaining three attribute: humidity, temperature or wind.

Gain (S sunny, humidity) = 0.970

Gain (S sunny, temperature) = 0.570

Gain (S sunny, wind) = 0.019

Humidity has the highest gain; therefore it is used as the next decision node. This process

continues until all data in the dataset is classified.

21

Figure 2.5 representation of a constructed decision tree

The corresponding rules are:

If outlook = sunny and humidity = high then play = no

If outlook = sunny and humidity = normal then play = yes

If outlook = overcast then play = yes

If outlook = overcast and wind = weak then play = yes

If outlook = overcast and wind = strong then play = no

2.6.4.1 Strengths of decision tree

It generates logical rules

Less computation is required for classification

It processes continuous and categorical variables

2.6.4.2 Weaknesses of decision tree

It is not suitable for prediction of continuous attribute

It performs inadequately given many classes and small data

It is computationally expensive to generate a decision tree

22

2.6.5 Neural Network

Neural network is a branch in artificial intelligence. It is referred to as artificial neural

networks (ANNs). Rather than programming systems to execute certain task, the systems

learn to perform these tasks using ANN by generating an artificial intelligence system

(AIS). Artificial intelligence system (AIS) is a logical model that can precisely find

hidden patterns in data.

An artificial neural network is made up of several artificial neurons that are linked to

form network architecture. The aim of a neural network is to convert the input into

significant output (Sasithra & saravanan, 2014). The teaching node can be unsupervised

or supervised. Application areas of ANN are: Bankruptcy prediction, speech recognition

and fault detection.

Figure 2.6 artificial neural network (adapted from Sasithra & saravanan, 2014)

A neural network is trained using a back propagation algorithm. Gradient descent method

(GDM) is used to minimize the mean squared error between network output and the

actual error rate. The parameters considered to measure the efficiency of the network are:

23

no. of epochs taken for convergence of the network, calculated mean square error and

rate of convergence.

Table 2.6 comparison of various classification methods based on artificial neural network (adapted from Sasithra & saravanan, 2014)

24

2.6.5.1 Strengths of Neural Network

Appropriate results are displayed for complex domains

It is suitable for both continuous and discrete data

2.6.5.2 Weaknesses of neural network

It is usually difficult for users to interpret learned result

Training is relatively slow

2.7 REVIEW OF EXISTING SYSTEM

A few existing and related projects where reviewed during the course of this research

project.

25

2.7.1 Mining student data to analyse learning behaviour

The research project was conducted by Alaa El-Halees of the department of Computer

Science, Islamic University of Gaza. The study involved four phases from which

knowledge was extracted to describe students’ behaviour.

A data mining technique is used to determine association rules which are sorted

using lift metrics and then, represented graphically.

Classification rules were discovered using decision tree.

The students were clustered into groups using EM-clustering

An outlier analysis was used to detect all outliers in the data.

Figure 2.7 Association rules graph for students with grade “fail” using Arviewer

2.7.2 Mining social media data to understand student learning experience

The study explores a social media (twitter) with the aim of understanding the learning

experience of engineering students in Purdue University, United States of America.

26

Using an inductive content analysis, it was discovered that engineering students in

Purdue University are struggling with heavy study load which leads to several outcomes

such as: sleep problem, lack of social engagement and other physical and psychological

health problems.

2,785 tweets with #engineeringproblems were related to the educational life of

engineering students in Purdue University and five categories for classification were

identified. 70 percent of the 2,785 tweets were used for training (1,950 tweets) and 30

percent for testing (835 tweets).

Figure 2.8 number of tweets for each issue detected from the Purdue tweet collection

2.7.3 Mining educational data to analyse student’ performance

The main objective of the research project was to use decision tree (a data mining

technique) to understand students’ performance. A decision tree is a tree in which each

branch node represents a choice between several alternatives, and each node represents a

decision.

27

Information such as class test, class attendance and seminar or assignment marks were

collected from the student management system, to predict students’ performance at the

end of the semester.

28

3CHAPTER THREE: SYSTEM MODELING AND DESIGN

3.1 INTRODUCTION

This chapter focuses on an overview of naïve Bayes classifier algorithm, steps involved

in pre-processing data as well as the diagrammatic models used to design the system

architecture.

3.2 SYSTEM REQUIREMENT

The system requirement defines the system’s operational constraints and functions in

detail.

System requirement is divided into; Functional requirements and Non-Functional

requirements.

The functional requirements include the following:

The system allows users to view analysis of the factors affecting the learning

experience of student for a particular course in a graphical form.

The system allows users to view recommendations for a particular course.

The Non-functional requirements include the following:

The graphical user interface should be simple and user-friendly

The system should be efficient, that is, it performs task in limited task and with

limited computer resources

The system should be reliable.

The system should analyse or classify data accurately.

3.3 SYSTEM DESIGN

System modelling is an act of representing the features of a system using graphical

notation. System modelling helps an analyst to understand fully, the functionality of the

29

system and also, are used to communicate effectively with customers (Sommervile,

2007).

3.3.1 Unified modelling language

Unified modelling language is an industry standard graphical notation for defining

software analysis and designs.

Types of UML Diagrams

1. Activity diagrams: is used to demonstrate the activities that form a process.

2. Use-case diagrams: is used to demonstrate the interactions that exist between a

system and its environment.

3. Sequence diagrams: is an interaction diagram that shows how processes operate

with one another and in what order.

4. Class diagrams: is the building block of object-oriented programming. It is used

both for general conceptual modeling of the systematics of the application.

5. State diagrams: describes the behavior of a single object in response to a series

of event in a system.

For this project, the system will be modelled using use case diagram and activity

diagram.

3.3.1.1 Use Case Diagram

A use case diagram is a graphical representation of the relationships that exist between

use cases and actors. Use cases are developed at requirements elicitation stage of

software engineering and are further developed as they are reviewed by stakeholders

during analysis. “A use case is a typical representation of a major piece of complete

functionality” (Bernd Bruegge, 2000).

Use case is represented as an ellipse. It is a unique name (usually a present-tense

verb phrase) expressed in an active voice.

30

Actor represents a human or computer that interacts with the system. In UML, an

actor is represented by arrows and lines.

Relationships between use cases are represented by arrows and lines. The default

relationship that exists between a use case and an actor is the <<communicates>>

relationship represented by a line.

Figure 3.9 use case diagram of showing the action performed by management or educators.

3.3.1.2 Activity Diagram

Activity diagrams show a breakdown in the complex flow of a use case. UML activity

diagrams are an enhanced form of flowcharts. “An activity is a step that needs to be

executed, whether by a computer or by a human” (Fowler, 2000).

Furthermore, activity diagrams allow for parallelism (that is, they can operate alternately,

simultaneously or consecutively) when the order of activities is not necessary.

31

View Analysis

Management ///actoreducato

Figure 3.10 Activity diagram showing the flow of activities involved in analysing data

3.4 OVERVIEW OF NAÏVE BAYES CLASSIFIER

Naïve Bayes classifiers are known to be efficient. The probabilistic model of Bayes’

theorem originates from the postulation that the attributes in the dataset are mutually

independent.

Naïve Bayes have been applied in various fields because of its strengths, some of which

are; easy to implement and relatively robust. Some applications of naïve Bayes includes:

diagnosis of diseases, spam filtering, to mention a few. Moreover, the nature of problem

to be solved is a determinant of the classification model that will be used.

3.4.1 Posterior Probability

Posterior probability can be understood as: “what is the probability that a particular

object or an entity belongs to class I given its observed feature values?” An actual

example would be “what is the probability that a boy has diabetes given a certain value

for a pre-breakfast blood glucose measurement and a certain value for a post-breakfast

blood glucose measurement? “

32

P (diabetes | xi), xi = [90mg/dl, 145mg/dl]

Let

Xi be the feature vector of sample i, i ϵ {1, 2… n}

ωj be the notation of class j, j ϵ {1,2,…., m}

p (xi | ωj) be the probability of observing sample xi given that it belongs to class ωj.

The objective function in the naïve Bayes probability is to maximize the posterior

probability, given the training data in order to formulate the decision rule.

3.4.2 Class-Conditional Probabilities

The class-conditional probabilities (likelihoods) can be derived by directly estimating the

training. Thus, given a d-dimensional feature vector x, the class conditional probability

can be calculated as follows:

Here, P(xi∨ω j) means “how likely is ti to observe this particular pattern x given that it

belongs to class ωj?” For every feature vector, the individual likelihood can be derived

from the maximum-likelihood estimate, that is, a frequency in the case of categorical

data.

3.4.3 Prior Probabilities

The prior probabilities describe possibility of knowing a particular class.”

If there is a uniform distribution among the priors, the posterior probabilities will be

determined by the evidence term and the class-conditional probabilities. But if the

33

Nxi , ωj : Number of times feature xi appears in samples from class ωj.

Nωj : Total count of all features in class ω

evidence term is a constant, then, decision rule will be dependent on the class-conditional

probabilities.

3.4.4 Multi-Variate Bernoulli Naïve Bayes

The multivariate Bernoulli model is based on data in binary form. Each token in the

feature vector of a document is categorized as either a value of 1 or 0. The feature vector

is known to have m dimensions, where m is the number of words in a vocabulary. The

value “1” implies the occurrence of a word in the document while the value “0” implies

the no-occurrence of a word in the text document d.

3.4.5 Multinomial Naïve Bayes

3.4.5.1 Term Frequency

It is sometimes referred to as raw frequency. Term frequency is another method that can

be used to categorize document rather than using binary values – in the case of

multivariate Bernoulli naïve Bayes.

Basically, term frequency is the amount of times a given token t occurs in a document d.

Maximum-likelihood estimate can be derived from term frequency by training data to

evaluate the class-conditional probabilities in the multinomial model.

34

P(x ∣ωj)=∏i=1 mP (xi ∣ωj)b ⋅(1−P(xi ∣ωj))(1−b)(b∈0,1)

Let P(xi ∣ωj) be the maximum-likelihood estimate that a particular word or token xi occurs in class ωj.

P(xi ∣ωj)=dfxi , y+1 dfy+2

Where

dfxi , y Represents the number of documents in the training dataset that contain the feature xi and belongs

to the class ωj .

Dfy Represents the number of documents in the training dataset that belong to the class ωj .

3.4.5.2 Term Frequency – Inverse Document Frequency (TF-IDF)

Another method for categorizing document is TF-IDF. It can also be referred to as a

weighted term frequency and is useful to remove stop words from text document. This

approach presumes that the significance of a word is proportional to the number of times

that particular word occurs across a text document.

Aside been used to rank documents by relevance, TF-IDF can be applied to text

classification using any of the text classification technique such as Naïve Bayes.

3.4.6 Performance of Multi-Variate Bernoulli and Multinomial Model

Based on empirical comparisons, there is evidence that the multinomial model tends to

perform better than multivariate Bernoulli when the vocabulary size is relatively large.

However, the performance of machine learning algorithms is highly dependent on the

appropriate choice of feature.

In practice, it is advised that the choice between multivariate and multinomial model for

text classification should follow comparative studies which includes different

combinations of selection and feature extraction process.

3.4.7 Continuous Variables

Naive Bayes can be used on continuous data although it is known as a conventional

instance of categorical data. An identified strategy for achieving using naïve Bayes

classification would be

to use a Gaussian kernel to calculate the class-conditional probabilities. The Gaussian

naïve Bayes model can be written as:

35

P (xik∣ω) = 12πσω2exp − (xik−μω) 22σω2

Where

μ (sample mean) and σ(the standard deviation) are the parameters that are to be estimated from

the training data. Under the naïve Bayes assumption of conditional independence, the class-

conditional probability can then be computed as the product of the individual probabilities:

3.4.8 Eager and Lazy Learning Algorithms

Eager learners are machine learning algorithms that are able to learn from a model of

training dataset the moment data is available. Once the model is learned, the training data

will then be re-evaluated in order to make a new prediction. Naïve Bayes classifier is an

example of an eager learner algorithm because they are relatively fast in classifying new

instances.

Lazy learners on the other hand, predict the class label of new instances by memorizing

and re-evaluating the training dataset. The advantage of lazy learning algorithm is that the

training phrase is relatively fast. Nonetheless, the actual prediction is slower compared to

the eager learners as a result of the re-evaluation of training data. An example of a lazy

learner is k-nearest neighbour algorithm.

3.4.9 The Bag of Word Model

Good features are measured by:

Salient: Features must be relevant with respect to the problem domain.

Discriminatory: Features should contain adequate information to ensure accuracy in

distinguishing between patterns when used to train the classifier.

Prior to using machine learning algorithm for training, there is need to represent a text

document as a feature vector. Hence, the Bag of Word (BOW) model is used for such

processing.

“The BOW representation of a document D enables the transformed dataset to be viewed

as a matrix, where vectors are represented as rows and terms are represented as columns.

This view enables the application of various matrix decomposition techniques to

clustering and dimensionality reduction. Moreover, documents can be compared using

classical distance/similarity measures since they are treated as vectors.” (Milos &

Mirjana, 2008).

36

3.5 TEXT PRE-PROCESSING

3.5.1 Tokenization

The process ‘tokenization’ involves the breaking down of a document into singular token

that is used as input for several natural language processing (NLP) algorithms.

Tokenization is usually followed by certain processing steps, such as stemming,

lemmatizing, stop-word removal and the construction of n-grams.

3.5.2 Stop Words

These are characters or words that are common in text document and hence, are not

relevant for training a dataset. Examples of such words are: or, so, to, and, the etc. Stop

word removal can be done by using stop word dictionary to search for stop words.

Another method to remove stop words in a text document is to create a stop list by

sorting every word based on frequency. Thereafter, the stop list is used to remove stop

words that are ranked among the top word n words in the stop list.

3.5.3 Stemming and Lemmatizing

Stemming is the process of deriving the root form of a word. Porter Stemmer - a

stemming algorithm - was developed by Martin F. Porter in 1979. For example, the

words, machinery, mechanism and mechanizing can be stemmed to machine.

Lemmatization is computationally more expensive and difficult than stemming.

However, there is little influence of stemming and lemmatization on the performance of

text classification.

3.5.4 N-grams

Using the n-gram model, a token is defined as a sequence of n items. The most preferred

n-gram model is the unigram (1-gram) where each word consists of one letter or symbol.

The choice of the number n depends on two factors: the language and where it would be

applied.

37

3.6 WORKFLOW OF THE PROJECT

Figure 3.11 workflow of social media data integrated with qualitative analysis and data mining algorithm

From figure 3.3, width of the blue arrows represents data volumes - wider indicates more

data volume. Black arrows represent data analysis, computation and results flow. The

figure 3.3 is a model that represents the seven steps involved in this project.

Step one – Data Gathering

With the consent of the lecturer for Communications in English II (GST121), a

discussion forum was created on the e-learning management system (URL –

[email protected]) by the CSIS department, Covenant University.

According to Martin (2009), discussion is a significant dimension of a learning process.

On the 10th March 2016, the discussion forum was open for students offering GST121 to

post comments about their learning experiences. A total of 118 messages were gathered

from the discussion forum for the study.

38

mailto:[email protected]

Step two – Data Sampling

After data was gathered and stored, specific comments related to the learning experiences

of students offering GST121 were used for qualitative analysis.

Step three – Qualitative Data Analysis

A qualitative content analysis also known as inductive content analysis was conducted on

the samples. It is a qualitative research method that is used to manually analyse text

content and generate pre-defined categories of data.

According to Rost et al, in large-scale social media data analysis, faulty assumptions are

likely to occur if machine learning algorithms are used without carrying out a qualitative

survey on data. “We concur with this argument as it is found that no appropriate

unsupervised algorithm could reveal in-depth meanings to the data” (Chen, Vorvoreanu,

& Madhavan, 2014).

Step four – Qualitative Result

The major problems affecting the learning experiences of students offering GST121 was

classified under several categories.

Step five – Model Adaption

Based on the categories from step four, a multi-label naïve Bayes classification algorithm

was implemented. he classification algorithm is used to train the analyzer that assists in

classifying the learning experiences of students offering GST121.

Step six – Data Analysis Result

The result would help educators make informed decisions to the identify factors that

affect students’ learning experiences.

39

4CHAPTER FOUR: SYSTEM IMPLEMENTATION AND EVALUATION

4.1 INTRODUCTION

This chapter provides an overview on the choice programming languages, software and

hardware requirements, and the different interfaces which were implemented.

4.2 SYSTEM REQUIREMENTS

System requirements are descriptions of a system functionalities and operational

constraint. Requirements may range from high level abstract statements of the services

provided by a system and the operational constraint to a detailed mathematical functional

specification. There are two types of system requirements, they are:

Hardware Requirements

Software Requirements

4.2.1 Hardware Requirements

This part of the system requirements is concerned with the physical components of a

computer that is needed for an effective and efficient operation of the application. The

hardware requirement is sub-divided into:

Server Requirement

Client Requirement

Table 4.7 server side hardware requirements

REQUIREMENT HARDWARE

Processor Intel Core i5 2.0Ghz and higherPrimary Memory (RAM) 6GB of RAM or higherSecondary Memory 10GB hard disk space of higherArchitecture X64 (64 Bit)

Table 4.8 Client side hardware requirements

40

REQUIREMENTS HARDWARE

Processor Intel Pentium III 1.2Ghz and higherPrimary Memory (RAM) 1 GB of RAM or higherSecondary Memory 3 GB hard disk space and higher

4.2.2 Software Requirements

Software requirement is concerned with defining software resource requirements and

prerequisites applications that need to be installed on a computer for an application to

operate optimally. However, this can be classified into:

Development Software Requirements: These are tools and software that are

needed for the successful development and deployment of the application.

Client Software Requirements: These are necessary software needed to run the

application.

Table 4.9 Development Software Requirements

REQUIREMENTS SOFTWAREOperating system Microsoft Windows 8.0, 8.1 and 10.0Programming Languages Python, HTML, CSS, JavaScriptCore Python Packages Flask, Scikit-Learn, PandasDevelopment tool PyCharm IDEWeb Server Flask Development Server

Table 4.10 Web Client Software Requirements

REQUIREMENTS SOFTWAREOperating System Microsoft Windows 7, 8.0, 8.1, 10Internet Browser Google chrome, Mozilla Firefox, Opera

4.3 IMPLEMENTATION TOOLS

The tools used to implement the application include:

1. Python: Python is a high-level, interactive and interpreted language. Other

features of python include; easy-to-learn, interactive mode, scalability, easy-to-

maintain. Python was the choice programming language due to its standard library

and wide array of packages available for machine learning.

41

2. Hyper Text Mark-up Language (HTML): HTML is a mark-up language that is

used to create user interfaces for mobile application as well as webpages. HTML

is used together with CSS and JavaScript.

3. Python Machine Learning Packages: Packages are namespaces which contain

multiple packages and modules themselves, the machine learning, numerical &

scientific packages used in this project research includes:

SciKit-Learn: It is an open source machine learning library for Python

programming language which features several classification algorithms.

Pandas: It is a data analysis tools for the Python programming language.

It is open source and easy to use.

Scipy: It is an open source Python library used by engineers, analysts and

scientists for scientific or technical computing.

4 Flask: Flask is a micro web framework written in python and is based on the

Werkbeug toolkit and Jinja2 template engine.

5 PyCharam: It is a Python language Integrated Development Environment (IDE).

4.4 SYSTEM MODULES AND INTERFACES

The section describes the various modules and interfaces of the application.

4.4.1 Home Page

This is the page displayed after the program is run. It has a “view analysis” button which

the user clicks to view a graphical representation of the learning experience of students

offering GST121.

42

Figure 4.12 homepage of application

4.4.2 Result Page

The result is a bar chart representation of the analysis and a table showing the test dataset

(responses and categories) used for classification.

Figure 4.13 a bar chart showing the category ratio of the learning experience of students offering GST121

43

4.4.3 Result and Interpretation

From the inductive content analysis stage, a total of 114 posted comments on the learning

experiences of students offering GST121 were gathered with three categories – relevance

of the course, unfavourable environment and lecture quality. 70 percent of the 114

comments were used for training (89 comments) and 30 percent for testing.

4.4.4 Recommendation

Based on the classification result, the following are recommendations for the identified

factors affecting the learning experiences of students offering GST121:

1) The air conditioner should be switched on few minutes before commencement of lecture.

2) The lecturers should engage students by asking questions often during the lecture.3) Attendance for GST121 could be automated because of the population of students

offering the course or if still manual, should be coordinated properly by the lectures.

4) Lecturers should also introduce videos as a way of improving lecture quality.5) Lecturers should also ensure mastery of slides as most students commented that

the lecturers always read from the presentation slides consistently.6) The duration for the GST121 examination should also be reconsidered.

Figure 4.14 cross-section of comments used for classification and their corresponding category

44

4.5 DATA GATHERING PROCESS

For the study, a discussion forum (which is a social media category) on the Covenant

University e-learning platform was created for freshmen, Covenant University to express

their worries and concern about GST121. Thereafter, data was collected from the forum

for pre-processing and classification.

Figure 4.15 screenshot of Covenant University e-learning Moodle Homepage

45

Figure 4.16 screenshots of some comments posted by students on the discussion forum

4.5.1 Train Dataset

46

CHAPTER FIVE

5SUMMARY, RECOMMENDATION AND CONCLUSION

This chapter gives a summary of this work. A number of recommendations for future

works in this area were made and consequently the conclusion.

5.1 SUMMARY

This project represents a forum-based student learning experience analyzer using a text

mining technique. It has a graphic user interface with a “view analysis” button which

enables users to view the graphical representation of students’ learning experiences. The

system is a model of the use of various machines learning algorithm for the classification

of unstructured text and designed using UML diagrams. The forum-based student

learning experience analyzer provides users with recommendations to aid decision

making.

5.2 RECOMMENDATION

Recommendations for further improvement of this project include:

Evaluation of the learning experiences of students across different academic

level

The scope of this project was limited to freshmen of Covenant University offering

Communications in English (GST121) because of time constraint. I would

propose for further study that courses offered by students across different

academic level be considered.

Introduction of other text mining classification algorithms

For this project, Gaussian naïve Bayes was used as the classification algorithm. I

would suggest for further work that other text mining classification algorithms

such as Support vector system, K-nearest neighbour, Rocchio’s and decision tree

be used to classify the learning experiences of students. This will act as a measure

to check accuracy among the classifier algorithms.

51

5.3 CONCLUSION

The growth of digital technology has influenced the rate at which students share, update

and post comments about their learning experiences on social media. Evaluating the

learning experiences of students from data available on an informal platform is useful to

understand the worries, concern and struggle about students’ learning process. The

volume of data in such an environment can only be valuable if processed using an

effective automated process such as a text mining technique.

52

REFERENCES

Retrieved January 21, 2016, from ABBYY FireReader Engine 11:

http://www.abbyy.com/ocr-sdk/

Bejar, J. (2013). K-nearest neighbours.

Bernd Bruegge, A. D. (2000). Object-oriented software engineering. Germeny.

Berwick, R. (2003). An Idoit's guide to Support Vector Machines (SVMs).

Chen, X., Vorvoreanu, M., & Madhavan, k. (2014). Mining Social Media Data for

Understanding Students' Learning Experiences. IEEE transactions onLlearning

Technologies, 246-259.

Classification using Decision Trees. (n.d.).

Danah, B. M., & Ellison, N. B. (2007). Social Network Sites: Definition, history and

scholarship.

Dewing, M. (2012). Social Media: an Introduction. Canada: Library of Parliament.

Educase. (2009). 7 things you should know about microblogging.

Fowler, M. (2000). UML Distilled.

Jajoo, P. (2008). Document Clustering. India.

Milos, & Mirjana. (2008). Text Mining: Approaches and Application. 227-234.

Ms. Priyanka Patel, M. K. (2015). A Review: Text Classification on Social Media Data.

IOSR Journal of Computer Engineering, 80-84.

Pagare, P. K. (2014). Analyzing Social Media Data for Understanding Student's Problem.

International Journal of Computer Applications, 17-22.

Sasithra, k., & saravanan. (2014). Review on Classification Based on Artificial Neural

Networks. International Journal of Ambient System and Applications, 11-18.

Sommervile, I. (2007). Software Engineering. United Kingdom.

53

Witten, I. H. (n.d.). Text Mining. Hamilton, New Zealand.

Yiu, T. (2001, April 5). Decision Tree Classification.

Zisserman, A. (2015). The SVM Classifier.

54