text clustering of tafseer translations by using k … · each translator has its comments and sort...

TEXT CLUSTERING OF TAFSEER TRANSLATIONS BY USING K-MEANS ALGORITHM:

AN AL-BAQARAH CHAPTER VIEW

Paper authors: Mohammed A. Ahmed

Dr.Hanif BaharinDr.Puteri Nor Ellyza Nohuddin

Paper Conference ID: 152

PresenterPresentation NotesNOTE:To change the image on this slide, select the picture and delete it. Then click the Pictures icon in the placeholder to insert your own image.

ABSTRACT Al-Quran is the primary book of faith and practice among Muslims.

Al-Quran has been translated into different world languages, such as English that have been written bymany translators, and each translator has its ideas, comments, and sort of statements to translate the versesacquired from (Tafseer).

Text clustering is the process for text mining which needs to cluster into the same segment of similardocuments.

The paper tries to cluster these differences of Tafseer translations using text clustering.

This study adapted (k-means) clustering technique algorithm (unsupervised learning) to illustrate anddiscover the relationships between keywords called features or concepts for five different translators onthe 286 verses of al-Baqarah chapter.

Data preprocessing and feature extraction using TF-IDF (Term Frequency-Inverse DocumentFrequency) have applied for the datasets.

The results show two/three-dimensional clustering plotting for the first two/three most frequent featuresassigning to seven cluster categories (k=7) for each one of five translated Tafseer. The features'allah/god', 'believ', and 'said' are the most three features shared by the five Tafseer.

INTRODUCTION

Text clustering is the process for text mining which needs to cluster into the same segment of similardocuments.

Clustering techniques of text are like data mining. Documents loaded within the weight vector term areobjects in the clustering. Techniques including:- Partitioning (k-means). Hierarchical (network analysis map). Density-Based (DBSCAN). Grid-based (STING).

Al-Quran is written in Arabic and translated into different world languages, such as English that has beenwritten by many translators. The longest chapter (Sura) in al-Quran is al-Baqarah.

Each translator has its comments and sort of statements to explain the verses acquired from (Tafseer). Thisstudy aims to discover the relationships and make clustering between keywords called features orconcepts for five different translators using the clustering technique (k-means) algorithm.

LITERATURE REVIEW

A. C. Slamet et al.[8] studied algorithm (k-means) to use for clustering 6236 total verses of the HolyQuran, which created three clusters for unstemmed/stemmed attributes.

F. Huda et al. [7] Three tests of similarity are also used to identify chapter Al-Baqarah using a comparisonof three cluster algorithms: k-means, k-medoid, and bisecting. The optimum result was by k-medoidwith cosine similarity.

S. Chua et al. [5] used TF-IDF and hierarchical algorithm of network analysis map approach to extractkeywords and identify relationships between keywords and chapters of Malay translated Tafseer. Theproposed method called KCRA framework.

S. J. Putra et al. [10] developed a semantic-based question answering system (QAS) forthe Indonesian translation of the Quran using TF-IDF to clustered 222 concepts from ontology al-Quran.

S. J. Putra et al. [12] implements (QAS) is the same as [10], But here, the author shows more detailsabout TF-IDF results and has different work procedure.

RESEARCH METHODOLOGY

Tokenizing

POS Tagging

Case Folding

Stemming

Stop-word removal

The Preprocessing

TF-IDF

Feature Extraction

k-means

The Clustering Algorithm

THE EXPERIMENT

START

Collect Data

Preprocessing

Feature Extraction

Clustering Process

END

Flowchart of Experimental Scenario

CONCLUSION

This paper used one of text clustering techniquealgorithm, k-means adapted for the experiments.

Al-Baqarah chapter verses are the data used as inputto this algorithm.

Data preprocessing and feature extraction using TF-IDF (Term Frequency-Inverse Document Frequency)have applied for the datasets.

Five documents of five different translators haveapplied. Translators translated verse from Arabic toEnglish language using Tafseer.

Two/three-dimensional clustering plottingimplemented show the first most two/three frequentfeatures assigned to seven cluster categories (k=7) oneach of five Tafseer.

Features ('allah/god', 'believ', and 'said') are the mostthree features shared by the five Tafseer.

Tafseernumber

Features' name (frequent number)

1 god(296), believ(69), lord(57), said(55), say(48), rememb(47), know(45), good(50),…

2 allah(264), said(69), know(66), believ(54), people(54), prophet(49), lord(48), say(48),…

3 god(288), shall(111), believ(73), said(70), say(60), lord(49), know(45), thou(44),…

4 allah (276), unto (225), ye (214), shall(151), verili(102),said(73), thou(68),believ(66),…

5 allah(349), believ(84), said(72), say(67), shall (62), know(56), lord(54), muhammed(45),….

REFERENCES

[5] S. Chua and P. N. E. Nohuddin. Relationship Analysis of Keyword and Chapter in Malay-Translated Tafseer of Al-Quran, Journal of Telecommunication, Electronic and Computer Engineering(JTEC), vol. 9, no. 2–10, pp. 185–189, 2017.

[7] A. F. Huda, M. R. Deyana, Q. U. Safitri, W. Darmalaksana, U. Rahmani, and others. AnalysisPartition Clustering and Similarity Measure on Al-Quran Verses, in 2019 IEEE 5th InternationalConference on Wireless and Telematics (ICWT), 2019, pp. 1–5.

[8] C. Slamet, A. Rahman, M. A. Ramdhani, and W. Darmalaksana. Clustering the verses of the HolyQur'an using K-means algorithm, Asian Journal of Information Technology, vol. 15, no. 24, pp. 5159–5162, 2016.

[10] S. J. Putra, R. H. Gusmita, K. Hulliyah, and H. T. Sukmana. A semantic-based question answeringsystem for indonesian translation of Quran, in Proceedings of the 18th International Conference onInformation Integration and Web-based Applications and Services, 2016, pp. 504–507.

[12] S. J. Putra, K. Hulliyah, N. Hakiem, R. P. Iswara, and A. F. Firmansyah. Generating weightedvector for concepts in indonesian translation of Quran, in Proceedings of the 18th InternationalConference on Information Integration and Web-based Applications and Services, 2016, pp. 293–297.

THANK YOU

Text Clustering of Tafseer Translations by Using k-means Algorithm: �An Al-Baqarah Chapter ViewABSTRACT INTRODUCTIONLITERATURE REVIEWRESEARCH METHODOLOGYTHE EXPERIMENTCONCLUSION REFERENCESThank you