spelling error detector rule for jawi
TRANSCRIPT
-
7/28/2019 Spelling Error Detector Rule for Jawi
1/5
Spelling Error Detector ule for Jawi StemmerSuliana Sulaiman1, hairuddin Omar, Nazlia Omar3, 0RKGZamri Murah4, Hamdan Abdul ahman5
#Faculty of Art, Computing and Creative Industry
Universiti Pendidikan Sultan Idris,Tanjong Malim,Perak,[email protected]
*Pattern Recognition Research Lab, CAIT
Faculty of Information Science and TechnologyUniversiti Kebangsaan Malaysia,Bangi, Malaysia
{ko,no,zamri,[email protected]}
Abstract Stemmer is important especially for information and
document retrieval. It can also help to reduce the size of the
dictionary. Normally Malay stemmers need to have a root word
dictionary to increase the stemmers accuracy. In Jawi stemmer,
we use Jawi spelling error rule to detect whether the program
produces the correct stemmed word after all possible affixes have
been removed. Jawi spelling error rule has been tested using
3018 data in Jawi with two syllables root word and the result was
compared manually. The result shows 97.8% accuracy of Jawi
spelling word with two syllables which have been checked
correctly using the spelling error detector rule.
Keywords Spelling error detector rule, Jawi.
I. INTODUCTIONStemmer is a computer process to reduce all affixes word to
its root word [1]. etrieval effectiveness such as precisionespecially for language with complex morphology or shortquery can be increased using stemmer [2]. Stemmer is also
capable of reducing morphological variants into its root wordand at the same time it helps in improving recall [3].Most of the languages such as English, Arabic, Malay,
France and Dutch have their own stemmer. The earliestEnglish stemmer was created in 1968 by Julie Beth Lovins to
process English word into stemmed word [4]. Malaylanguage differs from English language as it has complexmorphology compared to English [5]. Malay language can bewritten either in umi or in Jawi. There are many differences
between umi and Jawi. umi should be written from left toright and it is more like English character but Jawi should bewritten from right to left and it is similar like Arabic character.Spelling of Malay word in Jawi is different compare to
spelling Malay word in umi. In Jawi stemmer, we use Jawispelling error detector rule to make sure the producedstemmed word is spelled correctly after each affixes rule wereapplied.
This paper proposed the spelling error detector rule todetect whether the stemmer can produce the correct stemmedword after all possible affixes has been removed. The paper isstructured as follows. Section 2 introduces an overview of therelated work on spell checking rule for other language.Section 3 explains the spelling technique used to spell Jawiwith two syllables. Section 4 shows the experimental result.
Section 5 discusses the result of our experiment andconclusion has been made in section 6.
II. ELATEDWOWe can categorize error in Malay stemmer as
understemming, overstemming, unchanged and spellingexception [6]. The first Malay stemmer has been developed byAsim Othman in 1993[7]. He used 121 set of rules to stem all
possible affixes in umi and compared the stemmed wordwith dictionary. He started with prefix-suffix rules andfollowed by prefix rules, suffix rules and infix rules. If thestemmed word produced did not match with the dictionarythen he applied prefix rules and repeat the same process untilall of the affix rules are applied. In that case stemmed wordneed to be matched with the dictionary. In 1995, Fatimah [8]came out with her own Malay stemmer. She managed todevelop 561 rules for prefix, suffix, prefix suffix and infix. Allaffix has been tested using these rules and she compared thestemmed word produced with the root word dictionary.
N.Idris [9], 21 tried to reduce the number of rules producedfrom Fatimahs stemmer. She used only the significant rulesfor prefix and suffix. To overcome the heterophyllous word,she used extra dictionary and called as local dictionary. Thedictionary holds root word for explicit context and highlyfocused to the application. In 25 Taufik [1] tried to makeenhancement to Fatimah algorithm to reduce the stemmingerror. He came out with new method called ule FrequencyOrder. All of the researcher mentioned above, used thedictionary to make sure they stem the correct word[7][8][9][1]. Even though Malay stemmer can produce the
best result for a stemmed word, it is still not suitable for Jawicharacters. One of the main reason is the spelling technique tospell Jawi is different compared to umi.
III.SPELLINEOULESpelling in Jawi is more complicated compared to umi
because of the placement of the vowel. Jawi have threevowels such as , and . To spell two syllables in Jawi wordcorrectly we need to follow the five steps as below [11]:Step 1: Using vowels at both syllablesStep 2: Using hamzah () between first and second syllables.
211 International Conference on Pattern Analysis and Intelligent obotics
28-29 June 211, Putrajaya, Malaysia
978-1-61284-46-/11/$26. 211
DA-4
-
7/28/2019 Spelling Error Detector Rule for Jawi
2/5
Step 3: Using vowels only at first syllablesStep 4: Using vowels only at second syllablesStep 5: No vowels at first and second syllables
A. Spelling Jawi with Two SyllablesTwo syllables word is a combination of Open and Close
Syllables. Open Syllables (example: ma) has a vowel at thelast character meanwhile Close Syllables has a consonant at
the last character (example: kan). Jawi with two syllables canbe spelt using the combination of Open Syllable + OpenSyllable, Close Syllable + Open Syllable, Open syllable +Close syllable and Close Syllable + Close syllable. Details ofthe two syllables pattern are described in Table 1, Table 2,Table 3 and Table 4.
TABLE 1COMBINATION OF OPEN SYLLABLES + OPEN SYLLABLES
Polar Example
Vowel + Vowel [i+a]
Vowel + Consonant Vowel [i + tu]
Consonant Vowel + Vowel [du + a]
Consonant Vowel + Consonant Vowel [ko + ta]
Vowel + Consonant Diftong [a + bai]Consonant Vowel + Consonant Diftong [pa + loi]
TABLE 2COMBINATION OF CLOSE SYLLABLES + OPEN SYLLABLES
Polar Example
Vowel Consonant + Consonant Vowel [an + da]
Consonant Vowel Consonant +
Consonant Vowel
[ban+tu]
Vowel Consonant + Consonant Diftong [an+dai]Consonant Vowel Consonant +Consonant Diftong
[san+tau]
TABLE 3COMBINATION OF OPEN SYLLABLES + CLOSE SYLLABLES
Polar Example
Vowel + Vowel Consonant [a+ur]
Vowel + Consonant Vowel Consonant [i+kan]
Consonant Vowel + Vowel Consonant [ma+in]
Consonant Vowel + Consonant Vowel
Consonant[se+pit]
TABLE 4COMBINATION OF CLOSE SYLLABLES + CLOSE SYLLABLES
Polar Example
Vowel Consonant + Consonant VowelConsonant
[in+tan]
Consonant Vowel Consonant +Consonant Vowel Consonant
[sun+tik]
Consonant Diftong + Consonant VowelConsonant
[tau+lan]
Vowels in Jawi character are , and while consonantsare other Jawi characters except , and . Other syllable thathas the combination of characters ai, au oroi in their syllablesis identified as Diphthong [11].
Most of the researchers for example A.Othman [7], F.Ahmad [8], N.Idris [9] and Taufik [1], used either dictionary,root word dictionary or local dictionary as a component to
make sure the correct stemmed word is produced after affixrules are applied. In this paper we proposed to check the
produced stemmed word with Jawi spelling rule to make surethe stemmed word is correct.
B. Spelling Error Detector RuleTo get the stemmed word from affix word we need
to apply Jawi stemming rule. After the prefix rule is applied, itproduces the stemmed word as . This word uses thespelling error detector to detect whether the stemmed word
produced is spelled correctly. The rules are generated usingTable 1, Table 2, Table 3 and Table 4 combined with step 1 to
step 5 as mentioned in section II. Figure 1 shows the exampleof the spelling detector for word .
Fig. 1 Example of spelling error detector for word
After eliminating possible affixes, the stemmed word mustuse the spelling error detector to make sure it produces the
Open Syllable
V =
NoYes
Open Syllable
V =
Implement spelling error rule
esult
Correct
Wrong
Left to rightV C V C
1st Syllable 2nd SyllablesCV CV
Compare
C V C V ight to left
-
7/28/2019 Spelling Error Detector Rule for Jawi
3/5
-
7/28/2019 Spelling Error Detector Rule for Jawi
4/5
-
7/28/2019 Spelling Error Detector Rule for Jawi
5/5
From this table we can conclude that Spelling ErrorDetector ule is able to detect 97.8% of Jawi spelling wordwith two syllables correctly. Figure 2 shows the graph ofwords correctly checked by the pattern and its error.
Fig 2 raph shows word correctly checked and its error.
The highest error can be seen in pattern 1 for [e+a]/ + .Most of these errors occur because the rule cannot identifywhether the vowel belong to e pepet or e taling. The rulegives the correct answer for because it cannot differentiatethe use of e pepet and e taling. In umi there is no
problem to spell bena in e-pepet and e-taling because tospell it correctly we use both vowel e and a at first andsecond syllables but in Jawi to set apart e-pepet from e-taling we can only use vowel .
V. CONCLUSIONSFrom the result it shows that 97.8% of two syllables words
have been checked correctly using the Spelling ErrorDetector ule. In Jawi, character can be interpreted as e
pepet or as e taling. The different of these e pepet and etaling can be found in word such as and . For thistype of problem Spelling Error Detector ule cannot detect
the word which shouldnt be spelled in e taling instead of epepet. Our current effort will involve more development onJawi spelling rules for the three syllables words.
EFEENCES[1] T.A Eiman. and L. Jessica, Towards an Error-Free Arabic Stemming,
Communication of ACM. vol.23, pp. 9-14. 28[2] . rovetz, Viewing Morphology as an Inference Process, Univ. of
Massachusetts, Amherst, MA, Tech. ep. 1-12, 1993[3] A. . Pandey and T. J. Siddiqui, An Unsupervised Hindi Stemmer
with Heuristic Improvements, in ACM 28.p 99.[4] J.B Lovins.,. Development of Stemming Algorithm, Mechanical
Translation and Computational Linguistic. vol.11, pp 22-31, 1968.[5] N. S. arim, F. M. Onn, H. Musa and A. H. Mahmood, Tatabahasa
DewanEdisi Ketiga, uala Lumpur, Malaysia: Dewan Bahasa dan
Pustaka, 28.[6] T.M.T. Sembok, Word Stemming Algorithms and etrievalEffectiveness in Malay and Arabic Documents etrieval Systems,WASET, vol 1, pp. 95-97, Nov. 25.
[7] A.Othman,Pengakar perkataan melayu untuk sistem capaiandokumen, MSc. thesis. National University of Malaysia. 1993.
[8] F. Ahmad, Experiments with A Malay Stemming Algorithm, PhD,thesis, National Univerity of Malaysia, 1996.
[9] N.Idris, S.M.F.D.S Mustafa, Stemming for Term Conflation in MalayTexts,ACM, vol.3, pp.12-17.
[1] M.Taufik, F. Ahmad, . Mahmod and T.M.T Sembok, ulesFrequency Order Stemmer for Malay Language,International Journalof Computer Science and Network Security IJCSNS, vol.9, pp.433-438,29.
[11] H.A.ahman, Panduan Menulis dan Mengeja Jawi. uala Lumpur,Malaysia: Dewan Bahasa dan Pustaka, 1999.