spelling error detector rule for jawi

Upload: whodoes

Post on 03-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/28/2019 Spelling Error Detector Rule for Jawi

    1/5

    Spelling Error Detector ule for Jawi StemmerSuliana Sulaiman1, hairuddin Omar, Nazlia Omar3, 0RKGZamri Murah4, Hamdan Abdul ahman5

    #Faculty of Art, Computing and Creative Industry

    Universiti Pendidikan Sultan Idris,Tanjong Malim,Perak,[email protected]

    *Pattern Recognition Research Lab, CAIT

    Faculty of Information Science and TechnologyUniversiti Kebangsaan Malaysia,Bangi, Malaysia

    {ko,no,zamri,[email protected]}

    Abstract Stemmer is important especially for information and

    document retrieval. It can also help to reduce the size of the

    dictionary. Normally Malay stemmers need to have a root word

    dictionary to increase the stemmers accuracy. In Jawi stemmer,

    we use Jawi spelling error rule to detect whether the program

    produces the correct stemmed word after all possible affixes have

    been removed. Jawi spelling error rule has been tested using

    3018 data in Jawi with two syllables root word and the result was

    compared manually. The result shows 97.8% accuracy of Jawi

    spelling word with two syllables which have been checked

    correctly using the spelling error detector rule.

    Keywords Spelling error detector rule, Jawi.

    I. INTODUCTIONStemmer is a computer process to reduce all affixes word to

    its root word [1]. etrieval effectiveness such as precisionespecially for language with complex morphology or shortquery can be increased using stemmer [2]. Stemmer is also

    capable of reducing morphological variants into its root wordand at the same time it helps in improving recall [3].Most of the languages such as English, Arabic, Malay,

    France and Dutch have their own stemmer. The earliestEnglish stemmer was created in 1968 by Julie Beth Lovins to

    process English word into stemmed word [4]. Malaylanguage differs from English language as it has complexmorphology compared to English [5]. Malay language can bewritten either in umi or in Jawi. There are many differences

    between umi and Jawi. umi should be written from left toright and it is more like English character but Jawi should bewritten from right to left and it is similar like Arabic character.Spelling of Malay word in Jawi is different compare to

    spelling Malay word in umi. In Jawi stemmer, we use Jawispelling error detector rule to make sure the producedstemmed word is spelled correctly after each affixes rule wereapplied.

    This paper proposed the spelling error detector rule todetect whether the stemmer can produce the correct stemmedword after all possible affixes has been removed. The paper isstructured as follows. Section 2 introduces an overview of therelated work on spell checking rule for other language.Section 3 explains the spelling technique used to spell Jawiwith two syllables. Section 4 shows the experimental result.

    Section 5 discusses the result of our experiment andconclusion has been made in section 6.

    II. ELATEDWOWe can categorize error in Malay stemmer as

    understemming, overstemming, unchanged and spellingexception [6]. The first Malay stemmer has been developed byAsim Othman in 1993[7]. He used 121 set of rules to stem all

    possible affixes in umi and compared the stemmed wordwith dictionary. He started with prefix-suffix rules andfollowed by prefix rules, suffix rules and infix rules. If thestemmed word produced did not match with the dictionarythen he applied prefix rules and repeat the same process untilall of the affix rules are applied. In that case stemmed wordneed to be matched with the dictionary. In 1995, Fatimah [8]came out with her own Malay stemmer. She managed todevelop 561 rules for prefix, suffix, prefix suffix and infix. Allaffix has been tested using these rules and she compared thestemmed word produced with the root word dictionary.

    N.Idris [9], 21 tried to reduce the number of rules producedfrom Fatimahs stemmer. She used only the significant rulesfor prefix and suffix. To overcome the heterophyllous word,she used extra dictionary and called as local dictionary. Thedictionary holds root word for explicit context and highlyfocused to the application. In 25 Taufik [1] tried to makeenhancement to Fatimah algorithm to reduce the stemmingerror. He came out with new method called ule FrequencyOrder. All of the researcher mentioned above, used thedictionary to make sure they stem the correct word[7][8][9][1]. Even though Malay stemmer can produce the

    best result for a stemmed word, it is still not suitable for Jawicharacters. One of the main reason is the spelling technique tospell Jawi is different compared to umi.

    III.SPELLINEOULESpelling in Jawi is more complicated compared to umi

    because of the placement of the vowel. Jawi have threevowels such as , and . To spell two syllables in Jawi wordcorrectly we need to follow the five steps as below [11]:Step 1: Using vowels at both syllablesStep 2: Using hamzah () between first and second syllables.

    211 International Conference on Pattern Analysis and Intelligent obotics

    28-29 June 211, Putrajaya, Malaysia

    978-1-61284-46-/11/$26. 211

    DA-4

  • 7/28/2019 Spelling Error Detector Rule for Jawi

    2/5

    Step 3: Using vowels only at first syllablesStep 4: Using vowels only at second syllablesStep 5: No vowels at first and second syllables

    A. Spelling Jawi with Two SyllablesTwo syllables word is a combination of Open and Close

    Syllables. Open Syllables (example: ma) has a vowel at thelast character meanwhile Close Syllables has a consonant at

    the last character (example: kan). Jawi with two syllables canbe spelt using the combination of Open Syllable + OpenSyllable, Close Syllable + Open Syllable, Open syllable +Close syllable and Close Syllable + Close syllable. Details ofthe two syllables pattern are described in Table 1, Table 2,Table 3 and Table 4.

    TABLE 1COMBINATION OF OPEN SYLLABLES + OPEN SYLLABLES

    Polar Example

    Vowel + Vowel [i+a]

    Vowel + Consonant Vowel [i + tu]

    Consonant Vowel + Vowel [du + a]

    Consonant Vowel + Consonant Vowel [ko + ta]

    Vowel + Consonant Diftong [a + bai]Consonant Vowel + Consonant Diftong [pa + loi]

    TABLE 2COMBINATION OF CLOSE SYLLABLES + OPEN SYLLABLES

    Polar Example

    Vowel Consonant + Consonant Vowel [an + da]

    Consonant Vowel Consonant +

    Consonant Vowel

    [ban+tu]

    Vowel Consonant + Consonant Diftong [an+dai]Consonant Vowel Consonant +Consonant Diftong

    [san+tau]

    TABLE 3COMBINATION OF OPEN SYLLABLES + CLOSE SYLLABLES

    Polar Example

    Vowel + Vowel Consonant [a+ur]

    Vowel + Consonant Vowel Consonant [i+kan]

    Consonant Vowel + Vowel Consonant [ma+in]

    Consonant Vowel + Consonant Vowel

    Consonant[se+pit]

    TABLE 4COMBINATION OF CLOSE SYLLABLES + CLOSE SYLLABLES

    Polar Example

    Vowel Consonant + Consonant VowelConsonant

    [in+tan]

    Consonant Vowel Consonant +Consonant Vowel Consonant

    [sun+tik]

    Consonant Diftong + Consonant VowelConsonant

    [tau+lan]

    Vowels in Jawi character are , and while consonantsare other Jawi characters except , and . Other syllable thathas the combination of characters ai, au oroi in their syllablesis identified as Diphthong [11].

    Most of the researchers for example A.Othman [7], F.Ahmad [8], N.Idris [9] and Taufik [1], used either dictionary,root word dictionary or local dictionary as a component to

    make sure the correct stemmed word is produced after affixrules are applied. In this paper we proposed to check the

    produced stemmed word with Jawi spelling rule to make surethe stemmed word is correct.

    B. Spelling Error Detector RuleTo get the stemmed word from affix word we need

    to apply Jawi stemming rule. After the prefix rule is applied, itproduces the stemmed word as . This word uses thespelling error detector to detect whether the stemmed word

    produced is spelled correctly. The rules are generated usingTable 1, Table 2, Table 3 and Table 4 combined with step 1 to

    step 5 as mentioned in section II. Figure 1 shows the exampleof the spelling detector for word .

    Fig. 1 Example of spelling error detector for word

    After eliminating possible affixes, the stemmed word mustuse the spelling error detector to make sure it produces the

    Open Syllable

    V =

    NoYes

    Open Syllable

    V =

    Implement spelling error rule

    esult

    Correct

    Wrong

    Left to rightV C V C

    1st Syllable 2nd SyllablesCV CV

    Compare

    C V C V ight to left

  • 7/28/2019 Spelling Error Detector Rule for Jawi

    3/5

  • 7/28/2019 Spelling Error Detector Rule for Jawi

    4/5

  • 7/28/2019 Spelling Error Detector Rule for Jawi

    5/5

    From this table we can conclude that Spelling ErrorDetector ule is able to detect 97.8% of Jawi spelling wordwith two syllables correctly. Figure 2 shows the graph ofwords correctly checked by the pattern and its error.

    Fig 2 raph shows word correctly checked and its error.

    The highest error can be seen in pattern 1 for [e+a]/ + .Most of these errors occur because the rule cannot identifywhether the vowel belong to e pepet or e taling. The rulegives the correct answer for because it cannot differentiatethe use of e pepet and e taling. In umi there is no

    problem to spell bena in e-pepet and e-taling because tospell it correctly we use both vowel e and a at first andsecond syllables but in Jawi to set apart e-pepet from e-taling we can only use vowel .

    V. CONCLUSIONSFrom the result it shows that 97.8% of two syllables words

    have been checked correctly using the Spelling ErrorDetector ule. In Jawi, character can be interpreted as e

    pepet or as e taling. The different of these e pepet and etaling can be found in word such as and . For thistype of problem Spelling Error Detector ule cannot detect

    the word which shouldnt be spelled in e taling instead of epepet. Our current effort will involve more development onJawi spelling rules for the three syllables words.

    EFEENCES[1] T.A Eiman. and L. Jessica, Towards an Error-Free Arabic Stemming,

    Communication of ACM. vol.23, pp. 9-14. 28[2] . rovetz, Viewing Morphology as an Inference Process, Univ. of

    Massachusetts, Amherst, MA, Tech. ep. 1-12, 1993[3] A. . Pandey and T. J. Siddiqui, An Unsupervised Hindi Stemmer

    with Heuristic Improvements, in ACM 28.p 99.[4] J.B Lovins.,. Development of Stemming Algorithm, Mechanical

    Translation and Computational Linguistic. vol.11, pp 22-31, 1968.[5] N. S. arim, F. M. Onn, H. Musa and A. H. Mahmood, Tatabahasa

    DewanEdisi Ketiga, uala Lumpur, Malaysia: Dewan Bahasa dan

    Pustaka, 28.[6] T.M.T. Sembok, Word Stemming Algorithms and etrievalEffectiveness in Malay and Arabic Documents etrieval Systems,WASET, vol 1, pp. 95-97, Nov. 25.

    [7] A.Othman,Pengakar perkataan melayu untuk sistem capaiandokumen, MSc. thesis. National University of Malaysia. 1993.

    [8] F. Ahmad, Experiments with A Malay Stemming Algorithm, PhD,thesis, National Univerity of Malaysia, 1996.

    [9] N.Idris, S.M.F.D.S Mustafa, Stemming for Term Conflation in MalayTexts,ACM, vol.3, pp.12-17.

    [1] M.Taufik, F. Ahmad, . Mahmod and T.M.T Sembok, ulesFrequency Order Stemmer for Malay Language,International Journalof Computer Science and Network Security IJCSNS, vol.9, pp.433-438,29.

    [11] H.A.ahman, Panduan Menulis dan Mengeja Jawi. uala Lumpur,Malaysia: Dewan Bahasa dan Pustaka, 1999.