working progress and outline of project plan wbs 1

Post on 12-Jan-2017

216 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

PRESENTED BY ELVIRA NURFADHILAH * B a s e d O n Wo r k B y M o h a m m a d Te d u h U l i n i a n s y a h , S h u n I s h i z a k i ,

A n d K i y o k o U c h i y a m a

1 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Name : Elvira Nurfadhilah Working at : Agency for the Assessment & Application of Technology (BPPT)

Laboratory : Intelligence Computing Laboratory (ICL) Specialisation field : Image Processing and Natural Language Processing Email : elvira.nurfadhilah@bppt.go.id/ elvira.nurfadhilah@gmail.com /

Educational background Under Graduate : Bogor Agriculture University in Computer Science (2011)

Graduate : Bogor Agriculture University in Computer Science (2015)

Joined at BPPT : 2014

2 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Intelligent Computing Laboratory (ICL) at the Center for Information and Communication Technology (PTIK), BPPT.

ICL deals with image processing, computer vision, language technology and signal processing.

Portal Bahasa (Stemmer and Concordance)

Statistical Machine Translation Text-To- Speech Etc.

Fingerprint and Latten Fingerprint Iris Face and Face sketch Blood vessel etc.

Developing a malaria diagnosis tool based on images of thin and thick smears.

Natural Language Processing

3 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

o Background o Experimental Data o Method o Results and Discussion o Utilizing the proposed technique for Wordnet o Demo Program

4 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

BACKGROUND

5 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Ambiguities arise when a single lexical word may have been created by more than one pos-sible combination of affixes.

Example: beruang:

o beruang (Noun Animal)

o ber ( uang ( Noun Concrete )) : Verb Intransitive

o be ( ruang ( Noun Abstract Concept )) :

Verb Intransitive

6 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

EXPERIMENTAL DATA

7 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

A corpus consists of articles (politics,economics,sports,etc.) downloaded from"Kompas"daily newspaper website (http://www.kompas.com). The corpus contains 20,579,771 words in 1,105,156 sentences.

8 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

There are more than 800 combinations of affixes (prefixes, suffixes, and infixes)

9 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

10 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

METHOD

11 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Linking all possible nodes Retrieve all possible POS tags

candidates from root word dictionary and affix table

Assign linking costs between nodes, search minimum cost,

and decide proper POS tags for each words

Process 1 Process 2 Process 3

12 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Beruang

(be + uang)

(ber + ruang)

13 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015 13

Example

14 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

15 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

16 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Let p1 be one POS and p2 another one, where p2 directly follows pi. The cost of the pair (p1,p2) is:

Cost (p1,p2) = 2log(N/n(p1,p2))

where n(p1,p2) is the number of (p1,p2) pairs which appear in the data. N is the total number of all of the pairs of POS tags in the data.

17 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Example :

18 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Cost (pronoun, adverb) = 4.9

Cost (pronoun, adjective) = 8.3

Cost (pronoun, conjunction) = 8.9

19 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

An Example of Possible Analysis for a Simple Input Sentence

20 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

RESULTS & DISCUSSION

21 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

22 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

23 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

24 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

25 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Reference • Uliniansyah MT, Ishizaki S, Uchiyama K. 2004. Solving Ambiguities in

Indonesian Words by Morphological Analysis Using Minimum Connectivity Cost. Journal of Natural Language Processing, Vol. 11, No. 1

• Kridalaksana, H.(1996). Pembentukan Kata dalam Bahasa Indonesia. PT Gramedia Pustaka Utama.

26 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Utilizing the proposed technique for WordNet We can use Wordnet and this technique to choose the proper sense of the word in a sentence.

Contoh :

Dia \pronoun sedang\adverb berada\verb intransitive di\partical location dalam\noun abstract location kamar\noun building

27 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

28 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

29 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

Demo Program

30 2ND WORDNET BAHASA WORKSHOP, 15-16 JANUARY 2015

top related