string matching dengan regular expressioninformatika.stei.itb.ac.id/~rinaldi.munir/stmik/...string...

17
String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James H. Martin 15-211 Fundamental Data Structures and Algorithms, by Ananda Gunawardena

Upload: vomien

Post on 17-May-2019

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

String Matching dengan Regular Expression

Masayu Leylia Khodra

Referensi:Chapter 2 of An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition, by Daniel Jurafsky and James H. Martin

15-211 Fundamental Data Structures and Algorithms, by Ananda Gunawardena

Page 2: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

String Matching: Definisi

• Diberikan:

1. T: teks (text), yaitu (long) string yang panjangnya nkarakter

2. P: pattern, yaitu string dengan panjang m karakter(asumsi m <<< n) yang akan dicari di dalam teks.

Carilah (find atau locate) di dalam teks yang bersesuaiandengan pattern.

Page 3: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Contoh 1: Exact Matching

Page 4: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Contoh 2: Regex Matching

Page 5: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Notasi Umum Regex

Page 6: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Regex Kata berawal Huruf Kapital

[A-Z][a-z]* : Alfabet huruf besar yang dilanjutkan dengan nol atau banyak huruf kecil

Page 7: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Notasi Regex: Contoh

Metacharacter titik “.” menyatakan karakter apapun (kiri). Gunakanlah backslash ‘\’

untuk metacharacter.

Page 8: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Notasi Regex: Contoh

Page 9: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Contoh 2: Regex

Page 10: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Contoh 3: Regex for Email

Page 11: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Contoh 4: Regex for Phone Number

Page 12: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Regular Expressions and Automata 12

Basic Regular Expression Patterns

• The use of the brackets [] to specify a disjunction of characters.

• The use of the brackets [] plus the dash - to specify a range.

Page 13: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Regular Expressions and Automata 13

Basic Regular Expression Patterns

• Uses of the caret ^ for negation or just to mean ^

• The question-mark ? marks optionality of the previous expression.

• The use of period . to specify any character

Page 14: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Finite State Machines (FSM)

• FSM is a computing machine that takes

– A string as an input

– Outputs YES/NO answer

• That is, the machine “accepts” or “rejects” the string

FSMInput String Yes / No

Referensi: Gunawardena, 2006

Page 15: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

FSM Model

• Input to a FSM– Strings built from a fixed alphabet {a,b,c}– Possible inputs: aa, aabbcc, a etc..

• The Machine– A directed graph

• Nodes = States of the machine• Edges = Transition from one state to another

• Special States– Start (q0) and Final (or Accepting) (q2)

• Assume the alphabet is {a,b}– Which strings are accepted by this FSM?

Referensi: Gunawardena, 2006

Page 16: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

FSM untuk String Matching

• Alphabet {a,b,c}

• Pattern “aabc”

• String: aaaaaaaaaaaabcddddddddddddddd

0Start 1 2 3 4a a b c

b|c

b|cc

a

b

a

4

Referensi: Gunawardena, 2006

Page 17: String Matching dengan Regular Expressioninformatika.stei.itb.ac.id/~rinaldi.munir/Stmik/...String Matching dengan Regular Expression Masayu Leylia Khodra Referensi: Chapter 2 of An

Regex di Java