Transcript
Page 1: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 1/3

B o o k R e v i e w s

Finite-State Language Processing

E m m a n u e l R o c h e a n d Y v e s S ch a b e s ( ed i to r s )

( T e r a g r a m C o r p o r a t i o n )

C a m b r i d g e , M A : T h e M I T P r e s s ( T h e

M I T P r e s s s e r i e s i n L a n g u a g e , S p e e c h

a n d C o m m u n i c a t i o n ) , 1 9 9 7 ,

x v ii + 46 4 p p ; h a r d b o u n d , I S B N

0-262-18182-7, $45.00

Reviewed byMario Josd Cdccamo and Tomasz Kowaltowski

Un iversity o f Campinas, Brazil

1 . I n t r o d u c t i o n

F i n it e -s ta t e a u t o m a t a h a v e b e e n w i d e l y u s e d i n c o m p u t e r s c ie n c e si n c e i ts b e g in n i n g .

H o w e v e r , f o r a lo n g t im e t h e N L P c o m m u n i t y h a s p r e f e r r e d o t h e r t o o l s b a s e d o n

m o r e p o w e r f u l f o rm a l i s m s . C h o m s k y ' s a r g u m e n t t h a t fi n it e -s ta t e d e v i c e s a r e n o t a b l et o re p r e s e n t n a t u r a l l a n g u a g e s t ru c t u re s , e s p e c i a ll y t h o s e i n v o l v i n g c e n t ra l e m b e d d i n g

( r ecu r s io n ) , w a s o n e o f t h e r ea s o n s f o r th i s f a c t.

F i n i te - s ta t e a u t o m a t a w e r e i n t r o d u c e d f i rs t t o N L P a s to o l s fo r e f fi c i e n t c o m p u t a -

t i o n a l i m p l e m e n t a t i o n o f l a r g e v o c a b u l a r i e s a n d l e x i c o n s. E x c e ll e n t r e s u l t s a c h i e v e d i n

t h a t a r e a b r o u g h t i n t e r e s t i n a p p l y i n g f in i t e- s ta t e f o r m a l i s m s t o o t h e r f ie l d s o f c o m p u -

t a t i o n a l l i n g u i st i c s .Finite-State Languag e Processing i s a c o l l e c t i o n o f 1 5 p a p e r s w r i t t e n b y 2 1 a u t h o r s

f o c u s e d o n s t a t e o f t h e a r t f in i t e- s ta t e m o d e l s . I t c o n t a i n s p a p e r s r e p o r t i n g r e s e a r c h o n

m o r p h o l o g y , l e x i c o n c o n s t r u c t i o n , ( s u r f a c e ) p a r s i n g , p a r t - o f - s p e e c h t a g g i n g , p h o n e t i c

c o n v e r s io n , i n fo r m a t i o n r e tr ie v a l, a n d s p e e c h r e c o g ni t io n . A l t h o u g h m a n y c h a p t e r s a r e

w r i t t e n b y r e s e a rc h e r s w h o a r e c u r r e n t l y d e v e l o p i n g t h e i r w o r k a t A m e r i c a n r e s e a r ch

c e n te r s, m o s t c o n t r i b u ti o n s t o t h is b o o k c a m e f r o m t h e E u r o p e a n N L P s c h o o l.

T h e p r e f a c e , t h e 1 5 c h a p t e r s , a n d t h e i n d e x c o m p r i s e a l t o g e t h e r a b o o k o f 4 6 4

p a g e s . E a c h c h a p t e r i s a n i n d e p e n d e n t p a p e r a b o u t a s pe c if ic to p ic . M o s t o f th e p a p e r s

r e q u i r e s o m e r e a s o n a b l e b a c k g r o u n d i n c o m p u t a t i o n a l l i n g u i s t i c s a n d i n c o m p u t e r

sc ience .

C h a p t e r 1 p r e s e n t s a g e n e r a l i n t r o d u c t i o n t o t h e t h e o r y o f f i n it e -s t a te a u t o m a t a

a n d t r a n sd u c e r s . T h e o r d e r o f t h e n e x t c h a p t e r s d o e s n o t s e e m t o f o l lo w a n y s p e ci fi c

c r i t e r i o n ; h o w e v e r s o m e r e l a t e d c h a p t e r s a r e c o n s e c u t i v e . T h e b o o k i n c l u d e s a l i s t o f

c o n t r i b u t o r s w i t h t h e i r a f f i l i a t i o n s a n d a d d r e s s e s , a n d a s m a l l t e r m i n d e x .

2 . C o n t e n t s

C h a p t e r 1, w r i t t e n b y t h e e d i t o r s o f th e b o o k , i n t r o d u c e s m o s t o f t h e c o n c e p t s n e c e s s a r y

t o r e a d t h e b o o k , t h o u g h i t d o e s r e q u ir e s o m e p r e v i o u s k n o w l e d g e o f t h e su b je c t . T h e

Page 2: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 2/3

Com putat ional Linguist ics Volume 24, ,Num ber 4

m o s t v a l u a b l e p o i n t o f t h i s c h a p t e r i s t h e p r e s e n t a t i o n o f s o m e f i ni t e -s t a te c o n c e p t s

t h r o u g h N L P e x a m p l e s .

C h a p t e r s 2 a n d 6 a r e r e l e v a n t t o m o r p h o l o g i c a l p r o c e s s i n g . I n C h a p t e r 2 , D a v i d

C l e m e n c e a u a d d r e s s e s t h e w e l l - k n o w n p r o b l e m s o f de r iv a t io n a l m o r p h o l o g y a n d t h e

o p p o s i t i o n b e t w e e n l ex ic a l m e t h o d s ( el e ct ro n ic d i c ti o n ar ie s ) a n d h e u r is t ic m e t h o d s

( d e r iv a t i o n r u le s ). H e a d v o c a t e s m i x i n g t h e t w o a p p r o a c h e s . H o w e v e r , o n e o f h i s a r-g u m e n t s a b o u t " u n r e a li s ti c a ll y " l a r g e d ic t io n a r ie s s e e m s to b e l o s in g s o m e g r o u n d d u e

t o r e c e n t a d v a n c e s i n t e c h n o l o g y . T h i s c h a p t e r a l s o p r e s e n t s M O R P H O , a m o r p h o l o g -

i ca l a n a l y z e r b a s e d o n K o s k e n n i e m i ' s t w o - l e v e l m o d e l . I n C h a p t e r 6, M a x S i l b e rz t e in

a d o p t s a s i m i l a r a p p r o a c h t o n a t u r a l l a n g u a g e l e xi c a l a n a l y s i s, e s p e c i a l l y i n t h e p r e s -

e n ce o f c o m p o u n d w o r d s .

C h a p t e r 3 , b y K i m m o K o s k e n n i e m i , d i s c u s s e s t h e r e l e v a n c e o f t h e t w o - l e v e l m o r -

p h o l o g y m o d e l t o f in i te - st a te c a l c u lu s a n d h o w t h e i d e a s b e h i n d t w o - l e v e l m o r p h o l o g y

c a n b e e x t e n d e d t o d e s i g n F i n i t e - S t a t e I n t e r s e c t i o n G r a n ~ n a r s ( F S I G s ) .

I n C h a p t e r 4 , L a u r i K a r t t u n e n e x t e n d s t h e c a l c u l u s o f r e g u l a r e x p r e s s i o n s w i t h

th e r e p l a c e o p e r a t o r a s a n o t h e r t o o l t o c a p t u r e a c o m m o n o p e r a t i o n i n N L P : s t ri n gr e p l a c e m e n t . T h e t h e o re t ic a l a s p e c t s b e h i n d t h e o p e r a t o r a r e c l ar if ie d a n d e x p l a i n e d

b y e x a m p l e s . T h e re l a ti o n b e t w e e n t h e r e p l a c e o p e r a t o r a n d r e w r i t e a n d t w o - l e v e l r u l e s

i s a l s o d i s c u s s e d .

I n C h a p t e r 5, F e r n a n d o C . N . P e r e i ra a n d R e b e c c a N . W r i gh t p r o p o s e a n a l g o r i th m

t o c o m p u t e f in i te - st a te a p p r o x i m a t i o n s o f c o n t e x t- fr e e g ra m m a r s . T h e u n d e r l y i n g i n-

t e re s t o f t h e a u t h o r s i s t h e e f fi ci e nt c o m p u t a t i o n a l i m p l e m e n t a t i o n o f p h r a s e s t r u c tu r e

g r a m m a r s . T h e a p p r o x i m a t i o n a l g o r i th m a c c e p t s a n y c o n te x t -f r ee g r a m m a r a s i n p u t

a n d r e t u r n s a f i n i t e - s t a t e a u t o m a t o n t h a t r e c o g n i z e s a l l t h e s e n t e n c e s g e n e r a t e d b y t h e

g r a m m a r a n d p e r h a p s s o m e m o r e . It i s p r o v e d t h a t t h e a p p r o x i m a t i o n i s e x a c t i f t h e

g r a m m a r i s l e f t- l in e a r o r r ig h t -l in e a r. T h i s is n o t s u r p r i s in g , a s li n e a r g r a m m a r s ( r ig h t

o r l e f t ) d e s c r i b e t h e s a m e l a n g u a g e s a s f i n i t e - s t a t e a u t o m a t a .

A f i n i te - s ta t e i m p l e m e n t a t i o n o f E r ic B r il l' s p a r t - o f - s p e e c h t a g g e r i s p r e s e n t e d i n

C h a p t e r 7 b y t h e e d i t o r s o f t h e b o o k . T h e i r c o n s t r u c ti o n i s ru l e - b a s e d , a s o p p o s e d

t o t h e u s u a l s t o c h a s t ic a p p r o a c h , a n d p r o d u c e s a n e ff i c ie n t d e t e r m i n i s t i c t r a n s d u c e r

w i t h o u t s a c r if i ci n g t h e q u a l i t y o f re s u l t s. T h i s t a g g e r i s f a s t e r th a n t h e o r i g i n a l o n e

d e v i s e d b y B rill.

T h e a p p l i c a t i o n o f t r a n s d u c e r s i s a l s o th e c e n t r a l t h e m e o f C h a p t e r s 8 , 1 2, a n d 1 4. I n

C h a p t e r 8, E m m a n u e l R o c h e s h o w s h o w t o b u i l d p a r s e r s fo r c o n te x t- fl e e g r a m m a r s o r

e v e n m o r e c o m p l e x l in g u is ti c s i t u a t io n s u s i n g t r a n s d u c e rs . C h a p t e r 1 2 i s d e v o t e d t o t h e

c o m p u t a t i o n a l m a n i p u l a t i o n o f s e q u e n t i a l t ra n s d u c e r s ( d e te r m i n i st i c o v e r th e i n pu t ).

I n t h i s c h a p t e r , M e h r y a r M o h r i p r e s e n t s t h e t h e o r y r e l a t e d t o s e q u e n t i a l t r a n s d u c e r s

a n d b r i e f l y d i s c u s s e s t h e i r a p p l i c a t i o n s t o p h o n o l o g y , m o r p h o l o g y , l e x i c o n c o n s t r u c -

t i o n, s y n t a x , a n d s p e e c h p r o c e s s i n g . I n C h a p t e r 1 4, E r ic L a p o r t e p r o p o s e s a s o l u t i o n t o

p h o n e t i c c o n v e r s i o n b a s e d o n tr a n s d u c e r s a n d b i m a c h i n e s ( t r a n s d u c e r s w h e r e i n p u t

r e a d i n g i s c a r r i e d b o t h f r o m l e f t t o r i g h t a n d f r o m r i g h t t o l ef t) . I m p l e m e n t a t i o n d e t a i l s

o f B i P h o , a p h o n e t i c c o n v e r s i o n s y s t e m , a re a l s o d i s c u s s e d .

T h e b o o k r e t u r n s t o t h e F S IG fo r m a l i s m i n C h a p t e r s 9 a n d 1 0. F S I G s d e s e r v e

t o b e m e n t i o n e d a s o n e o f t h e f i r s t f i n i t e - s t a t e a p p l i c a t i o n s i n N L P . A t r o V o u t i l a i n e n

e x a m i n e s i n C h a p t e r 9 t h e p r e p a r a t i o n o f t h e i n f o r m a t i o n t o b u i l d a d a t a b a s e f o ra f i n i t e - s t a t e p a r s e r . M a n y r e m a r k s m a d e i n t h i s c h a p t e r a r e a p p l i c a b l e t o p a r s e r s

i n g e n er a l. A d r a w b a c k p o i n t e d o u t i n th e f ir st w o r k s o n F S I G s w a s t h e t i m e a n d

s p a c e r e q u i r e m e n t s o f a n a c t u a l im p l e m e n t a t i o n o f a p a r s e r b a s e d o n t h a t f o rm a l i s m .

S o m e a l g o r it h m s t h a t o v e r c o m e th e s e p r o b l e m s w e r e s u g g e s t e d i n t h e l it e ra t u re . I n

C h a p t e r 1 0 , P a s i T a p a n a i n e n s h e d s l i g h t o n t h o s e s o l u t i o n s a n d g i v e s a c o m p a r a t i v e

c o m p l e x i t y a n a l y s i s o f t h e a l g o r i th m s .

642

Page 3: Finite-State Language Processing.pdf

7/29/2019 Finite-State Language Processing.pdf

http://slidepdf.com/reader/full/finite-state-language-processingpdf 3/3

Book Reviews

In Chapter 11, Maurice Gross addresses the lack of a systematic categorization

of the objects in linguistics. The author concludes that constraints encoded in finite-

state automata can be locally described, and therefore a cumulative approach to the

construction of grammars is possible.

The implementation of the sys tem Faustus is discussed in Chapter 13 by Jerry R.

Hobbs and his colleagues from SRI. Faustus is a system for extracting informationfrom running texts. The architecture of the system consists of a cascade of finite-state

transducers splitting the processing into several different stages. Unlike other papers

in the collection, this paper focuses on the implementation of a real system.

The final chapter, by Fernando C. N. Pereira and Michael D. Riley, presents a

general framework for implementing speech recognizers. The interesting point of this

chapter is the application of weigh ted finite-state automata and transducers to repre-

sent data structures common in speech recognition.

3 . E v a l u a t i o n

Finite-State Language Processing is probably the first book covering the current work in

the area in a comprehensive way. It will be valuable to many researchers in linguistics,

especially those who are interested in nonclassical approaches to NLP. It should be

also appealing to those who come from computer science and are mot ivated to work in

computat ional linguistics. As a textbook it is appropriate for a postgraduate seminar.

The chapters are in general well written in a direct and easy-to-read style with

ma ny examples. Each chapter includes its own list of references; there is no unified list,

which might have been useful. There are some minor mistakes and inconsistencies,

quite common in a collection of loosely related papers.

The text is not directed to those who look for immediate implementation solutions.

Much of the material is treated in a fairly theoretical way in spite of the discussion of

many practical aspects. Finally, there are some subjects that are no t covered or are just

mentioned, such as applications to corpus processing and machine translation.

Ma rio Josd Cdccamo is a doctoral student in computer science. In his recent Master's thesis, he de-

scribed the implementation of an FSA-based environment for syntactic pattern processing that

can be used for applications that require surface parsing such as agreement advisers. Tomasz

Kowaltowski is a Professor of Computing at the University of Campinas whose interests include

applications of FSAs in representing large linguistic databases. The reviewers' address is: Insti-

tute of Computing, University of Campinas, Caixa Postal 6176, 13083-970 Campinas, SP, Brazil;

e-mail: {mcaccamo,tomasz}@dcc.unicamp.br

643


Top Related