1566-1563-1-pb

8/4/2019 1566-1563-1-PB

1/2

This book i s an in t roduc t ion to

support vector machines and re-

lated kernel methods in super-

vised learning, whose task is to esti-

ma t e a n i npu t -ou t pu t func t i ona l

relationship from a training set of ex-

am ples. A learnin g prob lem is referred

to as classification if its out pu t take dis-

crete values in a set of possible cate-

gories and regression if it has contin u-

ous real-valued outp ut.

A sim ple and useful m odel of an in -

put-outpu t fun ct ion al relat ionship is

to assum e that th e outpu t variable can

be expressed app roxim ately as a lin ear

combination of its input vector com-

pon ents. These lin ear mod els include

th e lin ear least squares meth od for re-

gression and the logistic regression

method for classification. Because a

l inear model has l imited predict ion

power by itself, there has been exten-

sive research in n on linear mo dels such

as neural n etworks. However, there are

two major problems with the use of

n on lin ear mod els: First, th ey are theo-

retically difficult to analyze, and sec-

ond, they are computat ional ly diffi -

cul t to solve. Linear methods have

recently regain ed th eir p opu larity be-

cause of their simplicity both th eoret-

ically and co m pu tation ally. It has alsobeen real ized that with appropriate

features, the prediction power of lin-

ear models can be as good as non lin ear

models. For example, one can l in-

earize a neural n etwork mo del by us-

ing weighted averaging over all possi-

ble neural networks in th e mod el.

To use a linear m odel to represent a

nonlinear functional relationship, we

ceived m uch attention because of the

int roduc t ion of support vec tor ma-

chines (SVMs) and the renewed inter-

est in Gau ssian pro cesses. SVMs, in tro-

du ced by Vapn ik and h is collaborators,

were originally formulated for binary-classification problems. The resulting

method is similar to penalized maxi-

m um -likel ih ood est imators using lo-

gist ic mod els (also kno wn as condi-

tion al m aximu m en tropy classification

m od els). Later, SVMs were exten ded to

regression problems, and the resulting

formulations are similar to ridge re-

gression.

The general framework of support

vector learning includes the following

components: (1) regularized l inear

learning m odels (such as classification

and regression ), (2) th eoretical boun ds,

(3) convex duality and the associated

dual-kernel representat ion, and (4)

sparsen ess of th e du al-kernel represen -

tat ion. Although most of these con-

cepts are not n ew, the com bination is

unique, which makes SVMs and relat-

ed kernel-based learning methods spe-

cial and interesting. As a result, hun-

dreds of research papers have been

published in recent years on different

aspects of this new learnin g m ethod ol-

ogy. SVMs h ave also successfully been

applied in practice, especially for clas-

sificat ion problems. Although many

problems th at h ave been successfully

solved by SVMs could also have been

solved successfully by stan dard stat isti-

cal methods such as penalized logisticregression, in practice, SVMs can still

have significant compu tational advan-

tages.

Certain aspects of SVMs have al-

ready been described in Vapniks

monograph ent i t led Statistical Learn-

ing Theory (Wiley, 1998). Howev er, th is

book is highly mathematical , which

makes i t int imidat ing to an average

reader. Thus, it is not an ideal intro-

du ctory textbo ok. However, because of

th e pop ularity of SVMs in recen t years,

there is a need for a gentle introduc-

tion to th e subject th at is suitable forth e average reader.

Th e book by Cristianini an d Shawe-

Taylor successfully fulfills th e dem an d

for such a gentle introduct ion. It is

structured as a self-con tained textboo k

for machine learning s tudents and

practition ers. However, the bo ok is al-

so a valuable reference for active re-

searchers working in this field. A use-

ful aspect of the book is that i t

con tains a literature survey an d discus-

sion section at th e end o f every ch ap-

ter. Pointers to m an y of the referen ces

can also be foun d on a dedicated web

site.1 The discussion broadens the

readers view of relevant topics, and

the references point the reader to

some imp ortan t recent research works.

Because th e auth ors of the boo k are ac-

tive research ers in th e field wh o h ave

made t remendous cont r ibut ions

th emselves, the in formation ch osen in

SUMMER 200 1 103

Book Reviews

An Int roduct ion to SupportVector Machines andOther Kernel-Based

Learning MethodsA Review

Tong Zhang

An Introduction to

Support Vector Machines

and Other Kernel-Based

Learning Methods, Nello

Cristianini and

John Shawe-Taylor,

Cam bridge University

Press, Cam bridge, U.K.,

2000, 189 pp.,

ISBN 0-521-78019-5.

need to include nonlinear feature

comp on ents. A useful techn ique for

constructing nonlinear features is ker-

nel methods, where each feature is a

funct ion of the current input and one

of the example input (such as their

distance). This idea has recently re-

Copyrigh t 2001, American Association for Artificial Int elligence. All rights reserved. 0738-4602-2001 / $2.00

AI Magazine Volume 22 Number 2 (2001) ( AAAI)

8/4/2019 1566-1563-1-PB

2/2

the book reflects their unique, often

insightfu l perspectives.

The book contains eight chapters,

which cover topics inc luding the

math ematical foundation of kernels,

Vapn ik-Ch ervonen kisstyle learnin g

th eory, dual ity in m athem atical pro-

gramm ing, SVM formulations, imp le-

m ent ation issues, and successful SVM

applicat ions. Although these topics

are not comp rehensive, th ey are care-

ful ly chosen. Some more advanced

materials are left out or barely men-

t ioned in the book. Often, these ad-

van ced top ics are still und er active re-

search and, thus , a re not mature

enough to be included in an introduc-

tory text. Am on g th e m issing topics I

consider im portan t are (1) approxima-

t ion propert ies of kern el representa-

t ion, (2) non Vapn ik-Chervon enkis

style an d no n -marginstyle th eoretical

analysis, (3) the relationship betweenSVM and regularizat ion methods in

stat ist ics and numeric mathematics,

(4) the impact of different regulariza-

tion con ditions and th e imp act of dif-

feren t loss term s, (5) direct-kernel for-

mulat ions in the primal form (these

formula t ion s have some advantages

over the dual-kernel form ulation s con -

sidered in th e book), and (6) sparse-

ness properties of the dual-kernel rep-

resentation in an SVM (this aspect of

SVM is quite important and relates to

many ideas that existed before its in-

troduction).The book starts with a brief intro-

duct ion to d ifferent learning m odels

and th en focuses on l inear l earning

m achines using perceptron s and least

squares regression. Many other linear

learning machines are not mentioned

in th e book, probably because they are

less relevant to dual-kernel methods.

Th e con cept of kernel-indu ced feature

space is developed in a subsequent

chapter . The authors inc lude some

novel and very interesting kernel ex-

amp les, such as a string kernel th at in-

du ces a feature space con sistin g of all

possible sub sequen ces in a strin g. Th e

m ain advan tage of kernel-indu ced fea-

ture spaces is the ab ility to represent

n on linear fun ction s lin early in a space

spanned by the kernel functions. Al-

though a kerne l representa t ion can

na tura l ly be obta ined f rom the d ua l

form of SVMs as presented in the

boo k, it can also be form ulated d irect-

ly in a primal-form learn ing m achine.

The book h as a nicely written chap-

te r on th e learning th eory aspec t of

SVMs, mainly from the Vapnik-Cher-

vonenkis ana lys i s and the maximal

margin point of view. The authors

have made substant ial contribut ions

in th e theoretical developm ent of th efield, an d th e m aterial selection in th is

chapter reflects their perspective on

the problem. These theoretical results

are later used to m otivate som e specif-

ic SVM form ulation s con sidered in th e

book. As acknowledged by the au-

thors , th i s approach only leads to

learnin g meth ods that reflect what are

given in the th eoretical boun ds, which

m igh t not be optim al. In addition, for

computa t iona l reasons , in the la te r

proposed formu lations, there are some

heuristics not directly reflected in the

theoretical results. For these reasons,th e th eory itself does no t justify SVMs

as a meth od superior to other stan dard

statistical meth ods, such as logistic re-

gression for classification or least

squares for regression. In fact , one

m igh t s ta r t wi th n on -SVM formu la-

tions and develop similar theoretical

results.

Following the theoretical analysis

chapter, the book gives an introdu c-

t ion of opt imizat ion and Lagrangian

duali ty th eory, wh ich is an essent ial

compo nen t of supp ort vector learning.

Motivated by th e theoret ical boun ds

given earlier, a nu m ber of specific SVM

formulations are presented. By using

th e Lagran gian dual i ty th eory, th ese

SVMs are transformed in to d ual forms

using kernel representation. Some im-

plem ent ation issues th at are useful for

practitioners are carefully discussed.

Although this book mainly considers

the s tandard approach to obta ining

dual-kernel representat ion using La-

grangian duality, we can also use ker-

n el represen tation directly in p rimal-

form l inear learning machines. The

lat ter approach does not require the

Lagrangian du ality th eory and can be

made eith er equivalen t or n on -equiva-

lent to th e dual-kern el formu lation .Al though thi s book does not in-

clude all aspects of learning methods

related to SVMs, it achieves the right

balance o f top ic selection , clarity, and

m athem atical rigor as an in troductory

text . Th e topics covered in th e book

are the m ost fun damental and imp or-

tant ones in support vector learning

and related kernel meth ods. This book

is essential both for practitioners who

want to q uickly learn some implemen -

tat ion t ricks to apply SVMs to real

problems an d for th eoretically orien t-

ed readers who want to unders tandthe m athematical foun dations of the

underlying theory. It is also a handy

reference bo ok for researchers workin g

in th e field.

Note

1. www.support-vector.net.

Tony Zhang received a B.A. in m ath emat -

ics and comp uter science from Corn ell Uni-

versity in 1994 an d a Ph.D. in com puter sci-

ence from Stanford University in 1998.

Since 1998, h e has been with IBM Research,

T. J. Watson Research Cen ter, Yorkt ownHeights, New York, where h e is now a re-

search staff member in the Mathematical

Sciences Departmen t. His research in terests

include machine learning, numeric algo-

rithm s, and t heir applications. His e-mail

address is tzhan [email protected] .com.

104 AI MAGAZINE

Book Reviews

For Other Good Books on

Artificial Intelligence,

Please Visit the AAAI Press Website!

(www.aaaipress.org)

1566-1563-1-pb

Documents