1566-1563-1-pb
TRANSCRIPT
-
8/4/2019 1566-1563-1-PB
1/2
This book i s an in t roduc t ion to
support vector machines and re-
lated kernel methods in super-
vised learning, whose task is to esti-
ma t e a n i npu t -ou t pu t func t i ona l
relationship from a training set of ex-
am ples. A learnin g prob lem is referred
to as classification if its out pu t take dis-
crete values in a set of possible cate-
gories and regression if it has contin u-
ous real-valued outp ut.
A sim ple and useful m odel of an in -
put-outpu t fun ct ion al relat ionship is
to assum e that th e outpu t variable can
be expressed app roxim ately as a lin ear
combination of its input vector com-
pon ents. These lin ear mod els include
th e lin ear least squares meth od for re-
gression and the logistic regression
method for classification. Because a
l inear model has l imited predict ion
power by itself, there has been exten-
sive research in n on linear mo dels such
as neural n etworks. However, there are
two major problems with the use of
n on lin ear mod els: First, th ey are theo-
retically difficult to analyze, and sec-
ond, they are computat ional ly diffi -
cul t to solve. Linear methods have
recently regain ed th eir p opu larity be-
cause of their simplicity both th eoret-
ically and co m pu tation ally. It has alsobeen real ized that with appropriate
features, the prediction power of lin-
ear models can be as good as non lin ear
models. For example, one can l in-
earize a neural n etwork mo del by us-
ing weighted averaging over all possi-
ble neural networks in th e mod el.
To use a linear m odel to represent a
nonlinear functional relationship, we
ceived m uch attention because of the
int roduc t ion of support vec tor ma-
chines (SVMs) and the renewed inter-
est in Gau ssian pro cesses. SVMs, in tro-
du ced by Vapn ik and h is collaborators,
were originally formulated for binary-classification problems. The resulting
method is similar to penalized maxi-
m um -likel ih ood est imators using lo-
gist ic mod els (also kno wn as condi-
tion al m aximu m en tropy classification
m od els). Later, SVMs were exten ded to
regression problems, and the resulting
formulations are similar to ridge re-
gression.
The general framework of support
vector learning includes the following
components: (1) regularized l inear
learning m odels (such as classification
and regression ), (2) th eoretical boun ds,
(3) convex duality and the associated
dual-kernel representat ion, and (4)
sparsen ess of th e du al-kernel represen -
tat ion. Although most of these con-
cepts are not n ew, the com bination is
unique, which makes SVMs and relat-
ed kernel-based learning methods spe-
cial and interesting. As a result, hun-
dreds of research papers have been
published in recent years on different
aspects of this new learnin g m ethod ol-
ogy. SVMs h ave also successfully been
applied in practice, especially for clas-
sificat ion problems. Although many
problems th at h ave been successfully
solved by SVMs could also have been
solved successfully by stan dard stat isti-
cal methods such as penalized logisticregression, in practice, SVMs can still
have significant compu tational advan-
tages.
Certain aspects of SVMs have al-
ready been described in Vapniks
monograph ent i t led Statistical Learn-
ing Theory (Wiley, 1998). Howev er, th is
book is highly mathematical , which
makes i t int imidat ing to an average
reader. Thus, it is not an ideal intro-
du ctory textbo ok. However, because of
th e pop ularity of SVMs in recen t years,
there is a need for a gentle introduc-
tion to th e subject th at is suitable forth e average reader.
Th e book by Cristianini an d Shawe-
Taylor successfully fulfills th e dem an d
for such a gentle introduct ion. It is
structured as a self-con tained textboo k
for machine learning s tudents and
practition ers. However, the bo ok is al-
so a valuable reference for active re-
searchers working in this field. A use-
ful aspect of the book is that i t
con tains a literature survey an d discus-
sion section at th e end o f every ch ap-
ter. Pointers to m an y of the referen ces
can also be foun d on a dedicated web
site.1 The discussion broadens the
readers view of relevant topics, and
the references point the reader to
some imp ortan t recent research works.
Because th e auth ors of the boo k are ac-
tive research ers in th e field wh o h ave
made t remendous cont r ibut ions
th emselves, the in formation ch osen in
SUMMER 200 1 103
Book Reviews
An Int roduct ion to SupportVector Machines andOther Kernel-Based
Learning MethodsA Review
Tong Zhang
An Introduction to
Support Vector Machines
and Other Kernel-Based
Learning Methods, Nello
Cristianini and
John Shawe-Taylor,
Cam bridge University
Press, Cam bridge, U.K.,
2000, 189 pp.,
ISBN 0-521-78019-5.
need to include nonlinear feature
comp on ents. A useful techn ique for
constructing nonlinear features is ker-
nel methods, where each feature is a
funct ion of the current input and one
of the example input (such as their
distance). This idea has recently re-
Copyrigh t 2001, American Association for Artificial Int elligence. All rights reserved. 0738-4602-2001 / $2.00
AI Magazine Volume 22 Number 2 (2001) ( AAAI)
-
8/4/2019 1566-1563-1-PB
2/2
the book reflects their unique, often
insightfu l perspectives.
The book contains eight chapters,
which cover topics inc luding the
math ematical foundation of kernels,
Vapn ik-Ch ervonen kisstyle learnin g
th eory, dual ity in m athem atical pro-
gramm ing, SVM formulations, imp le-
m ent ation issues, and successful SVM
applicat ions. Although these topics
are not comp rehensive, th ey are care-
ful ly chosen. Some more advanced
materials are left out or barely men-
t ioned in the book. Often, these ad-
van ced top ics are still und er active re-
search and, thus , a re not mature
enough to be included in an introduc-
tory text. Am on g th e m issing topics I
consider im portan t are (1) approxima-
t ion propert ies of kern el representa-
t ion, (2) non Vapn ik-Chervon enkis
style an d no n -marginstyle th eoretical
analysis, (3) the relationship betweenSVM and regularizat ion methods in
stat ist ics and numeric mathematics,
(4) the impact of different regulariza-
tion con ditions and th e imp act of dif-
feren t loss term s, (5) direct-kernel for-
mulat ions in the primal form (these
formula t ion s have some advantages
over the dual-kernel form ulation s con -
sidered in th e book), and (6) sparse-
ness properties of the dual-kernel rep-
resentation in an SVM (this aspect of
SVM is quite important and relates to
many ideas that existed before its in-
troduction).The book starts with a brief intro-
duct ion to d ifferent learning m odels
and th en focuses on l inear l earning
m achines using perceptron s and least
squares regression. Many other linear
learning machines are not mentioned
in th e book, probably because they are
less relevant to dual-kernel methods.
Th e con cept of kernel-indu ced feature
space is developed in a subsequent
chapter . The authors inc lude some
novel and very interesting kernel ex-
amp les, such as a string kernel th at in-
du ces a feature space con sistin g of all
possible sub sequen ces in a strin g. Th e
m ain advan tage of kernel-indu ced fea-
ture spaces is the ab ility to represent
n on linear fun ction s lin early in a space
spanned by the kernel functions. Al-
though a kerne l representa t ion can
na tura l ly be obta ined f rom the d ua l
form of SVMs as presented in the
boo k, it can also be form ulated d irect-
ly in a primal-form learn ing m achine.
The book h as a nicely written chap-
te r on th e learning th eory aspec t of
SVMs, mainly from the Vapnik-Cher-
vonenkis ana lys i s and the maximal
margin point of view. The authors
have made substant ial contribut ions
in th e theoretical developm ent of th efield, an d th e m aterial selection in th is
chapter reflects their perspective on
the problem. These theoretical results
are later used to m otivate som e specif-
ic SVM form ulation s con sidered in th e
book. As acknowledged by the au-
thors , th i s approach only leads to
learnin g meth ods that reflect what are
given in the th eoretical boun ds, which
m igh t not be optim al. In addition, for
computa t iona l reasons , in the la te r
proposed formu lations, there are some
heuristics not directly reflected in the
theoretical results. For these reasons,th e th eory itself does no t justify SVMs
as a meth od superior to other stan dard
statistical meth ods, such as logistic re-
gression for classification or least
squares for regression. In fact , one
m igh t s ta r t wi th n on -SVM formu la-
tions and develop similar theoretical
results.
Following the theoretical analysis
chapter, the book gives an introdu c-
t ion of opt imizat ion and Lagrangian
duali ty th eory, wh ich is an essent ial
compo nen t of supp ort vector learning.
Motivated by th e theoret ical boun ds
given earlier, a nu m ber of specific SVM
formulations are presented. By using
th e Lagran gian dual i ty th eory, th ese
SVMs are transformed in to d ual forms
using kernel representation. Some im-
plem ent ation issues th at are useful for
practitioners are carefully discussed.
Although this book mainly considers
the s tandard approach to obta ining
dual-kernel representat ion using La-
grangian duality, we can also use ker-
n el represen tation directly in p rimal-
form l inear learning machines. The
lat ter approach does not require the
Lagrangian du ality th eory and can be
made eith er equivalen t or n on -equiva-
lent to th e dual-kern el formu lation .Al though thi s book does not in-
clude all aspects of learning methods
related to SVMs, it achieves the right
balance o f top ic selection , clarity, and
m athem atical rigor as an in troductory
text . Th e topics covered in th e book
are the m ost fun damental and imp or-
tant ones in support vector learning
and related kernel meth ods. This book
is essential both for practitioners who
want to q uickly learn some implemen -
tat ion t ricks to apply SVMs to real
problems an d for th eoretically orien t-
ed readers who want to unders tandthe m athematical foun dations of the
underlying theory. It is also a handy
reference bo ok for researchers workin g
in th e field.
Note
1. www.support-vector.net.
Tony Zhang received a B.A. in m ath emat -
ics and comp uter science from Corn ell Uni-
versity in 1994 an d a Ph.D. in com puter sci-
ence from Stanford University in 1998.
Since 1998, h e has been with IBM Research,
T. J. Watson Research Cen ter, Yorkt ownHeights, New York, where h e is now a re-
search staff member in the Mathematical
Sciences Departmen t. His research in terests
include machine learning, numeric algo-
rithm s, and t heir applications. His e-mail
address is tzhan [email protected] .com.
104 AI MAGAZINE
Book Reviews
For Other Good Books on
Artificial Intelligence,
Please Visit the AAAI Press Website!
(www.aaaipress.org)