1566-1563-1-pb

Upload: kshitij-kumar

Post on 07-Apr-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/4/2019 1566-1563-1-PB

    1/2

    This book i s an in t roduc t ion to

    support vector machines and re-

    lated kernel methods in super-

    vised learning, whose task is to esti-

    ma t e a n i npu t -ou t pu t func t i ona l

    relationship from a training set of ex-

    am ples. A learnin g prob lem is referred

    to as classification if its out pu t take dis-

    crete values in a set of possible cate-

    gories and regression if it has contin u-

    ous real-valued outp ut.

    A sim ple and useful m odel of an in -

    put-outpu t fun ct ion al relat ionship is

    to assum e that th e outpu t variable can

    be expressed app roxim ately as a lin ear

    combination of its input vector com-

    pon ents. These lin ear mod els include

    th e lin ear least squares meth od for re-

    gression and the logistic regression

    method for classification. Because a

    l inear model has l imited predict ion

    power by itself, there has been exten-

    sive research in n on linear mo dels such

    as neural n etworks. However, there are

    two major problems with the use of

    n on lin ear mod els: First, th ey are theo-

    retically difficult to analyze, and sec-

    ond, they are computat ional ly diffi -

    cul t to solve. Linear methods have

    recently regain ed th eir p opu larity be-

    cause of their simplicity both th eoret-

    ically and co m pu tation ally. It has alsobeen real ized that with appropriate

    features, the prediction power of lin-

    ear models can be as good as non lin ear

    models. For example, one can l in-

    earize a neural n etwork mo del by us-

    ing weighted averaging over all possi-

    ble neural networks in th e mod el.

    To use a linear m odel to represent a

    nonlinear functional relationship, we

    ceived m uch attention because of the

    int roduc t ion of support vec tor ma-

    chines (SVMs) and the renewed inter-

    est in Gau ssian pro cesses. SVMs, in tro-

    du ced by Vapn ik and h is collaborators,

    were originally formulated for binary-classification problems. The resulting

    method is similar to penalized maxi-

    m um -likel ih ood est imators using lo-

    gist ic mod els (also kno wn as condi-

    tion al m aximu m en tropy classification

    m od els). Later, SVMs were exten ded to

    regression problems, and the resulting

    formulations are similar to ridge re-

    gression.

    The general framework of support

    vector learning includes the following

    components: (1) regularized l inear

    learning m odels (such as classification

    and regression ), (2) th eoretical boun ds,

    (3) convex duality and the associated

    dual-kernel representat ion, and (4)

    sparsen ess of th e du al-kernel represen -

    tat ion. Although most of these con-

    cepts are not n ew, the com bination is

    unique, which makes SVMs and relat-

    ed kernel-based learning methods spe-

    cial and interesting. As a result, hun-

    dreds of research papers have been

    published in recent years on different

    aspects of this new learnin g m ethod ol-

    ogy. SVMs h ave also successfully been

    applied in practice, especially for clas-

    sificat ion problems. Although many

    problems th at h ave been successfully

    solved by SVMs could also have been

    solved successfully by stan dard stat isti-

    cal methods such as penalized logisticregression, in practice, SVMs can still

    have significant compu tational advan-

    tages.

    Certain aspects of SVMs have al-

    ready been described in Vapniks

    monograph ent i t led Statistical Learn-

    ing Theory (Wiley, 1998). Howev er, th is

    book is highly mathematical , which

    makes i t int imidat ing to an average

    reader. Thus, it is not an ideal intro-

    du ctory textbo ok. However, because of

    th e pop ularity of SVMs in recen t years,

    there is a need for a gentle introduc-

    tion to th e subject th at is suitable forth e average reader.

    Th e book by Cristianini an d Shawe-

    Taylor successfully fulfills th e dem an d

    for such a gentle introduct ion. It is

    structured as a self-con tained textboo k

    for machine learning s tudents and

    practition ers. However, the bo ok is al-

    so a valuable reference for active re-

    searchers working in this field. A use-

    ful aspect of the book is that i t

    con tains a literature survey an d discus-

    sion section at th e end o f every ch ap-

    ter. Pointers to m an y of the referen ces

    can also be foun d on a dedicated web

    site.1 The discussion broadens the

    readers view of relevant topics, and

    the references point the reader to

    some imp ortan t recent research works.

    Because th e auth ors of the boo k are ac-

    tive research ers in th e field wh o h ave

    made t remendous cont r ibut ions

    th emselves, the in formation ch osen in

    SUMMER 200 1 103

    Book Reviews

    An Int roduct ion to SupportVector Machines andOther Kernel-Based

    Learning MethodsA Review

    Tong Zhang

    An Introduction to

    Support Vector Machines

    and Other Kernel-Based

    Learning Methods, Nello

    Cristianini and

    John Shawe-Taylor,

    Cam bridge University

    Press, Cam bridge, U.K.,

    2000, 189 pp.,

    ISBN 0-521-78019-5.

    need to include nonlinear feature

    comp on ents. A useful techn ique for

    constructing nonlinear features is ker-

    nel methods, where each feature is a

    funct ion of the current input and one

    of the example input (such as their

    distance). This idea has recently re-

    Copyrigh t 2001, American Association for Artificial Int elligence. All rights reserved. 0738-4602-2001 / $2.00

    AI Magazine Volume 22 Number 2 (2001) ( AAAI)

  • 8/4/2019 1566-1563-1-PB

    2/2

    the book reflects their unique, often

    insightfu l perspectives.

    The book contains eight chapters,

    which cover topics inc luding the

    math ematical foundation of kernels,

    Vapn ik-Ch ervonen kisstyle learnin g

    th eory, dual ity in m athem atical pro-

    gramm ing, SVM formulations, imp le-

    m ent ation issues, and successful SVM

    applicat ions. Although these topics

    are not comp rehensive, th ey are care-

    ful ly chosen. Some more advanced

    materials are left out or barely men-

    t ioned in the book. Often, these ad-

    van ced top ics are still und er active re-

    search and, thus , a re not mature

    enough to be included in an introduc-

    tory text. Am on g th e m issing topics I

    consider im portan t are (1) approxima-

    t ion propert ies of kern el representa-

    t ion, (2) non Vapn ik-Chervon enkis

    style an d no n -marginstyle th eoretical

    analysis, (3) the relationship betweenSVM and regularizat ion methods in

    stat ist ics and numeric mathematics,

    (4) the impact of different regulariza-

    tion con ditions and th e imp act of dif-

    feren t loss term s, (5) direct-kernel for-

    mulat ions in the primal form (these

    formula t ion s have some advantages

    over the dual-kernel form ulation s con -

    sidered in th e book), and (6) sparse-

    ness properties of the dual-kernel rep-

    resentation in an SVM (this aspect of

    SVM is quite important and relates to

    many ideas that existed before its in-

    troduction).The book starts with a brief intro-

    duct ion to d ifferent learning m odels

    and th en focuses on l inear l earning

    m achines using perceptron s and least

    squares regression. Many other linear

    learning machines are not mentioned

    in th e book, probably because they are

    less relevant to dual-kernel methods.

    Th e con cept of kernel-indu ced feature

    space is developed in a subsequent

    chapter . The authors inc lude some

    novel and very interesting kernel ex-

    amp les, such as a string kernel th at in-

    du ces a feature space con sistin g of all

    possible sub sequen ces in a strin g. Th e

    m ain advan tage of kernel-indu ced fea-

    ture spaces is the ab ility to represent

    n on linear fun ction s lin early in a space

    spanned by the kernel functions. Al-

    though a kerne l representa t ion can

    na tura l ly be obta ined f rom the d ua l

    form of SVMs as presented in the

    boo k, it can also be form ulated d irect-

    ly in a primal-form learn ing m achine.

    The book h as a nicely written chap-

    te r on th e learning th eory aspec t of

    SVMs, mainly from the Vapnik-Cher-

    vonenkis ana lys i s and the maximal

    margin point of view. The authors

    have made substant ial contribut ions

    in th e theoretical developm ent of th efield, an d th e m aterial selection in th is

    chapter reflects their perspective on

    the problem. These theoretical results

    are later used to m otivate som e specif-

    ic SVM form ulation s con sidered in th e

    book. As acknowledged by the au-

    thors , th i s approach only leads to

    learnin g meth ods that reflect what are

    given in the th eoretical boun ds, which

    m igh t not be optim al. In addition, for

    computa t iona l reasons , in the la te r

    proposed formu lations, there are some

    heuristics not directly reflected in the

    theoretical results. For these reasons,th e th eory itself does no t justify SVMs

    as a meth od superior to other stan dard

    statistical meth ods, such as logistic re-

    gression for classification or least

    squares for regression. In fact , one

    m igh t s ta r t wi th n on -SVM formu la-

    tions and develop similar theoretical

    results.

    Following the theoretical analysis

    chapter, the book gives an introdu c-

    t ion of opt imizat ion and Lagrangian

    duali ty th eory, wh ich is an essent ial

    compo nen t of supp ort vector learning.

    Motivated by th e theoret ical boun ds

    given earlier, a nu m ber of specific SVM

    formulations are presented. By using

    th e Lagran gian dual i ty th eory, th ese

    SVMs are transformed in to d ual forms

    using kernel representation. Some im-

    plem ent ation issues th at are useful for

    practitioners are carefully discussed.

    Although this book mainly considers

    the s tandard approach to obta ining

    dual-kernel representat ion using La-

    grangian duality, we can also use ker-

    n el represen tation directly in p rimal-

    form l inear learning machines. The

    lat ter approach does not require the

    Lagrangian du ality th eory and can be

    made eith er equivalen t or n on -equiva-

    lent to th e dual-kern el formu lation .Al though thi s book does not in-

    clude all aspects of learning methods

    related to SVMs, it achieves the right

    balance o f top ic selection , clarity, and

    m athem atical rigor as an in troductory

    text . Th e topics covered in th e book

    are the m ost fun damental and imp or-

    tant ones in support vector learning

    and related kernel meth ods. This book

    is essential both for practitioners who

    want to q uickly learn some implemen -

    tat ion t ricks to apply SVMs to real

    problems an d for th eoretically orien t-

    ed readers who want to unders tandthe m athematical foun dations of the

    underlying theory. It is also a handy

    reference bo ok for researchers workin g

    in th e field.

    Note

    1. www.support-vector.net.

    Tony Zhang received a B.A. in m ath emat -

    ics and comp uter science from Corn ell Uni-

    versity in 1994 an d a Ph.D. in com puter sci-

    ence from Stanford University in 1998.

    Since 1998, h e has been with IBM Research,

    T. J. Watson Research Cen ter, Yorkt ownHeights, New York, where h e is now a re-

    search staff member in the Mathematical

    Sciences Departmen t. His research in terests

    include machine learning, numeric algo-

    rithm s, and t heir applications. His e-mail

    address is tzhan [email protected] .com.

    104 AI MAGAZINE

    Book Reviews

    For Other Good Books on

    Artificial Intelligence,

    Please Visit the AAAI Press Website!

    (www.aaaipress.org)