edited by j.g. carbonell and j. siekmann - home - …978-3-540-74976-9/1.pdf · edited by j.g....

20
Lecture Notes in Artif icial Intelligence 4702 Edited by J. G. Carbonell and J. Siekmann Subseries of Lecture Notes in Computer Science

Upload: ngotu

Post on 23-Sep-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Lecture Notes in Artificial Intelligence 4702

Edited by J. G. Carbonell and J. Siekmann

Subseries of Lecture Notes in Computer Science

Page 2: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Joost N. Kok Jacek Koronacki

Ramon Lopez de Mantaras Stan Matwin

Dunja Mladenic Andrzej Skowron (Eds.)

Knowledge Discoveryin Databases:PKDD 2007

11th European Conference on Principles and Practiceof Knowledge Discovery in DatabasesWarsaw, Poland, September 17-21, 2007Proceedings

13

Page 3: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Series Editors

Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USAJörg Siekmann, University of Saarland, Saarbrücken, Germany

Volume Editors

Joost N. KokLeiden University, The NetherlandsE-mail: [email protected]

Jacek KoronackiPolish Academy of Sciences, Warsaw, PolandE-mail: [email protected]

Ramon Lopez de MantarasSpanish National Research Council (CSIC), Bellaterra, SpainE-mail: [email protected]

Stan MatwinUniversity of Ottawa, Ontario, CanadaE-mail: [email protected]

Dunja MladenicJožef Stefan Institute, Ljubljana, SloveniaE-mail: [email protected]

Andrzej SkowronWarsaw University, PolandE-mail: [email protected]

Library of Congress Control Number: 2007934762

CR Subject Classification (1998): I.2, H.2, J.1, H.3, G.3, I.7, F.4.1

LNCS Sublibrary: SL 7 – Artificial Intelligence

ISSN 0302-9743

ISBN-10 3-540-74975-6 Springer Berlin Heidelberg New York

ISBN-13 978-3-540-74975-2 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.

Springer is a part of Springer Science+Business Media

springer.com

© Springer-Verlag Berlin Heidelberg 2007Printed in Germany

Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12124534 06/3180 5 4 3 2 1 0

Page 4: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Preface

The two premier annual European conferences in the areas of machine learningand data mining have been collocated ever since the first joint conference inFreiburg, 2001. The European Conference on Machine Learning (ECML) tracesits origins to 1986, when the first European Working Session on Learning washeld in Orsay, France. The European Conference on Principles and Practice ofKnowledge Discovery in Databases (PKDD) was first held in 1997 in Trondheim,Norway. Over the years, the ECML/PKDD series has evolved into one of thelargest and most selective international conferences in machine learning anddata mining. In 2007, the seventh collocated ECML/PKDD took place duringSeptember 17–21 on the central campus of Warsaw University and in the nearbyStaszic Palace of the Polish Academy of Sciences.

The conference for the third time used a hierarchical reviewing process. Wenominated 30 Area Chairs, each of them responsible for one sub-field or severalclosely related research topics. Suitable areas were selected on the basis of thesubmission statistics for ECML/PKDD 2006 and for last year’s InternationalConference on Machine Learning (ICML 2006) to ensure a proper load balanceamong the Area Chairs. A joint Program Committee (PC) was nominated for thetwo conferences, consisting of some 300 renowned researchers, mostly proposedby the Area Chairs. This joint PC, the largest of the series to date, allowed usto exploit synergies and deal competently with topic overlaps between ECMLand PKDD.

ECML/PKDD 2007 received 592 abstract submissions. As in previous years,to assist the reviewers and the Area Chairs in their final recommendation authorshad the opportunity to communicate their feedback after the reviewing phase.For a small number of conditionally accepted papers, authors were asked tocarry out minor revisions subject to the final acceptance by the Area Chairresponsible for their submission. With very few exceptions, every full submissionwas reviewed by three PC members. Based on these reviews, on feedback fromthe authors, and on discussions among the reviewers, the Area Chairs provideda recommendation for each paper. The four Program Chairs made the finalprogram decisions following a 2-day meeting in Warsaw in June 2007. Continuingthe tradition of previous events in the series, we accepted full papers with anoral presentation and short papers with a poster presentation. We selected 41 fullpapers and 37 short papers for ECML, and 28 full papers and 35 short papers forPKDD. The acceptance rate for full papers is 11.6% and the overall acceptancerate is 23.8%, in accordance with the high-quality standards of the conferenceseries. Besides the paper and poster sessions, ECML/PKDD 2007 also featured12 workshops, seven tutorials, the ECML/PKDD Discovery Challenge, and theIndustrial Track.

Page 5: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

VI Preface

An excellent slate of Invited Speakers is another strong point of the conferenceprogram. We are grateful to Ricardo Bazea-Yates (Yahoo! Research Barcelona),Peter Flach (University of Bristol), Tom Mitchell (Carnegie Mellon Universi-ty), and Barry Smyth (University College Dublin) for their participation inECML/PKDD 2007. The abstracts of their presentations are included in thisvolume.

We distinguished four outstanding contributions; the awards were generouslysponsored by the Machine Learning Journal and the KD-Ubiq network.

ECML Best Paper: Angela Kimming, Luc De Raedt and Hannu Toivonen:“Probabilistic Explanation-Based Learning”

PKDD Best Paper: Toon Calders and Szymon Jaroszewicz: “Efficient AUC-Optimization for Classification”

ECML Best Student Paper: Daria Sorokina, Rich Caruana, and Mirek Rie-dewald: “Additive Groves of Regression Trees”

PKDD Best Student Paper: Dikan Xing, Wenyuan Dai, Gui-Rong Xue, andYong Yu: “Bridged Refinement for Transfer Learning”

This year we introduced the Industrial Track chaired by Florence d’Alche-Buc(Universite d’Evry-Val d’Essonne) and Marko Grobelnik (Jozef Stefan Institute,Slovenia) consisting of selected talks with a strong industrial component presen-ting research from the area covered by the ECML/PKDD conference.

For the first time in the history of ECML/PKDD, the conference procee-dings were available on-line to conference participants during the conference.We are grateful to Springer for accommodating this new access channel for theproceedings. Inspired by some related conferences (ICML, KDD, ISWC) we in-troduced videorecording, as we would like to save at least the invited talks andpresentations of award papers for the community and make them accessible athttp://videolectures.net/.

This year’s Discovery Challenge was devoted to three problems: user beha-vior prediction from Web traffic logs, HTTP traffic classification, and Sumerianliterature understanding. The Challenge was co-organized by Piotr Ejdys (Gemi-us SA), Hung Son Nguyen (Warsaw University), Pascal Poncelet (EMA-LGI2P)and Jerzy Tyszkiewicz (Warsaw University); 122 teams participated. For thefirst task, the three finalists were:

Malik Tahir Hassan, Khurum Nazir Junejo and Asim Karim from Lahore Uni-versity, Pakistan

Krzysztof Dembczynski and Wojciech Kot�lowski from Poznan University ofTechnology, Poland and Marcin Sydow from Polish-Japanese Institute ofInformation Technology, Poland

Tung-Ying Lee from National Tsing Hua University, Taiwan

Results for the other Discovery Challenge tasks were not available at the timethe proceedings were finalized, but were announced at the conference.

We are all indebted to the Area Chairs, Program Committee members andexternal reviewers for their commitment and hard work that resulted in a rich

Page 6: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Preface VII

but selective scientific program for ECML/PKDD. We are particularly gratefulto those reviewers who helped with additional reviews at very short notice toassist us in a small number of difficult decisions. We further thank our Workshopand Tutorial Chairs Marzena Kryszkiewicz (Warsaw Technical University) andJan Rauch (University of Economics, Prague) for selecting and coordinating the12 workshops and seven tutorial events that accompanied the conference; theworkshop organizers, tutorial presenters, and the organizers of the DiscoveryChallenge and the Industrial track; Richard van de Stadt and CyberChairPROfor competent and flexible support; Warsaw University and the Polish Academyof Sciences (Institute of Computer Science) for their local and organizationalsupport. Special thanks are due to the Local Chair, Marcin Szczuka, WarsawUniversity (assisted by Michal Ciesio�lka from the Polish Academy of Sciences)for the many hours spent making sure that all the details came together to ensurethe success of the conference. Finally, we are grateful to the Steering Committeeand the ECML/PKDD community that entrusted us with the organization ofthe ECML/PKDD 2007.

Most of all, however, we would like to thank all the authors who trusted uswith their submissions, thereby contributing to the one of the main yearly eventsin the life of our vibrant research community.

September 2007 Joost Kok (PKDD Program co-Chair)Jacek Koronacki (General Chair)

Ramon Lopez de Mantaras (General Chair)Stan Matwin (ECML Program co-Chair)

Dunja Mladenic (ECML Program co-Chair)Andrzej Skowron (PKDD Program co-Chair)

Page 7: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Organization

General Chairs

Ramon Lopez de Mantaras (Spanish Council for Scientific Research)Jacek Koronacki (Polish Academy of Sciences)

Program Chairs

Joost N. Kok (Leiden University)Stan Matwin (University of Ottawa and Polish Academy of Sciences)Dunja Mladenic (Jozef Stefan Institute)Andrzej Skowron (Warsaw University)

Local Chairs

Micha�l Ciesio�lka (Polish Academy of Sciences)Marcin Szczuka (Warsaw University)

Tutorial Chair

Jan Rauch (University of Economics, Prague)

Workshop Chair

Marzena Kryszkiewicz (Warsaw University of Technology)

Discovery Challenge Chair

Hung Son Nguyen (Warsaw University)

Industrial Track Chairs

Florence d’Alche-Buc (Universite d’Evry-Val d’Essonne)Marko Grobelnik (Jozef Stefan Institute)

Page 8: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

X Organization

Steering Committee

Jean-Francois Boulicaut Pavel BrazdilRui Camacho Floriana EspositoJohannes Furnkranz Joao GamaFosca Gianotti Alıpio JorgeDino Pedreschi Tobias SchefferMyra Spiliopoulou Luıs Torgo

Area Chairs

Michael R. Berthold Hendrik BlockeelOlivier Chapelle James CussensKurt Driessens Peter FlachEibe Frank Johannes FurnkranzThomas Gartner Joao GamaRayid Ghani Jerzy Grzymala-BusseEamonn Keogh Kristian KerstingMieczys�law A. K�lopotek Stefan KramerPedro Larranaga Claire NedellecAndreas Nurnberger George PaliourasBernhard Pfahringer Enric PlazaLuc De Raedt Tobias SchefferGiovanni Semeraro W�ladys�law SkarbekMyra Spiliopoulou Hannu ToivonenLuıs Torgo Paul Utgoff

Program Committee

Charu C. AggarwalJesus Aguilar-RuizDavid W. AhaNahla Ben AmorSarabjot Singh AnandAnnalisa AppiceJosep-Lluis ArcosWalid G. ArefEva ArmengolAnthony J. BagnallAntonio BahamondeSugato BasuBettina BerendtFrancesco BergadanoRalph BergmannSteffen Bickel

Concha BielzaMikhail BilenkoFrancesco BonchiGianluca BontempiChristian BorgeltKarsten M. BorgwardtDaniel BorrajoAntal van den BoschHenrik BostromMarco BottaJean-Francois BoulicautJanez BrankThorsten BrantsUlf BrefeldCarla E. BrodleyPaul Buitelaar

Page 9: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Organization XI

Toon CaldersLuis M. de CamposNicola CanceddaClaudio CarpinetoJesus CerquidesKaushik ChakrabartiChien-Chung ChanAmanda ClareIra CohenFabrizio CostaSusan CrawBruno CremilleuxTom CroonenborghsJuan Carlos CuberoPadraig CunninghamAndrzej CzyzewskiWalter DaelemansIan DavidsonMarco DegemmisOlivier DelalleauJitender S. DeogunMarcin DetynieckiBelen Diaz-AgudoChris H.Q. DingCarlotta DomeniconiMarek J. DruzdzelSaso DzeroskiTina Eliassi-RadTapio ElomaaAbolfazl Fazel FamiliWei FanAd FeeldersAlan FernGeorge FormanLinda C. van der GaagPatrick GallinariJose A. GamezAlex GammermanMinos N. GarofalakisGemma C. GarrigaEric GaussierPierre GeurtsFosca GianottiAttilio GiordanaRobert L. Givan

Bart GoethalsElisabet GolobardesPedro A. Gonzalez-CaleroMarko GrobelnikDimitrios GunopulosMaria HalkidiMark HallMatthias HeinJose Hernandez-OralloColin de la HigueraMelanie HilarioShoji HiranoTu-Bao HoJaakko HollmenGeoffrey HolmesFrank HoppnerTamas HorvathAndreas HothoJiayuan HuangEyke HullemeierMasahiro InuiguchiInaki InzaManfred JaegerSzymon JaroszewiczRosie JonesEdwin D. de JongAlıpio Mario JorgeTamer KahveciAlexandros KalousisHillol KarguptaAndreas KarwathGeorge KarypisSamuel KaskiDimitar KazakovRoss D. KingFrank KlawonnRalf KlinkenbergGeorge KolliosIgor KononenkoBozena KostekWalter A. KostersMiroslav KubatHalina KwasnickaJames T. KwokNicolas Lachiche

Page 10: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

XII Organization

Michail G. LagoudakisNiels LandwehrPedro LarranagaPavel LaskovMark LastDominique LaurentNada LavracQuoc V. LeGuy LebanonUlf LeserJure LeskovecJessica LinFrancesca A. LisiPasquale LopsJose A. LozanoPeter LucasRichard MaclinDonato MalerbaNikos MamoulisSuresh ManandharStephane Marchand-MailletElena MarchioriLluis MarquezYuji MatsumotoMichael MayMike MayoThorsten MeinlPrem MelvilleRosa MeoTaneli MielikainenBamshad MobasherSerafın MoralKatharina MorikHiroshi MotodaToshinori MunakataIon MusleaOlfa NasraouiJennifer NevilleSiegfried NijssenJoakim NivreAnn NoweArlindo L. OliveiraSanti OntanonMiles OsborneMartijn van Otterlo

David PageSpiros PapadimitriouSrinivasan ParthasarathyAndrea PasseriniJose M. PenaLourdes Pena CastilloJose M. Pena SanchezJames F. PetersJohann PetrakLech PolkowskiHan La PoutrePhilippe PreuxKatharina ProbstTapani RaikoAshwin RamSheela RamannaJan RamonZbigniew W. RasChotirat Ann RatanamahatanaFrancesco RicciJohn RiedlChristophe RigottiCeline RobardetVictor RoblesMarko Robnik-SikonjaJuho RousuCeline RouveirolUlrich Ruckert (TU Munchen)Ulrich Ruckert (Univ. Paderborn)Stefan RupingHenryk RybinskiLorenza SaittaHiroshi SakaiRoberto SantanaMartin ScholzMatthias SchubertMichele SebagSandip SenJouni K. SeppanenGalit ShmueliArno SiebesAlejandro SierraVikas SindhwaniArul SiromoneyDominik Slezak

Page 11: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Organization XIII

Carlos SoaresMaarten van SomerenAlvaro SotoAlessandro SperdutiJaideep SrivastavaJerzy StefanowskiDavid J. StracuzziJan StruyfGerd StummeZbigniew SurajEinoshin SuzukiRoman SwiniarskiMarcin SydowPiotr SynakMarcin SzczukaLuis TalaveraMatthew E. TaylorYannis TheodoridisKai Ming TingLjupco TodorovskiVolker TrespShusaku TsumotoKarl TuylsMichalis VazirgiannisKatja VerbeeckJean-Philippe Vert

Michail VlachosHaixun WangJason Tsong-Li WangTakashi WashioGary M. WeissSholom M. WeissShimon WhitesonMarco WieringSlawomir T. WierzchonGraham J. WilliamsStefan WrobelYing YangJingTao YaoYiyu YaoFrancois YvonBianca ZadroznyMohammed J. ZakiGerson ZaveruchaFilip ZeleznyChengXiang ZhaiYi ZhangZhi-Hua ZhouJerry ZhuWojciech ZiarkoAlbrecht Zimmermann

Additional Reviewers

Rezwan AhmedFabio AiolliDima AlbergVassilis AthitsosMaurizio AtzoriAnne AugerPaulo AzevedoPierpaolo BasileMargherita BerardiAndre BergholzMichele BerlingerioKanishka BhaduriKonstantin BiatovJerzy B�laszczynskiGianluca BontempiYann-ael Le Borgne

Zoran BosnicRemco BouckaertAgnes BraudBjoern BringmannEmma ByrneOlivier CaelenRossella CancelliereGiovanna CastellanoMichelangelo CeciHyuk ChoKamalika DasSouptik DattaUwe DickLaura DietzMarcos DominguesHaimonti Dutta

Page 12: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

XIV Organization

Marc DymetmanStefan EickelerTimm EulerTanja FalkowskiFernando FernandezFrancisco J. Ferrer-TroyanoCesar FerriDaan FierensBlaz FortunaAlexandre FranciscoMingyan GaoFabian GuizaAnna Lisa GentileAmol N. GhotingArnaud GiacomettiValentin GjorgjioskiRobby GoetschalckxDerek GreenePerry GrootPhilip GrothDaniele GunettiBernd GutmannSattar HashemiYann-Michael De HauwereVera HollinkYi HuangLeo IaquintaAlexander IlinTasadduq ImamTao-Yuan JenFelix JungermannAndrzej KaczmarekBenjamin Haibe KainsJuha KarkkainenRohit KateChris KauffmanArto KlamiJiri KlemaDragi KocevChristine KoernerKevin KontosPetra KraljAnita KrishnakumarMatjaz KukarBrian Kulis

Arnd Christian KonigChristine KornerFei Tony LiuAntonio LaTorreAnne LaurentBaoli LiZi LinBin LiuYan LiuCorrado LoglisciRachel LomaskyCarina LopesChuan LuPierre MaheMarkus MaierGiuseppe MancoIrina MatveevaNicola Di MauroDimitrios MavroeidisStijn MeganckIngo MierswaMirjam MinorAbhilash Alexander MirandaJoao MoreiraSourav MukherjeeCanh Hao NguyenDuc Dung NguyenTuan Trung NguyenJanne NikkilaXia NingBlaz NovakIrene NtoutsiRiccardo OrtaleStanis�law OsinskiKivanc OzonatAline PaesPance PanovThomas Brochmann PedersenMaarten PeetersRuggero PensaXuan-Hieu PhanBenjarath PhoophakdeeAloisio Carlos de PinaChristian PlagemannJose M. Puerta

Page 13: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Organization XV

Aritz PerezChedy RaissiM. Jose Ramirez-QuintanaUmaa RebbapragadaStefan ReckowChiara RensoMatthias RenzFrancois RioultDomingo Rodriguez-BaenaSten SagaertLuka SajnEsin SakaSaeed SalemAntonio SalmeronEerika SaviaAnton SchaeferLeander SchietgatGaetano SciosciaHoward ScordioSven Van SegbroeckIvica SlavkovLarisa SoldatovaArnaud SouletEduardo SpynosaVolkmar SterzingChristof StoermannJiang SuPiotr Szczuko

Alexander TartakovskiOlivier TeytaudMarisa ThomaEufemia TinelliIvan TitovRoberto TrasartiGeorge TsatsaronisKatharina TschumitschewDuygu UcarAntonio VarlaroShankar VembuCeline VensMarcos VieiraPeter VrancxNikil WaleChao WangDongrong WenArkadiusz WojnaYuk Wah WongAdam WoznicaMichael WurstWei XuXintian YangMonika ZakovaLuke ZettlemoyerXueyuan ZhouAlbrecht Zimmermann

Page 14: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Sponsors

We wish to express our gratitude to the sponsors of ECML/PKDD 2007 fortheir essential contribution to the conference. We wish to thank Warsaw Uni-versity, Faculty of Mathematics, Informatics and Mechanics, and Institute ofComputer Science, Polish Academy of Sciences for providing financial and orga-nizational means for the conference; the European Office of Aerospace Researchand Developement, Air Force Office of Scientific Research, United States AirForce Research Laboratory, for their generous financial support.1 KDUbiq Eu-ropean Coordination Action for supporting Poster Reception, Student TravelAwards, and the Best Paper Awards; Pascal European Network of Excellencefor sponsoring the Invited Speaker Program, the Industrial Track and the video-recording of the invited talks and presentations of the four Award Papers; JozefStefan Institute, Slovenia, SEKT European Integrated project and Unilever R& D for their financial support; the Machine Learning Journal for supportingthe Student Best Paper Awards; Gemius S.A. for sponsoring and supportingthe Discovery Challenge. We also wish to express our gratitude to the followingcompanies and institutions that provided us with data and expertise which wereessential components of the Discovery Challenge: Bee Ware, l’Ecole des Minesd’Ales, LIRMM - The Montpellier Laboratory of Computer Science, Robotics,and Microelectronics, and Warsaw University, Faculty of Mathematics, Informa-tics and Mechanics. We also acknowledge the support of LOT Polish Airlines.

1 AFOSR/EOARD support is not intended to express or imply endorsement by theU.S. Federal Government.

Page 15: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Table of Contents

Invited Talks

Learning, Information Extraction and the Web . . . . . . . . . . . . . . . . . . . . . . 1Tom M. Mitchell

Putting Things in Order: On the Fundamental Role of Ranking inClassification and Probability Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

Peter A. Flach

Mining Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Ricardo Baeza-Yates

Adventures in Personalized Information Access . . . . . . . . . . . . . . . . . . . . . . 5Barry Smyth

Long Papers

Experiment Databases: Towards an Improved ExperimentalMethodology in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Hendrik Blockeel and Joaquin Vanschoren

Using the Web to Reduce Data Sparseness in Pattern-BasedInformation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

Sebastian Blohm and Philipp Cimiano

A Graphical Model for Content Based Image Suggestion and FeatureSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Sabri Boutemedjet, Djemel Ziou, and Nizar Bouguila

Efficient AUC Optimization for Classification . . . . . . . . . . . . . . . . . . . . . . . . 42Toon Calders and Szymon Jaroszewicz

Finding Transport Proteins in a General Protein Database . . . . . . . . . . . . 54Sanmay Das, Milton H. Saier Jr., and Charles Elkan

Classification of Web Documents Using a Graph-Based Model andStructural Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

Andrzej Dominik, Zbigniew Walczak, and Jacek Wojciechowski

Context-Specific Independence Mixture Modelling for ProteinFamilies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

Benjamin Georgi, Jorg Schultz, and Alexander Schliep

Page 16: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

XX Table of Contents

An Algorithm to Find Overlapping Community Structure inNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

Steve Gregory

Privacy Preserving Market Basket Data Analysis . . . . . . . . . . . . . . . . . . . . . 103Ling Guo, Songtao Guo, and Xintao Wu

Feature Extraction from Sensor Data Streams for Real-Time HumanBehaviour Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

Julia Hunter and Martin Colley

Generating Social Network Features for Link-Based Classification . . . . . . 127Jun Karamon, Yutaka Matsuo, Hikaru Yamamoto, andMitsuru Ishizuka

An Empirical Comparison of Exact Nearest Neighbour Algorithms . . . . . 140Ashraf M. Kibriya and Eibe Frank

Site-Independent Template-Block Detection . . . . . . . . . . . . . . . . . . . . . . . . . 152Aleksander Ko�lcz and Wen-tau Yih

Statistical Model for Rough Set Approach to MulticriteriaClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

Krzysztof Dembczynski, Salvatore Greco, Wojciech Kot�lowski, andRoman S�lowinski

Classification of Anti-learnable Biological and Synthetic Data . . . . . . . . . . 176Adam Kowalczyk

Improved Algorithms for Univariate Discretization of ContinuousFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188

Jussi Kujala and Tapio Elomaa

Efficient Weight Learning for Markov Logic Networks . . . . . . . . . . . . . . . . . 200Daniel Lowd and Pedro Domingos

Classification in Very High Dimensional Problems with Handfuls ofExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212

Mark Palatucci and Tom M. Mitchell

Domain Adaptation of Conditional Probability Models Via FeatureSubsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224

Sandeepkumar Satpal and Sunita Sarawagi

Learning to Detect Adverse Traffic Events from Noisily Labeled Data . . . 236Tomas Singliar and Milos Hauskrecht

IKNN: Informative K-Nearest Neighbor Pattern Classification . . . . . . . . . 248Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, and C. Lee Giles

Page 17: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Table of Contents XXI

Finding Outlying Items inSets of Partial Rankings . . . . . . . . . . . . . . . . . . . 265Antti Ukkonen and Heikki Mannila

Speeding Up Feature Subset Selection Through Mutual InformationRelevance Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277

Gert Van Dijck and Marc M. Van Hulle

A Comparison of Two Approaches to Classify with GuaranteedPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288

Stijn Vanderlooy and Ida G. Sprinkhuizen-Kuyper

Towards Data Mining Without Information on Knowledge Structure . . . 300Alexandre Vautier, Marie-Odile Cordier, and Rene Quiniou

Relaxation Labeling for Selecting and Exploiting Efficiently Non-localDependencies in Sequence Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312

Guillaume Wisniewski and Patrick Gallinari

Bridged Refinement for Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 324Dikan Xing, Wenyuan Dai, Gui-Rong Xue, and Yong Yu

A Prediction-Based Visual Approach for Cluster Exploration andCluster Validation by HOV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336

Ke-Bing Zhang, Mehmet A. Orgun, and Kang Zhang

Short Papers

Flexible Grid-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350Marc-Ismael Akodjenou-Jeannin, Kave Salamatian, andPatrick Gallinari

Polyp Detection in Endoscopic Video Using SVMs . . . . . . . . . . . . . . . . . . . 358Luıs A. Alexandre, Joao Casteleiro, and Nuno Nobre

A Density-Biased Sampling Technique to Improve ClusterRepresentativeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366

Ana Paula Appel, Adriano Arantes Paterlini, Elaine P.M. de Sousa,Agma J.M. Traina, and Caetano Traina Jr.

Expectation Propagation for Rating Players in Sports Competitions . . . . 374Adriana Birlutiu and Tom Heskes

Efficient Closed Pattern Mining in Strongly Accessible Set Systems(Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382

Mario Boley, Tamas Horvath, Axel Poigne, and Stefan Wrobel

Discovering Emerging Patterns in Spatial Databases: A Multi-relationalApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390

Michelangelo Ceci, Annalisa Appice, and Donato Malerba

Page 18: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

XXII Table of Contents

Realistic Synthetic Data for Testing Association Rule MiningAlgorithms for Market Basket Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398

Colin Cooper and Michele Zito

Learning Multi-dimensional Functions: Gas Turbine Engine Modeling . . . 406Chris Drummond

Constructing High Dimensional Feature Space for Time SeriesClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414

Victor Eruhimov, Vladimir Martyanov, and Eugene Tuv

A Dynamic Clustering Algorithm for Mobile Objects . . . . . . . . . . . . . . . . . 422Dominique Fournier, Gaele Simon, and Bruno Mermet

A Method for Multi-relational Classification Using Single andMulti-feature Aggregation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430

Richard Frank, Flavia Moser, and Martin Ester

MINI: Mining Informative Non-redundant Itemsets . . . . . . . . . . . . . . . . . . . 438Arianna Gallo, Tijl De Bie, and Nello Cristianini

Stream-Based Electricity Load Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446Joao Gama and Pedro Pereira Rodrigues

Automatic Hidden Web Database Classification . . . . . . . . . . . . . . . . . . . . . . 454Zhiguo Gong, Jingbai Zhang, and Qian Liu

Pruning Relations for Substructure Discovery of Multi-relationalDatabases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462

Hongyu Guo, Herna L. Viktor, and Eric Paquet

The Most Reliable Subgraph Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471Petteri Hintsanen

Matching Partitions over Time to Reliably Capture Local Clusters inNoisy Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479

Frank Hoppner and Mirko Bottcher

Searching for Better Randomized Response Schemes forPrivacy-Preserving Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487

Zhengli Huang, Wenliang Du, and Zhouxuan Teng

Pre-processing Large Spatial Data Sets with Bayesian Methods . . . . . . . . 498Saara Hyvonen, Esa Junttila, and Marko Salmenkivi

Tag Recommendations in Folksonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506Robert Jaschke, Leandro Marinho, Andreas Hotho,Lars Schmidt-Thieme, and Gerd Stumme

Page 19: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

Table of Contents XXIII

Providing Naıve Bayesian Classifier-Based Private Recommendationson Partitioned Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515

Cihan Kaleli and Huseyin Polat

Multi-party, Privacy-Preserving Distributed Data Mining Using aGame Theoretic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523

Hillol Kargupta, Kamalika Das, and Kun Liu

Multilevel Conditional Fuzzy C-Means Clustering of XML Documents . . 532Michal Kozielski

Uncovering Fraud in Direct Marketing Data with a Fraud AuditingCase Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540

Fletcher Lu

Real Time GPU-Based Fuzzy ART Skin Recognition . . . . . . . . . . . . . . . . . 548Mario Martınez-Zarzuela, Francisco Javier Dıaz Pernas,David Gonzalez Ortega, Jose Fernando Dıez Higuera, andMıriam Anton Rodrıguez

A Cooperative Game Theoretic Approach to Prototype Selection . . . . . . . 556Narayanam Rama Suri, V. Santosh Srinivas, andM. Narasimha Murty

Dynamic Bayesian Networks for Real-Time Classification of SeismicSignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565

Carsten Riggelsen, Matthias Ohrnberger, and Frank Scherbaum

Robust Visual Mining of Data with Error Information . . . . . . . . . . . . . . . . 573Jianyong Sun, Ata Kaban, and Somak Raychaudhury

An Effective Approach to Enhance Centroid Classifier for TextCategorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581

Songbo Tan and Xueqi Cheng

Automatic Categorization of Human-Coded and Evolved CoreWarWarriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589

Nenad Tomasev, Doni Pracner, Milos Radovanovic, andMirjana Ivanovic

Utility-Based Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597Luis Torgo and Rita Ribeiro

Multi-label Lazy Associative Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 605Adriano Veloso, Wagner Meira Jr., Marcos Goncalves, andMohammed Zaki

Visual Exploration of Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613Michail Vlachos, Bahar Taneri, Eamonn Keogh, and Philip S. Yu

Page 20: Edited by J.G. Carbonell and J. Siekmann - Home - …978-3-540-74976-9/1.pdf · Edited by J.G. Carbonell and J. Siekmann ... We are grateful to Ricardo Bazea-Yates (Yahoo! ... Jes´us

XXIV Table of Contents

Association Mining in Large Databases: A Re-examination of ItsMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621

Tianyi Wu, Yuguo Chen, and Jiawei Han

Semantic Text Classification of Emergent Disease Reports . . . . . . . . . . . . . 629Yi Zhang and Bing Liu

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639