edited by j.g. carbonell and j. siekmann - home - …978-3-540-74976-9/1.pdf · edited by j.g....
TRANSCRIPT
Lecture Notes in Artificial Intelligence 4702
Edited by J. G. Carbonell and J. Siekmann
Subseries of Lecture Notes in Computer Science
Joost N. Kok Jacek Koronacki
Ramon Lopez de Mantaras Stan Matwin
Dunja Mladenic Andrzej Skowron (Eds.)
Knowledge Discoveryin Databases:PKDD 2007
11th European Conference on Principles and Practiceof Knowledge Discovery in DatabasesWarsaw, Poland, September 17-21, 2007Proceedings
13
Series Editors
Jaime G. Carbonell, Carnegie Mellon University, Pittsburgh, PA, USAJörg Siekmann, University of Saarland, Saarbrücken, Germany
Volume Editors
Joost N. KokLeiden University, The NetherlandsE-mail: [email protected]
Jacek KoronackiPolish Academy of Sciences, Warsaw, PolandE-mail: [email protected]
Ramon Lopez de MantarasSpanish National Research Council (CSIC), Bellaterra, SpainE-mail: [email protected]
Stan MatwinUniversity of Ottawa, Ontario, CanadaE-mail: [email protected]
Dunja MladenicJožef Stefan Institute, Ljubljana, SloveniaE-mail: [email protected]
Andrzej SkowronWarsaw University, PolandE-mail: [email protected]
Library of Congress Control Number: 2007934762
CR Subject Classification (1998): I.2, H.2, J.1, H.3, G.3, I.7, F.4.1
LNCS Sublibrary: SL 7 – Artificial Intelligence
ISSN 0302-9743
ISBN-10 3-540-74975-6 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-74975-2 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material isconcerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publicationor parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,in its current version, and permission for use must always be obtained from Springer. Violations are liableto prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2007Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, IndiaPrinted on acid-free paper SPIN: 12124534 06/3180 5 4 3 2 1 0
Preface
The two premier annual European conferences in the areas of machine learningand data mining have been collocated ever since the first joint conference inFreiburg, 2001. The European Conference on Machine Learning (ECML) tracesits origins to 1986, when the first European Working Session on Learning washeld in Orsay, France. The European Conference on Principles and Practice ofKnowledge Discovery in Databases (PKDD) was first held in 1997 in Trondheim,Norway. Over the years, the ECML/PKDD series has evolved into one of thelargest and most selective international conferences in machine learning anddata mining. In 2007, the seventh collocated ECML/PKDD took place duringSeptember 17–21 on the central campus of Warsaw University and in the nearbyStaszic Palace of the Polish Academy of Sciences.
The conference for the third time used a hierarchical reviewing process. Wenominated 30 Area Chairs, each of them responsible for one sub-field or severalclosely related research topics. Suitable areas were selected on the basis of thesubmission statistics for ECML/PKDD 2006 and for last year’s InternationalConference on Machine Learning (ICML 2006) to ensure a proper load balanceamong the Area Chairs. A joint Program Committee (PC) was nominated for thetwo conferences, consisting of some 300 renowned researchers, mostly proposedby the Area Chairs. This joint PC, the largest of the series to date, allowed usto exploit synergies and deal competently with topic overlaps between ECMLand PKDD.
ECML/PKDD 2007 received 592 abstract submissions. As in previous years,to assist the reviewers and the Area Chairs in their final recommendation authorshad the opportunity to communicate their feedback after the reviewing phase.For a small number of conditionally accepted papers, authors were asked tocarry out minor revisions subject to the final acceptance by the Area Chairresponsible for their submission. With very few exceptions, every full submissionwas reviewed by three PC members. Based on these reviews, on feedback fromthe authors, and on discussions among the reviewers, the Area Chairs provideda recommendation for each paper. The four Program Chairs made the finalprogram decisions following a 2-day meeting in Warsaw in June 2007. Continuingthe tradition of previous events in the series, we accepted full papers with anoral presentation and short papers with a poster presentation. We selected 41 fullpapers and 37 short papers for ECML, and 28 full papers and 35 short papers forPKDD. The acceptance rate for full papers is 11.6% and the overall acceptancerate is 23.8%, in accordance with the high-quality standards of the conferenceseries. Besides the paper and poster sessions, ECML/PKDD 2007 also featured12 workshops, seven tutorials, the ECML/PKDD Discovery Challenge, and theIndustrial Track.
VI Preface
An excellent slate of Invited Speakers is another strong point of the conferenceprogram. We are grateful to Ricardo Bazea-Yates (Yahoo! Research Barcelona),Peter Flach (University of Bristol), Tom Mitchell (Carnegie Mellon Universi-ty), and Barry Smyth (University College Dublin) for their participation inECML/PKDD 2007. The abstracts of their presentations are included in thisvolume.
We distinguished four outstanding contributions; the awards were generouslysponsored by the Machine Learning Journal and the KD-Ubiq network.
ECML Best Paper: Angela Kimming, Luc De Raedt and Hannu Toivonen:“Probabilistic Explanation-Based Learning”
PKDD Best Paper: Toon Calders and Szymon Jaroszewicz: “Efficient AUC-Optimization for Classification”
ECML Best Student Paper: Daria Sorokina, Rich Caruana, and Mirek Rie-dewald: “Additive Groves of Regression Trees”
PKDD Best Student Paper: Dikan Xing, Wenyuan Dai, Gui-Rong Xue, andYong Yu: “Bridged Refinement for Transfer Learning”
This year we introduced the Industrial Track chaired by Florence d’Alche-Buc(Universite d’Evry-Val d’Essonne) and Marko Grobelnik (Jozef Stefan Institute,Slovenia) consisting of selected talks with a strong industrial component presen-ting research from the area covered by the ECML/PKDD conference.
For the first time in the history of ECML/PKDD, the conference procee-dings were available on-line to conference participants during the conference.We are grateful to Springer for accommodating this new access channel for theproceedings. Inspired by some related conferences (ICML, KDD, ISWC) we in-troduced videorecording, as we would like to save at least the invited talks andpresentations of award papers for the community and make them accessible athttp://videolectures.net/.
This year’s Discovery Challenge was devoted to three problems: user beha-vior prediction from Web traffic logs, HTTP traffic classification, and Sumerianliterature understanding. The Challenge was co-organized by Piotr Ejdys (Gemi-us SA), Hung Son Nguyen (Warsaw University), Pascal Poncelet (EMA-LGI2P)and Jerzy Tyszkiewicz (Warsaw University); 122 teams participated. For thefirst task, the three finalists were:
Malik Tahir Hassan, Khurum Nazir Junejo and Asim Karim from Lahore Uni-versity, Pakistan
Krzysztof Dembczynski and Wojciech Kot�lowski from Poznan University ofTechnology, Poland and Marcin Sydow from Polish-Japanese Institute ofInformation Technology, Poland
Tung-Ying Lee from National Tsing Hua University, Taiwan
Results for the other Discovery Challenge tasks were not available at the timethe proceedings were finalized, but were announced at the conference.
We are all indebted to the Area Chairs, Program Committee members andexternal reviewers for their commitment and hard work that resulted in a rich
Preface VII
but selective scientific program for ECML/PKDD. We are particularly gratefulto those reviewers who helped with additional reviews at very short notice toassist us in a small number of difficult decisions. We further thank our Workshopand Tutorial Chairs Marzena Kryszkiewicz (Warsaw Technical University) andJan Rauch (University of Economics, Prague) for selecting and coordinating the12 workshops and seven tutorial events that accompanied the conference; theworkshop organizers, tutorial presenters, and the organizers of the DiscoveryChallenge and the Industrial track; Richard van de Stadt and CyberChairPROfor competent and flexible support; Warsaw University and the Polish Academyof Sciences (Institute of Computer Science) for their local and organizationalsupport. Special thanks are due to the Local Chair, Marcin Szczuka, WarsawUniversity (assisted by Michal Ciesio�lka from the Polish Academy of Sciences)for the many hours spent making sure that all the details came together to ensurethe success of the conference. Finally, we are grateful to the Steering Committeeand the ECML/PKDD community that entrusted us with the organization ofthe ECML/PKDD 2007.
Most of all, however, we would like to thank all the authors who trusted uswith their submissions, thereby contributing to the one of the main yearly eventsin the life of our vibrant research community.
September 2007 Joost Kok (PKDD Program co-Chair)Jacek Koronacki (General Chair)
Ramon Lopez de Mantaras (General Chair)Stan Matwin (ECML Program co-Chair)
Dunja Mladenic (ECML Program co-Chair)Andrzej Skowron (PKDD Program co-Chair)
Organization
General Chairs
Ramon Lopez de Mantaras (Spanish Council for Scientific Research)Jacek Koronacki (Polish Academy of Sciences)
Program Chairs
Joost N. Kok (Leiden University)Stan Matwin (University of Ottawa and Polish Academy of Sciences)Dunja Mladenic (Jozef Stefan Institute)Andrzej Skowron (Warsaw University)
Local Chairs
Micha�l Ciesio�lka (Polish Academy of Sciences)Marcin Szczuka (Warsaw University)
Tutorial Chair
Jan Rauch (University of Economics, Prague)
Workshop Chair
Marzena Kryszkiewicz (Warsaw University of Technology)
Discovery Challenge Chair
Hung Son Nguyen (Warsaw University)
Industrial Track Chairs
Florence d’Alche-Buc (Universite d’Evry-Val d’Essonne)Marko Grobelnik (Jozef Stefan Institute)
X Organization
Steering Committee
Jean-Francois Boulicaut Pavel BrazdilRui Camacho Floriana EspositoJohannes Furnkranz Joao GamaFosca Gianotti Alıpio JorgeDino Pedreschi Tobias SchefferMyra Spiliopoulou Luıs Torgo
Area Chairs
Michael R. Berthold Hendrik BlockeelOlivier Chapelle James CussensKurt Driessens Peter FlachEibe Frank Johannes FurnkranzThomas Gartner Joao GamaRayid Ghani Jerzy Grzymala-BusseEamonn Keogh Kristian KerstingMieczys�law A. K�lopotek Stefan KramerPedro Larranaga Claire NedellecAndreas Nurnberger George PaliourasBernhard Pfahringer Enric PlazaLuc De Raedt Tobias SchefferGiovanni Semeraro W�ladys�law SkarbekMyra Spiliopoulou Hannu ToivonenLuıs Torgo Paul Utgoff
Program Committee
Charu C. AggarwalJesus Aguilar-RuizDavid W. AhaNahla Ben AmorSarabjot Singh AnandAnnalisa AppiceJosep-Lluis ArcosWalid G. ArefEva ArmengolAnthony J. BagnallAntonio BahamondeSugato BasuBettina BerendtFrancesco BergadanoRalph BergmannSteffen Bickel
Concha BielzaMikhail BilenkoFrancesco BonchiGianluca BontempiChristian BorgeltKarsten M. BorgwardtDaniel BorrajoAntal van den BoschHenrik BostromMarco BottaJean-Francois BoulicautJanez BrankThorsten BrantsUlf BrefeldCarla E. BrodleyPaul Buitelaar
Organization XI
Toon CaldersLuis M. de CamposNicola CanceddaClaudio CarpinetoJesus CerquidesKaushik ChakrabartiChien-Chung ChanAmanda ClareIra CohenFabrizio CostaSusan CrawBruno CremilleuxTom CroonenborghsJuan Carlos CuberoPadraig CunninghamAndrzej CzyzewskiWalter DaelemansIan DavidsonMarco DegemmisOlivier DelalleauJitender S. DeogunMarcin DetynieckiBelen Diaz-AgudoChris H.Q. DingCarlotta DomeniconiMarek J. DruzdzelSaso DzeroskiTina Eliassi-RadTapio ElomaaAbolfazl Fazel FamiliWei FanAd FeeldersAlan FernGeorge FormanLinda C. van der GaagPatrick GallinariJose A. GamezAlex GammermanMinos N. GarofalakisGemma C. GarrigaEric GaussierPierre GeurtsFosca GianottiAttilio GiordanaRobert L. Givan
Bart GoethalsElisabet GolobardesPedro A. Gonzalez-CaleroMarko GrobelnikDimitrios GunopulosMaria HalkidiMark HallMatthias HeinJose Hernandez-OralloColin de la HigueraMelanie HilarioShoji HiranoTu-Bao HoJaakko HollmenGeoffrey HolmesFrank HoppnerTamas HorvathAndreas HothoJiayuan HuangEyke HullemeierMasahiro InuiguchiInaki InzaManfred JaegerSzymon JaroszewiczRosie JonesEdwin D. de JongAlıpio Mario JorgeTamer KahveciAlexandros KalousisHillol KarguptaAndreas KarwathGeorge KarypisSamuel KaskiDimitar KazakovRoss D. KingFrank KlawonnRalf KlinkenbergGeorge KolliosIgor KononenkoBozena KostekWalter A. KostersMiroslav KubatHalina KwasnickaJames T. KwokNicolas Lachiche
XII Organization
Michail G. LagoudakisNiels LandwehrPedro LarranagaPavel LaskovMark LastDominique LaurentNada LavracQuoc V. LeGuy LebanonUlf LeserJure LeskovecJessica LinFrancesca A. LisiPasquale LopsJose A. LozanoPeter LucasRichard MaclinDonato MalerbaNikos MamoulisSuresh ManandharStephane Marchand-MailletElena MarchioriLluis MarquezYuji MatsumotoMichael MayMike MayoThorsten MeinlPrem MelvilleRosa MeoTaneli MielikainenBamshad MobasherSerafın MoralKatharina MorikHiroshi MotodaToshinori MunakataIon MusleaOlfa NasraouiJennifer NevilleSiegfried NijssenJoakim NivreAnn NoweArlindo L. OliveiraSanti OntanonMiles OsborneMartijn van Otterlo
David PageSpiros PapadimitriouSrinivasan ParthasarathyAndrea PasseriniJose M. PenaLourdes Pena CastilloJose M. Pena SanchezJames F. PetersJohann PetrakLech PolkowskiHan La PoutrePhilippe PreuxKatharina ProbstTapani RaikoAshwin RamSheela RamannaJan RamonZbigniew W. RasChotirat Ann RatanamahatanaFrancesco RicciJohn RiedlChristophe RigottiCeline RobardetVictor RoblesMarko Robnik-SikonjaJuho RousuCeline RouveirolUlrich Ruckert (TU Munchen)Ulrich Ruckert (Univ. Paderborn)Stefan RupingHenryk RybinskiLorenza SaittaHiroshi SakaiRoberto SantanaMartin ScholzMatthias SchubertMichele SebagSandip SenJouni K. SeppanenGalit ShmueliArno SiebesAlejandro SierraVikas SindhwaniArul SiromoneyDominik Slezak
Organization XIII
Carlos SoaresMaarten van SomerenAlvaro SotoAlessandro SperdutiJaideep SrivastavaJerzy StefanowskiDavid J. StracuzziJan StruyfGerd StummeZbigniew SurajEinoshin SuzukiRoman SwiniarskiMarcin SydowPiotr SynakMarcin SzczukaLuis TalaveraMatthew E. TaylorYannis TheodoridisKai Ming TingLjupco TodorovskiVolker TrespShusaku TsumotoKarl TuylsMichalis VazirgiannisKatja VerbeeckJean-Philippe Vert
Michail VlachosHaixun WangJason Tsong-Li WangTakashi WashioGary M. WeissSholom M. WeissShimon WhitesonMarco WieringSlawomir T. WierzchonGraham J. WilliamsStefan WrobelYing YangJingTao YaoYiyu YaoFrancois YvonBianca ZadroznyMohammed J. ZakiGerson ZaveruchaFilip ZeleznyChengXiang ZhaiYi ZhangZhi-Hua ZhouJerry ZhuWojciech ZiarkoAlbrecht Zimmermann
Additional Reviewers
Rezwan AhmedFabio AiolliDima AlbergVassilis AthitsosMaurizio AtzoriAnne AugerPaulo AzevedoPierpaolo BasileMargherita BerardiAndre BergholzMichele BerlingerioKanishka BhaduriKonstantin BiatovJerzy B�laszczynskiGianluca BontempiYann-ael Le Borgne
Zoran BosnicRemco BouckaertAgnes BraudBjoern BringmannEmma ByrneOlivier CaelenRossella CancelliereGiovanna CastellanoMichelangelo CeciHyuk ChoKamalika DasSouptik DattaUwe DickLaura DietzMarcos DominguesHaimonti Dutta
XIV Organization
Marc DymetmanStefan EickelerTimm EulerTanja FalkowskiFernando FernandezFrancisco J. Ferrer-TroyanoCesar FerriDaan FierensBlaz FortunaAlexandre FranciscoMingyan GaoFabian GuizaAnna Lisa GentileAmol N. GhotingArnaud GiacomettiValentin GjorgjioskiRobby GoetschalckxDerek GreenePerry GrootPhilip GrothDaniele GunettiBernd GutmannSattar HashemiYann-Michael De HauwereVera HollinkYi HuangLeo IaquintaAlexander IlinTasadduq ImamTao-Yuan JenFelix JungermannAndrzej KaczmarekBenjamin Haibe KainsJuha KarkkainenRohit KateChris KauffmanArto KlamiJiri KlemaDragi KocevChristine KoernerKevin KontosPetra KraljAnita KrishnakumarMatjaz KukarBrian Kulis
Arnd Christian KonigChristine KornerFei Tony LiuAntonio LaTorreAnne LaurentBaoli LiZi LinBin LiuYan LiuCorrado LoglisciRachel LomaskyCarina LopesChuan LuPierre MaheMarkus MaierGiuseppe MancoIrina MatveevaNicola Di MauroDimitrios MavroeidisStijn MeganckIngo MierswaMirjam MinorAbhilash Alexander MirandaJoao MoreiraSourav MukherjeeCanh Hao NguyenDuc Dung NguyenTuan Trung NguyenJanne NikkilaXia NingBlaz NovakIrene NtoutsiRiccardo OrtaleStanis�law OsinskiKivanc OzonatAline PaesPance PanovThomas Brochmann PedersenMaarten PeetersRuggero PensaXuan-Hieu PhanBenjarath PhoophakdeeAloisio Carlos de PinaChristian PlagemannJose M. Puerta
Organization XV
Aritz PerezChedy RaissiM. Jose Ramirez-QuintanaUmaa RebbapragadaStefan ReckowChiara RensoMatthias RenzFrancois RioultDomingo Rodriguez-BaenaSten SagaertLuka SajnEsin SakaSaeed SalemAntonio SalmeronEerika SaviaAnton SchaeferLeander SchietgatGaetano SciosciaHoward ScordioSven Van SegbroeckIvica SlavkovLarisa SoldatovaArnaud SouletEduardo SpynosaVolkmar SterzingChristof StoermannJiang SuPiotr Szczuko
Alexander TartakovskiOlivier TeytaudMarisa ThomaEufemia TinelliIvan TitovRoberto TrasartiGeorge TsatsaronisKatharina TschumitschewDuygu UcarAntonio VarlaroShankar VembuCeline VensMarcos VieiraPeter VrancxNikil WaleChao WangDongrong WenArkadiusz WojnaYuk Wah WongAdam WoznicaMichael WurstWei XuXintian YangMonika ZakovaLuke ZettlemoyerXueyuan ZhouAlbrecht Zimmermann
Sponsors
We wish to express our gratitude to the sponsors of ECML/PKDD 2007 fortheir essential contribution to the conference. We wish to thank Warsaw Uni-versity, Faculty of Mathematics, Informatics and Mechanics, and Institute ofComputer Science, Polish Academy of Sciences for providing financial and orga-nizational means for the conference; the European Office of Aerospace Researchand Developement, Air Force Office of Scientific Research, United States AirForce Research Laboratory, for their generous financial support.1 KDUbiq Eu-ropean Coordination Action for supporting Poster Reception, Student TravelAwards, and the Best Paper Awards; Pascal European Network of Excellencefor sponsoring the Invited Speaker Program, the Industrial Track and the video-recording of the invited talks and presentations of the four Award Papers; JozefStefan Institute, Slovenia, SEKT European Integrated project and Unilever R& D for their financial support; the Machine Learning Journal for supportingthe Student Best Paper Awards; Gemius S.A. for sponsoring and supportingthe Discovery Challenge. We also wish to express our gratitude to the followingcompanies and institutions that provided us with data and expertise which wereessential components of the Discovery Challenge: Bee Ware, l’Ecole des Minesd’Ales, LIRMM - The Montpellier Laboratory of Computer Science, Robotics,and Microelectronics, and Warsaw University, Faculty of Mathematics, Informa-tics and Mechanics. We also acknowledge the support of LOT Polish Airlines.
1 AFOSR/EOARD support is not intended to express or imply endorsement by theU.S. Federal Government.
Table of Contents
Invited Talks
Learning, Information Extraction and the Web . . . . . . . . . . . . . . . . . . . . . . 1Tom M. Mitchell
Putting Things in Order: On the Fundamental Role of Ranking inClassification and Probability Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Peter A. Flach
Mining Queries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4Ricardo Baeza-Yates
Adventures in Personalized Information Access . . . . . . . . . . . . . . . . . . . . . . 5Barry Smyth
Long Papers
Experiment Databases: Towards an Improved ExperimentalMethodology in Machine Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Hendrik Blockeel and Joaquin Vanschoren
Using the Web to Reduce Data Sparseness in Pattern-BasedInformation Extraction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
Sebastian Blohm and Philipp Cimiano
A Graphical Model for Content Based Image Suggestion and FeatureSelection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Sabri Boutemedjet, Djemel Ziou, and Nizar Bouguila
Efficient AUC Optimization for Classification . . . . . . . . . . . . . . . . . . . . . . . . 42Toon Calders and Szymon Jaroszewicz
Finding Transport Proteins in a General Protein Database . . . . . . . . . . . . 54Sanmay Das, Milton H. Saier Jr., and Charles Elkan
Classification of Web Documents Using a Graph-Based Model andStructural Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
Andrzej Dominik, Zbigniew Walczak, and Jacek Wojciechowski
Context-Specific Independence Mixture Modelling for ProteinFamilies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
Benjamin Georgi, Jorg Schultz, and Alexander Schliep
XX Table of Contents
An Algorithm to Find Overlapping Community Structure inNetworks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
Steve Gregory
Privacy Preserving Market Basket Data Analysis . . . . . . . . . . . . . . . . . . . . . 103Ling Guo, Songtao Guo, and Xintao Wu
Feature Extraction from Sensor Data Streams for Real-Time HumanBehaviour Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
Julia Hunter and Martin Colley
Generating Social Network Features for Link-Based Classification . . . . . . 127Jun Karamon, Yutaka Matsuo, Hikaru Yamamoto, andMitsuru Ishizuka
An Empirical Comparison of Exact Nearest Neighbour Algorithms . . . . . 140Ashraf M. Kibriya and Eibe Frank
Site-Independent Template-Block Detection . . . . . . . . . . . . . . . . . . . . . . . . . 152Aleksander Ko�lcz and Wen-tau Yih
Statistical Model for Rough Set Approach to MulticriteriaClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Krzysztof Dembczynski, Salvatore Greco, Wojciech Kot�lowski, andRoman S�lowinski
Classification of Anti-learnable Biological and Synthetic Data . . . . . . . . . . 176Adam Kowalczyk
Improved Algorithms for Univariate Discretization of ContinuousFeatures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Jussi Kujala and Tapio Elomaa
Efficient Weight Learning for Markov Logic Networks . . . . . . . . . . . . . . . . . 200Daniel Lowd and Pedro Domingos
Classification in Very High Dimensional Problems with Handfuls ofExamples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Mark Palatucci and Tom M. Mitchell
Domain Adaptation of Conditional Probability Models Via FeatureSubsetting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
Sandeepkumar Satpal and Sunita Sarawagi
Learning to Detect Adverse Traffic Events from Noisily Labeled Data . . . 236Tomas Singliar and Milos Hauskrecht
IKNN: Informative K-Nearest Neighbor Pattern Classification . . . . . . . . . 248Yang Song, Jian Huang, Ding Zhou, Hongyuan Zha, and C. Lee Giles
Table of Contents XXI
Finding Outlying Items inSets of Partial Rankings . . . . . . . . . . . . . . . . . . . 265Antti Ukkonen and Heikki Mannila
Speeding Up Feature Subset Selection Through Mutual InformationRelevance Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 277
Gert Van Dijck and Marc M. Van Hulle
A Comparison of Two Approaches to Classify with GuaranteedPerformance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 288
Stijn Vanderlooy and Ida G. Sprinkhuizen-Kuyper
Towards Data Mining Without Information on Knowledge Structure . . . 300Alexandre Vautier, Marie-Odile Cordier, and Rene Quiniou
Relaxation Labeling for Selecting and Exploiting Efficiently Non-localDependencies in Sequence Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 312
Guillaume Wisniewski and Patrick Gallinari
Bridged Refinement for Transfer Learning . . . . . . . . . . . . . . . . . . . . . . . . . . . 324Dikan Xing, Wenyuan Dai, Gui-Rong Xue, and Yong Yu
A Prediction-Based Visual Approach for Cluster Exploration andCluster Validation by HOV3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 336
Ke-Bing Zhang, Mehmet A. Orgun, and Kang Zhang
Short Papers
Flexible Grid-Based Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350Marc-Ismael Akodjenou-Jeannin, Kave Salamatian, andPatrick Gallinari
Polyp Detection in Endoscopic Video Using SVMs . . . . . . . . . . . . . . . . . . . 358Luıs A. Alexandre, Joao Casteleiro, and Nuno Nobre
A Density-Biased Sampling Technique to Improve ClusterRepresentativeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
Ana Paula Appel, Adriano Arantes Paterlini, Elaine P.M. de Sousa,Agma J.M. Traina, and Caetano Traina Jr.
Expectation Propagation for Rating Players in Sports Competitions . . . . 374Adriana Birlutiu and Tom Heskes
Efficient Closed Pattern Mining in Strongly Accessible Set Systems(Extended Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382
Mario Boley, Tamas Horvath, Axel Poigne, and Stefan Wrobel
Discovering Emerging Patterns in Spatial Databases: A Multi-relationalApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 390
Michelangelo Ceci, Annalisa Appice, and Donato Malerba
XXII Table of Contents
Realistic Synthetic Data for Testing Association Rule MiningAlgorithms for Market Basket Databases . . . . . . . . . . . . . . . . . . . . . . . . . . . . 398
Colin Cooper and Michele Zito
Learning Multi-dimensional Functions: Gas Turbine Engine Modeling . . . 406Chris Drummond
Constructing High Dimensional Feature Space for Time SeriesClassification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 414
Victor Eruhimov, Vladimir Martyanov, and Eugene Tuv
A Dynamic Clustering Algorithm for Mobile Objects . . . . . . . . . . . . . . . . . 422Dominique Fournier, Gaele Simon, and Bruno Mermet
A Method for Multi-relational Classification Using Single andMulti-feature Aggregation Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 430
Richard Frank, Flavia Moser, and Martin Ester
MINI: Mining Informative Non-redundant Itemsets . . . . . . . . . . . . . . . . . . . 438Arianna Gallo, Tijl De Bie, and Nello Cristianini
Stream-Based Electricity Load Forecast . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 446Joao Gama and Pedro Pereira Rodrigues
Automatic Hidden Web Database Classification . . . . . . . . . . . . . . . . . . . . . . 454Zhiguo Gong, Jingbai Zhang, and Qian Liu
Pruning Relations for Substructure Discovery of Multi-relationalDatabases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 462
Hongyu Guo, Herna L. Viktor, and Eric Paquet
The Most Reliable Subgraph Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471Petteri Hintsanen
Matching Partitions over Time to Reliably Capture Local Clusters inNoisy Domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 479
Frank Hoppner and Mirko Bottcher
Searching for Better Randomized Response Schemes forPrivacy-Preserving Data Mining . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 487
Zhengli Huang, Wenliang Du, and Zhouxuan Teng
Pre-processing Large Spatial Data Sets with Bayesian Methods . . . . . . . . 498Saara Hyvonen, Esa Junttila, and Marko Salmenkivi
Tag Recommendations in Folksonomies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506Robert Jaschke, Leandro Marinho, Andreas Hotho,Lars Schmidt-Thieme, and Gerd Stumme
Table of Contents XXIII
Providing Naıve Bayesian Classifier-Based Private Recommendationson Partitioned Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 515
Cihan Kaleli and Huseyin Polat
Multi-party, Privacy-Preserving Distributed Data Mining Using aGame Theoretic Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 523
Hillol Kargupta, Kamalika Das, and Kun Liu
Multilevel Conditional Fuzzy C-Means Clustering of XML Documents . . 532Michal Kozielski
Uncovering Fraud in Direct Marketing Data with a Fraud AuditingCase Builder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 540
Fletcher Lu
Real Time GPU-Based Fuzzy ART Skin Recognition . . . . . . . . . . . . . . . . . 548Mario Martınez-Zarzuela, Francisco Javier Dıaz Pernas,David Gonzalez Ortega, Jose Fernando Dıez Higuera, andMıriam Anton Rodrıguez
A Cooperative Game Theoretic Approach to Prototype Selection . . . . . . . 556Narayanam Rama Suri, V. Santosh Srinivas, andM. Narasimha Murty
Dynamic Bayesian Networks for Real-Time Classification of SeismicSignals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 565
Carsten Riggelsen, Matthias Ohrnberger, and Frank Scherbaum
Robust Visual Mining of Data with Error Information . . . . . . . . . . . . . . . . 573Jianyong Sun, Ata Kaban, and Somak Raychaudhury
An Effective Approach to Enhance Centroid Classifier for TextCategorization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 581
Songbo Tan and Xueqi Cheng
Automatic Categorization of Human-Coded and Evolved CoreWarWarriors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 589
Nenad Tomasev, Doni Pracner, Milos Radovanovic, andMirjana Ivanovic
Utility-Based Regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 597Luis Torgo and Rita Ribeiro
Multi-label Lazy Associative Classification . . . . . . . . . . . . . . . . . . . . . . . . . . 605Adriano Veloso, Wagner Meira Jr., Marcos Goncalves, andMohammed Zaki
Visual Exploration of Genomic Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 613Michail Vlachos, Bahar Taneri, Eamonn Keogh, and Philip S. Yu
XXIV Table of Contents
Association Mining in Large Databases: A Re-examination of ItsMeasures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
Tianyi Wu, Yuguo Chen, and Jiawei Han
Semantic Text Classification of Emergent Disease Reports . . . . . . . . . . . . . 629Yi Zhang and Bing Liu
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639