data mining: concepts, models, methods and algorithms, mehmed kantarzic, paperback, ieee...

2
QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL Qual. Reliab. Engng. Int. 2005; 21:427–428 Published online in Wiley InterScience (www.interscience.wiley.com). BOOK REVIEWS Systems Reliability and Failure Prevention, Herbert Hecht, Artech House, 2004, 230 pages, £55.00. (Originally reviewed for The Aeronautical Journal published by the Royal Aeronautical Society. Published here with permission.) The author is vice-chairman of a company that is involved in consulting work on ‘high dependability systems’, and he has worked with government and academic bodies in the U.S.A. on the reliability and safety of such systems. This book should, therefore, be expected to present the best and most modern ideas and methods. Unfortunately, it disappoints. It provides competent and practical descriptions of some reliability improvement methods, such as failure modes and effects analysis and sneak circuit analysis, redundancy design techniques, software reliability issues, and reliability programme management, but there are too many omissions and even misleading advice for me to recommend it. The most important omissions are an absence of any mention or discussion of accelerated test techniques, or of the role of manufacturing quality in ensuring product reliability. (In Chapter 8, the product ‘life cycle’ is stated to consist of ‘concept, development, and operation and maintenance’. Forgetting about the essential phase of manufacture and its contributions is a common failing of writers on engineering reliability and management.) The author makes reference to only one other book on reliability engineering, and that was published in 1977. The topic with the greatest potential to mislead is the treatment of the economics of reliability, to which a whole chapter is devoted. Much of this flies in the face of Deming’s fundamentally correct teaching that improvements in quality always result in lower total costs, and of the reality that forecasts of future reliability values almost always entail levels of uncertainty that undermine the validity of the kinds of analyses presented. The book includes interesting stories of some well- known system failures, and some examples of applications of the methods described. However, it falls well short of being a definitive source for such an important topic. PATRICK O’CONNOR (DOI: 10.1002/qre.649) Data Mining: Concepts, Models, Methods and Algorithms, Mehmed Kantarzic, Paperback, IEEE Press/Wiley, 2001, xii + 345 pages, $74.95. Machine learning, neural networks, genetic algorithms and fuzzy logic are terms which only a few years evoked awe and respect for the person uttering them. More recently these data mining techniques (and others), which were largely developed within the artificial intelligence community, have entered the mainstream as techniques for analyzing data that offer advantages over classical statistical methods. The advantages are particularly relevant to the analysis of large complex databases. Data mining, as distinguished from traditional analytical methods, uses automated, computationally intensive and usually non-parametric procedures to find patterns in data. When I first became interested in data mining about 10 years ago, it was hard to find comprehensive literature which could serve as an introduction to the topic for the novice. In fact, much of the material was in journals where the presentation was very technical (i.e. a challenge to understand). In more recent years this situation has been remedied with the release of a number of books aimed at introducing people involved in data analysis to the topic. Data Mining: Concepts, Models, Methods and Algorithms offers a very readable and up-to-date introduction to data mining. A point in its favor is that the book includes a chapter devoted to data preparation. One cannot overstate the importance of this step to the data mining process. The chapter discusses missing data, outlier detection and the use of transformations such as normalization and smoothing. A rule of thumb is that the data management, cleaning and transformation processes will consume 90% or more of the data mining effort. Another unique feature of the book is a chapter devoted to data reduction. Given the massive databases analyzed in data mining projects, methods are often needed to reduce the number of records (through sampling) and the number of variables (through variable selection or combining variables to produce a smaller number of variables in total). The book reviews the major data mining methods. The survey includes a brief overview of classical approaches such as regression and discriminant analysis, though the potential reader should note that knowledge of conventional statistical methods is a prerequisite for understanding much of the material in the book. Data mining methods introduced include clustering, association rules, neural networks, decision trees, genetic algorithms and fuzzy logic. This is a pretty comprehensive and up-to-date collection of methods. A final chapter on visualization presents some useful and, in some cases, relatively recently developed methods used to analyze and present data graphically. In general, the level of discussion in the book is introductory and can be followed by those new to the discipline. Relatively simple examples are used to motivate an understanding of the methods. However, there were a few places where I needed to read a passage a few times Copyright c 2005 John Wiley & Sons, Ltd.

Upload: louise-francis

Post on 06-Jul-2016

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Data Mining: Concepts, Models, Methods and Algorithms, Mehmed Kantarzic, Paperback, IEEE Press/Wiley, 2001, xii + 345 pages

QUALITY AND RELIABILITY ENGINEERING INTERNATIONAL

Qual. Reliab. Engng. Int. 2005; 21:427–428

Published online in Wiley InterScience (www.interscience.wiley.com).

BOOK REVIEWS

Systems Reliability and Failure Prevention, HerbertHecht, Artech House, 2004, 230 pages, £55.00.(Originally reviewed for The Aeronautical Journalpublished by the Royal Aeronautical Society. Publishedhere with permission.)

The author is vice-chairman of a company that is involvedin consulting work on ‘high dependability systems’, andhe has worked with government and academic bodies inthe U.S.A. on the reliability and safety of such systems.This book should, therefore, be expected to present the bestand most modern ideas and methods.

Unfortunately, it disappoints. It provides competentand practical descriptions of some reliability improvementmethods, such as failure modes and effects analysisand sneak circuit analysis, redundancy design techniques,software reliability issues, and reliability programmemanagement, but there are too many omissions and evenmisleading advice for me to recommend it.

The most important omissions are an absence of anymention or discussion of accelerated test techniques, orof the role of manufacturing quality in ensuring productreliability. (In Chapter 8, the product ‘life cycle’ is statedto consist of ‘concept, development, and operation andmaintenance’. Forgetting about the essential phase ofmanufacture and its contributions is a common failingof writers on engineering reliability and management.)The author makes reference to only one other book onreliability engineering, and that was published in 1977.

The topic with the greatest potential to mislead isthe treatment of the economics of reliability, to whicha whole chapter is devoted. Much of this flies in theface of Deming’s fundamentally correct teaching thatimprovements in quality always result in lower total costs,and of the reality that forecasts of future reliability valuesalmost always entail levels of uncertainty that underminethe validity of the kinds of analyses presented.

The book includes interesting stories of some well-known system failures, and some examples of applicationsof the methods described. However, it falls well short ofbeing a definitive source for such an important topic.

PATRICK O’CONNOR

(DOI: 10.1002/qre.649)

Data Mining: Concepts, Models, Methods andAlgorithms, Mehmed Kantarzic, Paperback, IEEEPress/Wiley, 2001, xii + 345 pages, $74.95.

Machine learning, neural networks, genetic algorithms andfuzzy logic are terms which only a few years evoked aweand respect for the person uttering them. More recentlythese data mining techniques (and others), which

were largely developed within the artificial intelligencecommunity, have entered the mainstream as techniquesfor analyzing data that offer advantages over classicalstatistical methods. The advantages are particularlyrelevant to the analysis of large complex databases.Data mining, as distinguished from traditional analyticalmethods, uses automated, computationally intensive andusually non-parametric procedures to find patterns in data.When I first became interested in data mining about10 years ago, it was hard to find comprehensive literaturewhich could serve as an introduction to the topic for thenovice. In fact, much of the material was in journals wherethe presentation was very technical (i.e. a challenge tounderstand). In more recent years this situation has beenremedied with the release of a number of books aimed atintroducing people involved in data analysis to the topic.Data Mining: Concepts, Models, Methods and Algorithmsoffers a very readable and up-to-date introduction to datamining.

A point in its favor is that the book includes achapter devoted to data preparation. One cannot overstatethe importance of this step to the data mining process.The chapter discusses missing data, outlier detection andthe use of transformations such as normalization andsmoothing. A rule of thumb is that the data management,cleaning and transformation processes will consume 90%or more of the data mining effort. Another unique featureof the book is a chapter devoted to data reduction.Given the massive databases analyzed in data miningprojects, methods are often needed to reduce the numberof records (through sampling) and the number of variables(through variable selection or combining variables toproduce a smaller number of variables in total).

The book reviews the major data mining methods.The survey includes a brief overview of classicalapproaches such as regression and discriminant analysis,though the potential reader should note that knowledgeof conventional statistical methods is a prerequisitefor understanding much of the material in the book.Data mining methods introduced include clustering,association rules, neural networks, decision trees, geneticalgorithms and fuzzy logic. This is a pretty comprehensiveand up-to-date collection of methods. A final chapter onvisualization presents some useful and, in some cases,relatively recently developed methods used to analyze andpresent data graphically.

In general, the level of discussion in the book isintroductory and can be followed by those new to thediscipline. Relatively simple examples are used to motivatean understanding of the methods. However, there were afew places where I needed to read a passage a few times

Copyright c© 2005 John Wiley & Sons, Ltd.

Page 2: Data Mining: Concepts, Models, Methods and Algorithms, Mehmed Kantarzic, Paperback, IEEE Press/Wiley, 2001, xii + 345 pages

428 BOOK REVIEWS

in order to understand the material, and I am already wellacquainted with the subject. A set of review problems andexercises contained at the end of each chapter can be usefulto the instructor using the book as a text for an introductorycourse on data mining. An additional helpful feature is theinclusion of appendices with extensive lists of Web sitescontaining data, software and information useful to dataminers.

At the end of each chapter is an annotated list ofreferences that provides sources for further study for thosewho want to pursue a topic in more detail. A few of myfavorite texts did not appear, but I approach data miningwith a statistician’s perspective and view every data miningmethod as an augmentation of a statistical procedure I amalready familiar with, while the author of the book is fromthe computer science discipline.

In addition to serving as a comprehensive introductionto students and practitioners unfamiliar with data mining,the book has something to offer to those already applyingdata mining methods because it is thorough and coverssome methods that other data mining survey booksdo not.

Kantarzic’s Data Mining will not supplant Hastie,Tibshirani and Friedman’s The Elements of StatisticalLearning as the reference on data mining (at leastfor statisticians), but it provides a less-challengingintroduction to the topic for those who do not already haveextensive exposure to statistical methods for analyzingdata.

LOUISE FRANCIS

(DOI: 10.1002/qre.704)

Copyright c© 2005 John Wiley & Sons, Ltd. Qual. Reliab. Engng. Int. 2005; 21:427–428