data mining: making data meaningful

Post on 22-Sep-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

18 Computer

Indu

stry

Tre

nds

The Palomar Observatory in SanDiego, California, records thou-sands of images of the sky,including millions of faint pointsof light from the distant reaches

of space. It would take hundreds of peo-ple thousands of hours pouring overimages to identify which light source is astar, which is a galaxy, and so on. Andhumans wouldn’t be able to identifysome of the most distant light sources.

However, NASA’s Jet PropulsionLaboratory (JPL) has a data-mining tool,called Skicat (Sky Image Cataloging andAnalysis Tool), which can analyze digi-tized data from the observatory, recog-nize patterns, and make the properidentifications much more quickly andmuch less expensively.

This is just one of the ways that researchgroups, large businesses, governmentagencies, and other organizations are usingimproved mining technologies and tech-niques to discover meaningful patterns inhuge databases. The technology is alsolooking for geological patterns in earth-quake-prone areas, predicting bad creditrisks, and anticipating inventory demands.

And now, data mining has been refinedto the point where even people whoaren’t highly trained statisticians can usethis complex data-analysis tool.

INCREASED POPULARITYData mining’s increased popularity is

due partly to technological improvementsthat permit faster, more effective analy-ses of databases. Data-mining techniques,like the use of neural networks, havebecome more effective. A neural network

is a processing device, either an algorithmor hardware, whose design was moti-vated by the design of the human brain.Neural networks exhibit interconnectiv-ity and parallelism, “learn” from exam-ples, and are able to generalize.

Vendors have improved their products’precision by combining a variety of data-mining techniques. In the past, they generally used only one technique in their products. Angoss International (http://www.angoss.com) recently releaseda data-mining product for businesses,KnowledgeSeeker, that uses both tree-based models and neural networks. Tree-based models organize data intobranched systems that show how variouspieces of data relate to each other.

Powerful processors now let computers(including many desktop units) run com-plex mining algorithms and search largedatabases quickly. In addition, better graph-ics technology lets users see the results ofdata mining on easy-to-read graphs, charts,and so on. These two factors have madedata mining a valuable tool for users whoaren’t sophisticated statisticians.

Data-mining products are also justnow becoming Web-enabled. So, com-panies that, for example, access data-

bases through corporate intranets cannow use data-mining tools.

Because of these factors, the technol-ogy is beginning to be used in so manysettings that data-mining vendors arebeginning to splinter into niches based onindustry and function, said ErickBrethenoux, the Paris-based researchdirector for the Gartner Group, a marketresearch firm. For instance, SAS Institute(http://www.sas.com) is emerging as afavorite of statisticians, while HNCSoftware (http://www.hnc.com) is achiev-ing dominance among risk analysts.

A CAUTIONARY NOTEDespite data mining’s value, users

should realize that the technology pro-vides only a guide, not a gospel. Usersmust be wary of finding meaningless sta-tistical patterns that don’t indicate a causeand an effect or that don’t accurately pre-dict the future.

For example, David J. Leinweber, man-aging director of First Quadrant Corp.,an investment management firm, said heused data mining and found that, statis-tically speaking, the best predictor of theperformance of Standard & Poor’s 500Index of stocks is the price of butter inBangladesh. Of course, he said, the twohave no causal relationship, so the corre-lation is meaningless. This, he said, is anexample of the type of “stupid data-min-ing tricks” users must be careful about.

A nalysts expect vendors to beginreleasing data-mining tools for spe-cific applications. One emerging

area is text mining, which can analyze,for example, customer comments on sur-vey sheets. Brethenoux predicts that mul-timedia mining—which could, forexample, analyze patterns in photos—will emerge within five years.

Meanwhile, researchers are workingon ways to accelerate data mining bytrading accuracy for speed.

And in the not-too-distant future, datamining will become a common data-analy-sis tool on many desktops said HerbEdelstein, president of Two Crows Corp.,a data-mining consulting firm. This processhas already started, as even people whodon’t conduct complex statistical analysisare now beginning to use data mining. ❖

Joe Mullich is a freelance technologywriter based in Glendale, California.Contact him at joemullich@aol.com.

Data Mining:Making Data

MeaningfulJoe Mullich

Editor: Lee Garber, Computer, 10662 LosVaqueros Circle, PO Box 3014, Los Alamitos,CA 90720-1314; l.garber@computer.org

JPL’s Skicat can identify distant lightsources more quickly

and inexpensively.

.

top related