![Page 1: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/1.jpg)
Distributed Tera-Mining
R. L. Grossman
Laboratory for Advanced Computing
University of Illinois &
Magnify, Inc.
![Page 2: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/2.jpg)
Trend 1. Explosion of Data …
![Page 3: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/3.jpg)
… All in the Wrong Format
With no one to analyze it.
![Page 4: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/4.jpg)
The Data Gap
0
500,000
1,000,000
1,500,000
2,000,000
2,500,000
3,000,000
3,500,000
4,000,000
1995 1996 1997 1998 1999
The Data Gap
Total new disk (TB) since 1995
New Ph.D.s
Most data comes a GB and a TB at a time.
![Page 5: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/5.jpg)
Trend 2. Sonet is dead. Lambda Rules.
Gigabytes can be moved in seconds.
![Page 6: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/6.jpg)
Trend 3: Most Data is Distributed
Bush’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.
![Page 7: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/7.jpg)
Example 1: ENSO & Cholera
El Nino Data at NCAR Cholera Data at WHO
![Page 8: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/8.jpg)
Example 2: Voting
County BUCHANANALACHUA 263BAKER 73BAY 248BRADFORD 65BREVARD 570BROWARD 788 Table 1
County ReformAlachua 91Baker 4Bay 55Bradford 3Brevard 148Broward 332
Table 2
![Page 9: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/9.jpg)
Correlation: Reform Voters vs Votes for Buchanan
0
500
1000
1500
2000
2500
3000
3500
4000
0 50 100 150 200 250 300 350 400 450
Palm Beach
![Page 10: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/10.jpg)
DataSpace – One Approach to Making Data Useful
16 terabytes of documents4 billion documents
Today’sMulti-media
Web
Tomorrow’sData Web
petabytes of data tens of billions to
trillions of records
• html• http• search by keyword• workstations servers
• pmml & dtml • dstp• correlate & mine• data & compute clusters
Complementary to the grid, which we view as a distributed computer.
![Page 11: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/11.jpg)
attributes [aid]
UCK [uckid]
k[i], y[j]
k[i], x[i]
DSTP Server 1
DSTP Server 2
Click to obtain graph
![Page 12: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/12.jpg)
Terra Mining TestbedOptical testbed for distributed tera miningof scientific data.
Goal also to be testbed forbroadband based business services.
![Page 13: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/13.jpg)
Lessons Learned
1. It’s the data stupid. Cycles, cylinders & lambdas are all commodities.
2. The fundamental challenge: lower the cost to make data useful.
3. The emergence of internet infrastructure for data is inevitable.
Opens up possibilities for new
types of scientific discoveries.
![Page 14: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/14.jpg)
For More Information DataSpace
http://www.dataspaceweb.nethttp://www.ncdm.uic.edu
DataSpace Standardshttp://www.dmg.org
Selected articleshttp://www.twocultures.net
Magnify – http://www.magnify.com
![Page 15: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/15.jpg)
End of Slides
![Page 16: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/16.jpg)
FTP Still Lives
![Page 17: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/17.jpg)
Trend 2. Bandwidth is a Commodity
OC-3 OC-12 OC-48
![Page 18: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/18.jpg)
El Nina Anomalies
![Page 19: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/19.jpg)
Indonesia Cholera Cases
![Page 20: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/20.jpg)
Cholera Cases
![Page 21: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/21.jpg)
Distributed Exabytes (New Disks)
0
2000
4000
6000
8000
10000
12000
14000
1995 1996 1997 1998 1999 2000 2001 2002 2003
Source: IDC (1999) "1999 Winchester Disk Drive Market Forecast and Review"
Petabytes1 Exabyte
![Page 22: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/22.jpg)
Trend 3: Most Data is Distributed
W’s Law: The usefulness of a column of data varies as the square of the number of columns it is compared to.
![Page 23: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/23.jpg)
Example 2: Voting
![Page 24: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/24.jpg)
Database 1: Total Votes for Buchanan by County
County BUCHANANALACHUA 263BAKER 73BAY 248BRADFORD 65BREVARD 570BROWARD 788
![Page 25: Distributed Tera-Mining R. L. Grossman Laboratory for Advanced Computing University of Illinois & Magnify, Inc](https://reader036.vdocuments.us/reader036/viewer/2022070305/551480bd550346f06e8b4966/html5/thumbnails/25.jpg)
Database 2: Total Registered Reform Voters by County
County ReformAlachua 91Baker 4Bay 55Bradford 3Brevard 148Broward 332