![Page 1: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/1.jpg)
Visualizing and ExploringData
Based on Chapter 3 of Hand, Manilla, & Smyth
David Madigan
![Page 2: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/2.jpg)
Introduction
•Exploratory Data Analysis legitimized by Tukey (1997)
• ~ Data based hypothesis generation
•Always need to be skeptical about findings since thesearch space can be very large
•Useful tools: S-Plus, Ggobi, DataDesk, JMP
•http://otal.umd.edu/Olive/ (On-line Library of Information Visualization Environments)
![Page 3: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/3.jpg)
Displaying Single Variables
credit card example
![Page 4: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/4.jpg)
Pima indian example
![Page 5: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/5.jpg)
Smoothing Estimates
• Kernel estimates smooth out the contribution of eachdatapoint over a local neighborhood of that point.
!=
"=
n
i
nhh
ixxKxf
1
1 ))(
()(ˆ
h is the kernel width
• Gaussian kernel is common:2
)(
2
1!"
#$%
& ''
h
ixx
Ce
• Formal procedures for optimal bandwidth choice
• Gray & Moore’s work on speeding this up…
![Page 6: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/6.jpg)
ATE3hour
![Page 7: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/7.jpg)
Displaying Two Variables
![Page 8: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/8.jpg)
![Page 9: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/9.jpg)
![Page 10: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/10.jpg)
![Page 11: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/11.jpg)
Correlation = 0.5
![Page 12: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/12.jpg)
![Page 13: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/13.jpg)
![Page 14: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/14.jpg)
![Page 15: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/15.jpg)
![Page 16: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/16.jpg)
Tinting
• Experiment to model the effects of car window tinting onvisual performance
• csoa: critical stimulus onset asynchrony (time to recognizean alphanumeric target)
• it: inspection time (time required for a simplediscrimination task)
• age, tint (no,lo,hi), target (locon,hicon), sex
![Page 17: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/17.jpg)
xyplot(csoa~it | sex*agegp, data=tinting, groups=target, auto.key=list(columns=2))
![Page 18: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/18.jpg)
xyplot(csoa~it | sex*agegp, data=tinting, groups=tint, auto.key=list(columns=3))
![Page 19: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/19.jpg)
xyplot(csoa~it | sex*agegp, data=tinting, groups=tint, auto.key=list(columns=3), type=c("p","smooth"),span=0.8)
![Page 20: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/20.jpg)
Source: Michael Friendly
![Page 21: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/21.jpg)
Half-space location depth of z in R2 relative to z1,…,zn is thesmallest number of zi contained in any closed half-plane withboundary line through z
Bagplot
Rousseeuw,Ruts, andTukey
![Page 22: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/22.jpg)
![Page 23: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/23.jpg)
Four-fold display for categorical data
![Page 24: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/24.jpg)
Four-fold display for categorical data
![Page 25: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/25.jpg)
Mosaic plots for categorical data
![Page 26: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/26.jpg)
![Page 27: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/27.jpg)
Visual ScalabilityEick and Karr
•Human perception: 6.5 million pixels?
•Monitor resolution: 640X480=307,300; 1600X1200=1,920,000
•Visual metaphors:
•Bar charts: can display 500; realistic limit about 50; color
•Matrix views: 1280X1024 can display 13,000 10X10 entities
•Landscapes: 3-D matrix view; color, height, and shape; occlusion?
•Network views: scalability depends on connectivity
•Scatterplots: 100,000 points?
•Histograms: smoothing calculations become expensive
•Interactivity
![Page 28: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/28.jpg)
Dimensionality Reduction•Scatterplot = 2-D projection defined by 2 variables(e.g. x1 vs. x4)
•Other projections? e.g. 2x1+3x2 vs. 6x1 + 2x4
•Projection pursuit…issues with scalability
•PCA: scales quite well; used in text retrieval
•MDS: metric versus non-metric
•Random projections
![Page 29: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/29.jpg)
Tufte:
Graphical excellence is the well-designed presentation ofinteresting data - a matter of substance, of statistics, and ofdesign.
Graphical excellence consists of complex ideascommunicated with clarity, precision, and efficiency.
Graphical excellence is that which gives the viewer thegreatest number of ideas in the shortest time with the leastink in the smallest space.
Graphical excellence is nearly always multivariate.
And graphical excellence requires telling the truth about thedata.
![Page 30: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/30.jpg)
Tufte also insists that graphical displays should:
induce the viewer to think about the substance rather thanabout methodology, graphic design, the technology ofgraphic production or something else
reveal the data at several levels of detail, from a broadoverview to the fine structure
![Page 31: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/31.jpg)
![Page 32: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/32.jpg)
![Page 33: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/33.jpg)
In the following example, from The Times of Saturday 1/2/3 is a superb example of this form of abuse. The the two shells supposedly represent twoquantities in the ratio 500 to 364, so the first should be 500/364 or 1.374 times bigger than the second, representing a 37.4% increase. But their lengths are inthe ratio 102mm to 65mm, making the first 1.569 times longer than the second, and giving it a volume greater than that of the second by a factor of 1.569cubed, or 3.864. This gives a shocking lie factor of 3.864/1.374 or 2.8 times!
![Page 34: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/34.jpg)
![Page 35: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/35.jpg)
Tufte’s worst graphic ever!
![Page 36: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/36.jpg)
![Page 37: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/37.jpg)
![Page 38: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/38.jpg)
![Page 39: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/39.jpg)
http://www.ted.com/talks/view/id/92
![Page 40: Visualizing and Exploring Data - Columbia Universitymadigan/DM08/vis.pdf · Visualizing and Exploring Data Based on Chapter 3 of Hand, Manilla, & Smyth David Madigan. Introduction](https://reader035.vdocuments.us/reader035/viewer/2022062605/5fd71def49fd5565400ed191/html5/thumbnails/40.jpg)