Data VisualizationData Visualization
STAT 890, STAT 442, CM 462
Ali Ghodsi Department of Statistics
School of Computer ScienceUniversity of Waterloo
aghodsib @uwaterloo.ca
September 2006
Two ProblemsTwo Problems
Classical Statistics
• Infer information from small data sets (Not enough data)
Machine Learning
• Infer information from large data sets (Too many data)
Other Names for MLOther Names for ML
• Data mining,
• Applied statistics
• Adaptive (stochastic) signal processing
• Probabilistic planning or reasoning
are all closely related to the second problem.
ApplicationsApplications
Machine Learning is most useful when the structure of the task is not well understood but can be characterized by a dataset with strong
statistical regularity.• Search and recommendation (e.g. Google, Amazon)• Automatic speech recognition and speaker verification• Text parsing• Face identification• Tracking objects in video• Financial prediction, fraud detection (e.g. credit cards)• Medical diagnosis
TasksTasks
• Supervised Learning: given examples of inputs and corresponding desired outputs, predict outputs on future inputs.e.g.: classification, regression
• Unsupervised Learning: given only inputs, automatically discover representations, features, structure, etc.e.g.: clustering, dimensionality reduction, Feature extraction
Dimensionality ReductionDimensionality Reduction
• Dimensionality: The number of measurements available for each item in a data set.
• The dimensionality of real world items is very high.• For example: The dimensionality of a 600 by 600 image
is 360,000.• The Key to analyzing data is comparing these
measurements to find relationships among this plethora of data points.
• Usually these measurements are highly redundant, and relationships among data points are predictable.
Dimensionality ReductionDimensionality Reduction
• Knowing the value of a pixel in an image, it is easy to predict the value of nearby pixels since they tend to be similar.
• Knowing that the word “corporation” occurs often in articles about economics, but not very often in articles about art and poetry then it is easy to predict that it will not occur very often in articles about love.
• Although there are lots of measurements per item, there are far fewer that are likely to vary. Using a data set that only includes the items likely to vary allows humans to quickly and easily recognize changes in high dimensionality data.
Data RepresentationData Representation
Data RepresentationData Representation
11 11 11 11 11
11 00 11 00 11
11 11 11 11 11
11 0.50.5 0.50.5 0.50.5 11
11 11 11 11 11
Data RepresentationData Representation
644 by 103
644 by 2
2 by 103
23 by 28 23 by 28
-2.19
-0.02
-3.19
1.02
2 by 12 by 1
Arranging words: Each word was initially represented by a high-dimensional vector that counted the number of times it appeared in different encyclopedia articles. Words with similar contexts are collocated
Different FeaturesDifferent Features
Glasses vs. No GlassesGlasses vs. No Glasses
Beard vs. No BeardBeard vs. No Beard
Beard DistinctionBeard Distinction
Glasses DistinctionGlasses Distinction
Multiple-Attribute MetricMultiple-Attribute Metric
Embedding of sparse music Embedding of sparse music similarity graphsimilarity graph
Platt, 2004
Reinforcement learningReinforcement learning
Mahadevan and Maggioini, 2005
Semi-supervised learningSemi-supervised learning
Use graph-based discretization of manifold to infer missing labels.
Build classifiers from bottom eigenvectors of graph Laplacian.
Belkin & Niyogi, 2004; Zien et al, Eds., 2005
Learning correspondencesLearning correspondences
How can we learn manifold structure that is shared across multiple data sets?
c et al, 2003, 2005
Mapping and robot localizationMapping and robot localization
Bowling, Ghodsi, Wilkinson 2005
Ham, Lin, D.D. 2005
The Big PictureThe Big Picture
Manifold and Hidden VariablesManifold and Hidden Variables
ReadingReading
• Journals: Neural Computation, JMLR, ML, IEEE PAMI• Conferences: NIPS, UAI, ICML, AI-STATS, IJCAI,
IJCNN• Vision: CVPR, ECCV, SIGGRAPH• Speech: EuroSpeech, ICSLP, ICASSP• Online: citesser, google• Books:
– Elements of Statistical Learning, Hastie, Tibshirani, Friedman– Learning from Data, Cherkassky, Mulier– Machine Learning, Mitchell– Neural Networks for pattern Recognition, Bishop– Introduction to Graphical Models, Jordan et. al