unsupervised learning
TRANSCRIPT
![Page 1: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/1.jpg)
Unsupervised learningFactor & Cluster Analysis
D3M
![Page 2: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/2.jpg)
Learning ResourcesVideo Series from Stanford
![Page 3: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/3.jpg)
Factor & Cluster Analysis
Learning Objectives Unsupervised Learning Methods Principle component, Factor Analysis, & Clustering Objective is Dimension Reduction
Reduce the number of collinear variables (PCA/Factor) Group your rows (e.g. customers, markets, counties): Cluster Analysis
Learning Resources MIT Open Courses Lecture 11 & 14 Data Mining Class at U of Chicago (Lecture notes 7 & 8) Class notes
![Page 4: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/4.jpg)
Basic Idea
Data Exploration
A-theoretical but not mindless
Essentially looking for ‘similarities’o Between variables (columns)
o Principle Component/Factor Analysis
o Between Subjects (rows)o Clustering Algorithm
![Page 5: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/5.jpg)
Examples
Time series of Stock Prices
Items sold in supermarket
Attributes of Fortune 500 companies
Attributes of Brands (Perceptual or Real)
Customer Base of Amazon
Cluster webpages
Biological Attributes of Different Species
Attributes of State/County/Zip Codes
Google searches of keywords
Demographics/Shares of our Brand across stores
![Page 6: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/6.jpg)
6
Example: Marketing Research
• PRIZM (“Potential Ratings Index for Zip Markets”) by Claritas Inc.– “Birds of a feather flock together”– 62 neighborhood (zip code) based groups that are
similar on demographic and behavioral characteristics – Used for store location decisions, direct marketing,
media selection, etc.
• http://www.claritas.com/MyBestSegments/Default.jsp
![Page 7: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/7.jpg)
7
Key Methods
• Two key research tools
Cluster Analysis Tool for actually constructing segments
Factor AnalysisTool for “data reduction”
![Page 8: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/8.jpg)
Difference between cluster and factor analysis
V1 V2 V3 V4 V5 V20…..
Cluster Analysis
(Group Subjects)
Factor Analysis
(Group Variables)
Data
![Page 9: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/9.jpg)
9
Factor Analysis
![Page 10: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/10.jpg)
Difference between cluster and factor analysis
V1 V2 V3 V4 V5 V20…..
Cluster Analysis
(Group Subjects)
Factor Analysis
(Group Variables)
Data
![Page 11: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/11.jpg)
11
Factor Analysis
Factor Analysis can be used for data reduction (i.e., reduce the number of variables needed for analysis).
Factor analysis is able to summarize the information contained in a larger number of variables into a smaller number of ‘factors’ without significant loss of information.
Main use of Factor Analysis
![Page 12: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/12.jpg)
• Harm/care • Authority/respect • Fairness/reciprocity • Ingroup/loyalty• Purity/sanctity
Example: Basis of Moral Foundations
5 Underlying Factors behind these Questions
![Page 13: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/13.jpg)
• Data reduction is important when you need to measure “fuzzy” concepts such as ‘love’, ‘trust’ or ‘satisfaction’
• Ask a series of question that tap into the different components of the concept.
• Too many variables! Factor analysis can help to reduce this dimensionality problem
Factor Analysis
???
?
![Page 14: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/14.jpg)
14
Intuition• Factor analysis assumes that the correlation between a large
number of variables is due to them all being dependent on the same small number of “factors”. Analyze the patterns of correlations to tap into the underlying construct.
• Example: Car ratings
Perception of seats
Perception of noise
Perception of smoothness of ride
Perception of AC-system
(Attributes)
Perception of “quality”
(Factor)
Example: Car Ratings
![Page 15: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/15.jpg)
MKTG450 15
OpenImaginativeInsightful
ConscientiousnessOrganizedThorough
ExtraversionEnergeticAssertive
AgreeablenessSympatheticKindAffectionate
Neuroticism
TenseMoodyAnxious
Psychology: The “Big Five”
Trait Characteristics Example
![Page 16: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/16.jpg)
16
Cluster Analysis
![Page 17: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/17.jpg)
Difference between cluster and factor analysis
V1 V2 V3 V4 V5 V20…..
Cluster Analysis
(Group Subjects)
Factor Analysis
(Group Variables)
Data
![Page 18: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/18.jpg)
18
Cluster Analysis
• Cluster analysis is a technique used to identify groups of ‘similar’ customers in a market (i.e., market segmentation).
Cluster analysis encompasses a number of different algorithms and methods for grouping objects of similar kind into categories.
![Page 19: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/19.jpg)
19
ApplicationExample: Market Segmentation
o Process of dividing a total market into groups of consumers who have similar needs and who respond similarly to marketing mix variables.
?
?
?
![Page 20: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/20.jpg)
20
• General question: how to organize observed data into meaningful structures
• Examples: o In food stores items of similar nature, such as
different types of meat or vegetables are displayed in the same or nearby locations.
o Biologists have to organize the different species of animals-- man belongs to the primates, the mammals, the amniotes, the vertebrates, and the animals.
o In medicine, clustering diseases, cures for diseases, or symptoms of diseases can lead to very useful taxonomies.
o In the field of psychiatry, the correct diagnosis of clusters of symptoms such as paranoia, schizophrenia, etc. is essential for successful therapy.
o Collaborative filtering & Recommendation systems
![Page 22: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/22.jpg)
Example 1Segmenting Stores in Soup Case Study
D3M
![Page 23: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/23.jpg)
Demographics Are Highly Correlated
![Page 24: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/24.jpg)
Cluster Of Variables (Clustofvar Package in R)
![Page 25: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/25.jpg)
Interpret the Factors
These are called factor “loadings”. Measures the correlation between each demographicand the underlying “factor”. Our Job to Interpret and put a label to these.
![Page 26: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/26.jpg)
Information Captured
Factor1 Factor2 Factor3SS loadings 3.143 2.961 1.671Proportion Var 0.314 0.296 0.167Cumulative Var 0.314 0.610 0.777
Using 3 “factors” instead of 10 demographics, we capture approx. 78% of the information in data.
![Page 27: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/27.jpg)
![Page 28: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/28.jpg)
![Page 29: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/29.jpg)
Example 2Segmenting US Counties
D3M
![Page 30: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/30.jpg)
Files UsedUS_Counties.csv, Segment_US_County.R
• Suppose we are analyzing data based on US CountiesDemographic variablesHealth outcomesCrime RatesVoting BehaviorReligion Market Shares of brandsGoogle Searches
![Page 31: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/31.jpg)
Hard to Even See let alone UnderstandBasically Bunch of Variables are Highly Correlated
![Page 32: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/32.jpg)
Cluster Of Variables (Clustofvar Package in R)
![Page 33: Unsupervised learning](https://reader035.vdocuments.us/reader035/viewer/2022062407/55cfbf92bb61eb1b3d8b4673/html5/thumbnails/33.jpg)