data analytics analytics a short tour.pdf · •you can cook up your schema •eases data...
TRANSCRIPT
![Page 1: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/1.jpg)
Data AnalyticsA (Short) Tour
Venkatesh-Prasad Ranganath
http://about.me/rvprasad
Click to edit Master title style
![Page 2: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/2.jpg)
Is it Analytics or Analysis?
Analytics uses analysis to recommend actions or make decisions.
![Page 3: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/3.jpg)
Why Data Analysis?
Confirm a hypothesisConfirmatory
Explore the dataExploratory (EDA)
![Page 4: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/4.jpg)
Word of Caution – Case of Killer Potatoes?
This is figure 1.5 in the book “Exploring Data” by Ronald K Pearson.
![Page 5: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/5.jpg)
Word of Caution – Case of Killer Potatoes?
This is figure 1.6 in the book “Exploring Data” by Ronald K Pearson.
![Page 6: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/6.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 7: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/7.jpg)
Data Collection – Approaches
Observation Monitoring
Interviews Surveys
![Page 8: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/8.jpg)
Data Collection – Comparing Approaches
Observation Interviews Surveys Monitoring
Technique Shadowing Conversation Questionnaire Logging
Interactive No Yes No No
Simple No No Yes Yes
Automatable No No Yes Yes
Scalable No No Yes Yes
Data Size Small Small Medium Huge
Data Format Flexible Flexible Rigid Rigid
Data Type Qualitative Qualitative Qualitative Quantitative
Real Time Analysis No No No Yes
Expensive Yes Yes No No
![Page 9: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/9.jpg)
Data Collection – Comparing Approaches
Observation Interviews Surveys Monitoring
What to capture? Flexible Flexible Fixed Fixed
How to capture? Flexible Flexible Fixed Fixed
Human Subjects Yes Yes Yes No
Transcription Yes Yes Yes/No No
SnR High High High Low
Involves NLP Unlikely Unlikely Likely Likely
Kind of Analysis Confirmatory Confirmatory Confirmatory Exploratory
Kind of Techniques Statistical Testing Statistical Testing Statistical Testing Machine Learning
![Page 10: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/10.jpg)
Data Storage – Choices
• Flat Files
• Databases
• Streaming Data (but there is no storage)
![Page 11: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/11.jpg)
Data Storage – Flat Files
• Simple
• Common / Universal
• Inexpensive
• Independent of specific technology
• Compression friendly
• Very few choices• Plain text, CSV, XML, and JSON
• Well established
• Low level data access APIs
• No support for automatic scale out / parallel access
• Unoptimized data access• Indices
• Columnar storage
![Page 12: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/12.jpg)
Data Storage – Databases
• High level data access API
• Support for automatic scale out / parallel access
• Optimized data access• Indices
• Columnar storage
• Well established
• Complex
• Niche / Requires experts• Optimization
• Distribution
• Expensive
• Dependent on specific technology
• DB controlled compression
• Lots of choices• SQL, MySQL, PostgreSQL, Maria, Raven,
Couch, Redis, Neo4j, ….
![Page 13: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/13.jpg)
Data Storage – Streaming
• Well, there is not storage
• Novel
• Many streaming data sources
• Breaks traditional data analysis algorithms• No access to the entire data set
• Too many unknowns• Expertise
• Cost
• Best practices
• Accuracy
• Benefits
• Deficiencies
• Ease of use
![Page 14: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/14.jpg)
Data Storage – Algorithms and Necessity
• Offline
• Online
• Streaming
• Real-time
• Flat Files
• Databases
• Streaming Data
• Do we need fast?• How fast is quick enough?• How often do we need fast?• Is it worth the cost?• Is it worth the loss of accuracy?
![Page 15: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/15.jpg)
Data Representation – Structured
• Easy to process
• One time schema setup cost
• Common schema types• CSV, XML, JSON, …• You can cook up your schema
• Eases data exploration & analysis
• Off-the-shelf techniques to handle data
• Requires very little expertise
• Ideal with automatic data collection
• Ideal for storing quantitative data
• Rigid• Changing schema can be hard
• Upfront cost to define the schema
![Page 16: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/16.jpg)
Data Representation – Unstructured
• Flexible
• Off-the-shelf techniques to preprocess data but requires expertise
• Ideal for manual data collection
• Requires lots of preprocessing
• Complicates data exploration and analysis
• Requires domain expertise
• Extracting data semantics is hard
• Requires schema recovery *
![Page 17: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/17.jpg)
Data Access – Security
• Who has access to what parts of the data?
• What is the access control policy?
• How do we enforce these policies?• What techniques do we employ to enforce these policies?
• How do we ensure the policies have been enforced?
![Page 18: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/18.jpg)
Data Access – Privacy
• Who has access to what parts of the data?
• Who has access to what aspects of the data?
• How do you ensure the privacy of the source?
• What are the access control and anonymization policies?
• How do we enforce these policies?• What techniques do we employ to enforce these policies?
• How do we ensure the policies have been enforced?
• How strong is the anonymization policy?• Is it possible to recover the anonymized information? If so, how hard?
![Page 19: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/19.jpg)
Data Scale
• Nominal• Male, Female• Equality operation
• Ordinal• Very satisfied, satisfied, dissatisfied, and very dissatisfied• Inequalities operations
• Interval• Temperature, dates• Addition and subtraction operations
• Ratio• Mass, length, duration• Multiplication and division operations
![Page 20: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/20.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 21: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/21.jpg)
Data Cleansing
Let’s get our hands dirty!!
The data set is from UCI Machine Learning Repository (http://archive.ics.uci.edu/ml/).
![Page 22: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/22.jpg)
Data Cleansing – Common Issues
• Missing values
• Extra values
• Incorrect format
• Encoding
• File corruption
• Incorrect units
• Too much data
• Outliers
• Inliers
![Page 23: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/23.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 24: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/24.jpg)
Data Transformation (Feature Engineering)
• Analyze specific aspects of the data
• Coarsening data • Discretization
• Changing Scale
• Normalization
![Page 25: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/25.jpg)
Data Transformation (Feature Engineering)
• Analyze specific aspects of the data
• Coarsening data • Discretization
• Changing Scale
• NormalizationBMI BMI Categories
< 18.5 Underweight
18.5 – 24.9 Normal Weight
25 – 29.9 Overweight
> 30 Obesity
![Page 26: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/26.jpg)
Data Transformation (Feature Engineering)
• Analyze specific aspects of the data
• Coarsening data • Discretization
• Changing Scale
• NormalizationActual Weight Normalized
78 0.285
88 0.322
62 0.227
45 0.164
![Page 27: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/27.jpg)
Data Transformation (Feature Engineering)
• Analyze relations between features of the data
• Synthesize new features• Relating existing features
• Combining existing features
![Page 28: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/28.jpg)
Data Transformation
Let’s get out hands dirty!!
![Page 29: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/29.jpg)
Data Transformation (Feature Engineering)
Keep in mind the following:
• Scales• What the permitted operations?
• Data Collection• What is the trade-offs in data collection?
• Parsimony• Can we get away with simple scales?
![Page 30: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/30.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 31: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/31.jpg)
Data Analysis
• Features• Attributes of each datum
• Labels• Expert’s input about datum
• Data sets• Training• Validation• Test
• Work flow• Model building (training)• Model tuning and selection (validation)• Error reporting (test)
![Page 32: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/32.jpg)
Data Analysis – Models
The figure is from the book “Modern Multivariate Statistical Techniques” by Alan Julian Izenman.
![Page 33: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/33.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 34: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/34.jpg)
Result Validation – Approaches
• Expert Inputs
• Cross Validation• K-fold cross validation
• 5x2 cross validation
• Bootstrapping
![Page 35: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/35.jpg)
Result Validation – Basic Terms
Classification
Actuals
X Y
X True X (tx) False Y (fy) p = tx + fy
Y False X (fx) True Y (ty) n = fx + ty
p’ = tx + fx N = p + n
Consider a 2-class classification problem.
![Page 36: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/36.jpg)
Result Validation – Basic Terms
Now, consider X as positive evidence and Y as negative evidence.
Classification
Actuals
X Y
X True Positive (tp) False Negative (fn) p = tp + fn
Y False Positive (fp) True Negative (tn) n = fp + tn
p’ = tp + fp N = p + n
![Page 37: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/37.jpg)
Result Validation – Measures
error = (fp + fn) / N
accuracy = (tp + tn) / N
tp-rate = tp / p
fp-rate = fp / n
sensitivity = tp / p = tp-rate
specificity = tn / n = 1 – fp-rate
precision = tp / p’
recall = tp / p = tp-rate
Classification
Actuals
X Y
X True Positive (tp) False Negative (fn) p = tp + fn
Y False Positive (fp) True Negative (tn) n = fp + tn
p’ = tp + fp N = p + n
![Page 38: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/38.jpg)
Result Validation – ROC (Receiver Operating Characteristics)
The figure is from the Wikipedia page about “Receiver Operating Characteristics”.
![Page 39: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/39.jpg)
Result Validation – Class Confusion Matrix
A B
A True A (ta) False A (fa)
B False A (fa) True B (tb)
A B C D
A True A (ta) False B (fb) False C (fc) False D (fd)
B False A (fa) True B (tb) False C (fc) False D (fd)
C False A (fa) False B (fb) True C (tc) False D (fd)
D False A (fa) False B (fb) False C (fc) True D (td)
2 class problem 4 class problem
![Page 40: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/40.jpg)
Result Validation – Bias and Variance
The figure is from http://scott.fortmann-roe.com/docs/BiasVariance.html.
![Page 41: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/41.jpg)
Result Validation – Underfitting & Overfitting
The figure is from the book “Modern Multivariate Statistical Techniques” by Alan Julian Izenman.
![Page 42: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/42.jpg)
Result Validation
Let’s get out hands dirty!!
![Page 43: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/43.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 44: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/44.jpg)
Result Presentation (Visual Validation)
• Numbers• Central tendencies – mode, median, and mean
• Dispersion – range, standard deviation
• Five number summary• min, 1st quartile, median (mean), 3rd quartile, max
• Margin of error
• (Confidence) Interval
• Tables
• Charts
![Page 45: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/45.jpg)
Result Presentation – Box-and-Whisker Plots
The figure is from the Wikipedia page about “Box Plot”.
![Page 46: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/46.jpg)
Result Presentation – Histograms
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 47: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/47.jpg)
Result Presentation – Line Graphs
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 48: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/48.jpg)
Result Presentation – Scatter Plots
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 49: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/49.jpg)
Result Presentation – Heatmaps
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 50: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/50.jpg)
Result Presentation – Bubble Chart
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 51: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/51.jpg)
Result Presentation – Word Cloud
The figure is from the book “Data Visualization: A Successful Design Process” by Andy Kirk.
![Page 52: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/52.jpg)
Typical Data Analytics Work Flow
1. Identify Issue
2. Data Collection, Storage, Representation, and Access
3. Data Cleansing
4. Data Transformation
5. Data Analysis (Processing)
6. Result Validation
7. Result Presentation (Visual Validation)
8. Recommend Action / Make Decision
![Page 53: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/53.jpg)
Now, are you thinking..
• What about identifying the issue/question?• Know where you are going before you start
• What about recommending action / making the decision?• Information and knowledge aren’t the same
• Are data X tasks really that important and hard?• Garbage in, Garbage out
• Aren’t data analysis techniques the most important?• Smart data (structures) and dumb code works better than the other way
around
![Page 54: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/54.jpg)
Some Personal Observations
• Domain Knowledge is crucial• Optimizing analysis
• Improving relevance of results
• Always prefer • Simple solutions over complex solutions
• Fast solutions over slow solutions
• Most correct solutions over fully correct solutions (even no solution!!)
• If you hear “I want it now”, then say “Really? Please Explain”
• Visualization helps only if the results are relevant
• Limited data sets can only get you so far
• Security and privacy are more important than you think
![Page 55: Data Analytics Analytics A Short Tour.pdf · •You can cook up your schema •Eases data exploration & analysis •Off-the-shelf techniques to handle data •Requires very little](https://reader036.vdocuments.us/reader036/viewer/2022071001/5fbd515e968a6c41c929452d/html5/thumbnails/55.jpg)
Got any stories from the trenches?