part ii. excelanalytics€¦ · 02.10.2019 · filters 5. drill down demo: custom calculations....
TRANSCRIPT
Part II. Excel AnalyticsAnalysingData using Pivot Tables & Pivot Charts
Dr. Rodolfo Raga Jr.Angeles University FoundationJune 28 - 30, 2018
DataScienceJundyRaga
Topic Outline
1. PivotTable Overview1. Grouping Data2. Summarize Values By, Show Values As3. Calculated Fields4. Filters5. Drill down
2. PivotCharts Overview1. Chart Formatting: Gap Width, Format Data Labels,2. Formatting Line Chart Markers3. Chart Sizing and Chart Elements
DataScienceJundyRaga
First Question…
What do you do when you have lots of data in excel?
Where do you start Analyzing?
DataScienceJundyRaga
Sample Dataset with 58 Attributes and
49,659 observations
DataScienceJundyRaga
One way to do it is to use PivotTables
Pivot tables can summarize and analyze large amounts of data into summary reports.
DataScienceJundyRaga
Questions that Pivot Tables can answer
◦Who sold the most, and where? ◦Which quarters were the most profitable?◦Which product sold best?
DataScienceJundyRaga
Parts of a PivotTable
DataScienceJundyRaga
Data suitable for Pivot Table analysis:
•columns are considered fields, •column headings are field names, and •rows are records.
Important: data should have no blank columns or rows
DataScienceJundyRaga
Pivot Tables simulate the OLAP Data
Cube Structure
OLAP stands for online
analytical processing and
cube is another word for a
multi-dimensional set of
data, so an OLAP cube is a
staging space for analysis
of information.
DataScienceJundyRaga
Pivot Tables simulate the OLAP Data
Cube Structure
Basically, using Pivot Tables will enable
students to more easily understand how a cube
mechanism is used to query and analyze data in
organized, dimensional structures..
DataScienceJundyRaga
Inserting Pivot Tables
DataScienceJundyRaga
The Create PivotTable Dialog box.
DataScienceJundyRaga
Initial Empty Pivot Table
DataScienceJundyRaga
A Defined Pivot Table
DataScienceJundyRaga
The ribbon for PivotTables
DataScienceJundyRaga
Parts of a PivotTable
DataScienceJundyRaga
• When you create a PivotTable, a copy of the data is
stored in a pivot cache. Any changes to the data won’t
show up in the report until you refresh the cache.
• To refresh the data, right-click the pivot table and click
Refresh Data.
• The data needs to be clean. Any blank rows, blank
columns, or text in a number field will give unpredictable
results.
• For large datasets, use Conditional formatting
on number fields to search for invalid data.
• The data must have Column Headings in the
first row and must have no repeating
columns of data
Important Notes on PivotTable
DataScienceJundyRaga
Summarize Values By
You can summarize the variables in your PivotTablethrough different aggregate functions, depending onthe needs of your analysis.
These aggregate functions could be Sum, Average,Count, Min, Max, etc.
Excel, by default, summarizes via Sum whenever thevariable is or looks numeric, and via Countwhenever the variable is or looks like text.
PROTIP: Be careful when summarizing numeric-looking IDs or phone numbers. They should becounted, not summed (a sum of customer IDs ismeaningless).
DataScienceJundyRaga
Show Values As
You can also display the variables in yourPivotTable not as the raw results of your aggregatefunctions but as percentages, running totals,differences, ranks, or indeces.
PROTIP: Summarize Values By Average and ShowValues As % of Row or Column Totals make yourvariables comparable across groups/bins/bucketsdespite these groups having different sample sizes.
DataScienceJundyRaga
Show Values As
Source: Microsoft Office Support, https://support.office.com/en-us/article/Show-different-calculations-in-PivotTable-value-fields-014d2777-baaf-480b-a32b-98431f48bfec
No Calculation Displays the value that is entered in the field.
% of Grand Total Displays values as a percentage of the grand total of all the values or data points in the report.
% of Column Total Displays all the values in each column or series as a percentage of the total for the column or series.
% of Row Total Displays the value in each row or category as a percentage of the total for the row or category.
% Of Displays values as a percentage of the value of the Base item in the Base field.
% of Parent Row Total Calculates values as follows:(value for the item) / (value for the parent item on rows)
% of Parent Column Total Calculates values as follows:(value for the item) / (value for the parent item on columns)
% of Parent Total Calculates values as follows:(value for the item) / (value for the parent item of the selected Base field)
Dif ference From Displays values as the difference from the value of the Base item in the Base field.
% Dif ference From Displays values as the percentage difference from the value of the Base item in the Base field.
Running Total in Displays the value for successive items in the Base field as a running total.
% Running Total in Calculates the value as a percentage for successive items in the Base field that are displayed as a running total.
Rank Smalles t to Larges t Displays the rank of selected values in a specific field, listing the smallest item in the field as 1, and each largervalue with a higher rank value.
Rank Larges t to Smalles t Displays the rank of selected values in a specific field, listing the largest item in the field as 1, and each smallervalue with a higher rank value.
Index Calculates values as follows:((value in cell) x (Grand Total of Grand Totals)) / ((Grand Row Total) x (Grand Column Total))
DataScienceJundyRaga
1. Grouping Data
2. Summarize Values By, Show Values As
3. Calculated Fields
4. Filters
5. Drill down
Demo: Custom Calculations
DataScienceJundyRaga
PivotCharts
• A chart that dynamically changes depending on the contents of itsaccompanying PivotTable
• There is no PivotChart without a PivotTable.
• To create a PivotChart, go to any cell in your PivotTable and insert achart.
• You cannot create a histogram from a PivotTable.
PRO TIP: PivotCharts are a bit temperamental. Often, for a finalpresentation-ready output, it is a good idea to copy the contents ofyour PivotTable and paste them as values into another sheet andcreate a chart outside the PivotTable.
DataScienceJundyRaga
PivotCharts
To create a Pivot chart, click anywhere inside the PivotTable, then click on PivotTable Tools >> Analyze >> PivotChart then follow the wizard guided steps
DataScienceJundyRaga
1. Chart Formatting: Gap Width, Format Data Label2. Formatting Line Chart Markers3. Chart Sizing and Chart Elements
Demo: Custom Calculations
DataScienceJundyRaga
Exercise
Download the IT Help DeskDataset
Use Pivot Tables and Pivot Charts to analyze the dataset and find answers to
the following questions:
o Are most helpdesk tickets resolved immediately, if so, how many?
o Which helpdesk tickets are more common? Requests or Issues?
o Are older tickets mostly Requests or Issues?
o As helpdesk tickets age, does user satisfaction decreases with it? (by how much?)
o Could the Help Desk respond to more Issues immediately by deferring
Requests (which are less time-sensitive)?
o Challenge: Present all stories in the same chart.
Data Source: IBM Watson Analytics,
https://community.watsonanalytics.com/wp-content/uploads/2015/03/WA_Fn-UseC_-IT-Help-Desk.csv
DataScienceJundyRaga
DataScienceJundyRaga
What is the end-goal of this
course?Solution:
Classify these variables.
DaysOpen
TicketType
Satisfaction
discrete
categorical
categorical
DataScienceJundyRaga
What is the end-goal of this
course?Solution:
What do you want to show?
Use as an element of
time
Show composition of
tickets
Show slight decrease
over time. Thus, need to
transform intonumeric.
DaysOpen
TicketType
Satisfaction
discrete
categorical
categorical
DataScienceJundyRaga
What is the end-goal of this
course?Solution:
How can you show it?
time
composition
transform
into numeric
DaysOpen
TicketType
Satisfaction
discrete
categorical
categorical
On the x-axis
bars, stacked
% of tickets where
Satisfaction is “Satisfied”
or “Highly Satisfied”
DataScienceJundyRaga
What is the end-goal of this
course?Solution:
How can you show it?
time
composition
decrease
DaysOpen
TicketType
Satisfaction
Rate
discrete
categorical
continuous
x-axis
bars, stacked
line (or timeseries)