data science, data & dashboards design
Post on 13-Dec-2014
230 Views
Preview:
DESCRIPTION
TRANSCRIPT
Data Science, Data & Dashboards
Koo Ping Shung
pskoo@smu.edu.sg/koolanalytics@gmail.com
Acknowledgement Materials for these slides are adapted from
Week 3 materials of Coursera’s Data Scientist Toolbox module.
https://www.coursera.org/course/datascitoolbox
Information Dashboard Design (2006) by Stephen Few.
Data Science Questions Descriptive Exploratory Inferential Predictive Causal Mechanistic
Data Science Questions (I) Descriptive
Descriptive statistics on a data set. Set the ground for further analysis.
Exploratory Getting familiar with the data. Finding some initial/preliminary patters in the
data. Inferential
Using a small sample to say something about the population
Central Limit Theorem – How much is enough? Choosing a non-bias sample.
Data Science Questions (II) Predictive
Using Xs to predict value of Y Accuracy depends on getting the ‘right’ data. Parsimony – Using fewest X to predict Y accurately
Causal (Stats & OR) What is the change in A when B changes? (+ve/-ve) Supported by some hypothesis.
Mechanistic (Stats & OR) Exact changes in variables leading to exact
changes in other variables. Can keep throwing new variables in but how much
is enough?
The ProcessBusiness
Objectives & Questions
Collecting & Preparing Data Exploratory Data
Analysis
Build Mathematical Models (Train &
Validate)
Select the Mathematical Model
to be used.
Deployment of Mathematical Model
in IT Systems (if needed)
Continuous Validation of Model to ensure
acceptable Predictive Power
Implementation
Preparation
Model B
uild
ing
7
Types of Data
Data are generally classified into two types: Structured or Unstructured data.
Structured data are data that are generally captured by source systems and tabular
format. Each row is an observation and each column represents a
variable/characteristics. Structured data is understood easily by computers and human for
processing. Unstructured data
Each row may represent a document/file/listing. There are no variable types. More processes is needed to understand and analyze each observation.
Note that the analysis of each type of data, structured and unstructured is very different.
Types of Data Data Tables
Relational Databases JSON & XML
JSON – Javascript Object Notation XML – Extended Markup Language
Textual – Tweets, Blogs, Emails, Reviews Visual – Videos & Pictures Audio – Music, Sound, Speech
Big Data Surrounded by data 3 Vs of Big Data
Velocity Volume Variety
Data capturing is much easier with growth of technology.
Relevant data is more important. Role of the Data Scientist to propose data to
capture. Weighing Value vs Costs (Capture &
Maintenance)
Building Dashboards Role of dashboard – Strategic, Analytical or
Operational Type of data Domain Type of Measurement – ruler, listing Update frequency Access rights Interactivity Mechanism of display – Text, Graphics or
mixture. Portability – Mobile? PC?
Dashboards Guidelines Try to stay within a single screen. Use as much of the ‘real estate’ but put in
relevant information. Provide context – How Good or Bad? Where
are we? Avoid too much details and precisions. Choose the right display
Go back to the Biz Qn – Pie Chart or Bar Chart? Highlight important info – make sure it stands
out. Do not clutter with unnecessary ‘ornaments’. Watch the colors.
Koo Ping Shung
pskoo@smu.edu.sg/koolanalytics@gmail.com
Thank you!
top related