Download - Methods for Describing Sets of Data
-
7/26/2019 Methods for Describing Sets of Data
1/47
Chapter2
MethodsforDescribingSetsofData
-
7/26/2019 Methods for Describing Sets of Data
2/47
DescriptiveStatistics
Descriptivestatisticsutilizesnumericalandgraphicalmethodsto
lookforpatternsinadataset,tosummarizetheinformation
revea e na atasetan topresentt at n ormat on na
convenientform.
DescribedatabyusingGraphs(Graphical)
Boxplots,Histograms,PieChart,BarChart,Scatterplots,etc.
Describedatabyusingnumericalmeasures(Numerical)
CentralTendencies,Variability,Range,Outliers,etc.
-
7/26/2019 Methods for Describing Sets of Data
3/47
DescribingQualitative
Data
Qualitativedataarenonnumerical
Field
of
study,
Political
Party,
Gender,
Eye
color,
etc.
Summarizedintwowa s:
ClassFrequency:thenumberofobservationsinthedataset
thatfallintoaparticularclass.
ClassRelativeFre uenc :classfre uenc dividedb thetotalnumberofobservationsinthedataset.
class relative frequency = n
-
7/26/2019 Methods for Describing Sets of Data
4/47
DescribingQualitativeDataxamp e:
e
nven ory
Specie FrequencyRelative
Frequency Proportion
A. chi lensi s 3 0.20 20%
N. alpina 8 0.53 53%
. om ey .
-
7/26/2019 Methods for Describing Sets of Data
5/47
DescribingQualitativeData
BarGraph:Thecategories(classes)ofthequalitativevariableare
xamp e:
e
nven ory
represen e y ars,w ere e e g o eac ar se er ec ass
frequency,class
relative
frequency
or
class
percentage.
8
7
6
N. dombeyi
3
5
4N. alpina
2
1
A. chilensis
0
N. dombeyiN. alpinaA. chilensis876543210
-
7/26/2019 Methods for Describing Sets of Data
6/47
DescribingQualitativeData
PieChart:Thecategories(classes)ofthequalitativevariableare
xamp e:
e
nven ory
representedbyslicesofapie. Thesizeofeachsliceisproportional
tothe
class
relative
frequency.
A. chilensis (20.00%)
. alpina (53.33%)
N. dombeyi (26.67%)
-
7/26/2019 Methods for Describing Sets of Data
7/47
DescribingQualitativeDatas ea ng
p ots
SOURCE: Los Angeles Times, SOURCE: Gra hJam.comAugust 5, 1979.
-
7/26/2019 Methods for Describing Sets of Data
8/47
DescribingQualitativeData
Contin enc Table:Crosstabulation of units based on
measurementsof
two
qualitative
variables
simultaneously.
Stac e BarGrap :Barc artwit onevaria erepresente on
thehorizontalaxis,secondvariableassubcategorieswithinbars.
ClusterBarGraph:Barchartwithonevariableformingmajor
groupingsonhorizontalaxis,secondvariableusedtomakeside
.
-
7/26/2019 Methods for Describing Sets of Data
9/47
DescribingQualitativeData
ContingencyTable
AMS No AMS Total
Acet 14 104 118
Ginkgo 43 81 124
Acc+Gi 18 108 126
Total 115 372 487
AMS No AMS Total
Placebo 33.61 66.39 100Acet 11.86 88.14 100
Ginkgo 34.68 65.32 100
. .
-
7/26/2019 Methods for Describing Sets of Data
10/47
DescribingQualitativeData
ClusterBarGraph StackedBarGraph
0.32
0.34
GRAD
PATCH
0.26
0.28
0.30
H2
0.22
0.24
CR RCB IB 4 IB 8 IB 16 IB 32 R-C
.
-
7/26/2019 Methods for Describing Sets of Data
11/47
DescribingQualitativeData
3DBarGraph
-
7/26/2019 Methods for Describing Sets of Data
12/47
GraphicalMethodsforDescribing
uan a ve
a a
s ng evar a e
Dotplotsdisplayadotforeachobservationalongahorizontal
numberline
Duplicatevaluesarepiledontopofeachother
Thedotsreflectthesha eofthedistribution
Goodforsmalldatasets
-
7/26/2019 Methods for Describing Sets of Data
13/47
GraphicalMethodsforDescribing
AStemandLeafDisplayshowsthenumberofobservationsthat
uan a ve
a a
s ng evar a e
shareacommonvalue(thestem)andtheprecisevalueofeach
observation(the
leaf)
Stem- and- l eaf di s l a f or A e
Number of observat i ons: 15. Mi ni mum: 11. 0. Maxi mum: 45. 0.St em uni t s: 10, l eaf di gi t s: 1 ( t he val ue 11. 00 i s r epr esent ed by 1| 1)
8 1| 12225678
2 3| 12
2 4| 25
-
7/26/2019 Methods for Describing Sets of Data
14/47
GraphicalMethodsforDescribing
Thebox lotis a ra h re resentin information about certain
uan a ve
a a
s ng evar a e
percentiles
for
a
data
set
and
can
be
used
to
identify
outliers.
More later!
-
7/26/2019 Methods for Describing Sets of Data
15/47
GraphicalMethodsforDescribing
Histogramsaregraphsofthefrequencyorrelative
uan a ve
a a
s ng evar a e
frequencyofavariable.
Classintervalsmakeupthehorizontalaxis
axis.
8
6
7
5
2
4
3
Frequency
0
504540353025201510
Age (years)
-
7/26/2019 Methods for Describing Sets of Data
16/47
InterpretingHistograms
Probability:Heightsofbarsovertheclassintervalsare
proportionaltothechancesanindividualchosenatrandom
wouldfallintheinterval.
Unimodal:Ahistogramwithasinglemajorpeak.
Bimodal:Histogramwithtwodistinctpeaks(oftenevidenceof
two
distinct
groups
of
units). Uniform:Intervalheightsareapproximatelyequal.
Symmetric:RightandLeftportionsaresameshape.
RightSkewed:Right
hand
side
extends
further.
LeftSkewed:Lefthandsideextendsfurther.
-
7/26/2019 Methods for Describing Sets of Data
17/47
InterpretingHistograms
Unimodal
Bimodal
Uniform
Symmetric RightSkewed
e t ewe
-
7/26/2019 Methods for Describing Sets of Data
18/47
GraphicalMethodsforDescribingQuantitativeData
Scatterplotshowstherelationshipbetweentwoquantitative
var a es.
Responsevariable
(y)
placed
on
the
vertical
(up/down)
axis
and
the
explanatoryvariable(x)placedonthehorizontal(left/right)axis.
50
40
45
30
35
Weight
275
20
250
15
300 400
25
375350325
Length
-
7/26/2019 Methods for Describing Sets of Data
19/47
GraphicalMethodsforDescribingQuantitativeData
Timeseriesplots:(anotherformofscatterplot)manydatasets
points.When
measurements
are
made
at
equally
spaced
time
points,
goalisoftentodescribetemporalvariation.
-
7/26/2019 Methods for Describing Sets of Data
20/47
GraphicalMethodsforDescribingQuantitativeData
Complexplots
(matrixscatterplot)
-
7/26/2019 Methods for Describing Sets of Data
21/47
GraphicalMethodsforDescribingQuantitative
Recommendations
Complete,clear,brief.
-
7/26/2019 Methods for Describing Sets of Data
22/47
Notation
Individualobservationsinadatasetaredenoted
x , x , x , x , x .
Summationnotation:we
use
asummation
symbol
often:
n
n
i
i xxxxx ++++=
=
...3211
1
tothelast(xn).
xamp e:
x1=
,
x2=
,
3=
an
x4=
,
1043214
=+++=x1=i
-
7/26/2019 Methods for Describing Sets of Data
23/47
NumericalDescriptiveMeasures
Numericsummariesofasetofmeasurements.
setof
measurements.
Value
or
values
around
which
the
data
tend
to
cluster.
MeasuresofVariabilitydescribethespreadordispersionofaset
ofmeasurements.Howstronglythedatacluster.
Parameters:NumericdescriptivemeasuresbasedonPopulationsof
measurements.
Statistics:Numericdescriptive
measures
based
on
Samplesof
measurements.
-
7/26/2019 Methods for Describing Sets of Data
24/47
NumericalMeasuresofCentralTendency
Themeanofasetofquantitativedataisthesumoftheobserved
valuesdividedbythenumberofvalues
n
ix x
N
iix
n
==N
i== 1
Example:x1 = 1,x2 = 2,x3 = 3 andx4 = 4 (i.e. n = 4)
, ,
= 1 + 2 + 3 + 4 /4 = 10/4 = 2.51
n
i
i
x
x ==
n
-
7/26/2019 Methods for Describing Sets of Data
25/47
NumericalMeasuresofCentralTendency
Themedian(M)ofasetofquantitativedataisthevaluewhichis
locatedinthemiddleofthedata,arrangedfromlowesttohighest
values(orviceversa),with50%oftheobservationsaboveand50%
below.
50% 50%
Lowest Value Highest ValueMedian
Findingthe
Median,
M:
Arrangethenmeasurementsfromsmallesttolargest:
n so , s em enum er.
Ifniseven,Mistheaverageofthemiddletwonumbers.
-
7/26/2019 Methods for Describing Sets of Data
26/47
NumericalMeasuresofCentralTendency
Themodeisthemostfrequentlyobservedvalue(continuous)
Themodalclassisthemidpointoftheclasswiththehighest
relativefrequency(discrete).
Mode (Age): 12
o a c ass pec e : . a p na
8 1| 122256783 2| 1882 3| 122 4| 25
Median: 18
-
7/26/2019 Methods for Describing Sets of Data
27/47
NumericalMeasuresofCentralTendency
TrimmedMean(TM):Meanthatisbasedoncentermeasurements
.
Skewness:Sha eofthedistribution:
MoundShapedDistributions: Mode MedianMean
Right
Skewed
Distributions:
Mode
3isconsideredanoutlier.
-
7/26/2019 Methods for Describing Sets of Data
45/47
Example
The length of a total of 20 fruits were
measured from two different bags.
data Example ;
input ID Length;
datalines;1 122 15
ID Length Bag ID Length Bag
1 12 A 10 27 A2 15 A 17 26 B
3 22 A 9 25 A
3 22
4 18
5 226 14
7 13
5 22 A 15 23 B
6 14 A 3 22 A
7 13 A 5 22 A
8 18 A 11 21 B
9 2510 27
11 2112 17
13 25
10 27 A 20 20 B
11 21 B 16 19 B
12 17 B 4 18 A
13 25 B 8 18 A
14 18 B 14 18 B
15 2316 1917 26
18 619 20
15 23 B 12 17 B
16 19 B 2 15 A
17 26 B 6 14 A
18 6 B 7 13 A
19 20 B 1 12 A
;
proc print;run;
proc univariate data=Example
20 20 B 18 6 B plots;
var Length;run;
-
7/26/2019 Methods for Describing Sets of Data
46/47
Example
ID Length Bag
10 27 A
17 26 B
13 25 B
15 23 B
3 22 A
5 22 A
19 20 B
20 20 B
16 19 B
4 18 A
8 18 A
14 18 B
12 17 B
2 15 A
6 14 A
7 13 A
1 12 A
18 6 B
-
7/26/2019 Methods for Describing Sets of Data
47/47
Example
ID Length Bag
10 27 A
17 26 B
9 25 A
13 25 B
15 23 B3 22 A
5 22 A
26
28
11 21 B
19 20 B
20 20 B
16 19 B
4 18 A
18
22
24
20
8 18 A
14 18 B
12 17 B
2 15 A
6 14 A
10
14
16
12
1 12 A18 6 B
186
Length
8