methods for describing sets of data

Upload: mages87

Post on 13-Apr-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/26/2019 Methods for Describing Sets of Data

    1/47

    Chapter2

    MethodsforDescribingSetsofData

  • 7/26/2019 Methods for Describing Sets of Data

    2/47

    DescriptiveStatistics

    Descriptivestatisticsutilizesnumericalandgraphicalmethodsto

    lookforpatternsinadataset,tosummarizetheinformation

    revea e na atasetan topresentt at n ormat on na

    convenientform.

    DescribedatabyusingGraphs(Graphical)

    Boxplots,Histograms,PieChart,BarChart,Scatterplots,etc.

    Describedatabyusingnumericalmeasures(Numerical)

    CentralTendencies,Variability,Range,Outliers,etc.

  • 7/26/2019 Methods for Describing Sets of Data

    3/47

    DescribingQualitative

    Data

    Qualitativedataarenonnumerical

    Field

    of

    study,

    Political

    Party,

    Gender,

    Eye

    color,

    etc.

    Summarizedintwowa s:

    ClassFrequency:thenumberofobservationsinthedataset

    thatfallintoaparticularclass.

    ClassRelativeFre uenc :classfre uenc dividedb thetotalnumberofobservationsinthedataset.

    class relative frequency = n

  • 7/26/2019 Methods for Describing Sets of Data

    4/47

    DescribingQualitativeDataxamp e:

    e

    nven ory

    Specie FrequencyRelative

    Frequency Proportion

    A. chi lensi s 3 0.20 20%

    N. alpina 8 0.53 53%

    . om ey .

  • 7/26/2019 Methods for Describing Sets of Data

    5/47

    DescribingQualitativeData

    BarGraph:Thecategories(classes)ofthequalitativevariableare

    xamp e:

    e

    nven ory

    represen e y ars,w ere e e g o eac ar se er ec ass

    frequency,class

    relative

    frequency

    or

    class

    percentage.

    8

    7

    6

    N. dombeyi

    3

    5

    4N. alpina

    2

    1

    A. chilensis

    0

    N. dombeyiN. alpinaA. chilensis876543210

  • 7/26/2019 Methods for Describing Sets of Data

    6/47

    DescribingQualitativeData

    PieChart:Thecategories(classes)ofthequalitativevariableare

    xamp e:

    e

    nven ory

    representedbyslicesofapie. Thesizeofeachsliceisproportional

    tothe

    class

    relative

    frequency.

    A. chilensis (20.00%)

    . alpina (53.33%)

    N. dombeyi (26.67%)

  • 7/26/2019 Methods for Describing Sets of Data

    7/47

    DescribingQualitativeDatas ea ng

    p ots

    SOURCE: Los Angeles Times, SOURCE: Gra hJam.comAugust 5, 1979.

  • 7/26/2019 Methods for Describing Sets of Data

    8/47

    DescribingQualitativeData

    Contin enc Table:Crosstabulation of units based on

    measurementsof

    two

    qualitative

    variables

    simultaneously.

    Stac e BarGrap :Barc artwit onevaria erepresente on

    thehorizontalaxis,secondvariableassubcategorieswithinbars.

    ClusterBarGraph:Barchartwithonevariableformingmajor

    groupingsonhorizontalaxis,secondvariableusedtomakeside

    .

  • 7/26/2019 Methods for Describing Sets of Data

    9/47

    DescribingQualitativeData

    ContingencyTable

    AMS No AMS Total

    Acet 14 104 118

    Ginkgo 43 81 124

    Acc+Gi 18 108 126

    Total 115 372 487

    AMS No AMS Total

    Placebo 33.61 66.39 100Acet 11.86 88.14 100

    Ginkgo 34.68 65.32 100

    . .

  • 7/26/2019 Methods for Describing Sets of Data

    10/47

    DescribingQualitativeData

    ClusterBarGraph StackedBarGraph

    0.32

    0.34

    GRAD

    PATCH

    0.26

    0.28

    0.30

    H2

    0.22

    0.24

    CR RCB IB 4 IB 8 IB 16 IB 32 R-C

    .

  • 7/26/2019 Methods for Describing Sets of Data

    11/47

    DescribingQualitativeData

    3DBarGraph

  • 7/26/2019 Methods for Describing Sets of Data

    12/47

    GraphicalMethodsforDescribing

    uan a ve

    a a

    s ng evar a e

    Dotplotsdisplayadotforeachobservationalongahorizontal

    numberline

    Duplicatevaluesarepiledontopofeachother

    Thedotsreflectthesha eofthedistribution

    Goodforsmalldatasets

  • 7/26/2019 Methods for Describing Sets of Data

    13/47

    GraphicalMethodsforDescribing

    AStemandLeafDisplayshowsthenumberofobservationsthat

    uan a ve

    a a

    s ng evar a e

    shareacommonvalue(thestem)andtheprecisevalueofeach

    observation(the

    leaf)

    Stem- and- l eaf di s l a f or A e

    Number of observat i ons: 15. Mi ni mum: 11. 0. Maxi mum: 45. 0.St em uni t s: 10, l eaf di gi t s: 1 ( t he val ue 11. 00 i s r epr esent ed by 1| 1)

    8 1| 12225678

    2 3| 12

    2 4| 25

  • 7/26/2019 Methods for Describing Sets of Data

    14/47

    GraphicalMethodsforDescribing

    Thebox lotis a ra h re resentin information about certain

    uan a ve

    a a

    s ng evar a e

    percentiles

    for

    a

    data

    set

    and

    can

    be

    used

    to

    identify

    outliers.

    More later!

  • 7/26/2019 Methods for Describing Sets of Data

    15/47

    GraphicalMethodsforDescribing

    Histogramsaregraphsofthefrequencyorrelative

    uan a ve

    a a

    s ng evar a e

    frequencyofavariable.

    Classintervalsmakeupthehorizontalaxis

    axis.

    8

    6

    7

    5

    2

    4

    3

    Frequency

    0

    504540353025201510

    Age (years)

  • 7/26/2019 Methods for Describing Sets of Data

    16/47

    InterpretingHistograms

    Probability:Heightsofbarsovertheclassintervalsare

    proportionaltothechancesanindividualchosenatrandom

    wouldfallintheinterval.

    Unimodal:Ahistogramwithasinglemajorpeak.

    Bimodal:Histogramwithtwodistinctpeaks(oftenevidenceof

    two

    distinct

    groups

    of

    units). Uniform:Intervalheightsareapproximatelyequal.

    Symmetric:RightandLeftportionsaresameshape.

    RightSkewed:Right

    hand

    side

    extends

    further.

    LeftSkewed:Lefthandsideextendsfurther.

  • 7/26/2019 Methods for Describing Sets of Data

    17/47

    InterpretingHistograms

    Unimodal

    Bimodal

    Uniform

    Symmetric RightSkewed

    e t ewe

  • 7/26/2019 Methods for Describing Sets of Data

    18/47

    GraphicalMethodsforDescribingQuantitativeData

    Scatterplotshowstherelationshipbetweentwoquantitative

    var a es.

    Responsevariable

    (y)

    placed

    on

    the

    vertical

    (up/down)

    axis

    and

    the

    explanatoryvariable(x)placedonthehorizontal(left/right)axis.

    50

    40

    45

    30

    35

    Weight

    275

    20

    250

    15

    300 400

    25

    375350325

    Length

  • 7/26/2019 Methods for Describing Sets of Data

    19/47

    GraphicalMethodsforDescribingQuantitativeData

    Timeseriesplots:(anotherformofscatterplot)manydatasets

    points.When

    measurements

    are

    made

    at

    equally

    spaced

    time

    points,

    goalisoftentodescribetemporalvariation.

  • 7/26/2019 Methods for Describing Sets of Data

    20/47

    GraphicalMethodsforDescribingQuantitativeData

    Complexplots

    (matrixscatterplot)

  • 7/26/2019 Methods for Describing Sets of Data

    21/47

    GraphicalMethodsforDescribingQuantitative

    Recommendations

    Complete,clear,brief.

  • 7/26/2019 Methods for Describing Sets of Data

    22/47

    Notation

    Individualobservationsinadatasetaredenoted

    x , x , x , x , x .

    Summationnotation:we

    use

    asummation

    symbol

    often:

    n

    n

    i

    i xxxxx ++++=

    =

    ...3211

    1

    tothelast(xn).

    xamp e:

    x1=

    ,

    x2=

    ,

    3=

    an

    x4=

    ,

    1043214

    =+++=x1=i

  • 7/26/2019 Methods for Describing Sets of Data

    23/47

    NumericalDescriptiveMeasures

    Numericsummariesofasetofmeasurements.

    setof

    measurements.

    Value

    or

    values

    around

    which

    the

    data

    tend

    to

    cluster.

    MeasuresofVariabilitydescribethespreadordispersionofaset

    ofmeasurements.Howstronglythedatacluster.

    Parameters:NumericdescriptivemeasuresbasedonPopulationsof

    measurements.

    Statistics:Numericdescriptive

    measures

    based

    on

    Samplesof

    measurements.

  • 7/26/2019 Methods for Describing Sets of Data

    24/47

    NumericalMeasuresofCentralTendency

    Themeanofasetofquantitativedataisthesumoftheobserved

    valuesdividedbythenumberofvalues

    n

    ix x

    N

    iix

    n

    ==N

    i== 1

    Example:x1 = 1,x2 = 2,x3 = 3 andx4 = 4 (i.e. n = 4)

    , ,

    = 1 + 2 + 3 + 4 /4 = 10/4 = 2.51

    n

    i

    i

    x

    x ==

    n

  • 7/26/2019 Methods for Describing Sets of Data

    25/47

    NumericalMeasuresofCentralTendency

    Themedian(M)ofasetofquantitativedataisthevaluewhichis

    locatedinthemiddleofthedata,arrangedfromlowesttohighest

    values(orviceversa),with50%oftheobservationsaboveand50%

    below.

    50% 50%

    Lowest Value Highest ValueMedian

    Findingthe

    Median,

    M:

    Arrangethenmeasurementsfromsmallesttolargest:

    n so , s em enum er.

    Ifniseven,Mistheaverageofthemiddletwonumbers.

  • 7/26/2019 Methods for Describing Sets of Data

    26/47

    NumericalMeasuresofCentralTendency

    Themodeisthemostfrequentlyobservedvalue(continuous)

    Themodalclassisthemidpointoftheclasswiththehighest

    relativefrequency(discrete).

    Mode (Age): 12

    o a c ass pec e : . a p na

    8 1| 122256783 2| 1882 3| 122 4| 25

    Median: 18

  • 7/26/2019 Methods for Describing Sets of Data

    27/47

    NumericalMeasuresofCentralTendency

    TrimmedMean(TM):Meanthatisbasedoncentermeasurements

    .

    Skewness:Sha eofthedistribution:

    MoundShapedDistributions: Mode MedianMean

    Right

    Skewed

    Distributions:

    Mode

    3isconsideredanoutlier.

  • 7/26/2019 Methods for Describing Sets of Data

    45/47

    Example

    The length of a total of 20 fruits were

    measured from two different bags.

    data Example ;

    input ID Length;

    datalines;1 122 15

    ID Length Bag ID Length Bag

    1 12 A 10 27 A2 15 A 17 26 B

    3 22 A 9 25 A

    3 22

    4 18

    5 226 14

    7 13

    5 22 A 15 23 B

    6 14 A 3 22 A

    7 13 A 5 22 A

    8 18 A 11 21 B

    9 2510 27

    11 2112 17

    13 25

    10 27 A 20 20 B

    11 21 B 16 19 B

    12 17 B 4 18 A

    13 25 B 8 18 A

    14 18 B 14 18 B

    15 2316 1917 26

    18 619 20

    15 23 B 12 17 B

    16 19 B 2 15 A

    17 26 B 6 14 A

    18 6 B 7 13 A

    19 20 B 1 12 A

    ;

    proc print;run;

    proc univariate data=Example

    20 20 B 18 6 B plots;

    var Length;run;

  • 7/26/2019 Methods for Describing Sets of Data

    46/47

    Example

    ID Length Bag

    10 27 A

    17 26 B

    13 25 B

    15 23 B

    3 22 A

    5 22 A

    19 20 B

    20 20 B

    16 19 B

    4 18 A

    8 18 A

    14 18 B

    12 17 B

    2 15 A

    6 14 A

    7 13 A

    1 12 A

    18 6 B

  • 7/26/2019 Methods for Describing Sets of Data

    47/47

    Example

    ID Length Bag

    10 27 A

    17 26 B

    9 25 A

    13 25 B

    15 23 B3 22 A

    5 22 A

    26

    28

    11 21 B

    19 20 B

    20 20 B

    16 19 B

    4 18 A

    18

    22

    24

    20

    8 18 A

    14 18 B

    12 17 B

    2 15 A

    6 14 A

    10

    14

    16

    12

    1 12 A18 6 B

    186

    Length

    8