describing the variety – descriptive statistics

17
Describing the variety – Descriptive statistics

Upload: kenyon-anderson

Post on 31-Dec-2015

37 views

Category:

Documents


3 download

DESCRIPTION

Describing the variety – Descriptive statistics. Reminder I:. Recall that statistics has two main field (see Lecture 1):. Statistics. Descriptive statistics. Inductive (inferential) statistics. Reminder II:. Types of variables (see Lecture 1):. Variable. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Describing the variety – Descriptive statistics

Describing the variety – Descriptive statistics

Page 2: Describing the variety – Descriptive statistics

Reminder I:

• Recall that statistics has two main field (see Lecture 1):

Statistics

Descriptive statistics

Inductive (inferential) statistics

Page 3: Describing the variety – Descriptive statistics

Reminder II:

Variable

qualitative(nominal and ordinal scale)

called factor in R

quantitative(interval and ratio scale)

called numeric in R

discrete continuous

• Types of variables (see Lecture 1):

Page 4: Describing the variety – Descriptive statistics

Descriptive statistics

• The aim of descriptive statistics is basically twofold:– exploring the statistical nature of our data– summarizing and displaying our data in a

concise, compact way

• To achieve this aim we– compute statistics of position and dispersion

from the data– make graphs to display graphically the data.

Page 5: Describing the variety – Descriptive statistics

Descriptive statistics computed from the data

• Statistics of position– Arithmetic mean (average):

(The mean of the statistical population is usually denoted by μ (Greek mu)).

n

ii

n xnn

xxxx

1

21 1...

Page 6: Describing the variety – Descriptive statistics

– Median:• This is the middle value of a ranked data set.• If your sample size is an odd number, median is

simply the data in the middle of the ranked data:E.g:

sample size (n): 7data: 4, 6, 2, 5, 3, 2, 3ranked data: 2, 2, 3, 3, 4, 5, 6median: 3

• If your sampe size is an even number, median is the mean of the two middle-positioned data of the ranked data set.E.g:

sample size (n): 6data: 4, 6, 5, 3, 2, 3ranked data: 2, 3, 3, 4, 5, 6median: (3 + 4) / 2 = 3.5

Page 7: Describing the variety – Descriptive statistics

– Mode:• This is the most frequent data value in the data

set.E.g:

data: 3.5, 1.1, 2.3, 1.9, 2.3, 2.5, 2.3

mode: 2.3

Page 8: Describing the variety – Descriptive statistics

• Statistics of dispersion– Range:

• This is simply the difference between the largest and the smallest data value of the data set.E.g:

data: 5, 2, 6, 9, 12, 4

largest value (maximum): 12smallest value (minimum): 2range: 12 – 2 =

10

Page 9: Describing the variety – Descriptive statistics

– Interquartile range (IQR):• This measure of dispersion works on ranked data.• Quartiles (Q): the values that divide tha ranked

data into four equal parts.– Fist quartile: the value from which the 25% of the ranked

data is smaller.– Third quartile: the value from which the 75% of the

ranked data is smaller.

• Interquartile range is the difference between the third and first quartile.E.g:sample site: 12ranked data: 2, 3, 3, 4, 4, 5, 6, 6, 6, 7, 8, 9

IQR: 6.5 – 3.5 = 3

1st Q: 3.5 3rd Q: 6.5

Page 10: Describing the variety – Descriptive statistics

– Variance:• It is the mean of the squared difference of the data

values from their mean.

(The variance of the statistical population is usually denoted by σ2 (Greek sigma)).

– Standard deviation (SD):• This is the square root of the variance.

(The variance of the statistical population is usually denoted by σ (Greek sigma)).

1

)(1

2

2

n

xxs

n

ii

1

)(1

2

n

xxs

n

ii

Page 11: Describing the variety – Descriptive statistics

– Standard error of the mean (SE):• This is the standard deviation of the mean. It describes

the stochastic fluctuation of the sample mean.

In other words, if we took a large number of repeated samples from the statistical population and computed the means of these samples, the standard deviation of the means would be equal to the value of the standard error.

E.g.Standard deviation of the data (s):

2.5sample size (n): 16Standard error of the mean: 2.5 / 4 =

0.625

n

sSEx

Page 12: Describing the variety – Descriptive statistics

– Confidence interval (CI) of the mean:• Provided we have a large sample from a statistical

population, the mean of the population (μ) is 95% likely to lie between the values 1.96×SE – and1.96×SE + .

• This region is called the 95% confidence interval of the population mean, because we can be 95% certain that it contains the population mean.

• The endpoints of the confidence interval are called the lower and upper limit of the confidence interval.

(If we have a small sized sample the multiplier is not 1.96, but a different value originated froma t distribution.)

xx

Page 13: Describing the variety – Descriptive statistics

Basic statistical graphs

• Graph for a qualitative variable:– Pie chart:

• Shows the relative amounts of the categories of a nominal variable (factor)

Page 14: Describing the variety – Descriptive statistics

• Graph for a discrete quantitative variable:– Bar chart:

• Displays the frequency of the data values of the data set.

Page 15: Describing the variety – Descriptive statistics

• Graphs for a continuous quantitative variable:– Histogram:

• Displays the frequency of data values fallen within a certain interval of the variable.

• Each category on the x-axis represents a range of values.

Number of observations (wheet grains) of which values lie between 23 and 24.

N.B: Do not confuse with the barplot which displays frequencies of discrete data.

Page 16: Describing the variety – Descriptive statistics

– Boxplot:• This is a very useful graph, displays many features of the

distribution of the data.

minimum

maximum

1st Q

3rd Q

medianIQR

Page 17: Describing the variety – Descriptive statistics

• Graph for displaying two quantitative variables together:– Scatterplot:

• Displays the statistical relationship between two quantitative variables.