average arithmetic and average quadratic deviation

Average Arithmetic and Average Quadratic

Deviation

Average Arithmetic and Average Quadratic Deviation

The average values, which give the generalized quantitative description of certain characteristic in statistical totality at the certain terms of place and time, are the most widespread form of statistical indices. They represent the typical lines of variation characteristic of the explored phenomena.

Average Arithmetic and Average Quadratic Deviation

Because of that quantitative description of characteristic is related to its high-quality side, it follows to examine average values only in light of terms of high-quality analysis. Except of summarizing estimation of certain characteristic the necessity of determination of changeable quantitative average values for the totality arises up also, when two groups which high-quality differ one from other are compared.

The use of averages in health protection

for description of work organization of health protection establishments (middle employment of bed, term of stay in permanent establishment, amount of visits on one habitant and other);


for description of indices of physical development (length, mass of body, circumference of head of new-born and other);

Introduction

The purpose of HYPOTHESIS TESTING is to aid the clinician, researcher, or administrator in reaching a conclusion concerning a POPULATION by examining a SAMPLE from that population.

sample

sample

population

inference

samplestatisticsinference parameter

Hypothesis is….A statement about one or more populations.

e.g. The average length of stay of patients

admitted to the hospital is 5 days; A certain drug will be effective in 90 percent

of the cases for which it is used.

Estimation: approximate a characteristic of the population with a statistic computed from the sample

x

of estimateour is x

mean sample

mean population

Types of hypotheses

Research hypothesis

The research hypothesis is the conjecture or supposition that motivates the research.

Statistical hypothesis

Statistical hypotheses are hypotheses that are stated in such a way that they be evaluated by appropriate statistical techniques.

Example

Twenty patients with a certain disease were randomly selected. The mean of enythrocyte sedimentation (ES) is 9.15 with standard deviation of 2.13. However, references reported mean of ES of this type of patients is 10.50.

Question: Does the mean of ES of this sample differ from 10.50?

Types of errors

P>0.05

Fail to

reject H0

Correct action Type II error β

P<0.05 Reject H0

Type I error

α

Correct action

True False

Condition of null hypothesis

Possible action

Type I and II error (one tailed)

Uα

(critical value)

1-α

1-β

α

β

H0:μ=0

H1:μ=μ0>0

μ

μ

μ+δ

Decision making

α/2=0.025α/2=0.025α/2=0.025

-1.96 1.960Rejection region

Rejection regionNon rejection region


for description of indices of physical development (length, mass of body, circumference of head of new-born and other);


for determination of medical-physiology indices of organism (frequency of pulse, breathing, level of arterial pressure and other);


for estimation of these medical-social and sanitary-hygienic researches (middle number of laboratory researches, middle norms of food ration, level of radiation contamination and others).

Averages

Averages are widely used for comparison in time, that allows to characterize the major conformities to the law of development of the phenomenon. So, for example, conformity to the law of growth increase of certain age children finds the expression in the generalized indices of physical development. Conformities to the law of dynamics (increase or diminishment) of pulse rate, breathing, clinical parameters at the certain diseases find the display in statistical indices which represent the physiology parameters of organism and other.

Chi-square distribution

3.84

Distribution rule

To reject the H0 if the value of the test statistic that we compute from our sample is one of the values in the rejection region ;

To not reject the H0 if the computed value of the test statistic is one of the values in the non-rejection region.

Significance levelThe decision as to which values go into

the rejection region and which ones go into the non-rejection region is made on the basis of the desired level of

significance, designated by α;The test statistic that falls in the rejection

region is said to be significant.

Hypothesis testing steps1. Data ----the nature of data ---test methods

2. Assumption ---e.g. normality of population distribution, equality of variance, and independence of samples-makes it possible to use certain mathematic models to reach an estimation of the sample.

3. Statistical Hypothesis and αlevel

4. Statistic computation

5. Decision making

HypothesisThe NULL HYPOTHESIS (H0) is the

hypothesis to be tested----hypothesis of no difference between mean of sample and mean of population

The ALTERNATIVE HYPOTHESIS (H1) is a statement of what we will believe is true if our sample data cause us to reject the null hypothesis.

Test statistic Decision maker (reject or not to reject

the H0) Computed from the data of the sample. Compare the computed statistics from

our sample to the corresponding CRITICAL VALUE, make decision

General Formula for Test Statistic

Test statistic =

Distribution of test statistic

Sample distribution is the key to statistical inference. t distribution or standard normal distribution

For example; t=

Follows the standard normal distribution if the hypothesis is true and the assumptions are met.

n

X

/0

S

t distribution and u distribution

Decision making

α/2=0.025α/2=0.025α/2=0.025

-1.96 1.960Rejection region

Rejection regionNon rejection region

Chi-square distribution

3.84

Type I and II error (one tailed)

Uα

(critical value)

1-α

1-β

α

β

H0:μ=0

H1:μ=μ0>0

μ

μ

μ+δ

Average ValuesAverage Values

Mean: the average of the data sensitive to outlying data

Median: the middle of the data not sensitive to outlying data

Mode: most commonly occurring value Range: the difference between the largest observation and

the smallest Interquartile range: the spread of the data

commonly used for skewed data Standard deviation: a single number which measures how much

the observations vary around the mean Symmetrical data: data that follows normal distribution

(mean=median=mode) report mean & standard deviation & n

Skewed data: not normally distributed (meanmedianmode) report median & IQ Range


Limit is it is the meaning of edge variant in a variation row

lim = Vmin Vmax


Amplitude is the difference of edge variant of variation row

Am = Vmax - Vmin


Average quadratic deviation characterizes dispersion of the variants around an ordinary value (inside structure of totalities).

Average quadratic deviation

σ = 1

2

n

d

simple arithmetical method


d = V - M

genuine declination of variants from the true middle arithmetic


σ = i

method of moments

22

n

dp

n

pd


is needed for:1. Estimations of typicalness of the middle arithmetic (М is typical for this row, if σ is less than 1/3 of average) value.2. Getting the error of average value.3. Determination of average norm of the phenomenon, which is studied (М±1σ), sub norm (М±2σ) and edge deviations (М±3σ).4. For construction of sigmal net at the estimation of physical development of an individual.


This dispersion a variant around of average characterizes an average

quadratic deviation ( )

n

2nd

Coefficient of variation is the relative measure of variety; it is a percent correlation of standard deviation and arithmetic average.

Terms Used To Describe The Quality Of Measurements

Reliability is variability between subjects divided by inter-subject variability plus measurement error.

Validity refers to the extent to which a test or surrogate is measuring what we think it is measuring.

Measures Of Diagnostic Test Accuracy

Sensitivity is defined as the ability of the test to identify correctly those who have the disease.

Specificity is defined as the ability of the test to identify correctly those who do not have the disease.

Predictive values are important for assessing how useful a test will be in the clinical setting at the individual patient level. The positive predictive value is the probability of disease in a patient with a positive test. Conversely, the negative predictive value is the probability that the patient does not have disease if he has a negative test result.

Likelihood ratio indicates how much a given diagnostic test result will raise or lower the odds of having a disease relative to the prior probability of disease.

Measures Of Diagnostic Test Accuracy

Expressions Used When Making Inferences About Data

Confidence Intervals- The results of any study sample are an estimate of the true value

in the entire population. The true value may actually be greater or less than what is observed.

Type I error (alpha) is the probability of incorrectly concluding there is a statistically significant difference in the population when none exists.

Type II error (beta) is the probability of incorrectly concluding that there is no statistically significant difference in a population when one exists.

Power is a measure of the ability of a study to detect a true difference.

Multivariable Regression Methods

Multiple linear regression is used when the outcome data is a continuous variable such as weight. For example, one could estimate the effect of a diet on weight after adjusting for the effect of confounders such as smoking status.

Logistic regression is used when the outcome data is binary such as cure or no cure. Logistic regression can be used to estimate the effect of an exposure on a binary outcome after adjusting for confounders.

Survival Analysis

Kaplan-Meier analysis measures the ratio of surviving subjects (or those without an event) divided by the total number of subjects at risk for the event. Every time a subject has an event, the ratio is recalculated. These ratios are then used to generate a curve to graphically depict the probability of survival.

Cox proportional hazards analysis is similar to the logistic regression method described above with the added advantage that it accounts for time to a binary event in the outcome variable. Thus, one can account for variation in follow-up time among subjects.

Kaplan-Meier Survival Curves

Why Use Statistics?

Cardiovascular Mortality in Males

0

0.2

0.4

0.6

0.8

1

1.2

'35-'44 '45-'54 '55-'64 '65-'74 '75-'84

SMR Bangor

Roseto

Descriptive Statistics

Identifies patterns in the data Identifies outliers Guides choice of statistical test

Percentage of Specimens Testing Positive for RSV (respiratory syncytial virus)

Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun

South 2 2 5 7 20 30 15 20 15 8 4 3

North-east

2 3 5 3 12 28 22 28 22 20 10 9

West 2 2 3 3 5 8 25 27 25 22 15 12

Mid-west

2 2 3 2 4 12 12 12 10 19 15 8

Descriptive Statistics

Percentage of Specimens Testing Postive for RSV 1998-99

0

5

10

15

20

25

30

35

Jul Sep Nov Jan Mar May Jul

SouthNortheastWestMidwest

Distribution of Course Grades

0

2

4

6

8

10

12

14

Number of Students

A A- B+ B B- C+ C C- D+ D D- F

Grade

Describing the Data with Numbers

Measures of Dispersion• RANGE • STANDARD DEVIATION• SKEWNESS

Measures of Dispersion

• RANGE • highest to lowest values

• STANDARD DEVIATION• how closely do values cluster around the

mean value• SKEWNESS

• refers to symmetry of curve

The Normal Distribution

Mean = median = mode

Skew is zero 68% of values fall

between 1 SD 95% of values fall

between 2 SDs

.

Me

an

, Med

ian

, Mo

de

1

2

average arithmetic and average quadratic deviation

Documents

average length

otherthe use of averages

statistical totality

certain disease

mean of es

certain terms of place

type of patients

physical development