intro statistics

9
1) A population is any specific collection of objects of interest. 2) A sample is any subset or sub-collection of the population, including the case that the sample consists of the whole population, in which case it is termed a census. 3) A measurement is a number or attribute computed for each member of a population or of a sample. The measurements of sample elements are collectively called the sample data. 4) A parameter is a number that summarizes some aspect of the population as a whole. A statistic is a number computed from the sample data. 5) Statistics is a collection of methods for collecting, displaying, analyzing, and drawing conclusions from data. 6) Descriptive statistics is the branch of statistics that involves organizing, displaying, and describing data. 7) Inferential statistics is the branch of statistics that involves drawing conclusions about a population based on information contained in a sample taken from that population. QUALITATIVE DATA are measurements for which there is no natural numerical scale, but which consist of attributes, labels, or other non-numerical characteristics. QUANTITATIVE DATA are numerical measurements that arise from a natural numerical scale. A data set can also be presented by means of a data frequency table, a table in which each distinct value x is listed in the first row and its frequency f, which is the number of times the value x appears in the data set, is listed below it in the second row. x 1 7 1 8 1 9 2 0 2 1 2 2 2 3 f 5 2 1 4 2 3 8 One way to reorganize and rewrite to make relevant information more visible is to construct a stem and leaf diagram. The numbers in the tens place, from 2 through 9, and additionally the number 10, are the “stems,” and are arranged in numerical order from top to bottom to the left of a vertical line Suppose 30 students in a statistics class took a test and made the following scores: 86 80 25 77 73 76 100 90 69 93 90 83 70 73 73 70 90 83 71 95 40 58 68 69 100 78 87 97 92 74

Upload: marcelamoreno

Post on 13-Sep-2015

225 views

Category:

Documents


4 download

DESCRIPTION

introduction to statistics

TRANSCRIPT

1) A population is any specific collection of objects of interest. 2) A sample is any subset or sub-collection of the population, including the case that the sample consists of the whole population, in which case it is termed a census. 3) A measurement is a number or attribute computed for each member of a population or of a sample. The measurements of sample elements are collectively called the sample data.4) A parameter is a number that summarizes some aspect of the population as a whole. A statistic is a number computed from the sample data.5) Statistics is a collection of methods for collecting, displaying, analyzing, and drawing conclusions from data.6) Descriptive statistics is the branch of statistics that involves organizing, displaying, and describing data.7) Inferential statistics is the branch of statistics that involves drawing conclusions about a population based on information contained in a sample taken from that population.

QUALITATIVE DATA are measurements for which there is no natural numerical scale, but which consist of attributes, labels, or other non-numerical characteristics. QUANTITATIVE DATA are numerical measurements that arise from a natural numerical scale.

A data set can also be presented by means of a data frequency table, a table in which each distinct value x is listed in the first row and its frequency f, which is the number of times the value x appears in the data set, is listed below it in the second row. x17181920212223

f5214238

One way to reorganize and rewrite to make relevant information more visible is to construct a stem and leaf diagram. The numbers in the tens place, from 2 through 9, and additionally the number 10, are the stems, and are arranged in numerical order from top to bottom to the left of a vertical line

Suppose 30 students in a statistics class took a test and made the following scores: 86 80 25 77 73 76 100 90 69 93 90 83 70 73 73 70 90 83 71 95 40 58 68 69 100 78 87 97 92 74The stem and leaf diagram is not practical for large data sets, so we need a different, purely graphical way to represent data. A frequency histogram is such a device. We will illustrate it using the same data set from the previous subsection.In our example of the exam scores in a statistics class, five students scored in the 80s. The number 5 is the frequency of the group labeled 80s. Since there are 30 students in the entire statistics class, the proportion who scored in the 80s is 5/30. The number 5/30, which could also be expressed as 0.16.1667, or as 16.67%, is the relative frequency of the group labeled 80s.The sample size, denoted by n, refers to the number of measurements in a sample. The sample mean of a set of n sample data is the number in the formula The population mean of a set of N population data if the number in the formula The sample median x^~ of a set of sample data for which there are an odd number of measurements is the middle measurement when the data are arranged in numerical order. The sample median x^~ of a set of sample data for which there are an even number of measurements is the mean of the two middle measurements when the data are arranged in numerical order. An outlier is a number that is far removed from most or all of the remaining measurements.

The relationship between the mean and the median for several common shapes of distributions. In each distribution we have drawn a vertical line that divides the area under the curve in half, which in accordance with is located at the median. The following facts are true in general: a. When the distribution is symmetric, as in panels (a) and (b) of, the mean and the median are equal. b. When the distribution is as shown in panel (c) of, it is said to be skewed right. The mean has been pulled to the right of the median by the long right tail of the distribution, the few relatively large data values. c. When the distribution is as shown in panel (d) of, it is said to be skewed left. The mean has been pulled to the left of the median by the long left tail of the distribution, the few relatively small data values.

The sample mode of a set of sample data is the most frequently occurring value.The population mode is defined in a similar way. On a relative frequency histogram, the highest point of the histogram corresponds to the mode of the data set. The range of a data set is the number R defined by the formula The sample variance of a set of n sample data is s2 in the formula: = The sample standard deviation of a set of n sample data is the square root of the sample variance:

The population variance of a set of N population is The population standard deviation of a set of N population is

Given an observed value x in a data set, x is the Pth percentile of the data if the percentage of the data that are less than or equal to x is P. The number P is the percentile rank of x. The Pth percentile cuts the data set in two so that approximately P% of the data lie below it and (100P)% of the data lie above it. The quartiles are the three percentiles (Q1, Q2, and Q3) that cut the data into 4ths (4 25% sections). The second quartile Q2 of any data set is its median. The lower set: all observations that are strictly less than Q2. (L={lower set}) The upper set: all observations that are strictly greater than Q2. (U={upper set}) The first quartile Q1 of the data set is the median of the lower set. The third quartile Q3 of the data set is the median of the upper set. The five-number summary of the data set are the five numbers that help describe the entire data set: {Xmin, Q1, Q2, Q3, Xmax}. The five-number summary is used to construct a box plot. Each of the five numbers is represented by a vertical line segment, a box is formed using the line segments at Q1 and Q3 as its two vertical sides, and two horizontal line segments are extended from the vertical segments marking Q1 and Q3 to the adjacent extreme values.

Xmin Q1 Q2 Q3 Xmax

The interquartile range (IQR) is the quantity: IQR=Q3Q1 (the distance from Q1 to Q3 is the length of the interval over which the middle half of the data range).

The z-score of an observation is the number z given by the computational formula: orThe z-score indicates how many standard deviations an individual observation x is from the center of the data set, its mean. If z is negative then x is below average. If z is 0 then x is equal to the average. If z is positive then x is above average.

The Empirical Rule If a data set has an approximately bell-shaped relative frequency histogram, then:1. Approximately 68% of the data lie within one standard deviation of the mean, that is, in the interval with endpoints x^s for samples and with endpoints for populations; 2. Approximately 95% of the data lie within two standard deviations of the mean, that is, in the interval with endpoints x^2s for samples and with endpoints 2 for populations; and 3. Approximately 99.7% of the data lies within three standard deviations of the mean, that is, in the interval with endpoints x^3s for samples and with endpoints 3 for populations.

Chebyshevs Theorem For any numerical data set, 1. At least 3/4 of the data lie within two standard deviations of the mean, that is, in the interval with endpoints x^2s for samples and with endpoints 2 for populations; 2. At least 8/9 of the data lie within three standard deviations of the mean, that is, in the interval with endpoints x^3s for samples and with endpoints 3 for populations; 3. At least 11/k2 of the data lie within k standard deviations of the mean, that is, in the interval with endpoints x^ks for samples and with endpoints k for populations, where k is any positive whole number that is greater than 1.

A random experiment is a mechanism that produces a definite outcome that cannot be predicted with certainty. The sample space associated with a random experiment is the set of all possible outcomes. An event is a subset of the sample space. A random experiment is an action for which all possible outcomes can be listed, but for which the actual outcome on any given trial of the experiment cannot be predicted with certainty. In such a situation we wish to assign to each outcome, such as rolling a two, a number called the probability of the outcome that indicates how likely it is that the outcome will occur. Similarly, we would like to assign a probability to any event, or collection of outcomes, such as rolling an even number, which indicates how likely it is that the event will occur if the experiment is performed.

A tree diagram is a device that can be helpful in identifying all possible outcomes of a random experiment, particularly one that can be viewed as proceeding in stages.

There are two possibilities for the first child, boy or girl, so we draw two line segments coming out of a starting point, one ending in a b for boy and the other ending in a g for girl. For each of these two possibilities for the first child there are two possibilities for the second child, boy or girl, so from each of the b and g we draw two line segments, one segment ending in a b and one in a g. For each of the four ending points now in the diagram there are two possibilities for the third child, so we repeat the process once more. The line segments are called branches of the tree. The right ending point of each branch is called a node. The nodes on the extreme right are the final nodes; to each one there corresponds an outcome, as shown in the figure. S={bbb,bbg,bgb,bgg,gbb,gbg,ggb,ggg}

The probability of an outcome e in a sample space S is a number p between 0 and 1 that measures the likelihood that e will occur on a single trial of the corresponding random experiment. The value p = 0 corresponds to the outcome e being impossible and the value p = 1 corresponds to the outcome e being certain.

The probability of an event A is the sum of the probabilities of the individual outcomes of which it is composed. It is denoted P(A). The following formula expresses the content of the definition of the probability of an event: If an event E is then

8Reglas multiplicativas, Regla de Bayes10, 11Tema 3 Variable aleatoria y distribuciones de probabilidadConcepto de variable aleatoria, distribuciones discretas de probabilidad12Distribuciones continuas de probabilidad15Tema 4 Esperanza Matemtica Qu inversin se debe elegir considerando distintos escenarios econmicos probables y usando el criterio de valor monetario esperado mximo? Media de una variable aleatoria16Varianza de variables aleatorias17Media y varianzas de combinaciones lineales18, 19Tema 5Distribuciones de Probabilidad Discreta Binomial20Hipergeomtrica

A random variable is a numerical quantity that is generated by a random experiment.