midterm self tests

Download Midterm Self Tests

If you can't read please download the document

Upload: walter-golden

Post on 24-Nov-2014

93 views

Category:

Documents


2 download

TRANSCRIPT

CHAPTER 1 Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. A statistician is an expert with at least a masters degree in mathematics or statistics, while a data analyst is anyone who works with data. Descriptive statistics is the collection, organization, presentation, and summary of data with charts or numerical summaries. Inferential statistics refers to generalizing from a sample to a population, estimating unknown parameters, drawing conclusions, and making decisions. Statistics is used in all branches of business. Statistical challenges include imperfect data, practical constraints, and ethical dilemmas. Effective technical report writing requires attention to style, grammar, organization, and proper use of tables and graphs. Business data analysts must learn to write a good executive summary and learn the 3 Ps for oral presentations: pace, planning, and practice. Statistical tools are used to test theories against empirical data. Pitfalls include nonrandom samples, incorrect sample size, and lack of causal links. The field of statistics is relatively new and continues to grow as mathematical frontiers expand. 1. Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data. TRUE 2. Inferential statistics refers to generalizing from a sample to a population, estimating unknown parameters, drawing conclusions, and making decisions. TRUE 3. Descriptive statistics refers to the collection, organization, presentation, and summary of data. TRUE 4. Using graphs and data to give authority to poor data is an example of statistical generalization. FALSE 5. A strong correlation between A and B would suggest that B must be caused by A) FALSE 6. A statistical test my be significant yet have no practical importance. TRUE 7. To protect professional integrity, any data analyst must know and follow accepted procedures, maintain data integrity, carry out accurate calculations, report procedures faithfully, protect confidential information, cite sources, and acknowledge sources of financial support. TRUE 8. In preparing an oral statistical presentation the 3 P's refer to Pace, Planning and Performance. FALSE CHAPTER 2 A data set is an array with n rows and m columns. Data sets may be univariate (one variable), bivariate (two variables), or multivariate (three or more variables). There are two basic data types: attribute data (categories that are described by labels) or numerical (meaningful numbers). Numerical data are discrete if the values are integers or can be counted or continuous if any interval can contain more data values. Nominal measurements are names, ordinal measurements are ranks, interval measurements have meaningful distances between data values, and ratio measurements have meaningful ratios and a zero reference point. Time series data are observations measured at n different points in time or over sequential time intervals, while cross-sectional data are observations among n entities such as individuals, firms, or geographic regions. Among probability samples, simple random samples pick items from a list using random numbers, systematic samples take every kth item, cluster samples select geographic regions, and stratified samples take into account known population proportions. Nonprobability samples include convenience or judgment samples, gaining time but sacrificing randomness. Survey design requires attention to question wording and scale definitions. Survey techniques (mail, telephone, interview, Web, direct observation) depend on time, budget, and the nature of the questions and are subject to various sources of error. 1. Attribute data have values that are described by words rather than numbers. TRUE 2. Numerical data can be either discrete or continuous. TRUE 3. The number of checks processed at a bank in a day is an example of attribute data. FALSE 4. The weight of a bag of dog food is an example of discrete data. FALSE 5. Nominal data refer to data that can be categorized and ordered. FALSE 6. Temperature measured in degrees Fahrenheit is an example of interval data. TRUE 7. Ordinal data are data that can be ranked. TRUE 8. Generally researchers would prefer sample data rather than census data in describing some population of interest. FALSE 9. Sample bias is a result of non-randomness in a sample. TRUE 10. A sampling frame is used to help identify the target population in a statistical study. TRUE 11. Internet surveys posted on popular websites such as MSN.com rely on convenience sampling. TRUE 12. Analysis of stock market prices during the depression would require the use of time series data. TRUE

CHAPTER 3 For a set of observations on a single numerical variable, a dot plot displays the individual data values, while a frequency distribution classifies the data into classes called bins for a histogram of frequencies for each bin. The number of bins and their limits are matters left to your judgment, though Sturgess Rule offers advice on the number of bins. The line chart shows values of one or more time series variables plotted against time. A log scale is sometimes used in time series charts when data vary by orders of magnitude. The bar chart shows a numerical data value for each category of an attribute. However, a bar chart can also be used for a time series. A scatter plot can reveal the association (or lack of association) between two variables X and Y. The pie chart (showing a numerical data value for each category of an attribute if the data values are parts of a whole) is common but should be used with caution. Sometimes a simple table is the best visual display. Creating effective visual displays is an acquired skill. Excel offers a wide range of charts from which to choose. Deceptive graphs are found frequently in both media and business presentations, and the consumer should be aware of common errors. 1. The Pareto chart is used to display the "vital few" causes of problems. TRUE 2. Dot plots are similar to histograms with many bins (classes). TRUE 3. Sturges' Rule is not an ironclad requirement, but merely a suggestion. TRUE 4. Pie charts can be useful in describing attribute data. TRUE 5. Scatter plots are widely used in business, education, and science. TRUE 6. A data set with two values that are tied for the highest number of occurrences is called bimodal. TRUE 7. Frequency histograms must have equal bin widths in order to avoid visual distortion of the data TRUE 8. A scatter plot is useful in visualizing trends in time series data. FALSE CHAPTER 4 The mean and median describe a samples central tendency and also indicate skewness. The mode is useful for discrete data with a small range. The trimmed mean eliminates extreme values. The geometric mean mitigates high extremes but fails when zeros or negative values are present. The midrange is easy to calculate but is sensitive to extremes. Dispersion is typically measured by the standard deviation while relative dispersion is given by the coefficient of variation for nonnegative data. Standardized data reveal outliers or unusual data values, and the Empirical Rule offers a comparison with a normal distribution. In measuring dispersion, the mean absolute deviation or MAD is easy to understand, but lacks nice mathematical properties. Quartiles are meaningful even for fairly small data sets, while percentiles are used only for large data sets. Box plots show the quartiles and data range. We can estimate many common descriptive statistics from grouped data. Sample coefficients of skewness and kurtosis allow more precise inferences about the shape of the population being sampled instead of relying on histograms. 1. The midrange is very sensitive to outliers. TRUE 2. A trimmed mean may be preferable to a mean when a data set has some extreme values. TRUE 3. Given the data set 10, 5, 2, 6, 3, 4, 20, the median value is 5. TRUE 4. When data are right-skewed, we expect the median to be greater than the mean FALSE 5. If there are 19 data values, the median will have 10 values above it and 9 below it because n is odd. FALSE 6. If the standard deviations of two samples are the same, so will be their coefficients of variation. FALSE 7. A certain Health Maintenance Organization (HMO) examined the number of office visits by its members in the last year. This data would probably be skewed to the left due to low outliers. FALSE 8. Skewness and kurtosis are both measures of a distribution's dispersion. FALSE 9. The coefficient of variation is useful to compare data set with dissimilar units of measurements. TRUE 10. Typically, outliers are any data values which fall beyond 2 standard deviations of the mean. FALSE 11. When applying the Empirical Rule to a distribution of grades, if a student scored one standard deviation below the mean she would be at the 25th percentile of the distribution. FALSE 12. A leptokurtic distribution is more sharply peaked (i.e. thinner tails) than a normal distribution. TRUE CHAPTER 5 The sample space for a random experiment describes all possible outcomes. Simple events in a discrete sample space can be enumerated, while outcomes of a continuous sample space can only be described by a rule. An

empirical probability is based on relative frequencies, a classical probability can be deduced from the nature of the experiment, and a subjective probability is based on judgment. An events complement is every outcome except the event. The odds are the ratio of an events probability to the probability of its complement. The union of two events is all outcomes in either or both, while the intersection is only those events in both. Mutually exclusive events cannot both occur, and collectively exhaustive events cover all possibilities. Dichotomous or polytomous events are mutually exclusive and collectively exhaustive. The conditional probability of an event is its probability given that another event has occurred. Two events are independent if the conditional probability of one is the same as its unconditional probability. The joint probability of independent events is the product of their probabilities. A contingency table is a cross-tabulation of frequencies for two variables with categorical outcomes and can be used to calculate probabilities. A tree visualizes events in a sequential diagram. Bayess Theorem shows how to revise a prior probability to obtain a conditional or posterior probability when another events occurrence is known. The number of arrangements of sampled items drawn from a population is found with the formula for permutations (if order is important) or combinations (if order does not matter). 1. The sum of all the probabilities of simple events in a sample space equals one. TRUE 2. The probability of an event will always be a value greater than zero, but less than one. FALSE 3. The union of two events A and B is the event consisting of all outcomes in the sample space that are contained in both event A and event B. FALSE 4. The general law of addition for probabilities says P(A U B) = P(A) + P(B) P(A B) TRUE 5. Two events A and B are independent only if P(A | B) is the same as P(A). TRUE 6. For any event A, the probability of A is 0 P(A) 1. TRUE 7. If events A and B are mutually exclusive, the joint probability of the events is zero. TRUE 8. The probability of A and its complement (A') will always sum to one. TRUE 9. If P(A) = 0.50 and P(B) = 0.30 and P(A B) = 0.15, then A and B independent are events. TRUE 10. When two or more events can occur at the same time, they are said to be mutually exclusive. FALSE 11. The probability of events A or B occurring can be found by summing the probabilities of the individual events. FALSE 12. A contingency table is a cross-tabulation of frequencies for two variables with categorical outcomes, and can be used to calculate probabilities. TRUE CHAPTER 6 A random variable assigns a numerical value to each outcome in the sample space of a stochastic process. A discrete random variable has a countable number of distinct values. Probabilities in a discrete probability distribution must be between zero and one, and must sum to one. The expected value is the mean of the distribution, measuring central tendency, and its variance is a measure of dispersion. A known distribution is described by its parameters, which imply its probability distribution function (PDF) and its cumulative distribution function (CDF). As summarized in Table 6.14 the uniform distribution has two parameters (a, b) that define its range a X b. The Bernoulli distribution has one parameter (, the probability of success) and two outcomes (0 or 1). The binomial distribution has two parameters (n, ). It describes the sum of n independent Bernoulli random experiments with constant probability of success. It may be skewed left ( > .50) or right ( < .50) or symmetric ( = .50) but becomes less skewed as n increases. The Poisson distribution has one parameter (, the mean arrival rate). It describes arrivals of independent events per unit of time or space. It is always right-skewed, becoming less so as increases. The hypergeometric distribution has three parameters (N,n,s). It is like a binomial, except that sampling of n items is without replacement from a finite population of N items containing s successes. The geometric distribution is a one-parameter model (, the probability of success) that describes the number of trials until the first success. Figure 6.33 shows the relationships among these five discrete models. 1. A discrete random variable has a countable number of distinct values. TRUE 2. The expected value of a discrete random variable E(X), is the sum of all X values weighted by their respective probabilities. TRUE 3. The outcomes from the roll of one die can be described as a discrete uniform distribution with = 3.5

4. 5. 6. 7.

and s = 2.5. FALSE When = 0.7 the discrete binomial distribution is negatively skewed. TRUE The hypergeometric distribution assumes that the probability of a success remains the same from one trial to the next. FALSE Although the shape of the Poisson distribution is positively skewed, it becomes more nearly symmetrical if its mean becomes larger. TRUE The Poisson distribution describes the number of occurrences within a randomly-chosen unit of time or space. TRUE