week 1 september 1-5 six mini-lectures qmm 510 fall 2014

93
Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

Upload: rosalind-oneal

Post on 25-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

Week 1 September 1-5

Six Mini-Lectures QMM 510Fall 2014

Page 2: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-2

Getting Started ML 1.1

• self-introductions (Moodle mini-biographies)

• course format, syllabus, projects

• grading, communication

• goals: short run vs long run

Ch

apter 0

You can watch the instructor’s introductory welcome video for MBA students (posted on Moodle)

Page 3: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-3

Getting StartedC

hap

ter 0

TextbookDavid P. Doane and Lori E. Seward, Applied Statistics in Business and Economics, 4th edition (McGraw-Hill, 2013), ISBN 0077931505. This is an omnibus ISBN that includes several components (textbook, Connect access, MegaStat download). All four components are essential because this is an online course. The Oakland University campus book center (248-370-2404) has this package ISBN in stock (and can ship to you if necessary).

Page 4: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-4

Ch

apter 0

Online ResourcesHomework, testing, and grading will utilize McGraw-Hill's Connect Plus. The Online Learning Center (OLC) has downloadable data sets for exercises and examples, as well as Big Data Sets, PowerPoint slides, self-graded practice quizzes, and step-by-step guided examples. The instructor will post mini-lectures on Moodle.

Getting Started

Page 5: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-5

Ch

apter 0

Course OrganizationUnless otherwise indicated, online quizzes, exercises, and written projects are due by midnight on Monday of the week shown in the syllabus. Use e-mail ([email protected]) or call me (cell 248-766-7605) Note: Instructor is in the Pacific time zone (please use judgment when calling). Post questions on Moodle forum.

Getting Started

Page 6: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-6

Ch

apter 0

GradingStudents will complete several written projects (50% weight, graded by instructor) and several Connect assignments with online feedback (50% weight). Basically, you will submit one assignment (Connect or Project) per week except for weeks 9 and 13. Grades will be posted on Moodle.

Getting Started

Page 7: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-7

Ch

apter 0

Homework using Connect C-1 Chapters 2-3 (Sep 8) C-5 Chapter 8 (Oct 20)

C-2 Chapter 4 (Sep 15) C-6 Chapter 9-10 (Nov 3)

C-3 Chapters 5-6 (Sep 29) C-7 Chapter 15 (Nov 10)

C-4 Chapter 7 (Oct 6) C-8 Chapter 12 (Nov 17)

Note: Connect assignments allow three attempts. Online feedback increases with each attempt. Assignments will be auto-submitted on due date. Your score will be the average of all three attempts, so it pays to try hard on each attempt. You may complete them in advance (they are accessible anytime up to due date). Be sure to save your work when you exit Connect.

Getting Started

Page 8: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-8

Ch

apter 0

Projects P-1 Describing a sample (Sep 22)

P-2 Making forecasts (Oct 13)

P-3 Regression modeling (Dec 3)

Note: For each project, submit a concise (5-10 page) report (not a spreadsheet or PowerPoint) using Microsoft Word or equivalent that answers the questions posed along with your own comments and interpretations. Strive for effective writing (see textbook Appendix I). Creativity and initiative will be rewarded. In projects done with partners or teams, submit only one report.

Getting Started

Page 9: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-9

Ch

apter 0Short Run Complete weekly assignments successfully

Improve Excel and report-writing skills

Balance this course against other responsibilities

Enjoy learning and want to learn more

Goals: Short Run / Long Run

Long Run Succeed in other MBA classes that use statistics

Develop confidence and lose fear of quant methods

Use resources to learn on your own (web, textbook)

Page 10: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-10

Resources Available ML 1.2

• textbook, e-book• OLC (http://www.mhhe.com/doane4e)

• Connect (http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014)

• Moodle (https://moodle.oakland.edu/)

• MegaStat (http://www.mhhe.com/megastat)

• LearningStats (http://www.mhhe.com/doane4e)

Ch

apter 0

Page 11: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-11

Resources Available

Textbook, e-book

Ch

apter 0

• Basically, we will cover the first 14 chapters

• Within chapters some topics get less weight

• Focus on what you need for assignments

Not covered in this class

Page 12: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-12

Resources AvailableC

hap

ter 0

E-book: In addition to textbook, you have an e-book

Premium content:ScreenCam videos on Excel and MegaStat

Pre-paid registration code is required to use Connect Plus

Connect Plus (http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014)

Page 13: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-13

Resources AvailableC

hap

ter 0

Premium content:5-minute tutorials on Excel and MegaStat

OLC (http://www.mhhe.com/doane4e)A pre-paid registration code is required to use Connect Plus and premium content

The OLC is available to anyone (without premium content)

Connect Plus (http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014)

Page 14: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

ScreenCam tutorials on Excel statistics – by Professor Doane (4 videos, 5 min each) if you need it

Resources Available

A pre-paid registration code is required to use Connect Plus and premium content

Ch

apter 0Connect Plus (http://connect.mcgraw-hill.com/class/d_doane_qmm_510_-_fall_2014)

Page 15: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-15

Resources Available OLC (http://www.mhhe.com/doane4e)

Ch

apter 0

Course: Big Data Sets, LearningStats, etc

Click on a chapter:Quizzes, PowerPoints for that chapter

Resources AvailableNo registration code required to use OLC

Page 16: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-16

Resources Available MegaStat (http://www.mhhe.com/megastat)

Ch

apter 0

Click to download: Pre-paid with code (with ISBN 0077931505)

Page 17: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-17

Resources Available MegaStat (http://www.mhhe.com/megastat)

Ch

apter 0

Add-Ins tab: Click on this tab to see MegaStat drop-down menu

Drop-down menu: Adds statistical capability to Excel

Page 18: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-18

Resources AvailableC

hap

ter 0

Files are zipped: Download one chapter at a time

OLC (http://www.mhhe.com/doane4e)

Appendix A F Tables (346.0K) Appendix I Business Reports (1011.0K) Unit 01 Overview of Statistics (5925.0K) Unit 02 Data Collection (815.0K) Unit 03 Data Presentation (9572.0K) Unit 04 Describing Data (3337.0K) Unit 05 Probability (478.0K) Unit 06 Discrete Distributions (550.0K) Unit 07 Continuous Distributions (1409.0K) Unit 08 Estimation (2103.0K) Unit 09 Hypothesis Tests I (1135.0K) Unit 10 Hypothesis Tests II (420.0K) Unit 11 ANOVA (192.0K) Unit 12 Simple Regression (2245.0K) Unit 13 Multiple Regression (2756.0K) Unit 14 Time Series I (1519.0K) Unit 15 Chi Square Tests (627.0K) Unit 16 Nonparametric Tests (1385.0K) Unit 17 Quality Management (1329.0K) Unit 18 Simulation (1460.0K)

LearningStats is a supplement – nice but not part of the textbook (demos, spreadsheets, slides)

Page 19: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-19

Challenges for MBAs ML 1.3C

hap

ter 1

1.1 What is Statistics?

1.2 Why Study Statistics?

1.3 Uses of Statistics

1.4 Statistical Challenges

1.5 Critical Thinking

Page 20: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-20

What is Statistics?

Statistics is the science of collecting, organizing, analyzing, interpreting, and presenting data.

A statistic is a single measure (number) used to summarize a sample data set; for example, the average height of students in a university.

Ch

apter 1

Page 21: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-21

• Data mining, neural tools, simulation, spreadsheet modeling, etc

• Costly software

• Specialized expertise required

• Huge databases (millions of records, complex file structure, sparse or missing data, proprietary concerns, privacy issues)

Big Data, Big ToolsC

hap

ter 1

Page 22: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-22

Descriptive statistics – the collection, organization, presentation, and summary of data.

Inferential statistics – generalizing from a sample to a population, estimating unknown parameters, drawing conclusions, making decisions.

Uses of StatisticsC

hap

ter 1

Page 23: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-23

Why Study Statistics

• Statistical knowledge gives a company a competitive advantage against organizations that cannot understand their internal or external market data.

• Mastery of basic statistics gives an individual manager a competitive advantage as one works one’s way through the promotion process, or when one moves to a new employer.

Ch

apter 1

Page 24: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-24

• Is technically current (e.g., software-wise).

• Communicates well.

• Is proactive.

The Ideal Data AnalystC

hap

ter 1

• Has a broad outlook.

• Is flexible.

• Focuses on the main problem.

• Meets deadlines

• Knows his/her limitations and is willing to ask for help.

• Can deal with imperfect information.

• Has professional integrity.

Page 25: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-25

• Treat customers in a fair and honest manner.

• Comply with laws that prohibit discrimination.

• Ensure that products and services meet safety regulations.

• Stand behind warranties.

• Advertise in a factual and informative manner.

• Encourage employees to ask questions and voice concerns.

• Accurately report information to management.

Business EthicsC

hap

ter 1

Page 26: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-26

• Know and follow accepted procedures.

• Maintain data integrity.

• Carry out accurate calculations.

• Report procedures faithfully.

• Protect confidential information.

• Cite sources.

• Acknowledge sources of financial support.

Upholding Ethical StandardsC

hap

ter 1

Page 27: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-27

Pitfall 1: Big Conclusions from a Small Sample

Pitfall 2: Conclusions from Nonrandom Samples

Pitfall 3: Conclusions From Rare Events

Pitfall 4: Poor Survey Methods

Pitfall 5: Assuming a Causal Link

Pitfall 6: Generalization from Groups

Pitfall 7: Unconscious Bias

Pitfall 8: Significance versus Importance

Critical ThinkingC

hap

ter 1

Page 28: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

1-28

Hire consultants at the beginning of the project, when your team lacks certain skills or when an unbiased or informed view is needed.

Using ConsultantsC

hap

ter 1

Page 29: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-29

Ch

apter 2

Collecting Data ML 1.4

Chapter Contents

2.1 Definitions

2.2 Level of Measurement

2.3 Sampling Concepts

2.4 Sampling Methods

2.5 Data Sources

2.6 Surveys

Page 30: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-30

• Observation: a single member of a collection of items that we want to study, such as a person, firm, or region.

• Variable: a characteristic of the subject or individual, such as an employee’s income or an invoice amount

• Data Set: consists of all the values of all of the variables for all of the observations we have chosen to observe.

DefinitionsC

hap

ter 2

Page 31: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-31

Ch

apter 2

Time Series vs Cross-Sectional Data

Time Series Data• Each observation in the sample represents a different equally spaced point

in time (e.g., years, months, days).

• Periodicity may be annual, quarterly, monthly, weekly, daily, hourly, etc.

• We are interested in trends and patterns over time (e.g., personal bankruptcies from 1980 to 2008).

Page 32: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-32

Ch

apter 2

Cross Sectional Data• Each observation represents a different individual unit (e.g., person) at

the same point in time (e.g., monthly VISA balances).

• We are interested in: - variation among observations or - relationships.

• We can combine the two data types to get pooled cross-sectional and time series data.

Time Series vs Cross-Sectional Data

Page 33: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-33

Data Types

(Figure 2.1)

Ch

apter 2

Caution: Ambiguity is introduced when continuous data are rounded to whole numbers so they seem discrete (e.g., round your weight from 166.4 to 166). When the range is large, it is usually best to treat integers as continuous data.

Page 34: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-34

Ch

apter 2

Level of Measurement

Page 35: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-35

Level of Measurement Characteristics Example

Nominal Categories onlyEye color (blue, brown, green, etc.)

OrdinalRank has meaning. No clear meaning to distance

Exercise frequency (often, rarely, never)

Interval Distance has meaning Temperature (57o Celsius)

Ratio Meaningful zero existsAccounts payable ($21.7 million)

Ch

apter 2

Level of Measurement

Page 36: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-36

Nominal Measurement• Nominal data merely identify a category.

• Nominal data can be coded numerically (e.g., 1 = Apple, 2 = Toshiba, 3 = Dell, 4 = HP, 5 = Other).

• Only mathematical operation allowed is counting (e.g., frequencies) or calculating percent in each category.

Ordinal Measurement• Ordinal data codes can be ranked (e.g., 1 = Frequently, 2 =

Sometimes, 3 = Rarely, 4 = Never).

Ch

apter 2

Level of Measurement

Page 37: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-37

Ordinal Measurement• Distance between codes is not meaningful

(e.g., distance between 1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning).

• Many useful statistical tests exist for ordinal data, especially in social science, marketing and human resource research.

Interval Measurement• Data can not only be ranked, but also have meaningful intervals

between scale points (e.g., difference between 60F and 70F is same as difference between 20F and 30F).

Ch

apter 2

Level of Measurement

Page 38: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-38

Interval Measurement• Intervals between numbers represent distances, so math operations

can be performed (e.g., take the average).

• Zero point of interval scales is arbitrary, so ratios are not meaningful (e.g., 60F is not twice as warm as 30F).

Ratio Measurement• Ratio data have all properties of nominal, ordinal, and interval data

types and also a meaningful zero.

• Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as much as $10 million).

• Zero does not have to be observable; it is a reference point.

Ch

apter 2

Level of Measurement

Page 39: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-39

• A special case of interval data frequently used in survey research.

• The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7). Responses are often coded as numbers (e.g., 1, 2, 3, 4, 5) but technically are ordinal measurements.

• Researchers generally treat Likert scales as interval data (no true zero) so they can calculate the mean and standard deviation.

Ch

apter 2

Likert Scales

Page 40: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-40

Use the following procedure to recognize data types:

Question If “Yes”

Q1. Is there a meaningful zero point?

Ratio data (statistical operations are allowed)

Q2. Are intervals between scale points meaningful?

Interval data (common statistics allowed, e.g., means and standard deviations)

Q3. Do scale points represent rankings?

Ordinal data (restricted to certain types of nonparametric statistical tests)

Q4. Are there discrete categories?

Nominal data (only counting allowed, e.g., finding the mode)

Ch

apter 2

Level of Measurement

Page 41: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-41

• In order to simplify data or when exact data magnitude is of little interest, ratio data can be recoded downward into ordinal or nominal measurements (but not conversely).

• For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “high” (over 140).

• Or recode your income (a ratio measurement) as ordinal (low, medium, high) by specifying cutoff points.

• The above recoded data are ordinal (ranking is preserved), but intervals are unequal and some information is lost.

Ch

apter 2

Changing Data By Recoding

Page 42: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-42

Ch

apter 2

• A sample involves looking only at some items selected from the population.

• A census is an examination of all items in a defined population.

• Why sample instead of census?

• Cost, time, budget constraints.

• Accuracy may be better in a sample (training, etc).

• For example, the United States Census cannot survey every person in the population (mobility, un-documented workers, budget constraints, incomplete responses, etc).

Sample or Census?

Page 43: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-43

Situations Where A Sample or Census May Be Preferred

Sample Census

Infinite population Small population

Destructive testing Large sample size

Timely results Database exists

Accuracy Legal requirements

Cost

Sensitive information

Ch

apter 2

Sampling Concepts

Page 44: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-44

• Statistics are computed from a sample of n items, chosen from a population of N items.

• Statistics can be used as estimates of parameters found in the population.

• Specific symbols are used to represent population parameters and sample statistics.

Ch

apter 2

Parameters and Statistics

Example: If you use the symbol s, the statistician assumes that you are referring to a sample standard deviation, whereas σ would denote a population standard deviation.

Page 45: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-45

Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n ≥ 20 or equivalently if n/N < .05).

Ch

apter 2

Parameters and Statistics

Page 46: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-46

Simple random sample Use random numbers to select items from a list (e.g., VISA cardholders).

Systematic sample Select every kth item from a list or sequence (e.g., restaurant customers).

Stratified sample Select randomly within defined strata (e.g., by age, occupation, gender).

Cluster sample Like stratified sampling except strata are geographical areas (e.g., zip codes).

Ch

apter 2

Sampling Methods

Random Sampling

Page 47: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-47

Judgment sample Use expert knowledge to choose “typical” items (e.g., which employees to interview).

Convenience sample

Use a sample that happens to be available (e.g., ask co-worker opinions at lunch).

Focus groups In-depth dialog with a representative panel of individuals (e.g., iPod users).

Ch

apter 2

Non-random Sampling

Sampling Methods

Page 48: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-48

With or Without Replacement

• If we allow duplicates when sampling, then we are sampling with replacement.

• Duplicates are unlikely when n is much smaller than large N.

• If we do not allow duplicates when sampling, then we are sampling without replacement.

Ch

apter 2

Sampling Methods

Page 49: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-49

Computer Methods

These are pseudo-random generators because even the best algorithms eventually repeat themselves.

Excel - Option A Enter the Excel function =RANDBETWEEN(1,875) into 10 spreadsheet cells. Press F9 to get a new sample.

Excel - Option B Enter the function =INT(1+875*RAND()) into 10 spreadsheet cells. Press F9 to get a new sample.

Internet The website www.random.org will give you many kinds of excellent random numbers (integers, decimals, etc).

Minitab Use Minitab’s Random Data menu with the Integer option.

Ch

apter 2

Sampling Methods

Page 50: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-50

Row – Column Data Arrays

• When the data are arranged in a rectangular array, an item can be chosen at random by selecting a row and column.

• For example, in the 4 x 3 array, select a random column between 1 and 3 and a random row between 1 and 4.

• This way, each item has an equal chance of being selected.

Ch

apter 2

Sampling Methods

Page 51: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-51

Randomizing a List• In Excel, use function =RAND() beside each row to create a column of

random numbers between 0 and 1.

• Copy and paste these numbers into the same column using Paste Special > Values in order to paste only values and not the formulas.

• Sort the spreadsheet on the random number column.

Ch

apter 2

Demonstration: CEO compensation (362 CEOs).

Sampling Methods

Page 52: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-52

Randomizing a List of 362 CEOs

Ch

apter 2

Rand() Rank Name Company Total Comp ($thou)0.0015203 254 Gary L Bloom Veritas Software 3,4920.0060530 173 Edmond J English TJX Cos 6,9380.0074301 350 William V Hickey Sealed Air 1,0490.0087558 202 William Clay Ford Jr Ford Motor 5,6030.0093715 169 David N Farr Emerson Electric 7,1540.0140494 305 Carl E Jones Jr Regions Financial 2,4710.0153532 309 James S Tisch Loews 2,3800.0161077 81 James E Rogers Cinergy 14,5740.0210922 184 Luke R Corbett Kerr-McGee 6,4350.0222110 242 John B Hess Amerada Hess 3,912

Rank Name Company Total Comp ($thou)1 Terry S Semel Yahoo 230,5542 Barry Diller IAC/InterActiveCorp 156,1683 William W McGuire UnitedHealth Group 124,7744 Howard Solomon Forest Labs 92,1165 George David United Technologies 88,7126 Lew Frankfort Coach 86,4817 Edwin M Crawford Caremark Rx 77,8648 Ray R Irani Occidental Petroleum 64,1369 Angelo R Mozilo Countrywide Financial 56,95610 Richard D Fairbank Capital One Financial 56,66011 Richard M Kovacevich Wells Fargo 53,083

Before: CEOs are arranged in descending order of compensation.

After: Sorted on RAND() column. The first k CEOs are a random sample.

Sampling Methods

Page 53: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-53

Systematic Sampling

• For example, starting at item 2, we sample every 4 items to obtain a sample of n = 20 items from a list of N = 78 items.

Note that N/n = 78/20 4 (periodicity).

• Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list.

Ch

apter 2

Sampling Methods

Page 54: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-54

Stratified Sampling• Requires prior information about the population.

• Applicable when the population can be divided into relatively homogeneous subgroups of known size (strata).

• A simple random sample of the desired size is taken within each stratum.

Ch

apter 2

Sampling Methods

Page 55: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-55

Cluster Sample• Strata consist of geographical regions.

• One-stage cluster sampling – sample consists of all elements in each of k randomly chosen subregions (clusters).

• Two-stage cluster sampling, first choose k subregions (clusters), then choose a random sample of elements within each cluster.

Ch

apter 2

Sampling Methods

Page 56: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-56

• Here is an example of 4 elements sampled from each of 3 randomly chosen clusters (two-stage cluster sampling).

Cluster Sample

Ch

apter 2

Sampling Methods

Page 57: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-57

Judgment Sample• A non-probability sampling method that relies on the expertise of

the sampler to choose items that are representative of the population.

• Can be affected by subconscious bias (i.e., non-randomness in the choice).

Ch

apter 2

Sampling Methods

Page 58: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

2-58

Focus Groups

Convenience Sample

• Take advantage of whatever sample is available at that moment. A quick way to sample.

• A panel of individuals chosen to be representative of a wider population, formed for open-ended discussion and idea gathering.

Ch

apter 2

Sampling Methods

Page 59: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-59

Describing Data Visually ML 1.5

3.1 Stem-and-Leaf Displays and Dot Plots3.2 Frequency Distributions and Histograms3.3 Excel Charts3.4 Line Charts3.5 Bar Charts3.6 Pie Charts3.7 Scatter Plots3.8 Tables3.9 Deceptive Graphs

Ch

apter 3

So many topics, so little time …

Page 60: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-60

For univariate data (a set of n observations on one variable) the statistician would consider the following:

Ch

apter 3

Describing Data

Page 61: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-61

• Look and ThinkLook at the data and visualize how they were collected and measured. Maybe the data values were rounded off?

• Sorting (Example: Price/Earnings Ratios)

Sort the data. Without fancy calculations, you can see the range, and get an idea of typical values. Note that these surely are rounded (price/earnings would not be exactly an integer).

Ch

apter 3

Visualizing Data

Page 62: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-62

To visualize small integer data sets we can use a stem-and-leaf plot. It is basically a frequency tally, except that we write digits instead of tally marks. For two-digit integer data, the stem is the tens digit of the data, and the leaf is the ones digit. For the 44 P/E ratios, the stem-and-leaf plot is:

Ch

apter 3

Stem-and-Leaf

Caution Teachers like it, but you rarely see this display in business because it only works for simple integer data (at least, without heroic modifications).

Use equally spaced stems (even if some stems are empty). The stem-and-leaf can reveal center (24 P/E ratios were in the 10–19 stem) as well as variability (the range is from 7 to 59) and shape (right-skewed, mode in the 2nd stem). In this illustration, the leaf digits have been sorted, although this is not necessary. An advantage of the stem-and-leaf is that we can retrieve the raw data. For example, the data values in the fourth stem are 31, 37, 37, 38.

Page 63: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-63

Steps in Making a Dot Plot

Dot plots- are easy to understand. - reveal center, variability, and shape of the distribution.

1. Make a scale that covers the data range.

2. Mark the axes and label them.

3. Plot each data value as a dot above the scale at its approximate location.

Note: If more than one data value lies at about the same axis location, the dots are stacked vertically.

Ch

apter 3

Dot Plot

Page 64: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-64

• The range is from 7 to 59.• All but a few data values lie between 10 and 25.• A typical “middle” data value would be around 17 or 18.• The data are not symmetric due to a few large P/E ratios.

Ch

apter 3

Dot Plot: Example

Caution: Dot plots work best for integers and small samples. Avoid dot plots if n is large or if you have decimal data.

Page 65: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-65

Bins and Bin Limits

• A frequency distribution is a table formed by classifying n data values into k classes (bins).

• Bin limits define the values to be included in each bin. Widths must all be the same except when we have open-ended bins.

• Frequencies are the number of observations within each bin.

• Often expressed as relative frequencies (frequency divided by the total) or percentages (relative frequency times 100).

Ch

apter 3

Frequency Distributions

Page 66: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-66

Herbert Sturges proposed adding binsat a declining rate as n increases:k = 1 + log2(n) or k = 1 + 3.3log10(n)

What is the ideal number of bins (k) to classify n data values?

Ch

apter 3

How Many Bins?

The Excel formula for k is =1+log(n)/log(2). Add one bin when n doubles. This is only a guideline. Use more or fewer bins to make “nice” bin limits.

Page 67: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-67

Ch

apter 3

Example: n = 44 P/E ratios:Sturges suggests: k = 1 + 3.3log10(n)

k = 1 + 3.3log10(44)

k = 1 + 3.3(1.64345)k = 6.42

so 6 or 7 bins seems reasonable

Page 68: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-68

Consider 3 histograms for the P/E ratio data with different bin widths. In what ways do they differ? In what ways are they similar?

Ch

apter 3

Histograms

A histogram is a bar chart whose Y-axis shows the frequency within each bin, and whose X-axis ticks show end points of each bin.

.

Page 69: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-69

Ch

apter 3

ShapePrototype distribution shapes

Page 70: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-70

Ch

apter 3

• A frequency polygon connects midpoints of the histogram intervals, with extra intervals at the beginning and end so that the line will touch the X-axis. Attractive when you need to compare data sets (since more than one polygon can be plotted on the same scale).

• An ogive is a line graph of the cumulative frequencies. It is useful for finding percentiles or in comparing the shape of the sample with a known benchmark such as the normal distribution.

Frequency Polygons and Ogives

Examples for P/E Data Using 6 Bins

Page 71: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-71

Ch

apter 3

Frequency Polygons and Ogives

Examples Using 11 Bins

Page 72: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-72

Scatter plots can convey patterns in (x, y) data pairs that would not be apparent from a table.

Ch

apter 3

Scatter Plots

Page 73: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-73

Example: Miles per gallon vs weight for 93 cars.

Ch

apter 3

Scatter Plots

Page 74: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-74

Ch

apter 3

Effective Tables

Tips for effective tables:

1. Keep the table simple, consistent with its purpose.2. Put summary tables in the main body of the written report.3. Put detailed tables in an appendix (or insert a hyperlink). 4. Display the data to be compared in columns rather than rows. 5. For presentation, round off to three or four significant digits.6. Physical table layout should guide the eye toward the comparison

you wish to emphasize. 7. Row and column headings should be simple yet descriptive.8. Within a column, use a consistent number of decimal digits.

Page 75: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-75

Log Scales• Arithmetic scale – distances on the Y-axis are proportional to the

magnitude of the variable being displayed.

• Logarithmic scale – (ratio scale) equal distances represent equal ratios.

• Use a log scale for the vertical axis when data vary over a wide range, say, by more than an order of magnitude. This will reveal more detail for smaller data values.

Ch

apter 3

Line Charts

Page 76: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-76

Log ScalesA log scale is useful for time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, the national debt, or yourfuture income). It reveals whether the quantity is growing at an

increasing percent (concave upward), or constant percent (straight line), or declining percent (concave downward)

Ch

apter 3

Line Charts

both growing at a constant percent?

Page 77: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-77

Error 1: Dramatic Title and Distracting PicturesError 2: Elastic Graph ProportionsError 3: Dramatic Title and Distracting PicturesError 4: 3D and Novelty GraphsError 5: Rotated GraphsError 6: Unclear Definitions or ScalesError 7: Vague SourcesError 8: Complex GraphsError 9: Gratuitous EffectsError 10: Estimated DataError 11: Area Trick

Ch

apter 3

Deceptive Graphs

Page 78: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-78

A nonzero origin will exaggerate the trend.

Deceptive Objective

Error 1: Nonzero Origin

Ch

apter 3

Deceptive Graphs

Page 79: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-79

• 3D is acceptable (e.g., 3D column) but harder to read data values.

• Avoid novelty charts (e.g., pyramid). They distort the data.

Error 4: 3-D and Novelty Graphs

Ch

apter 3

Deceptive Graphs

Page 80: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-80

Trends may appear to dwindle into the distance or loom towards you. Harder to read data values. Label each data value if there is room.

Error 5: 3-D and Rotated Graphs

Ch

apter 3

Deceptive Graphs

Page 81: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

3-81

• Keep your main objective in mind. • Break graph into smaller parts if necessary.• Use clear labels and descriptive titles.

Error 8: Complex Graphs

Ch

apter 3

Deceptive Graphs

Page 82: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-82

Assignments ML 1.6

• Connect C-1 (covers chapters 2-3)• You get three tries• Connect gives you feedback• Printable if you wish• Deadline is midnight each Monday

• Project P-1 (data, tasks, questions)• Review instructions• Look at the data• Your task is to write a nice, readable report (not a spreadsheet)• Paste Excel graphs and tables into your Word document• Length is up to you

Page 83: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-83

Projects: General Instructions

General Instructions

For each team project, submit a short (5-10 page) report (using Microsoft Word or equivalent) that answers the questions posed. Strive for effective writing (see textbook Appendix I). Creativity and initiative will be rewarded. Avoid careless spelling and grammar. Paste graphs and computer tables or output into your written report. It may be easier to format tables in Excel and then use Paste Special > Picture to avoid weird formatting and permit sizing within Word. Allocate tasks among team members as you see fit, but all should review and proofread the report (submit only one report).

Page 84: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-84

Project P-1Random teams are assigned on Moodle (submit only one report). Data: Download from Moodle or from the instructor’s web page. Your team is assigned one crime category (but you can change it if you wish). Copy the city names and the chosen crime data column to a new spreadsheet. Delete lines (if any) with missing data. Analysis: (a) Sort the observations (with city names). (b) List the top 10 and bottom 10 data values (with city names). (c) For the entire data set, calculate the mean and median. What do they tell you about center? Would the mode be helpful for this type of data? Explain. (d) Calculate the standard deviation. (e) Calculate the standardized z-value for each observation. (f) Are there outliers or unusual data values (see p. 137)? Discuss. (g) Use MegaStat (or Minitab or Excel) to make a histogram. Describe its shape. (h) Calculate the quartiles. Make a boxplot and describe it. (i) Make a scatter plot of your kind of crime versus a different type of crime. What does it show? (j) Ambitious students: Sort the database in random order (see bottom of page 36) using Excel’s function =RAND(). Copy and paste the first few sorted lines into your report to illustrate your sorting method. Comment on anything unusual (or interesting things that you might find on the web).

Watch the video walkthrough using Voting, North Carolina Births, and CEO compensation as examples (posted on Moodle)

Page 85: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-85

Project P-1your 2010 data will look like this (2005 and 2000 are also available)

Crime Rates in U.S. Metropolitan Areas, 2010 (n = 365)

Metropolitan Statistical Area All Violent Murder Rape Robbery Assault All Property Burglary Larceny Car Theft DefinitionsAbilene, TX M.S.A. 423.0 3.1 48.9 72.7 298.3 3617.3 1009.0 2459.8 148.5 Violent crimeAkron, OH M.S.A. 304.7 3.7 40.9 105.1 155.0 3185.6 947.7 2074.5 163.3 Murder and nonnegligent manslaughterAlbany, GA M.S.A. 566.0 8.7 24.9 150.4 382.1 4512.6 1417.8 2803.4 291.4 Forcible rapeAlbany-Schenectady-Troy, NY M.S.A. 310.4 1.5 21.0 98.5 189.4 2693.6 512.1 2076.2 105.4 RobberyAlbuquerque, NM M.S.A. 670.4 5.8 44.8 124.3 495.6 3896.1 920.6 2586.2 389.4 Aggravated assaultAlexandria, LA M.S.A. 638.0 5.8 23.1 132.3 476.7 4592.9 1203.3 3176.3 213.3Allentown-Bethlehem-Easton, PA-NJ M.S.A. 228.2 3.5 20.3 93.6 110.9 2298.0 432.2 1758.1 107.7 Property crimeAltoona, PA M.S.A. 243.6 0.8 38.0 49.8 155.0 1811.7 425.4 1318.2 68.0 BurglaryAmarillo, TX M.S.A. 513.1 5.7 40.8 98.9 367.8 4812.7 1137.2 3390.5 285.0 Larceny-theftAmes, IA M.S.A. 299.5 1.1 41.7 12.4 244.4 2528.1 478.6 1966.1 83.3 Motor vehicle theftAnchorage, AK M.S.A. 812.9 4.2 85.9 148.5 574.4 3506.3 416.1 2813.4 276.8Anderson, IN M.S.A. 205.8 2.3 33.4 70.6 99.5 3353.8 848.1 2294.6 211.1Anderson, SC M.S.A. 586.0 5.3 36.4 75.9 468.4 4707.8 1297.6 3041.7 368.4Ann Arbor, MI M.S.A. 338.5 1.4 43.2 69.8 224.0 2713.7 659.7 1879.5 174.4Appleton, WI M.S.A. 155.8 0.0 21.4 13.8 120.5 2136.7 378.5 1708.2 50.0Asheville, NC M.S.A. 229.7 1.9 21.8 59.9 146.1 2454.9 749.6 1534.9 170.3Athens-Clarke County, GA M.S.A. 374.9 4.2 19.6 70.5 280.5 3843.7 1018.0 2588.1 237.5Atlanta-Sandy Springs-Marietta, GA M.S.A. 413.8 6.1 20.9 149.7 237.1 3462.6 957.0 2135.7 370.0Atlantic City-Hammonton, NJ M.S.A. 529.8 8.0 18.9 245.5 257.5 3550.3 741.5 2685.7 123.1Augusta-Richmond County, GA-SC M.S.A. 412.9 10.2 37.4 156.6 208.7 4815.3 1355.1 3037.7 422.5Austin-Round Rock-San Marcos, TX M.S.A. 327.9 3.4 24.7 84.0 215.8 3792.0 754.3 2866.9 170.8Bakersfield-Delano, CA M.S.A. 593.0 9.0 19.9 148.4 415.7 3713.1 1148.0 1931.6 633.6Baltimore-Towson, MD M.S.A. 685.3 10.3 23.6 214.4 437.0 3090.7 649.5 2135.5 305.7Bangor, ME M.S.A. 68.4 2.0 12.6 27.2 26.6 3098.2 573.3 2429.3 95.7Barnstable Town, MA M.S.A. 434.6 0.5 36.1 57.6 340.3 2972.8 1116.6 1764.7 91.5Battle Creek, MI M.S.A. 697.6 4.5 75.3 109.6 508.3 3703.5 1145.6 2411.1 146.8Bay City, MI M.S.A. 335.2 0.9 78.1 50.8 205.2 2472.4 610.1 1776.6 85.7Beaumont-Port Arthur, TX M.S.A. 498.3 5.6 37.7 157.9 297.0 3865.3 1156.9 2488.4 220.1Bellingham, WA M.S.A. 267.0 2.5 44.7 50.6 169.1 3197.8 694.2 2372.7 130.8Bend, OR M.S.A.2 304.9 4.3 29.0 30.9 240.7 2973.7 497.5 2360.2 116.0

Property Crimes Per 100,000Violent Crimes Per 100,000

Page 86: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-86

Example: CEO Compensation

sorting is a good first step

Page 87: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-87

Example: CEO Compensation

Highlight all data (including the headings) and use Custom Sort

Page 88: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-88

Example: CEO Compensationnow you can clearly see the high and low data values (and comment on any weird data values)

Page 89: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-89

Example: CEO Compensation

use MegaStat’s Descriptive Statistics to get your basic stats along with a nice boxplot

Page 90: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-90

Example: CEO Compensation

use MegaStat’s Frequency Distributions to get a frequency table, histogram, etc

severely skewed

annotated by user

normal if logs used?

Page 91: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-91

Example: CEO Compensationstandardize the sorted list by subtracting the mean from each x value and then dividing by the standard deviation (or use =STANDARDIZE function)

Page 92: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-92

Example: CEO Compensationafter standardizing the sorted list, unusual z values can be seen

Page 93: Week 1 September 1-5 Six Mini-Lectures QMM 510 Fall 2014

0-93

Example: CEO Compensation

to randomize the list, paste values of =RAND() beside data and custom sort on =RAND()