m140: sampling, relationships and computer book plotting data · 2020. 11. 23. · random numbers...

Post on 17-May-2021

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

M140: Sampling, Relationships and

Plotting Data

Dr Jason Verrallj.verrall@open.ac.uk

07311 188800

This tutorial will begin at 10am and will last for approximately an hour.

This tutorial will be recorded. Please let me know if you have any questions or concerns about this.

Things you might need for this tutorial:• M140 Computer Book & Book 2• Pen, paper & calculator• Drink of your choice

Don’t forget to set up your audio using the Audio Wizard (in the ‘Meeting Menu’). Some headsets have independent volume controls so you may need to adjust these too.

You will also need to set up your mic if you plan on using it. Clicking the Mic symbol at the top of you Adobe Connect Window will toggle it on/off.

Not connected Connected and live Connected and muted

iCMA41 due on 2 December!

Good morning!

2

Mics will be muted until towards the end of the tutorial, when I will also stoprecording.Do use the Chat Box if you have a question during the tutorial!I will email slides out after the tutorial.

Tutorials are enhanced by your interactionPlease vote in the polls, ask questions and work through the exercises

Feel free to ask any questions or provide feedback by emailafterwards, or use the Private Chat function if you prefer during thetutorial

Sampling, Relationships and Plotting Data

• Minitab – generating lists of random numbers• Uniform, Normal distributions

• Sampling Methods• Simple random sampling - Minitab• Systematic random sampling• Stratified sampling – Minitab• Cluster sampling

• Exploring Relationships• Visual – scatterplots in Minitab• Least Squares Regression in Minitab

3

Computer Book has detailed instructions!

Generating Random Numbers

4

True or false?Computers cannot generate

true random numbers

Generating Random Numbers

• Computers generate pseudo-randomnumbers

• Given sufficient time, patterns will emerge and distribution will become different from true random

• This is bad for strong encryption

• You can force Minitab to generate the same ‘random’ numbers each time, by specifying a base or a seed value

• This is only useful if you want someone else to get the same random values as you

5

Generating Random Numbers

• Computers generate pseudo-randomnumbers

• Given sufficient time, patterns will emerge and distribution will become different from true random

• This is bad for strong encryption

• You can force Minitab to generate the same ‘random’ numbers each time, by specifying a base or a seed value

• This is only useful if you want someone else to get the same random values as you

• True random number sources• Random number tables• Dice or well-shuffled cards• Physical phenomena such as

radioactive decay• Lava lamps!

6

Generating Random Numbers

• Computers generate pseudo-randomnumbers

• Given sufficient time, patterns will emerge and distribution will become different from true random

• This is bad for strong encryption

• You can force Minitab to generate the same ‘random’ numbers each time, by specifying a base or a seed value

• This is only useful if you want someone else to get the same random values as you

• True random number sources• Random number tables• Dice or well-shuffled cards• Physical phenomena such as

radioactive decay• Lava lamps!

• For our purposes and most scientific uses, computer-generated numbers are fine

7

Random Numbers With Minitab 1

8

Which is the Uniformdistribution and which is the Normal distribution?

Random Numbers With Minitab 1

9

Uniform Normal or Gaussian

Every number has an equal chance of occurring, perfect for selecting samples

Only numbers which fit a Normal distribution are used, which are biased

towards the mean (0)

Random Numbers With Minitab 2

1. Create a column(s) to receive the random numbers

10

Random Numbers With Minitab 3

1. Create a column(s) to receive the random numbers

2. Select Calc -> Random Dataand select your distribution• Uniform for regular random

numbers

11

Random Numbers With Minitab 4

1. Create a column(s) to receive the random numbers

2. Select Calc -> Random Dataand select your distribution• Uniform for regular random

numbers

3. Specify the receiving column, number of rows and parameters

12

Random Numbers With Minitab 5

1. Create a column(s) to receive the random numbers

2. Select Calc -> Random Dataand select your distribution• Uniform for regular random

numbers3. Specify the receiving column,

number of rows and parameters

4. Format the receiving column if necessary, e.g. specify dp

13

Sampling Theory & Minitab Practice

14

How Much Is This Forest Worth?

15

• A farmer wants to know how much the trees are worth in his forest• We can’t measure every tree to determine its value… so how can we answer?

How Much Is This Forest Worth?

16

• A farmer wants to know how much the trees are worth in his forest• We can’t measure every tree to determine its value… so how can we answer?

Sampling!

How Much Is This Forest Worth?

17

How Much Is This Forest Worth?

18

How Much Is This Forest Worth?

19

Exploratory Data Analysis (EDA)Tally

• Minitab function to count different values in a column

• Numeric or nominal data

20

Exploratory Data Analysis (EDA)Tally

• Minitab function to count different values in a column

• Numeric or nominal data

21

Exploratory Data Analysis (EDA)Tally

• Minitab function to count different values in a column

• Numeric or nominal data

22

Exploratory Data Analysis (EDA)Tally

• Minitab function to count different values in a column

• Numeric or nominal data

• It’s tempting here to make a back-of-envelope estimation

• This will be very rough and not suitable for our purposes

• But does provide an indication

23

Exploratory Data Analysis (EDA)Graphical Summary

• Look at age initially to understand the spread

• Older trees should be larger and more valuable

24

25

What might account for the skew in ages?

26

Exploratory Data Analysis (EDA)Graphical Summary

• Look at age initially to understand the spread

• Older trees should be larger and more valuable

• Look at age by tree species• Are trees planted in rotation or in

groups?

27

Exploratory Data Analysis (EDA)Graphical Summary

• Look at age initially to understand the spread

• Older trees should be larger and more valuable

• Look at age by tree species• Are trees planted in rotation or in

groups? The list of available columns changes because nominal data is valid to

categorise a numeric variable

28

Species Median (yr) Range (yr)

Beech 55 22-80

Birch 20 15-23

Elm 80 0.9-98

Oak 102 80-150

Yew 124 110-150

Sampling Methods 1

• Simple Random Sampling• Select n at random from a list• With or without replacement

29

30

Sampling 1 - MinitabSimple Random Sampling

• Each member is equally likely to be sampled

• Sampling does not affect the chance of selecting any other sample

31

Sampling 1 - MinitabSimple Random Sampling

• Each member is equally likely to be sampled

• Sampling does not affect the chance of selecting any other sample

• Replacement• Without: complete independence• With: may select the same datum

multiple times but may be better for small datasets

32

Sampling 1 - MinitabSimple Random Sampling

1. Create a column to accept the sample list

33

Sampling 1 - MinitabSimple Random Sampling

1. Create a column to accept the sample list

2. Open the Sample From Columns dialogue box

34

Sampling 1 - MinitabSimple Random Sampling

1. Create a column to accept the sample list

2. Open the Sample From Columns dialogue box

3. Complete required fields

35

Sampling 1 - MinitabSimple Random Sampling

1. Create a column to accept the sample list

2. Open the Sample From Columns dialogue box

3. Complete required fields• From Column will be the serial

number or index of the tree to be measured

36

Sampling 1 - MinitabSimple Random Sampling

1. Create a column to accept the sample list

2. Open the Sample From Columns dialogue box

3. Complete required fields1. From Column will be the serial

number or index of the tree to be measured

4. Click OK

Sampling Methods 2

• Systematic Random Sampling• Select a random start• Then select every nth

• Often used in industrial processes

37

Sampling Methods 2

• Systematic Random Sampling• Select a random start• Then select every nth

• Often used in industrial processes• Can be more representative than

simple random sampling• Can be less representative if the

sampling list is structured or ordered

38

39

Sampling 2 - MinitabSystematic Sampling

• Sadly we can’t do this in Minitab!• Paper, Excel or another spreadsheet

is easy

40

Sampling 2 - MinitabSystematic Sampling

1. Calculate the sampling interval:• Interval = Population size

Sample size

41

Sampling 2 - MinitabSystematic Sampling

1. Calculate the sampling interval:• Interval = Population size

Sample size

2. Select a random number as the first sample datum• Use a table or generate a Uniform

Distribution random number list

42

Sampling 2 - MinitabSystematic Sampling

1. Calculate the sampling interval:• Interval = Population size

Sample size

2. Select a random number as the first sample datum• Use a table or generate a Uniform

Distribution random number list

3. Iteratively add the interval to the prior sample index until n is reached

43

Sampling Methods 3Stratified Sampling

• There are different methods for selecting stratum size

• Distribution-matched – reflects the composition of the population (A)

• Equal size – approximately same number of members in each stratum

• Select stratum members randomly

Species Tally Percent Stratum A

Stratum B

Beech 52 26% 7.8 = 8 6

Birch 66 33% 9.9 = 10 6

Elm 41 20.5% 6.15 = 6 6

Oak 36 18% 5.4 = 5 6

Yew 5 2.5% 0.75 = 1 6

44

Sampling Methods 3Stratified Sampling

• There are different methods for selecting stratum size

• Distribution-matched – reflects the composition of the population (A)

• Equal size – approximately same number of members in each stratum

• Select stratum members randomly• Can be more representative than

random sampling• Useful method if differences

between strata is important

Species Tally Percent Stratum A

Stratum B

Beech 52 26% 7.8 = 8 6

Birch 66 33% 9.9 = 10 6

Elm 41 20.5% 6.15 = 6 6

Oak 36 18% 5.4 = 5 6

Yew 5 2.5% 0.75 = 1 6

45

Sampling 3 - MinitabStratified Sampling

• There are different methods for selecting stratum size

• Distribution-matched – reflects the composition of the population (A)

• Equal size – approximately same number of members in each stratum

• Select stratum members randomly• Can be more representative than

random sampling• Useful method if differences

between strata is important

Species Tally Percent Stratum A

Stratum B

Beech 52 26% 7.8 = 8 6

Birch 66 33% 9.9 = 10 6

Elm 41 20.5% 6.15 = 6 6

Oak 36 18% 5.4 = 5 6

Yew 5 2.5% 0.75 = 1 6

Minitab will create a stratified sample but it is fiddly. See the end of the slide pack

for a Minitab Blog article and some screenshots.

46

Sampling Methods 4Cluster Sampling

• Geographic method, best suited to sampling from multiple locations

47

Sampling Methods 4Cluster Sampling

• Geographic method, best suited to sampling from multiple locations

• Use a random method to select a small number of locations

• Divide locations into clusters if needed

48

Sampling Methods 4Cluster Sampling

• Geographic method, best suited to sampling from multiple locations

• Use a random method to select a small number of locations

• Divide locations into clusters if needed

• Choose a subsample from each of these sample locations

• Randomly!

49

Sampling Methods 4Cluster Sampling

• Geographic method, best suited to sampling from multiple locations

• Use a random method to select a small number of locations

• Divide locations into clusters if needed

• Choose a subsample from each of these sample locations

• Randomly!

• Combine

Sampling Methods

1. Avoid the use of judgement or convenience to select samples2. Use a good source of random values

1. Tables2. Computer3. Calculator4. Dice, well-shuffled deck of cards

3. Trade off between accuracy and sample size1. Sample size may be constrained e.g. by cost, practicality, access etc.

50

More to come on sample sizes

Golden Rules

Relationships Between Variables 1

• Sometimes we have multiple variables in a system• Lab experiment, data analysis, machine learning, traffic survey.. Endless!

51

Relationships Between Variables 2

• Sometimes we have multiple variables in a system• Lab experiment, data analysis, machine learning, traffic survey.. Endless!

• Scientists are often interested in whether there are relationships between variables

• Why?

52

Why do we look for relationships between

variables?

Relationships Between Variables 3

• Sometimes we have multiple variables in a system• Lab experiment, data analysis, machine learning, traffic survey.. Endless!

• Scientists are often interested in whether there are relationships between variables

• Why?

• Here are a couple of tools to help explore multiple variables• Is there a relationship between variable A and variable B?• What kind of relationship?• How strong?• Can I use this to predict variable B behaviour?

53

Relationships Between Variables 4

• What relationship would you expect the following to have?• Positive or negative?

• Petrol price and miles driven• Salt intake and blood pressure• Number of completed Unit exercises and TMA scores• Price of an item and number of that item sold• Temperature and ice cream sales

54

Relationships Between Variables - Minitab

• Tool 1: Visual exploration – scatter plot

55

Relationships Between Variables - Minitab

• Tool 1: Visual exploration – scatter plot• Tool 2: Describing & predicting – least squares regression

56

Scatterplots with Minitab 1

57

How confident are you with using scatter plots?

58

1 41 21 08642

40000

30000

20000

1 0000

0

C1

C2

Scatterplot of C2 vs C1

1 41 21 08642

20

1 8

1 6

1 4

1 2

1 0

8

C1

C3

Scatterplot of C3 vs C1

1 0987654321

7000

6000

5000

4000

3000

2000

1 000

0

C4

C5

Scatterplot of C5 vs C4

1 0987654321

20

1 5

1 0

5

0

C4C6

Scatterplot of C6 vs C4

Explanatory or predictor Explanatory or predictor

Explanatory or predictor Explanatory or predictor

Resp

onse

Resp

onse

59

1 41 21 08642

40000

30000

20000

1 0000

0

C1

C2

Scatterplot of C2 vs C1

1 41 21 08642

20

1 8

1 6

1 4

1 2

1 0

8

C1

C3

Scatterplot of C3 vs C1

1 0987654321

7000

6000

5000

4000

3000

2000

1 000

0

C4

C5

Scatterplot of C5 vs C4

1 0987654321

20

1 5

1 0

5

0

C4C6

Scatterplot of C6 vs C4

Response and Explanatory Variables 4

• Are TMA01 scores related to the total amount of time spent studying the course in weeks 1- 6?

Which is the response variable?

Response and Explanatory Variables 4

• Are TMA01 scores related to the total amount of time spent studying the course in weeks 1- 6?

• Explanatory variable: Time spent studying • Response variable: TMA01 scores

Scatterplots with Minitab 1

1. Select Graph -> Scatterplot…

62

Scatterplots with Minitab 2

1. Select Graph -> Scatterplot…2. Select Simple

63

Scatterplots with Minitab 3

1. Select Graph -> Scatterplot…2. Select Simple3. Select your X and Y variables

• Explanatory or Predictor on X• Response on Y

64

65

Line of Best Fit 1

• Sometimes a line can be fitted to a scatterplot, to help explain data more easily

• This line can also be used as a prediction tool• Machine learning!

• But which line has the best fit?

x

xxx

x

x

xx x

x x

ab

c

Line of Best Fit 2

• Graph of achievement in maths against reading, the units are the average scores for 15 year olds, by country (pisa.mtw)

• Where would you draw the regression line?

67

Regression 1

• A regression model is systematically fitted to every data point • Different methods are used to calculate the distance from many

theoretical lines to each point• Residuals

• The line with the smallest total residuals is selected

• Here we use a linear regression model and the least squares fitting method

68

Regression 2

• Any straight line can be expressed as:• 𝑦𝑦 = 𝑚𝑚𝑚𝑚 + 𝐶𝐶

• 𝑚𝑚 is the gradient or slope of the line• 𝐶𝐶 is the intercept on the vertical axis

69

Regression With Minitab 1

1. Select Fit Regression Model

70

Regression With Minitab 2

1. Select Fit Regression Model2. Select the variables

• Predictor = X axis• Response = Y axis

71

Regression With Minitab 3

1. Select Fit Regression Model2. Select the variables

• Predictor = X axis• Response = Y axis

3. Click OK

72

Regression With Minitab 4

1. Select Fit Regression Model2. Select the variables

• Predictor = X axis• Response = Y axis

3. Click OK4. Here is our 𝑦𝑦 = 𝑚𝑚𝑚𝑚 + 𝐶𝐶

73

Regression With Minitab 5

74

Why might this be a poor

prediction tool in some cases?

Regression With Minitab 5

75

Why might this be a poor

prediction tool in some cases?

Negative intercept suggests anything shorter than 3m has a negative value.

Regression With Minitab 6

Adding a regression line1. Select Graph -> Scatterplot2. Select With Regression

76

Regression With Minitab 7

Adding a regression line1. Select Graph -> Scatterplot2. Select With Regression3. Choose the X and Y variables

77

Regression With Minitab 8

78

Regression With Minitab 8

Residuals1. Select Scatterplot2. Select X and Y variables3. Select Graphs…

79

Regression With Minitab 9

Residuals1. Select Scatterplot2. Select X and Y variables3. Select Graphs…4. Select parameters as shown

80

Regression With Minitab 10

81

OU Resources• M140 materials online

• Course Books & Screencasts• https://learn2.open.ac.uk/course/view.php?id=2

08584&area=resources

• M140 student forums• OU Library e-books

• https://pmt-eu.hosted.exlibrisgroup.com/permalink/f/gvehrt/TN_cdi_askewsholts_vlebooks_9781846281686

• https://pmt-eu.hosted.exlibrisgroup.com/permalink/f/h21g24/44OPN_ALMA_DS51131243990002316

• Contact me:• j.verrall@open.ac.uk• 07311 188 800

Online Resources• Wikipedia• CrossValidated

• https://stats.stackexchange.com/

• Minitab channel on YouTube:• https://www.youtube.com/user/MinitabInc

• Minitab help• https://support.minitab.com/en-us/minitab/19/

82

Thank you! Any questions?

Recording will be available from M140-20J Online Tutorial Roomhttps://learn2.open.ac.uk/mod/connecthosted/view.php?id=1644077&group=274133

83

Sampling With MinitabStratified Sampling

1. Split Worksheet by tree species

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

84

Sampling With MinitabStratified Sampling

1. Split Worksheet by tree species2. Create a random sample on

each new worksheet for the stratum size, using same destination column

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

85

Sampling With MinitabStratified Sampling

1. Split Worksheet by tree species2. Create a random sample on

each new worksheet for the stratum size, using same destination column

3. Stack all the sub-sheets

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

86

Sampling With MinitabStratified Sampling

1. Split Worksheet by tree species2. Create a random sample on

each new worksheet for the stratum size, using same destination column

3. Stack all the sub-sheets4. Copy the stratified sample

column to a new worksheet using Subset the Data

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

87

Sampling With MinitabStratified Sampling

1. Split Worksheet by tree species2. Create a random sample on

each new worksheet for the stratum size, using same destination column

3. Stack all the sub-sheets4. Copy the stratified sample

column to a new worksheet using Subset the Data

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

88

Sampling With MinitabStratified Sampling

5. Set this condition

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

89

Sampling With MinitabStratified Sampling

5. Set this condition6. Sample will appear in new sheet

https://blog.minitab.com/blog/statistics-and-quality-improvement/taking-a-stratified-sample-in-minitab-statistical-software

top related