unit 6 data management math 421a 15 hours - gov.pe.ca 6: data management ... study newspapers,...

23
100 UNIT 6 DATA MANAGEMENT MATH 421A 15 HOURS Revised June 1, 00

Upload: phamdan

Post on 16-Mar-2018

213 views

Category:

Documents


1 download

TRANSCRIPT

100

UNIT 6

DATA MANAGEMENT

MATH 421A

15 HOURS

Revised June 1, 00

101

UNIT 6: Data Management

Previous Knowledge

With the implementation of APEF Mathematics at the Intermediate level, students should be ableto:

- Grade 7- distinguish between biased and unbiased sampling

- select appropriate data collection methods

- construct a histogram

- read and make inferences for data displays

- determine measures of central tendency

- create and solve problems using the numerical definition of probability

- identify all possible outcomes of two independent events

- Grade 8- develop and apply the concept of randomness

- construct and interpret box and whisker plots

- determine the effect of variations in data on the mean, median and mode

- Grade 9- determine probabilities involving dependent and independent events

- determine theoretical probabilities of compound events

Overview:

- sampling techniques and Bias

- measures of Central Tendency and 50% Box Plots

- 90% Box Plots and Applications

- Probability and Applications (Expected Values)

102

SCO: By the end of grade10 students will beexpected to:

F1 design and conduct experiments using statistical methods and scientific inquiry

F2 demonstrate an under- standing of concerns and issues that pertain to the collection of data

F12 draw inferences about a population/sample and any bias that canbe identified

F14 demonstrate an under- standing of how the size of a sample affects the variation in sample results

G5 develop an understanding of sampling variability

Elaborations - Instructional Strategies/SuggestionsSampling Techniques (8.1)Invite student groups to explore the following questions:“If you want to know what percent of high school students on PEI knowthe capitals of the Canadian provinces, how would you do this and whowould you ask? Would the results represent the views of the entiregrade 10 population?Class discussion might touch on these topics:What does the term “population” mean?Is it reasonable to survey the entire population?If the response is no, then how do we select a representative sample tobe surveyed?Concept of Bias should be introduced at this point.Bias is some influence that prevents the sample from beingrepresentative of the entire population.Challenge student groups to determine possible ways to select a biasedsample.(ex. Sample selected could be only grade 12 Canadian Studiesclasses)Invite students to explore ways of selecting an unbiased sample.Students should read pp.365-367 in Math Power 10. Probability sampling< simple random < every member of the population has an equal chance

of being selected. Ex. All students’ names are put in a hat and 30 are

selected < systematic < every nth member of a population is selected

Ex: If the school population is 630 and you want toselect a sample of 30 students, 630 ÷ 30 = 21.Therefore in an alphabetical student list select every21st student.

<stratified < the population is divided into groups, or strata, from which random samples are taken.Ex: School is divided into grades and you want 30people. Randomly pick 10 people from each grade.

<cluster < choose a random sample from one group within apopulation.Ex: School is subdivided by classes. A class ischosen randomly and all members are selected.

Non-Probability sampling (not random)<convenience < no thought or effort has been put into selecting the

sample. It is designed to be convenient for thesampler.Ex: Samplers survey their friends at the cafeteriatable.

103

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

Sampling Techniques (8.1)Journal/Pencil/PaperA survey result indicates that “ .. most Canadians feel that theSenate is a waste of tax-payers’ money.” What are some ofthe questions you should ask about this survey?( who was surveyed- was it random across Canada? ; Whatage groups were surveyed? ; What socio-economic groupswere surveyed?)

Pencil/PaperIdentify the population you would sample for an opinion oneach topic:a) minimum driving ageb) student parking spacesc) fees for athletic teamsd) cafeteria food

Pencil/PaperYou intend to survey the school population to determinewhether the students would attend another dance this month.Describe a sampling method for each sampling technique:a) systematicb) conveniencec) simple randomd) stratified

PresentationBring an example of a recent survey in a newspaper ormagazine to class and discuss the validity of the survey. Wasthere bias in the survey question(s)? What sampling methoddo you think was used?

ProjectTry to find out what company does the surveys during theelection campaign and ask questions relating to bias andsampling methods.

Sampling Techniques

Mathpower 10 p.368 # 1,6,11,14,17, 21,24

104

SCO: By the end of grade10 students will beexpected to:

F12 draw inferences about a population/sample and any bias that canbe identified

G2 design “yes/no” type questions

F4 construct various displays of data

Elaborations - Instructional Strategies/SuggestionsSampling Techniques (cont’d) (8.1)< Volunteers < members of a population choose to participate in a

survey.Ex: Interested students volunteer to participate (mail-in or phone-in surveys fall under this category)

Various Types of Bias (8.2)< Selection (Sampling) Bias

This is the type of bias created by faulty sample selection thisgenerally would not happen in probability sampling procedures.

< Response BiasThis bias is created by faulty question or survey construction. In other words the wording of the question influences theresponse. This can occur in all sampling techniques.Ex: In the question “Is it really fair that young people are notallowed to drive until they are 16?” the phrase “really fair”shows a bias in the question.

< Non-Response BiasThis bias is created when a large number of people do notcomplete a survey.Ex: Mail out questionnaires commonly have a poor response. People do not mail them back, therefore, a bias is createdbecause inferences are made on sketchy results.

Measures of Central TendencyGenerate discussion to see what students’ current knowledge is onmean, median and mode.

Mean ( ) < The arithmetic average.

Median < The middle number. Once the list is in ascending order,the median is the middle value. If there is an evennumber of values, the median is the average of themiddle two. Half of the data is below the median andhalf of the data is above the median.Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Median = 7)

Ex: 1, 2, 2, 3, 5, 5, 6, 6, 6, 7, 8, 9 (Median = 5.5)

Mode < The most frequently occurring numberEx: In the first list above the mode = 8 and in thesecond list mode = 6.

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

105

Various Types of Bias (8.2)JournalIn a short paragraph describe in your own words the types ofbias that can occur and give an example of each.

Group ActivityStudy newspapers, magazines, TV commercials, etc. Find asmany statements as possible that you feel are biased. Identifyeach one as a response, non-response, or selection bias.

ProjectContact a polling company and ask for copies of the questionsused to survey political party popularity during the lastelection. Study the questions for any bias and determine themethod of sampling.

Measures of Central TendencyPencil/paper(See p.112 for explanation on constructing boxplots) Each student in the class picks a number from 1 to 10. Writethe data from the entire class on the board and find the mean,median and mode. Draw a 50% box plot.

Pencil/Paper/EstimationA random generator(TI-83) is used to generate 20 numbersfrom 1 to 100. Estimate the mean, median and mode from thedata below. Calculate the mean, median and mode and relatethese to your estimates. Draw a 50% box plot.

55 100 91 95 46 75 94 17 19 5372 71 24 75 80 24 98 6 77 19

Pencil/PaperListed below are the heights, in centimetres, of 35competitors in an Olympics event. Examine the data todetermine the spread (range) of the data, where the data wascentred, and if any extreme heights existed. Construct a 50%box plot on the data below.

190 192 175 180 189 184 184 187 178175 195 185 183 187 185 182 184 195180 187 183 185 198 181 185 180 189185 167 184 188 183 185 189 175

Various Types of Bias

Mathpower 10 p.372 #1-13

Measures of Central Tendency

Note to teachers:To use the TI-83 as a random numbergenerator.

Math <<<< PRB 5:randInt(

generates numbers from 1 to 100 ingroups of 20.

106

SCO: By the end of grade10 students will beexpected to:

F5 calculate various statistics using appropriate technology, analyze and interpret displays and describe the relationships

G4 interpret and report on the results obtained from surveys andpolls, and fromexperiments

Elaborations - Instructional Strategies/SuggestionsMeasures of Central Tendency (cont’d)

For Box Plots we must look at the data in quarters or quartiles. Q1 (first quartile) < the first quartile is the mid-value of the first half of

the data (ie. up and not including the median).Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q1 = 4)

Q3 (third quartile)< the third quartile is the mid-value of the secondhalf of the data (ie. after the median).

Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10 (Q3 = 8)

Once we have determined the Median and the quartiles we can thenplot this data in a Box Plot. A box plot has 50% of the values insidethe box and the left whisker represents the first quarter of the data andthe right whisker represents the fourth quarter of the data.

Ex: 1, 3, 4, 4, 6, 7, 8, 8, 8, 9, 10Now we have Q1= 4, Median = 7, and Q3 = 8For this example we will use a number line from 1 to 10 with a scale of1.

In general, more valid inferences can be made when the measures ofcentral tendency are all close together. The more they are dispersedthe less valid the inferences.

107

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

Measures of Central TendencyPencil/PaperThe results of an experiment to determine the effect oftemperature on the speed of sound in air consisted of takingnine measurements at 100 C and nine taken at 220 C. The datais displayed below.

a)draw a 50% box plot for each set of datab) What is the median speed at the lower temperature? At the higher temperature?c) Between what two speeds do 50% of the data lie for each plot.d) From your results, what do you think is the effect of an increase in temperature on the speed of sound in air?

Pencil/PaperA survey of weekly television viewing time of 25 female and26 male teenagers produced the following data.

a) Find the measures of central tendency (mean, median and mode)b) What type of sampling technique would you assume was used?c) What types of conclusions can you make about the survey?

Measures of Central Tendency

108

SCO: By the end of grade10 students will beexpected to:

F4 construct various displays of data

F26 construct, interpret and apply 90% box plots

F30 organize and display information in many different ways with and without technology

Elaborations - Instructional Strategies/SuggestionsMeasures of Central Tendency (cont’d)Re-doing the previous example using the TI-83:

Stat 1:Edit clear all lists, then enter the data in L1

If the data must be arranged in ascending order press Stat 2:Sort A(L1) where A is ascending

To graph a 50% box plot:

2nd Stat Plot 1:Plot 1 and having the following settings

the 4th graph choice doesn’t connect the outliers to the box while the 5th

choice of graph does. Typically we will be using this 5th choice. It is abox and whiskers plot with outliers.

to graph set the appropriate window dimensions or press zoom 9:zoomstat

press trace and see the minimum, Q1, the median, Q3 and the maximum

by cursoring across the box plot.

109

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

Measures of Central TendencyPencil/Paper/TechnologyA teacher has the following results in percent in a class test.76, 43, 56, 74, 96, 89, 55, 66, 49, 80, 85, 93, 95, 77, 96, 70,98, 46, 78, 55, 76, 95, 95, 96, 52, 98, 73, 95, 81, 96, 59, 94,44, 92, 96. Sort the data in ascending order. And draw a 50%box and whiskers plot. SolutionEnter the data in the TI-83. Sort the data Stat 2:Sort A(L1).Graph the data on the TI-83. To see the graph, set the windowdimensions by pressing zoom 9:stat

To see the mean, minimum, Q1, median, Q3 and the maximumpress Stat <<<< Calc 1:1-var Stats enter and scrolling down LLooking at the sorted data determine the mode. What inferences can be made from the graph?Half the class has a mark over the median 80, and 1/4 over Q395 . Because the median is 80 and we see a short upperwhisker then a lot of the class is very high. The lower whiskeris long which means that there are a few really low studentsdragging the mean down. Note to teacher: If the box is really short then the middle 50%have marks very close together. If the box is long then there isa large range of marks in the middle 50% of students.Communication/Journal.Make inferences about the following box plot.

The median is skewed around 85% with a short upper whiskerand therefore a lot of marks there. The range of the upper halfis very small thus the upper half of the class have marks veryclose together. The lower half have a greater range and thus agreater dispersion of marks. Marks in upper half are highbecause median is 85%.

Measures of Central Tendency

110

# marked 8 9 10 11 12 13 14 15 16 17 18 19

Frequency 1 2 6 2 14 11 21 17 15 8 2 1

SCO: By the end of grade10 students will beexpected to:

F26 construct, interpret and apply 90% box plots

Elaboration - Instructional Strategies/Suggestions90% Box PlotsBinomial Population < A population that has two possible outcomes.

In other words, in response to a question theanswer is either YES or NO.Ex: Toss of a coinEx: Did you pass your test?Ex: Are you a band student?

90% Box Plots combine results of many small samples of thepopulation. These box plots then allow us to make inferences on thepopulation as a whole or backwards from population to sample. Thebox plots given are for sample sizes 20, 40, and 100.

Ex: In a school of 1000 students, a sample of 20 students is surveyed. This procedure is repeated 100 times and each time the 20 students arerandomly chosen. (Not necessarily the same students). This gives usthe data to create a 90% Box Plot for sample size 20.

In the above example, assume the population is known be 70% enrolledin the English Program and 30% in French Immersion. Whenconducting a survey (as explained in the above paragraph) the followingdata is obtained and placed in a frequency table.

In a 90% Box Plot, 10% of the values are contained in the two whiskerstogether. Out of 100 trials, 10% would be 10. In the table above weneed to count frequencies from both ends until we are as close to 10 aspossible. Working our way in from both sides, the closest we get to 10is 12 which is obtained when using the first three columns on the leftand the last two columns on the right. The rest of the values arecontained in the box.Now would be a good time to show the students the entire 90% BoxPlot for sample size 20 table and let them realize that all this work hasgenerated only 1 of the box plots in this table. So instead of doing allthis work from now on use the tables provided.

In order to do the worthwhile tasks you will need to be able to read thebox plot tables. Instructions are given in Addison-Wesley 10 text p. 548and 556.

111

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

90% Box Plots (population to sample, sample topopulation)Group Activity/Paper/PencilDivide class into groups and have each group create a 90%box plot based on a different percent of marked items.Note to teachers: To generate the data using the TI-83 for asituation where 80% of the school population is enrolled inthe English Program: Math <<<< over to Prb 7:randBin( Random binomial)

(Sample size, probability, number of samples). In this samplethe sample size is 20, the probability is 80% and this isrepeated 100 times80% of 20 = 16 so we would expect out of every 20 peoplesurveyed 16 would be in the English Program. This programgenerates 100 numbers with this restriction but taking intoaccount the fact that there is some uncertainty in the samplingprocess. In the first 20 people you survey it might happen thatmost (or very few) of them are in the English Program so thatyou may not have exactly 16 out of 20 in the EnglishProgram. If enough groups of 20 students are surveyed theaverage should move closer to 16. For the following problems and those in the SuggestedResources use the Box Plot tables at the end of this unit.Pencil/Paper20% of the school population take Canadian Studies. In arandom sample of 20 students, what range of students mightbe taking Canadian Studies.

Pencil/PaperIf 34% of the student population regularly attends schooldances, is it likely that a random sample of 40 students wouldcontain 20 students who attend dances.

Pencil/PaperIn a random sample of 20 grade 10 students 7 said they have adriver’s license. Make an inference about the percent of grade10 students who have a driver’s license. ( ex. Math 10 p.556)

90% Box Plots

see worksheet at end of unit

ActivityEstimating the size of a wildlifepopulation Math 10 p.560Instructions for this activity are at theback of the unit.

Math 10 p.561 # 1, 3-5

Problem Solving StrategiesMath Power 10 p.397 #1,3,6

112

SCO: By the end of grade10 students will beexpected to:

G10 find probability given various conditions

Elaborations - Instructional Strategies/SuggestionsProbability (p.374)A simple way of introducing students to the study of probability is to doan activity like the following:Each card has a letter written on it

if the cards were placed in a hat, what is the chance (or probability) thatyou will draw (assume that after each draw the cards are replaced):a) a vowelb) a consonantc) an Ed) an X

Now challenge the students to come up with a definition of probability.Probability < The ratio of the number of favourable outcomes to the

total number of possible outcomes.P(outcome) is the probability of getting that outcome. For example,when rolling a die P(3) is the probability of rolling a 3 which equals .

Using a deck of 52 cards a person draws a jack. a) What are the chances of drawing a second jack if the first jack has

been replaced? ( ) This is an example an independent event. Anindependent event is when each event has an equal chance ofoccurring.

b) What are the chances of drawing a second jack if the first jack wasnot replaced. ( ) This is an example of a dependent event.

Expected Values (8.3)Have students play the game as described in example 2 p. 381. Studentsneed to keep track of the number of rolls needed to win. The tableincluded on p.110 at the end of this unit is to help students record theirresult with this activity.

When students have completed the activity, record the number of rolls ittook each student to win and then find the class mean (experimentalsolution).

Now go through the solution to the example to calculate the expectedvalue of each roll. Use the expected value to find the number of rollsexpected to win (theoretical solution).

113

Worthwhile Tasks for Instruction and/or Assessment Suggested Resources

Probability (p.374)Pencil/PaperA jack is drawn from a deck of 52 cards. a) What is the probability of drawing a second jack from thedeck if the first jack is replaced?b) What is the probability of drawing a second jack from thedeck if the first jack is not replaced?JournalHow do independent and dependent events differ?

Expected Values (8.3)Pencil/PaperIn a contest at a local coffee/donut store the prizes are asshown. What is the expected value for this contest?

= .94If you spend more than $.94 at the store then you will spendmore than you win on average.Pencil/PaperAt the Old Home Week Exhibition there is a game of chancewhere you toss 2 coins. If both come up heads you will win$4. If only one comes up heads you will win $1. If neithercomes up heads they pay you nothing. It costs $2 to play thisgame. Complete the table below to determine the expectedvalue for this game. Should you play this game

ProbabilityMathpower 10

Scrabble p.362#1, 2d,g, i-kRock, Scissors, Paper allp.374 #1 do any threep.375 #3 a-c use chart p.381

Expected ValuesMathpower 10 p.382 # 2-6,9,10,13

Math 10 p.575 # 1-6

JournalDesign a game where you will raisemoney for the school council duringthe winter carnival. (Make sure youdon’t lose money for the school butstill give participants a reasonablechance of winning.

114

# of rolls Sum Points Total

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

115

Estimating the size of a Wildlife Population(bi-nomial population:tagged or not tagged)

To estimate the number of animals in a species, wildlife biologists use a capture - recapture samplingtechnique. To simulate this process popcorn can be used. Have approximately 100 kernels in each zip-lock bag where 10 in each bag has been spray painted black. ( see Math 10 p.560 for detailedinstructions)Students do not know how many popcorn are in the bag that they have. Don’t allowthem to count them yet, that is done in step 8.

1. Place the unmarked popcorn(natural colour) in a styro-foam cup º this represents the population at large

2. Count the number of marked popcorn (black)º this is the number captured and released

3. Place the marked popcorn in the cup and mix the popcorn up.º this represents the release of the captured into the wild where they mix with the

rest of the population

4. Pick 40 popcorn from the cup(don’t look - this is the random sample)º this is the recapture

5. Count the number of marked popcorn º this represents the number of marked items in the sample

6. Use the chart (sample size 40) to determine the percentage range of marked items in the population For example, if there were 6 marked popcorn kernels, then by using the table we would get a percentage range of 8% to 26%.

7. Use the steps below to estimate the size of the entire population (the total number of kernels in thebag) Using the 8% to 26% range. We know that 10 kernels are marked so the total population could rangefrom 38 to 125. .08n = 10 .26n = 10 n = 125 n = 38

Therefore there is a 90% probability that there are between 38 and 125 popcorn (marked and unmarked)in your bag.

8. Count the total number of popcorn in your bag. Does your prediction fall in an acceptable range?

116

# marked 8 9 10 11 12 13 14 15 16 17 18 19

Frequency 1 2 6 2 14 11 21 17 15 8 2 1

# marked 8 9 10 11 12 13 14 15 16 17 18 19

Frequency 1 2 6 2 14 11 21 17 15 8 2 1

Construction of box plots

If we look at 50% box plots then 50% of the data (values) are contained in the box and the remaining50% are contained in the two whiskers combined.For our example, 50% of 100 trials is 50. In the frequency table below (from p.106) we must try to getthe two whiskers adding to as close to 50 as possible (can’t be less than 50)If we work inward from the outside columns in the table we see this development;

Column 1 2 3 4 5 6 7 8 9 10 11 12

Combining the values of columns 1 and 12 we get a value = 2Adding to the above total columns 2 and 11 we get = 6Adding to the above total columns 3 and 10 we get = 20Adding to the above total columns 4 and 9 we get = 37Now as we approach 50 (the total we want) we will probably only be able to add one extra column at atimeAdding to the above total column 5 we get = 51If we had chosen to add to the above total column 8 we would have gotten = 54

So we can see that the best result comes from adding column 5 last to get a total of 51.

Column 1 2 3 4 5 6 7 8 9 10 11 12

The same procedure of working from outside to inside is used for 90% box plots.

117

90% Box Plot Problems Population to Sample (given the % of population, find # possible in a sample)

1) 30% of students at Three Oaks take Physics. In a random sample of 20 students, estimate how many students could possibly be taking Physics.

2) At a certain school, 80% of the students take History. In a random sample of 40 students, estimate how many students might be taking History.

3)In the town of Montague 18% of people speak two languages. In a random sample of 100 residents, estimate how many people might speak two languages.

4) If 28% of 16 year-old people smoke, is it possible that a random sample of 40 people would contain 19 smokers?

5) The probability (chances) of correctly answering a true/false question is 50%. If you guess the answers, can you correctly guess 24 out of 40 questions correctly, 90% of the time?

6) The probability of guessing a multiple choice question (each question has 5 possible answers) is 20%. If you guess the answers, can you guess 15 out of 40 questions correctly 90% of the time?

Sample to Population (given the # possible in a sample, find the % of the population)

1) 8 out of 40 randomly selected grade 10 students say that they have a part-time job. Make an inference about the percent of grade 10 students that have a part-time job.

2) In Westisle High School a survey showed that 12 out of 20 randomly selected students come from a farm home. Use the box-plots to estimate the percent of students in Westisle who come from a farm background.

3) Bluefield has 900 students. A survey showed that 26 out of 40 students were bussed to school:a) make an inference about the percent of students who go to school by bus.b) use the answer from (a) to estimate how many students are bussed.

Project/Presentation

4) Design a one question yes/no survey about a topic of your choice. Conduct your survey with a random sample of 40 people. Use the results to make an inference about the percent of people who would answer yes on the survey question. Explain how you chose your random sample. Which method of sampling did you use? How were you able to eliminate bias in your question?

118

119

120

121

122