gcse statistics revision notes

14
STATISTICS 4040 GCSE Statistics Revision notes Collecting data Sample – This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic sampling, cluster sampling, Quota sampling Convenience sampling Random sample – Where each piece of data has an equal chance of being picked. Methods Random number table – Tables of random numbers can be use. Here is an extract from a table of random numbers 36015 37672 90153 67480 26237 10635 34269 01638 Split the numbers into to digit numbers 36 01 53 76 72 90 15 36 74 80 26 23 71 06 35 34 26 90 16 38 And then start from 36 and select numbers between 0 and 50 36 01 15 36 26 Leave out any numbers above 50 Calculator – Use the RAN button on your calculator. For numbers from 0 to 100 type 100 Shift RAN Until you have enough numbers. Numbers in a bag – List numbers from 1 to 100 and put them in a bag and select the appropriate number at random Stratified sample – Where the data sampled is in proportion to the population. Example- The table shows the number of students in a school Year Students 7 120 8 100 9 115 10 125 Total 460 A stratified sample of size 30 is to be taken. How many year 7’s will be picked? Solution Fraction of year 7 students in school is 120 240 Khodabocus Aihjaaz Ahmad

Upload: aihjaaz-a

Post on 18-Nov-2014

111 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Gcse Statistics Revision Notes

STATISTICS 4040

GCSE StatisticsRevision notes

Collecting dataSample – This is when data is collected from part of the population. There are different methods for sampling Random sampling, Stratified sampling, Systematic sampling, cluster sampling, Quota sampling Convenience samplingRandom sample – Where each piece of data has an equal chance of being picked.MethodsRandom number table – Tables of random numbers can be use.Here is an extract from a table of random numbers 36015 37672 90153 67480 26237 10635 34269 01638 Split the numbers into to digit numbers

36 01 53 76 72 90 15 36 74 80 26 23 71 06 35 34 26 90 16 38And then start from 36 and select numbers between 0 and 50 36 01 15 36 26 Leave out any numbers above 50Calculator – Use the RAN button on your calculator. For numbers from 0 to 100 type 100 Shift RAN Until you have enough numbers.Numbers in a bag – List numbers from 1 to 100 and put them in a bag and select the appropriate number at random

Stratified sample – Where the data sampled is in proportion to the population. Example- The table shows the number of students in a school Year Students 7 120

8 100 9 115

10 125 Total 460A stratified sample of size 30 is to be taken. How many year 7’s will be picked?SolutionFraction of year 7 students in school is 120

240

120 x 30 = 7.82… Approx 8 year 7 students will be picked 240

Other Sample techniquesConvenience sample –the first so many pieces of data in the list are sampled.

(Quick but unlikely to be representativeQuota sample – The amount or quota of each group is given eg 100 woman were sampledSystematic sampling – Data is chosen at regular intervals eg Every 10th Person.Cluster sampling – The population is divided into groups (cluster) and then a group is chosen at random.

Khodabocus Aihjaaz Ahmad

Page 2: Gcse Statistics Revision Notes

STATISTICS 4040

Census – This is when all of the data in the population is taken. For example a census of the entire population of the UK is taken every 10 years.

Advantage Disadvantage Sample Cheaper Not completely representative

Less time consuming Possibly biased Less data to be analysed

Census Unbiased Time consuming Accurate Expensive

Takes account of Difficult to ensure whole Whole population Whole population is surveyedTypes of Data

- Secondary data – This is data that has been collected by someone else.Advantage- No need to collect. Ready to analyseDisadvantage – Could be unreliable

- Primary data - This is data collected by the person doing the analysisAdvantage - Should be reliableDisadvantage – Collecting is time consuming

Continuous Data – This is data that is on a continuous scale (Lengths, height, weights, measurements)Discrete Data - This data that consists of separate numbers. (Shoe sizes, number of people, money)

Quantitative data – This is data that has numerical values. (Time , heights, weights , number of people)Qualitative data - This is data that is not numerical (Colour, type , )

QuestionnairesOpen questions – Has no suggested answers and gives people chance to reply as they wish Advantage –Allows for a range of answers

Disadvantage – Range of response too broad- hard to analyseClosed questions – Gives a set of answer for the person to choose from Advantage – Restricts response making it easy to analyse responses Disadvantage – Will not necessarily cover all responses

Pilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of Data collection/ questionnaire and data required is suitable for the bigger survey.Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”

Other sampling methods See page 16

QuestionnairesOpen questions – Has no suggested answers and gives people chance to reply as they wishClosed questions – Gives a set of answer for the person to choose fromPilot survey (pre-test) – Small scale replica of the survey to be carried out. Used to ensure that the method of Data collection and data required is suitable for the bigger survey.Leading question – Avoid questions that infer an opinion such as “Smoking is bad for you. Do you agree?”

Khodabocus Aihjaaz Ahmad

Page 3: Gcse Statistics Revision Notes

STATISTICS 4040

CalculationsMeans from frequency distributionsExample

Means from grouped dataFind the mid-point of each group and then multiply by frequency. Sum and then divide by total frequencyExample

Standard DeviationVariance is a measure of spread about the mean of a distribution of dataThe square root of the variance is the standard deviation Example 1

Example 2 If the data is grouped ( The mean for this example was found at the top of this page)

Khodabocus Aihjaaz Ahmad

Page 4: Gcse Statistics Revision Notes

STATISTICS 4040

Standardised ScoresThis is used to compare values from different sets of data. For example, how do you compare your score in a maths mock exam to your score in an English exam. Here’s how?Standardised score = score – mean Standard deviationExampleSam takes an exam in maths and another in English.His marks along with the mean marks for the year and the standard deviation are shown below

Normal DistributionStandard deviation is used to describe the normal distribution.The normal distribution appears when large amounts of data are collected such as heights of people.When put into a histogram the data will form a Bell shape as below.

Scatter Diagrams

Khodabocus Aihjaaz Ahmad

Page 5: Gcse Statistics Revision Notes

STATISTICS 4040

To find the equation of a line of best fit y = ax + b Where a is the gradient of the line and b is the intercept on the y axis.

Causal RelationshipWhen a change in one variable causes a change in another variable there is said to be a causal relationship between the two.For exampleThe size of a car engine and the amount of petrol the car uses.Sales of computers and sales of softwareNot a causal relationship -> Sales of chocolates and sales of clothes.

Spearmans Rank Spearman’s rank correlation coefficient is a numerical measure of the correlation between two sets of data.- 1 is a perfect negative correlation+ 1 is a perfect negative correlation 0 means no correlation

Geometric meanTo work out the geometric mean of n numbers, multiply the numbers together and then take the nth root of the productGeometric mean of 3 , 7, 4, 8Geometric mean = = 5.09 In percentage change problems the geometric mean tell us the average percentage change over a period of time.Index numbersAn index number shows the rate of change in quantity , value or price of an item over a period of time.Index number = quantity x 100 Quantity in base yearExample

Chain base index numbers

Khodabocus Aihjaaz Ahmad

Page 6: Gcse Statistics Revision Notes

STATISTICS 4040A chain base index number tells you the annual percentage change. It is found by using the previous year as the base year and then working out the relative value of an itemExample (Using data above for antique)

Weighted MeansIn a GCSE course 40% of the mark is for paper 1, 40% is for paper 2 10% is for coursework task 1 and 10% is for coursework task 2.If a student scores the following marks we can work out the weighted mean.Paper 1 62%Paper 2 38%Coursework 1 58%Coursework 2 29%Weighted mean = 40 x Paper 1 + 40 x paper 2 + 10 x coursework1 + 10 x coursework 2 40 + 40 + 10 + 10 = 40 x 62 + 40 x 38 + 10 x 58 + 10 x 29 = 49.7% 100

Time series and moving averagesA time series graph shows how values change over a a period of time (days, weeks , months, quarters of years)The moving average gives an idea of how the values are changingTo find the 3 point moving average or 23, 22, 24, 25, 26, 29, 28Average 23, 22, 24 then Average 22, 24, 25, then average 24, 25, 26 and so on.Once you have calculated the moving averages you will need to plot these. Then draw a line of best fit through the moving averages to get a trend line.

Quality assuranceThese are used in commercial productions. For example. A packet of crisps should have a weight of 50g. Samples of packets are taken a regular intervals and the mean weight calculated . Upper and lower warning and action limits are set. If the sample mean is above or below the warning limit another sample should be taken immediately. If the sample mean is above or below the action limit the production should be stopped and machines reset.

Khodabocus Aihjaaz Ahmad

Page 7: Gcse Statistics Revision Notes

STATISTICS 4040

Quality control chart for ranges.Samples are taken and the range found. If the range is too large then production should be stopped.

Charts and GraphsBox plots

Lowest value Lower quartile median Upper quartile Highest value

Outliers Any values 1.5 x IQR above the UQ or below the LQ are considered to be an outlierCumulative frequencyThe frequency of a distribution is accumulatedFor exampleMark Frequency Cumulative frequency0 - 1 4 41 -2 5 4+ 5 = 92- 3 2 4 + 5 + 2 = 113- 4 6 4 + 5 + 2 + 6 = 174- 5 2 4 + 5 + 2 + 6 + 2 = 195- 6 3 4 + 5 + 2 + 6 + 2 + 3 = 226- 7 1 4 + 5 + 2 + 6 + 2 + 3 + 1 = 23 The values of the cumulative frequency are then plotted at the top value of each group and connected either by straight lines or a curve

Khodabocus Aihjaaz Ahmad

Page 8: Gcse Statistics Revision Notes

STATISTICS 4040

Histograms The area of the bar represents the frequency and the height of the bar is the Frequency density

Frequency density = frequency Class width

Stem and Leaf diagramsThis is a chart to help order data.For example 68 , 72, 56, 52, 78, 53, 64, 73Can be represented in a stem and leaf diagram 5 2 3 6 6 4 8 7 2 3 8

Key 5 2 = 52

Comparative Pie chartsWhen comparing two sets of data using pie charts we need to take the total frequencies into account.The areas of the two circles should be in the same ratio as the two frequencies.The larger pie chart has the bigger frequency.

Khodabocus Aihjaaz Ahmad

Page 9: Gcse Statistics Revision Notes

STATISTICS 4040

Compound Bar Charts See page 53

Population pyramidsThis allows you to compare percentages of populations by age and gender.

Choropleth maps – Used to show population distributions

Khodabocus Aihjaaz Ahmad

Page 10: Gcse Statistics Revision Notes

STATISTICS 4040

Probability

OddsThe ratio failures : successes is the odds against an event happeningThe ratio successes : failures is the odds on an event happeningIf the odds are 7:2 against, what is the probability of successAnswer: There are 7 chances of failure to every success, thus for (7 + 2) = 9 attempts there will be 2 successesThe probability of a success is 2

9Mutually Exclusive events – Events that cannot happen at the same timeIndependent events – The probability of one event is not affected by the probability of another event.Exhaustive events – A set of events is exhaustive if the set contains all possible outcomes.Rules of probabilityP(a or b ) = P (a ) + P(b)P(a and b) = P(a ) x P(b)Tree diagramsWhen completing a tree diagram remember each pair of branches must add to make 1.As you travel along the branches to find possible outcomes you multiply the probabilities.If the is more than one possible out come sum them.

Khodabocus Aihjaaz Ahmad

Page 11: Gcse Statistics Revision Notes

STATISTICS 4040Venn Diagrams

Discrete uniform distribution

A discrete uniform distribution has n distinct outcomes. Each outcome is equally likely, with probabilityEqual to 1 nFor example a fair six sided dice is rolled. The possible outcomes would be written as a probability distribution

x: 1 2 3 4 5 6 p(x): 1 1 1 1 1 1 6 6 6 6 6 6

Binomial distribution

If two events p and q are independent. If p is consider a success and q a failure and n trials are carried out then the probabilities are found by expanding (p + q)n .

p (success) = 0.2q (failure) = 0.85 trials are carried out.Probability distribution is (p + q)5 = p5 + 5p4q + 10p3q2 + 10p2q3 + 5pq4 + q5

Probability of two successes : use 10p2q3 = 10 x (0.2)2 x (0.8)3

Khodabocus Aihjaaz Ahmad