sadc course in statistics statistical concepts module b2, session3

42
SADC Course in Statistics Statistical concepts Module B2, Session3

Upload: ethan-shepherd

Post on 28-Mar-2015

236 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SADC Course in Statistics Statistical concepts Module B2, Session3

SADC Course in Statistics

Statistical concepts

Module B2, Session3

Page 2: SADC Course in Statistics Statistical concepts Module B2, Session3

2To put your footer here go to View > Header and Footer

Objectives

At the end of this session students will be able to:

• Define statistics

• Enter simple datasets once the data entry form is set up

• Recognise the type of each variable in a dataset

• Know some ways to summarise data of each main type

• Explain how statistical investigations deal with variability

• Differentiate between descriptive and inferential statistics

Page 3: SADC Course in Statistics Statistical concepts Module B2, Session3

3To put your footer here go to View > Header and Footer

Activities

1. This introduction

2. Entry of the data from the CAST survey

3. Discussion/presentation on statistical concepts1. Using the data entered2. And other case studies

4. The statistical glossary1. For when you need to remind yourself about

terminology

Page 4: SADC Course in Statistics Statistical concepts Module B2, Session3

4To put your footer here go to View > Header and Footer

What is statistics - 1?

From RSS webpage:

1. Statistics changes numbers into information.

2. Statistics is the art and science of deciding: • what are the appropriate data to collect, • deciding how to collect them efficiently • and then using them to give information, • answer questions, • draw inferences • and make decisions.

Page 5: SADC Course in Statistics Statistical concepts Module B2, Session3

5To put your footer here go to View > Header and Footer

What is statistics - 2?

3. Statistics is making decisions when there is uncertainty.

• We have to make decisions all the time, • in everyday life, • and as part of our jobs. • Statistics helps us make better decisions.

4. Statistics is NOT just collecting a lot of numbers• It is collecting numbers for a purpose

Page 6: SADC Course in Statistics Statistical concepts Module B2, Session3

6To put your footer here go to View > Header and Footer

What is statistics - 3?

From Wikipedia:

5. Statistics is a mathematical science pertaining to the• collection, • analysis,• interpretation or explanation• and presentation

of data.

6. Statistics are used for making informed decisions• and misused for other reasons

in all areas of business and government

Page 7: SADC Course in Statistics Statistical concepts Module B2, Session3

7To put your footer here go to View > Header and Footer

What is statistics - 4? From the book “Statistics: A guide to the unknown”:

7. Statistics is the science of learning from data.

Question 1 in the practical sheet

•From these 7 definitions – in the practical sheet• either chose the one you think is most appropriate• or make your own

a) A one – line definition

b) A longer definition

Page 8: SADC Course in Statistics Statistical concepts Module B2, Session3

8To put your footer here go to View > Header and Footer

Data checking and entry – Question 2

• What can we learn from the data you collected?• Work in pairs or small groups• First check the data from the CAST survey• Check each others, not your own

• Is it legible?• Can it be entered into the computer?• Is the response to the open-ended question clear?• Can the text be simplified?• If there are many points, ask the respondent to state

which are the most important 2 or 3.

• Brief notes (as a report) to be made in the exercise sheet

• to establish the data are ready for entry

Page 9: SADC Course in Statistics Statistical concepts Module B2, Session3

9To put your footer here go to View > Header and Footer

Data entry into Excel

Just type the number. The label is

automatic

Page 10: SADC Course in Statistics Statistical concepts Module B2, Session3

10To put your footer here go to View > Header and Footer

Data entry and checking – Question 3

• The data are now entered

• This can be a class exercise• on a single computer

• Data is entered by someone else• for each respondent (never by themselves)

• Then it must be checked• read it out• check by reading back

• Put the record number from the Excel form• on your original sheet• or add your names as another field in the Excel sheet

• Why might it be better to just have a number?

Page 11: SADC Course in Statistics Statistical concepts Module B2, Session3

11To put your footer here go to View > Header and Footer

Data entry and checking• You should now have completed question 3

• On the practical sheet

• How long to you estimate

• For 1000 records to be entered?

Page 12: SADC Course in Statistics Statistical concepts Module B2, Session3

12To put your footer here go to View > Header and Footer

Once the data are entered• Remember:

“Statistics is the science of learning from data.”

• To learn as much as possible• we must have confidence in the data• so they must be entered and checked well

• This is what we have done in the groups

• Now the data are ready for the analysis

• Before that, look at some other data sets• Look for the common points• That apply to all the sets• and look for differences

Page 13: SADC Course in Statistics Statistical concepts Module B2, Session3

13To put your footer here go to View > Header and Footer

Types of data - 1 • The analysis depends on the type of data

• What are the types here?

• For questions 1 to 6• Your answer was one of 5 categories• e.g. 1: Strongly agree, 2: Agree, … 5: Strongly disagree• These categories have an ordering• from strongly agree to strongly disagree

• This type of data are called • categorical • or factor• or qualitative

• With the ordering, they are sometimes called • ordered categorical data

Page 14: SADC Course in Statistics Statistical concepts Module B2, Session3

14To put your footer here go to View > Header and Footer

Types of data - 2

• The last question in the survey • was a sentence or two that was written

• This is also an example of qualitative data

• It is an open-ended response

• These data can be reported – and reporting the sentences can be very useful

• So it is good if they are entered as they stand

• To summarise perhaps the responses can be coded?

Page 15: SADC Course in Statistics Statistical concepts Module B2, Session3

15To put your footer here go to View > Header and Footer

Coding open-ended questions –Question 4

• This is question 4 in the practical sheet

• Looking at the responses in your groups• Could you code them?• What different codes would you have?• How would you enter the codes?

• Might you lose anything by coding

• For a quick analysis• Could you enter the complete texts• And analyse the other columns• And then code later?

• What might you lose by coding?

Page 16: SADC Course in Statistics Statistical concepts Module B2, Session3

16To put your footer here go to View > Header and Footer

Coding and entering open-ended data

• Discuss the suggestions for the codes.

• If some points are made by many students then prepare a summary,

• how many as a frequency• and as a percentage

• With the small number of responses • there is no need to enter them into the computer

• But discuss how it could be done

• It is an example of a multiple response question• because respondents may give no points• or more than one point

• If you ask for the most important observation• then it becomes a single qualitative response

Page 17: SADC Course in Statistics Statistical concepts Module B2, Session3

17To put your footer here go to View > Header and Footer

Other data sets• Zambia rainfall data

• Tanzania agriculture survey

• Look for the layout of the data• is it the same as for the simple CAST survey?

• Look for the types of data

• Which are the qualitative variables?• are they ordered?

• Which are the quantitative variables?• which of them are discrete?• and which are continuous?• have any been coded to become qualitative?

Page 18: SADC Course in Statistics Statistical concepts Module B2, Session3

18To put your footer here go to View > Header and Footer

Annual climatic data from Zambia

Page 19: SADC Course in Statistics Statistical concepts Module B2, Session3

19To put your footer here go to View > Header and Footer

Survey data from Tanzania - 1

Page 20: SADC Course in Statistics Statistical concepts Module B2, Session3

20To put your footer here go to View > Header and Footer

Survey data from Tanzania - 2

Page 21: SADC Course in Statistics Statistical concepts Module B2, Session3

21To put your footer here go to View > Header and Footer

Discussion- Question 5

• The layout of the data• Was always the same!• In a rectangle

• Each row is a record• There are as many records (rows of data) • as there were respondents, or students, or units

• Each column is a variable• Variables can be qualitative• or they can be quantitative

• Discuss which type they are • For each data sets• complete the tables in the practical sheet, question 5

Page 22: SADC Course in Statistics Statistical concepts Module B2, Session3

22To put your footer here go to View > Header and Footer

Qualitative variables• They are categorical

• They may be nominal, (which implies there is no ordering)

• Give some examples from the Tanzania survey

• They may be ordered – as in the CAST survey

• Give an ordered example from the Tanzania survey

Page 23: SADC Course in Statistics Statistical concepts Module B2, Session3

23To put your footer here go to View > Header and Footer

Examples of analysis – Tanzania surveyQuestion 6

• There are 3223 records, • but just take the 18 you can see in the figure

• Count the values for Q0123 – head of household• There were 6 Females and 12 Males• So 2/3 of the 18 households had a male head• That’s about 70% • but percentages are a bit misleading with so few numbers

• Now you give a similar summary for Q021• type of agricultural household

• And also Q3464• how often did the household have food problems

Page 24: SADC Course in Statistics Statistical concepts Module B2, Session3

24To put your footer here go to View > Header and Footer

Add a simple chart• A simple chart can also be sketched

• Here is one by Excel

• But a sketch can be “by hand”• Excel will be used for these tasks from Session 4

Page 25: SADC Course in Statistics Statistical concepts Module B2, Session3

25To put your footer here go to View > Header and Footer

Examples of analysis – CAST survey Question 7

• Do a similar analysis of the CAST survey

• To make it quick • each group could initially process just one question• then report the results to the class

• Include a hand drawn chart• Sketch a simple bar chart • and include the numbers on the chart• as shown earlier

Page 26: SADC Course in Statistics Statistical concepts Module B2, Session3

26To put your footer here go to View > Header and Footer

Quantitative variables- Question 8• They may be discrete (whole numbers)

• Give examples from the climatic data• And the Tanzania survey

• They may be (conceptually) continuous• Give examples from the data sets

• Also they may be coded into (ordered) categories• Give an example from the Tanzania survey

Page 27: SADC Course in Statistics Statistical concepts Module B2, Session3

27To put your footer here go to View > Header and Footer

Examples of analysis – Tanzania survey

• An analysis of the 18 values in Q3462– The number of times meat was eaten last week

• minimum = 0• maximum = 5• adding the values: total = 31, • so the mean = 31/18 about 1.7 times per week

• Note: the mean does not have to be an integer• just because the individual values are whole numbers

• Repeat this analysis• for Q3463 – times fish eaten last week• and HHsize

Page 28: SADC Course in Statistics Statistical concepts Module B2, Session3

28To put your footer here go to View > Header and Footer

Data analysis• As the layout of the data is always the same

• Once you know how to analyse one data set• You will have the principles to analyse them all• And we have just done one analysis!

• You have seen that• The appropriate analysis depends on the type of data

• So what are the principles • of analysing (summarising) data • of the different types?

Page 29: SADC Course in Statistics Statistical concepts Module B2, Session3

29To put your footer here go to View > Header and Footer

The methods of analysis

• How many? • are questions for qualitative variables• for example the CAST survey, the Tanzania survey

• You used summaries• Like counts, or proportions or percentages

• How large?

• How variable?• are questions for quantitative variables• for example the climatic data or the Tanzania survey

• We used summaries • Like averages, extremes and measures of spread

Page 30: SADC Course in Statistics Statistical concepts Module B2, Session3

30To put your footer here go to View > Header and Footer

A toolkit for analysis

• Different types of graph are also used

• Qualitative data• “how many”

• Quantitative data• how large• how variable

Page 31: SADC Course in Statistics Statistical concepts Module B2, Session3

31To put your footer here go to View > Header and Footer

Statistics and variation • In the CAST survey - why not just ask one student?

• In the climatic data - why not just use one year?

• In the agriculture survey - why not just use one household?

• Because there is variation between the responses

• Remember this definition?• “Statistics is making decisions • when there is uncertainty.”

Page 32: SADC Course in Statistics Statistical concepts Module B2, Session3

32To put your footer here go to View > Header and Footer

Variation is everywhere!

• In the book “Statistics a guide to the unknown”

• “Variation is everywhere. • Individuals vary• Repeated measurements on the same individual vary

• The science of statistics• provides tools for dealing with variation”

• So statistics is concerned with making sense from data, when there is variation

Page 33: SADC Course in Statistics Statistical concepts Module B2, Session3

33To put your footer here go to View > Header and Footer

Fighting the curse of variation• To do good statistics you must

• tame variation• fight the curse of variation

• You have 2 main strategies for overcoming variation

• 1. Take enough observations• In the Tanzania survey there were 3223 households

just from this one region

• 2. Measure characteristics that explain variation• Variation itself is not necessarily the problem• Variation you do not understand is the problem

Page 34: SADC Course in Statistics Statistical concepts Module B2, Session3

34To put your footer here go to View > Header and Footer

An example: explaining variation• Take the CAST survey

• Add a new record for an imaginary student• Make it VERY DIFFERENT to the existing records • So if most students were positive about CAST• Then make this record very negative, etc

• You have added variation

• Now what could you (should you) have measured • to explain this variation?

Page 35: SADC Course in Statistics Statistical concepts Module B2, Session3

35To put your footer here go to View > Header and Footer

What you could have measured• This little survey only asked about CAST

• It did not ask about you, e.g.• male/female• experience• age• computer access• etc

• These measurements could help• to understand the difference with this new student

• The Tanzania survey also asked about• Education• Possessions, etc

• Why – to be able to understand/explain variation

Page 36: SADC Course in Statistics Statistical concepts Module B2, Session3

36To put your footer here go to View > Header and Footer

Analysis and variation together

• For statistical analysis you have:• summarised columns of data• i.e. summarised individual variables

• You did this for qualitative and quantitative variables

• To fight the curse of variation• You take measurements• So you add to the rows of data

• That helps you to explain the variation

• That’s statistics for you!• You analyse the columns, i.e. the variables• And you understand variability by looking at the rows

Page 37: SADC Course in Statistics Statistical concepts Module B2, Session3

37To put your footer here go to View > Header and Footer

Types of statistics• Wikepedia says roughly:

• Statistical methods can be used to summarize • or describe a collection of data; • this is called descriptive statistics.

• In addition, patterns in the data may be modelled• and then used to draw inferences about the process

or population being studied; • this is called inferential statistics.

• Both descriptive and inferential statistics • comprise applied statistics.

Page 38: SADC Course in Statistics Statistical concepts Module B2, Session3

38To put your footer here go to View > Header and Footer

Descriptive and inferential statistics

• We have just done descriptive statistics

• We will only do descriptive statistics in this module

• The sample in the Tanzania agricultural survey • was 3223 households

• That’s just under 1% of the households in the region• See the column called WT – with values like 137• So each observation “represents 137 households

• But with such a large sample• The inferences for the whole region• Will be quite precise

• So most of what we need now is descriptive tools• In the Higher level modules • we add ideas of inferential statistics

Page 39: SADC Course in Statistics Statistical concepts Module B2, Session3

39To put your footer here go to View > Header and Footer

Glossary of statistical terms

• Each subject becomes easier• when you understand the terms

• A glossary is supplied• Called the SSC Statistical Glossary

• It explains most of the terms • For the 3 levels of this course

• So some terms may be new to you now

• An example is on the next slide• You can print the glossary if you wish• But it is good to look on-line• Then all the terms in blue are links• So you can easily move about in the document

Page 40: SADC Course in Statistics Statistical concepts Module B2, Session3

40To put your footer here go to View > Header and Footer

Example from the glossary• Descriptive statistics• If you have a large set of data, then descriptive statistics

provides graphical (e.g. boxplots) and numerical (e.g. summary tables, means, quartiles) ways to make sense of the data.

• The branch of statistics devoted to the exploration, summary and presentation of data is called descriptive statistics.

• If you need to do more than descriptive summaries and presentations it is to use the data to make inferences about some larger population.

• Inferential statistics is the branch of statistics devoted to making generalizations.

Page 41: SADC Course in Statistics Statistical concepts Module B2, Session3

41To put your footer here go to View > Header and Footer

Learning objectives

• Define statistics

• Enter simple datasets once the data entry form is set up

• Recognise the type of each variable in a dataset

• Know some ways to summarise data of each main type

• Explain how statistical investigations deal with variability

• Differentiate between descriptive and inferential statistics

Page 42: SADC Course in Statistics Statistical concepts Module B2, Session3

42To put your footer here go to View > Header and Footer

The end

• Next we move to the use of Excel

• To produce the tables and graphs

• So you can analyse all 3223 records – not just 18