data collection. what is data? data is facts and statistics that are collected together we use data...
TRANSCRIPT
Data Collection
What is Data? Data is facts and statistics that are collected together
We use data to be able to gather information for reference or analysis
We translate the information we find into a form that is more convenient to understand using charts and graphs
Types of Data
Numerical Data
Categorical Data
Numerical Data Numerical data is quantitative
This means it can be measured using numbers and these numbers can be placed in ascending or descending order
We use scatter plots and line graphs to represent numerical data
There are two types of numerical data – discrete and continuous
Discrete Numerical Data Discrete means the numbers used to measure the data have to be whole numbers
The numbers must be distinct and separate
Examples for discrete numerical data would be age, number of kittens, number of people, etc
Continuous Numerical Data Continuous means the numbers used to measure the data can be any number including decimals
Examples of continuous data would be temperature, time, and height
Categorical Data Categorical data is data that can be sorted into groups or categories
Categorical data is qualitative meaning it describes something
We use bar graphs and pie charts to sort categorical data
There are two different types of categorical data – nominal data and ordinal data
Nominal Categorical Data Nominal data can be counted but not put in ascending or descending order (sorted)
Nominal data makes sense regardless of the order it is presented
Examples of nominal data include gender, eye colour, hair colour, etc
Ordinal Categorical Data Values or observations that are ordinal can be ranked or have a scale attached
You can count and order ordinal data, but it cannot be measured like numerical data
Examples of ordinal data include house numbers, dates, swimming level, etc
Data Collection Data collection is separated into two types: primary data and secondary data
Primary data is collected first hand
Secondary data is data that was collected by somebody else
Primary Data - Examples Surveys
Focus groups
Questionnaires
Personal interviews
Experiments and observational study
Primary Data - Limitations Do you have the time and money for:
◦ Designing your collection instrument?◦ Selecting your population or sample?◦ Pretesting/piloting the instrument to work out sources of bias?◦ Administration of the instrument?◦ Entry/collation of data?
• Uniqueness• May not be able to compare to other populations
• Researcher error• Sample bias• Other confounding factors
Secondary Data – Examples of Sources
County health departments
Vital Statistics – birth, death certificates
Hospital, clinic, school nurse records
City and county governments
Surveillance data from state government programs
Federal agency statistics - Census, NIH, etc.
Secondary Data – Limitations When was it collected? For how long?
◦ May be out of date for what you want to analyze.◦ May not have been collected for a long enough time
• Is the data set complete?• There may be missing information on some observations• Unless such missing information is caught and corrected for, analysis will
be biased.• Is the data consistent/reliable?• Did variables drop out over time?• Did variables change in definition over time?
• E.g. number of years of education versus highest degree obtained.
Secondary Data – Advantages No need to reinvent the wheel.
◦ If someone has already found the data, take advantage of it.
It will save you money.Even if you have to pay for access, often it is cheaper in terms of money than collecting your own data. (more on this later.)
It will save you time.Primary data collection is very time consuming. (More on this later, too!)
It may be very accurate.When especially a government agency has collected the data, incredible amounts of time and money went into it. It’s probably highly accurate.
Data Collection When collecting data from a group, we can do it two ways
Observational data or Experimental data
Observational Data Observational data is collected by grouping people into different categories and observing how something affects them
An example of observational data collection would be to separate a group into adults vs children and compare the effects of sunlight on them
Experimental Data Experimental data is collected by creating our own groups and imposing our own treatment on the groups to see the effects
An example for experimental data would be administering a placebo drug to one group
Data Collection We use data collection to be able to obtain information on a smaller group and extend it to a larger population
The most important thing to remember is that the group we select must represent the population as a whole
It is very difficult to ensure this happens
Population vs Sample Population – the entire group being studied. Example: How many families in Canada have internet?
Sample – the part of the population that is being studied. Example: We would not be able to ask every family in Canada if they have internet. But we would select smaller groups from each province and territory and extend it to the entire country
We select a sample from an entire population so that it is easier to get the information we need
We use various sampling techniques to select our sample. Example: Our survey would not be very valid if we selected only families in southern parts of Canada where internet is more easily accessible.
Characteristics of a Good Sample
Each person must have an equal chance of being selected into the sample.
The sample must be large enough to represent the population
We use various sampling techniques to ensure this happens
Simple Random Sample Every member of the population has an equal chance of being picked
Example: Putting names in a hat and drawing at random
Systematic Random Sample To go through a population sequentially and select at even intervals
Example: Going through a phone book and selecting every 50th person
Stratified Sample A strata is a group of subjects that share a common characteristic
It keeps proportionate samples of each strata to the population
Example: If the population has both men and women, you ensure men and women are in the sample
Cluster Sample One representative group of the population chosen at random
Example: Picking one floor of an office building and surveying them
Multi-Stage Sampling Using a combination of stages to obtain the sample
Convenience Sample A type of sampling technique that is based on how easy responses are to obtain
Example: Surveying people stranded at an airport during a snowstorm about air travel
Voluntary Response Sampling Inviting subjects to voluntarily be a part of the sample
Example: Receiving a survey in the mail and being asked to complete it, random phone surveys from businesses
Problems with Data Collection Questions must be simple, clear, specific, ethical, free from bias, allow for honest response, and not infringe on anyone’s privacy
Questions must not contain slang, abbreviations, negatives, leading questions, and insensitivities
Good questions are often anonymous and require the subject to select from a list of possible responses
Survey bias can be unintentional, but can cause the data collected to be invalid. There are many different types of bias
Sampling Bias The chosen sample does not accurately reflect the population
Example: Asking basketball players about issues with the math curriculum
Non-Response Bias Particular groups are under-represented in the sample because they choose not to participate
When responders don’t respond, the surveyor is forced to draw their own conclusions about the sample
Measurement Bias When the data collection method consistently under- or overestimates a characteristic of the population
Leading questions can also cause measurement bias
Example: Police radar gun measuring for average speed on a particular road
Response Bias When participants in a survey give false or misleading answers
Question quality or topic might lead to response bias
Example: Teacher asks the class to raise hands if they completed their homework
Tally Charts A tally chart is a table used to record values by hand as the data is collected.
One tally mark is used for each occurrence of a value
Tally marks are usually grouped into sets of five to allow for easier counting
Frequency Tables Tally charts are helpful during the collection of data
Once the data is collected, it is more useful to summarize the data into what we call a frequency table.
A frequency table shows the data numerically Number of days with rain Number of weeks
01234567
255519664
Total 52