measurement, data collection, validity & reliability data is your friend
Post on 13-Dec-2015
212 Views
Preview:
TRANSCRIPT
Measurement, Data Collection,
Validity & Reliability
Data is your friend
Agenda
• Measurement
• Measures (aka, ways to collect data)
• Validity/reliability, up close and personal
Educational Measurement
• Measurement: assignment of numbers to differentiate values of a variable
• GOOD RESEARCH MUST HAVE SOUND MEASUREMENT!!
Thought Question
• Consider the following scores on a test
Marco 90 Adriane 85 Linda 75 Christy 99Chantelle 88 Jay 45 Remi 68 Marcus 97Chi Bo 92 Donnie 85
• Which measure of central tendency would Adriane use when telling her parents about her performance?
Descriptive Statistics
• Statistics: procedures that summarize and analyze quantitative data• Descriptive statistics: statistical procedures that
summarize a set of numbers in terms of central tendency or variation
• Important for understanding what the data tells the researcher
Descriptive Statistics: A Caution
• Statistics can provide us with useful information, but they can be interpreted in different ways to say different things
Thought Question
If Jay scored an 85 instead of a 45, what changes?
Highly deviant scores (called "outliers") have no more effect on the median than those scores very close to the middle. However, outliers can greatly affect the mean.
Descriptive Statistics
• Frequency distributions (see Figure 6.2)• Normal - scores equally distributed around
middle• Positively skewed - large number of low scores
and a small number of high scores; mean being pulled to the positive
• Negatively skewed - large number of high scores and a small number of low scores; mean being pulled to the negative
Normal Distribution
An Extreme Example
• Consider the salaries of 10 people
• Group A – All are teachers.
Salaries: $45,000 $45,000 $45,000
$50,000 $50,000 $50,000
$50,000 $55,000 $55,000
$55,000
An Extreme Example
• Consider the salaries of 10 people• Group B – Nine are teachers; 1 is Donovan
McNabb.Salaries: $45,000 $45,000 $45,000
$50,000 $50,000 $50,000$50,000 $55,000 $55,000$6,300,000
An Extreme Example
• What happens to the mean and median in these 2 examples? Does it change?
• What happens to the normal distribution?
Positive Skew
Negative Skew
Case in Point: Teacher Salary
• Compare Radnor to Philadelphia• Is the salary distribution for Philadelphia
going to be positively or negatively skewed? (Hint: Look at the # years of experience)
Descriptive Statistics
• Variability• How different are the scores?• Types
• Range: the difference between the highest and lowest scores
• Standard deviation• The average distance of the scores from the mean• The relationship to the normal distribution
• ±1 SD = 68% of all scores in a distribution• ±2 SD = 95% of all scores in a distribution
Variability
Variability
• Why does variability matter?
Descriptive Statistics
• Relationship• How two sets of scores relate to one another
• Correlation (positive)• Low .10 - .39• Moderate .40 - .69• High > .70
Example of Correlation
Measures of Data Collection
• Tests
• Questionnaires
• Observations
• Interviews
Measures (Means of Data Collection)
You must match the instrument to the research question!
Questionnaires
http://www.authentichappiness.sas.upenn.edu/
• Thoughts on those you responded to• Approaches to Happiness• Optimism• Grit
Examples to critique
• Measures• Questionnaire – Psychological School
Membership Survey used with middle school students
• Interview protocol – for teachers & counselors regarding professional development issues
• Observation instrument – PDE 430 for student teachers
• What are 2 benefits and 2 limitations of this measure?
Questionnaires
• Used to obtain a subject’s perceptions, attitudes, beliefs, values, opinions, or other non-cognitive traits
• Scales - a continuum that describes subject’s responses to a statement • Likert• Checklists• Ranked items
Questionnaires
• Likert scales• Response options require the subject to
determine the extent to which they agree with a statement
• Debate over odd v. even number responses• Statements must reflect extreme positive or
extreme negative positions• Example – CATS evaluations
Questionnaires
• Checklists• Choose options
• Ranked items • Sequential order• Avoids marking everything high or low
Questionnaires
• Problems with measuring non-cognitive traits• Difficulty clearly defining what is being measured
• Self-concept or self-esteem
• Response set• Responding same way (Ex - all 4’s on CATS)
• Social desirability • “PC filter”
• Faking• Agreeing with statements because of the negative
consequences associated with disagreeing
Questionnaires
• Controlling problems• Equal numbers of positively and negatively
worded statements• Alternating positive and negative statements • Providing confidentiality or anonymity to
respondents
Designing Questionnaires
• Online resources• http://pareonline.net/getvn.asp?v=5&n=3• http://www.peecworks.org/PEEC/PEEC_Inst/I0
004E536• http://www.statpac.com/surveys/
Observations
• Observations - direct observations of behaviors• Provide first hand account (ameliorates issues
of self-reporting in questionnaires)• Natural or controlled settings
• Ex – classroom vs. lab (child attachment studies)
• Structured or unstructured observations• Ex – frequency counts vs. narrative record
• Detached or involved observers
Observations
• Inference• Low inference - involves little if any inference
on the observers part• On-task/Off-task behavior instrument
• High inference - involves high levels of inference on the observers part
• Teacher effectiveness – PDE form 430
Observations
• Controlling observer effects• Observer bias
• Training• Inter-rater reliability (Cronbach’s alpha)• Multiple observers
• Contamination - knowledge of the study influences the observation
• Training• Targeting specific behaviors• Observers do not know of the expected outcomes• Observers are “blind” to which group is which
Observations
• Observer effects• Halo effectHalo effect - initial ratings influence subsequent
ratings
• Hawthorne effectHawthorne effect - increased performance results from awareness of being part of study
• LeniencyLeniency - wanting everyone to do well
• Central TendencyCentral Tendency - measuring in the middle
• Observer DriftObserver Drift - failing to record pertinent information
Interviews
• What are some challenges to doing this kind of interviewing?
http://www.youtube.com/watch?v=d6bXH2k9MKE
Interviews
• Advantages• Establish rapport & enhance motivation• Clarify responses through additional
questioning• Capture the depth and richness of responses• Allow for flexibility• Reduce “no response” and/or “neutral”
responses
Interviews
• Disadvantages• Time consuming• Expensive• Small samples• Subjective – interviewer characteristics,
contamination, bias
Validity and Reliability
What’s all the fuss about?
Validity/Reliability and Trustworthiness
• Why do we need validity and reliability in quantitative studies and “trustworthiness” in qualitative studies?
We can’t trust the results if we can’t trust the
methods!
Reader’s Digest version…
• Reliability • The extent to which scores are free from error
• Error is measured by consistency
• Validity• The extent to which inferences are appropriate,
meaningful, and useful
• “Does the instrument measure what it is supposed to measure??”
Thought Question
• On the ACT and SAT assessments, there is a definitive script that test administrators are required to follow exactly. What measurement issue are the test makers addressing?
Reliability of Measurement
• Reliability - The extent to which measures are free from error
• Error is measured by consistency
Reliability of Measurement
• Reliability• Measurement
• 0.00 indicates no reliability or consistency• 1.00 indicates total reliability or consistency• < .60 = weak reliability• > .80 = sufficient reliability
Reliability of Measurement
• Types of reliability evidence• Stability (i.e. test-retest)
• Testing the same subject using the same test on two occasions
• Limitation - carryover effects from the first to second administration of the test
• Equivalence (i.e. parallel form)• Testing the same subject with two parallel (i.e. equal)
forms of the same test taken at the same time• Limitation - difficulty in creating parallel forms
Reliability of Measurement
• Equivalence and stability• Testing the same subject with two forms of
the same test taken at different times• Limitation - difficulty in creating parallel
forms
Reliability of Measurement
• Internal consistency• Testing the same subject with one test and
“artificially” splitting the test into two halves
• Limitations - must have a minimum of ten (10) questions
• Often see “Chronbach’s alpha” for reliability coefficient (ex – Learning styles)
Reliability of Measurement
• Agreement/ Inter-rater reliability• Observational measures• Multiple observers coding similarly
Reliability of Measurement
• Enhancing reliability• Standardized administration procedures
(e.g. directions, conditions, etc.)• Appropriate reading level• Reasonable length of the testing period• Counterbalancing the order of testing if
several tests are being given
Validity of Measurement
• Validity: the extent to which inferences are appropriate, meaningful, and useful
• Current example – content tests and teacher licensure
Validity of Measurement
• For research results to have any value, validity of the measurement of a variable must exist• Use of established and “new”
instruments and the implications for establishing validity
• Importance of establishing validity prior to data collection (e.g. pilot tests)
Validity
• Content
• Predictive (criterion-related)
• Concurrent
• Construct
Thought Question
• Criticisms of standardized tests like the SAT claim that they discriminate against particular groups of students (especially minorities) and do not represent a broad enough domain of knowledge to adequately assess a student’s academic potential. What issue of validity is operating in these arguments?
Thought Question
• Other arguments against the SAT state that the tests do not adequately estimate an individual’s ability to succeed in college. What issue of validity is operating here?
Reliability & Validity of Measurement
• What is the relationship of reliability to validity?• If a watch consistently gives the time at 1:10
when actually it is 1:00, it is ____ but not ____.
• ______ is necessary but not sufficient condition for _______.
• To be _____ , an instrument must be ______, but a ____ instrument is not necessarily _____.
Reliability & Validity of Measurement
• What is the relationship of reliability to validity?• If a watch consistently gives the time at 1:10
when actually it is 1:00, it is reliable but not valid.
• Reliability is necessary but not sufficient condition for validity
• To be valid, an instrument must be reliable, but a reliable instrument is not necessarily valid.
Midterm
• Multiple Choice: 50 pts
• Short Answer: 25 pts
• Article Critique: 25 pts
Bring article with you to class. It’s ok to have notes on it.
top related