content &statistical validity
TRANSCRIPT
Seied Beniamin HosseiniBIMS , University of Mysore
November 2016
Content & Statistical validity
Determining ValidityPredictive Validity - Convergent & Divergent ValidityConstruct & Content ValidityDiscriminant Validity Face Validity
Validity Refers to measuring what we intend to measure. If math and vocabulary truly represent intelligence then a math and vocabulary test might be said to have high validity when used as a measure of intelligence.
For Example - In developing a nursing licensure exam, experts on the field of nursing would identify the information and issues required to be an effective nurse and then choose (or rate) items that represent those areas of information and skills.
Describe the content domain
Compare the structure of the test with the structure of the content domain
Determine the areas of the content domain that are measured by each test item
Basic Procedure for Assessing Content Validity
For Example - With respect to educational achievement tests a test is considered content valid when the proportion of the material covered in the test approximates the proportion of material covered in the course.
. Essential
Useful but not essential
Not necessary
Lawshe (1975) proposed that each rater should respond to the following question for each item in content validity:
Is the skill or knowledge measured by this item
Content validity requires the use of recognized subject matter experts to evaluate whether test items assess defined content and more rigorous statistical tests than does the assessment of face validity.
Content validity is most often addressed in academic and vocational testing, where test items need to reflect the knowledge actually required for a given topic area (e.g., history) or job skill (e.g., accounting).
One widely used method of measuring content validity was developed by C. H. Lawshe. "Is the skill or knowledge measured by this item 'essential,' 'useful, but not essential,' or 'not necessary' to the performance of the construct.
Content validity is different from face validity, Face validity assesses whether the test "looks valid" to the examinees who take it, the administrative personnel who decide on its use, and other technically untrained observers.
The content Validity Ratio
Number of SME panelists indicating "essential"
total number of SME panelists
positive values indicate that at least half the SMEs rated the item as essential.
values range from +1 to -1
The mean CVR across items may be used as an indicator of overall test content validity.
SME (small-to-medium enterprise) is a convenient term for segmenting businesses and other organizations that are somewhere between the "small office-home office" ( SOHO ) size and the larger enterprise . The European Union has defined an SME as a legally independent company with no more than 500 employees.
Reasonable conclusions use
Type Ifinding a difference or correlation when none exists
Type IIfinding no difference
when one exists
quantitative, statistical, and qualitative data
Statistical validity
It is the degree to which conclusions about the relationship among variables based on the data are correct or ‘reasonable’.
Statistical conclusion validity involves ensuring the use of adequate sampling procedures, appropriate statistical tests, and reliable measurement procedures
Low statistical powerPower is the probability of correctly rejecting the null hypothesis when it is false
Low power occurs when the sample size of the study is too small given other factors (small effect size, large group variability, unreliable measures, etc.).
Experiments with low power have a higher probability of incorrectly accepting the null hypothesis
that is, committing a type II error and concluding that there is no effect when there actually is
(I.e. there is real co variation between the cause and effect).
Violated assumptions of the test statistics
Violations of assumptions may make tests more or less likely to make type I or II errors.
Violating the assumptions of statistical tests can lead to incorrect inferences about the cause-effect relationship.
Most statistical tests involve assumptions about the data that make the analysis suitable for testing a hypothesis.
Fishing and the error rate problem
The more the researcher repeatedly tests the data, the higher the chance of observing a type I error and making an incorrect inference about the existence of a relationship
Each hypothesis testing involves a set risk of a type I error .
If a researcher searches or "fishes" through their data, testing many different hypotheses to find a significant effect, they are inflating their type I error rate