research

109
Research Methods Dr. Abeer Yasin

Upload: sherinaju

Post on 05-Jan-2016

214 views

Category:

Documents


0 download

DESCRIPTION

Methodology

TRANSCRIPT

Page 1: research

Research MethodsDr. Abeer Yasin

Page 2: research

Outline:• Sampling• Why Sample?• Sample Size.• Probability vrs Non probability Sampling• Descriptive Statistics.• Measures of central tendency.• Measures of variability• Measures of relative standing.• Measures of Association• Displaying data- bar charts, pie charts, frequency

histogram, line chart, frequency curve, stem and leaf plot, box and whisker plot.

• Use of A graph.• Skewness

Page 3: research

Outline:• Measures of central tendency: mean, median, mode,

weighted mean, geometric mean• Measures of dispersion: range, mean of absolute

deviation, variance, standard deviation, interquartile range.

• Frequency Distribution.• When to use each measure of central tendency• Analysis strategies of quantitative data• Correlation and regression• Scatterplots• Correlation coefficient• Correlation coefficient and scatter plot• Regression analysis• Linear and nonlinear relationships.

Page 4: research

Sampling

• When measuring every item in a population is impossible, inconvenient, too expensive, we take a sample.

• The process of sampling involves using a portion or a population to make conclusions about the whole population. A sample is a subset or some part of a larger population. The purpose of sampling is to estimate an unknown characteristic about a population

Page 5: research

Why Sample?

• Sampling cuts costs, reduces labor requirements and gather vital information quickly.

• Most properly selected samples give results that are reasonably accurate.

• In a sample, increased accuracy may sometimes be possible because the fieldwork and tabulation of data can be more closely supervised.

Page 6: research

Why Sample?

• In most cases, studying the entire population would be a massive undertaking. It can be avoided by selecting a sample from a population of interest. With proper sampling, we can use information obtained from the participants who were sampled to estimate characteristics about the population as a whole. Statistical theory allows to infer what the population is like, based on data obtained from a sample.

Page 7: research

Why Sample?

• At the outset of the sampling process, the target population must be carefully defined so that the proper sources from which the data are to be collected can be identified. The usual technique about defining the target population is to answer questions about the characteristics of the population.

Page 8: research

Sampling

• Define the target population• Select the sampling frame• Determine if a probability or nonprobability

sampling method will be chosen.• Plan procedure for selecting sampling units

( sample unit: a single element or group of elements subject to selection in the sample)

• Determine sample size• Select actual sampling units• Conduct fieldwork

Page 9: research

Confidence Intervals

• When researchers make inferences about populations, they do with a certain degree of confidence.

• Example: Results from the survey are accurate within 3 percentage points using a 95% level of confidence. 

• This is called a confidence interval. You can have 95% confidence that that the true population value lies within this interval around the obtained sample result.

Page 10: research

Sample Size

• A larger sample size will reduce the size of the confidence interval. Although the size of the confidence interval is determined by several factors, the most important is sample size. Larger samples are more likely to yield data that accurately reflect the true population value.

Page 11: research

Probability vrs Non probability Sampling

Several alternative ways to take a sample are available: probability techniques and nonprobability techniques.• Probability sampling: every element in the

population has a known, non zero probability of selection. The simple random sample, in which each member of the population has an equal probability  of being selected , is the best known probability sample.

• Nonprobability sampling: The probability of any particular member of the population being chosen is unknown. The selection is arbitrary, the researcher relays heavily on personal judgment.

Page 12: research

Nonprobability Sampling

• Convenience sampling: sampling by obtaining people or units that are conveniently available.

• Researchers use convenience sampling to obtain large number of completed questionnaires quickly and economically or when obtaining a sample through other means is impractical.  

Page 13: research

Nonprobability Sampling

• Judgment sampling: is a technique in which an experienced individual selects the sample based on his or her judgment about some appropriate characteristics required of the sample member.

• Researchers select samples that satisfy their specific purposes even if they are not fully representative.

Page 14: research

Nonprobability Sampling

• Quota sampling:  in quota sampling the interviewer has a quota to achieve.

• The interviewer is responsible for finding enough people to meet the quota.

Page 15: research

Nonprobability Sampling

• Snowball sampling: involves using probability methods for an initial selection of respondents and then obtaining additional respondents through information provided by initial respondents.

• This method is used to locate members of rare populations by referrals.

Page 16: research

Probability Sampling

• All probability sampling techniques are based on chance selection procedures.

• The term random refers to the procedure and not the data in the sample.

Page 17: research

Probability Sampling

Simple random sampling:• sampling procedure that ensures each

element in the population will have an equal chance of being included in the sample.

Page 18: research

Probability Sampling

• Systematic sampling: A starting point is selected by a random process, then every nth number on the list is selected.

• While systematic sampling is not actually random selection process, it does yield random results if the arrangement of items in the list is random in character.

• The problem of periodicity occurs if a list has systematic pattern.

Page 19: research

Probability Sampling

• Stratified sampling: Choose starta on the basis of existing information,  a subsample is drawn using simple random sampling within each stratum.

• The reason for taking a stratified sample is to obtain a more efficient sample than would be possible with simple random sampling.

Page 20: research

Probability Sampling

Cluster sampling:• The purpose of cluster sampling is to

sample economically while retaining the characteristics of a probability sample.

• In cluster sampling the primary sampling unit is no longer the individual element in the population but a larger cluster of elements located in proximity to one another. the area sample is the most popular type of cluster sampling.

Page 21: research

Frequency, Tables and Graphs

• Definition: summary displays of variables and their frequency of occurrence, which may involve one variable or more than one variable at a time.

• Examples: Tables, graphs, contingency tables containing different variables on their rows and columns.

Page 22: research

Descriptive Statistics

Page 23: research

Measures of Central Tendency

Definition:Single score summary of group of observations/scores.Examples: • Mode, which is the most frequent score in

a group• Mean which is the average of scores (sum

divided by number of scores)  • Median which is the score at or below the

50% of the scores fall

Page 24: research

Measures of Variability

Definition:• Indicate dispersion of scores within a data

set, deviation scores, which indicate each score's distance from the mean.

• Examples: Deviation scores such as average deviation and variance.

• Standard deviation which is the most widely used indicator of the average difference between the mean and individual scores.

Page 25: research

Measures of Relative Standing

Definition:• Single indicators of the relative position of

a score in relation to others.• Examples: Percentile rank, which is the

percent of scores that fall at or below a specific score, z score, standard score.

Page 26: research

Measures of Association

Definition:• Single indicators of the degree of

relationship between two or more variables.

• Examples: Correlation coefficients, which indicate the strength or relationships between variables (Pearson's correlation coefficient).

Page 27: research

Displaying Data

• Methods for displaying data for further analysis are charts and graphs.

• Graphs and charts provide a convenient way to communicate information.

• Of these charts are vertical Bar charts, line charts, pie charts, histograms, stem and leaf display and the Box and Whisker plots

Page 28: research

Bar Charts

• Bar charts: • Consist of rectangles. • On the horizontal axis place the classes or

labels used. • On the vertical axis place the frequencies

Page 29: research

Bar Chart

VAR00001

RSMEAP

Va

lue

VA

R0

00

02

100

80

60

40

20

0

Page 30: research

Pie Charts

Pie charts: • circular drawing with each piece of the pie

representing a class with its frequency represented by the area of the slice.

Page 31: research

Pie Chart

Page 32: research

Frequency Histogram

• Frequency histogram: • Consists of rectangles near each other

with no spaces in between. • Place on the horizontal axis the classes

and on the vertical axis the frequencies. • Note: the width of the classes is equal

since it represents the C.I.

Page 34: research

Line Chart

• Line chart: • Consists of connecting straight lines

through class midpoints. • To construct a line chart• Plot the midpoints of each class on the

horizontal axis. • Plot the frequencies on the vertical axis. • Connect the points through with straight

lines.

Page 35: research

Line Chart

Page 36: research

Frequency Curve

• Frequency curve: • Is a curve running through the midpoints

of each class. • To construct a frequency curve: • Plot the midpoints on the horizontal axis. • Plot the frequencies on the vertical axis. • Connect the points through with curves

Page 37: research

Frequency Curve

Page 38: research

Stem and Leaf Display

• Stem and Leaf display:  • In this display data are grouped according

to their leading digits (called stems, 10th or 100th place) while listing final digits (leaves) separately for each member of a class.

• The leaves are displayed in ascending order for each stem.

Page 39: research

Stem and Leaf Display

• Consider the following data and construct the stem and leaf display.

16 21 28 22 3140 14 18 22 2930 32 18 11 3442 12 24 19 2917 33 26 19 22

Page 40: research

Stem and Leaf Display

1 1 2 4 6 7 8 8 9 9

2 1 2 2 2 4 6 8 9 9

3 0 1 2 3 4

4 0 2

Page 41: research

Box and Whisker Plot.

The box and whisker plot uses the five point summary. It consists of:• An inner box that shows the numbers

which span the range from Q1 to Q3 .

• A line that is drawn through the box at the median position.

• The whiskers are the lines from Q1 to the minimum value and from Q3 to the maximum value.

Page 42: research

Box and Whisker Plot

Page 43: research

Box and Whisker Plot• One of the commonly used graphs in scientific

research to display data is the Box and Whisker Plot.

• In order to understand the Box and Whisker Plots we need to understand the components taken in to sketching the plot and the inter-quartile range.

• The inter-quartile range is a measure of dispersion, it measures the spread in the middle 50% of the data and it equals the difference between the observations at the 75th percentile (third quartile, Q3) and at the 25th  (first quartile, Q1) percentile

IQR = Q3 - Q1.

Page 44: research

The Box and Whisker Plot

The first quartile: Q1

• Q1 = 25th percentile.

• Q1 divides data such that 25% of the data are at or below this value.

• Q1 = (n+1)/4

• Q1 is located in the 0.25 (n + 1) positions.

 Third quartile: Q3

• Q3 = 75th percentile.

• Divides data such that 75% of the data are at or below Q3.

• Q3 = 3(n+1)/4

• Q3 is located in the 0.75 (n + 1) positions.

Second quartile Q2

• Q2 = 50th percentile.

• Q2 = median

Page 45: research

The Box and Whisker Plot• The five point summary used in sketching the Box and

Whisker Plot:• Consists of the minimum, Q1, median = Q2, Q3, and

maximum of the data set such that•    Min < Q1 < Q2 < Q3 < Max

• The box and whisker plot uses the five point summary as follows:

• An inner box that shows the numbers which span the range from Q1 to Q3 .

• A line that is drawn through the box at the median position.

•  The whiskers are the lines from Q1 to the minimum value and from Q3 to the maximum value.

Page 46: research

Use of the Graph• The use of a particular graph depends on the

information you would display or in other words the property you are more interested in displaying.

• Frequency distributions would work to give a comparison between categories in terms of frequency of each, the distribution of the data and its skewness in other words in which direction it tends to concentrate and the different possible categories.

• The pie charts would display the different categories and the percentage of each as a piece of the pie, therefore it demonstrates the portion of the data that concentrates in each category as a percentage for a comparison with respect to the whole picture

Page 47: research

Frequency Distributions

The term frequency distribution has many appearances and applications in statistics. Bar charts are often referred to as frequency distributions. In general frequency distributions are seen as graphs or plots that do explain a certain trend about a variable under study. Bar charts are a good example of these frequency distributions as bar charts can be skewed and the direction of skewness is an indication of the direction of the trend or change in a variable. Also Examining the shape of a distribution illustrates how the distribution is centered about its mean, therefore can also be viewed as a graphical measure of central tendency.

Page 48: research

• In order to understand the concept of skewness we need to understand another concept called symmetry. The shape of a distribution is symmetric if the observations (data) are balanced or evenly distributed about the mean. In a symmetric distribution the mean is equal to the median. A distribution on the other hand is defined as skewed if not symmetric. In this case the distribution can be positively skewed where the tail of the distribution extends to the right or negatively skewed where the tail of the distribution extends to the left. The attached graphs show a symmetric distribution (figure 1), positively skewed distribution (figure 2) and negatively skewed distribution (figure 3). The extended tail in no doubt illustrates the clustering of data and their relationship to the mean.

Page 49: research

A Symmetric Distribution

Page 50: research

A Positively Skewed Distribution

Page 51: research

A Negatively Skewed Distribution

Page 52: research

Descriptive Statistics

 Measures of Central Tendency and Dispersion of Ungrouped Data:Measures of central tendency locate the center value for a set of data. The central measure of tendency are listed below:• Mean• Median• Mode• The weighted mean• The Geometric mean

Page 53: research

The Mean

Mean: average of data. = sum of observations / number of observations. For a population the mean is given the symbol (MU) = x / NWhere x = the different x observations and N = number of the x observations. (= number of items in a population)For a sample the mean is given the symbol (X bar)

X = x/ nWhere x = the different x observations and n = number of the x observations. (= number of items in a sample).

Page 54: research

The Mean

Consider the following example representing the scores of a student on five different tests during a school year. Ex. 63 59 71 41 32

= 63+59+71+41+32/5 = 53.2. 

Page 55: research

The MedianMedian: is the middle observation after the data have been put in to an ordered array, half data above and half below the median.For an odd set of observations the median is found as follows:

Median position = (N + 1)/ 2.For and even set of data the median is found by taking

the average of the middle two observations.Ex. Previous set of data in Ex.1 put in descending order: 71 63 59 41 32The median is the middle value since the number of the data is odd = 59.•  

Page 56: research

The Median

Ex2, Find the median of the set of data: 71 63 60 59 41 32It is an even set of data the median is the average of the two middle values (60+ 59) /2 = 59.5

Page 57: research

The Mode

Mode: is the observation which occurs with the greatest frequency.Ex.3 Scores on a test for a student were 63 61 59 59 59 20 59Mode = 59.

Page 58: research

The Weighted Mean

The weighted mean: is calculated when certain observations carry more weight than others

Xw = XW / W

Where W = weight of each observation. X = observationXw = weighted mean.

Page 59: research

The Weighted Mean

Exam Score Weight

First test 60 1

Second test

69 1

Final exam 75 2

Ex4. Grades on an exams for a student are

Xw = XW / W = 60(1) + 69(1) + 75(2) / 1+1+2

Page 60: research

The Geometric Mean

Geometric mean: Computed by taking the nth root of the product of the n observations making up the sample GM = n ( x1 ……

xn)

Ex5. The Geometric mean of 5, 6, 8 and 12 isThe fourth root of the product of 5, 6, 8 and 12 = 7.33

Page 61: research

Measures of Dispersion

Definition: A measure of dispersion indicates to what degree the individual observations are spread about their mean.Measure of dispersion are listed below:•  Range.• Mean absolute deviation.• Variance.• Standard deviation. • The Interquartile Range

Page 62: research

The Range

Range: Is the difference between the highest observation and lowest observation.Ex.6, Consider the following data: 120 49 25 90 20 The range is = 120 – 20 =100.

Page 63: research

Mean Absolute DeviationIs a measure given by the following formula: MAD = Xi - X/ n

Where n = number of observations in a sample. Xi = ith observation

X = mean of data.Ex. Scores of eight students on a test are

73 82 64 61 63 68 52 73Mean = 67MAD = 56/8 = 7

Note: The greater the value of MAD the greater the dispersion of data around its mean the less we can depend on such data to represent the mean.

Page 64: research

Variance And Standard Deviation for A Population

For a population: Variance 2 = (xi –)2 / N

Standard deviation = 2

Ex8. For the following set of data calculate the variance and standard deviation. 110 145 125 95 150 = 110+145+125+95+150 / 5 = 125.2 = (xi –)2 / N

= (110 – 125)2 + (145 –125)2 + (125 –125)2 + (95-125)2 + (150 –125)2 / 5 = 430Standard deviation = 430 = 20.74. 

Page 65: research

Variance and Standard Deviation for A Sample

For a sample: Variance s2 = (xi – x)2 / n-1

Standard deviation s = s2

Ex. 2 = (xi –)2 / n-1

= (110 – 125)2 + (145 –125)2 + (125 –125)2 + (95-125)2 + (150 –125)2 / 4Standard deviation = 2. 

Page 66: research

Standard Deviation

Ex9. A sample of 7 items has been selected from a population. 87 120 54 92 73 80 63Mean = 81.2We calculate the sum of observations squared to be 49,047 Variance = 49047 – 7(81.2)2 / 6 = 465.9Standard deviation = 21.58

Page 67: research

The Interquartile Range

Observations (data) are sometimes compared by use of their relative ranking. Quartiles are another measure of dispersion that separates large data sets in to four quarters. • The first quartile = Q1 = 25th percentile=

Q1 = ( n+1)/4

•  Third quartile= Q3 = 75th percentile= Q3 = 3(n+1)/4 

• Second quartile= Q2 = 50th percentile= Q2

= median 

Page 68: research

The Interquartile Range

• The inter-quartile range measures the spread in the middle 50% of the data and it equals the difference between the observations at the 75th percentile and at the 25th percentile.

IQR = Q3 - Q1

• The five point summary: consists of the minimum, Q1, median = Q2, Q3, and maximum of the data set such that

min < Q1 < Q2 < Q3 < max

Page 69: research

Frequency Distributions

Researchers more often work with large sets of data where computations of central tendency or dispersion measures become tedious. In this case researchers begin with summarizing the data in to grouped sets (frequency distribution) and use statistical software programs such as excel for calculations of central tendency or dispersion measures as well as graph representations of data.

Page 70: research

Frequency Distribution

• In a frequency distribution table data is classified in to categories or classes. Each class has an upper and lower boundary, mid-point, interval and frequency.

• A frequency distribution: provides order to the data by dividing then in to classes and recording the number of observations in each class.

Page 71: research

Frequency Distribution

Ex10. Place the following set of data in to a frequency table 25 38 58 71 83 22 44 88 62 65Solution: The classes can be defined as follows 20-40, 41-60, 61-80, 81-100.

Page 72: research

Frequency Distribution

Class tally frequency

20-40 111 3

41-60 11 2

61-80 111 3

81-100 11 2

Page 73: research

When to Use Each of the Measures of Central Tendency?

 

Before we can discuss the different cases where each measure of central tendency is used it is important that we explain three types of measurements of variables, the interval-ratio level, the ordinal level and the nominal level.

Page 74: research

When to Use Each of the Measures of Central Tendency?• The interval-ratio level: The categories of

nominal level variable have no numerical quality to them they have categories that range on a scale from low to high but the exact distance between categories of scores is not defined.

• Variables measured at the interval-ratio level allow for classification and ranking. They also are measured in units that have equal intervals.

• For example, age categories 20-30, 30-40, 40-50 have equal intervals.

• Other examples include income, number of children, and number of bonds or stocks.

Page 75: research

When to Use Each of the Measures of Central Tendency?

• The nominal level: Variables that are measured at the nominal level have scores or categories that are not numeric in nature.

• Examples include sex, religion, race, zip code and place of birth.

• At this level the only mathematical operation can be used to compare relative sizes of the categories.

Page 76: research

When to Use Each of the Measures of Central Tendency?

• The ordinal level: Variable measured at this level are more sophisticated than the ones measured at the nominal level as they have scores or categories that can be ranked from low to high.

• Example to this would be socioeconomic status can be classified in to upper class, middle class, working class and lower class.

• Numbers can be used to represent each rank and thus number manipulation can be used.

Page 77: research

When to Use Each of the Measures of Central Tendency?

The Mean:• The mean is the most commonly used measure

of central tendency. The computation of the mean therefore involves addition and division as such it should be used with variables measured at the interval-ratio level. On the other hand it can also be used by researchers to calculate the mean of variables measured at the ordinal level since it is a more flexible to calculate than the median and is a central feature of many statistically advanced techniques.

Page 78: research

When to Use Each of the Measures of Central Tendency?

The Mean:• Measures that are used when one needs to

locate the scores that split the distribution in to thirds of fourths or the point below which a given percentage of the cases fall. These measures can be found for any ordinal or interval ration level.

Page 79: research

When to Use Each of the Measures of Central Tendency?

Percentiles, Deciles and Quartiles: • Measures that are used when one needs to

locate the scores that split the distribution in to thirds of fourths or the point below which a given percentage of the cases fall.

• These measures can be found for any ordinal or interval ration level.

Page 80: research

When to Use Each of the Measures of Central Tendency?

The Median:• The median represents the exact center of

distribution of scores. • It is the score that falls in the exact middle position

of a distribution, half the scores fall above it and half fall below it.

• It is calculated after ranking the scores from high to low and then choosing this middle value.

• The median cannot be calculated for variables measured at the nominal level.

• The median can be found for either the ordinal or interval ratio data.

Page 81: research

When to Use Each of the Measures of Central Tendency?

The Mode:• The mode for any set of data is the value

that occurs with greatest frequency. • The mode is the appropriate measure of

central tendency when the variable under consideration is nominal.

• It is also used for interval-ratio and ordinal level measured variables.

Page 82: research

Analysis Strategies for Quantitative Data

Quantitative data analysis is the analysis of numeric data using a variety of statistical techniques.Data analysis techniques:• Descriptive vrs inferential statistics.• Univariate vrs multivariate statistics• Parametric vrs nonparametric statistics.

Page 83: research

Analysis Strategies for Quantitative Data

• Descriptive methods: are procedures for summarizing data, with the intention of discovering trends and patterns, summarizing results for ease of understanding and communication.

• The outcome of these strategies is usually called descriptive statistics and includes results such as frequency tables, means and correlations.

Page 84: research

Analysis Strategies for Quantitative Data

• Inferential techniques: are generated after descriptive results have been examined.

• They are normally used for testing hypotheses or for confirming or disconfirming the results obtained from the descriptive statistics.

• An example is the use of t-tests.

Page 85: research

Analysis Strategies for Quantitative Data

• Univariate statistics: involve linking one variable that is the focal point of the analysis (eg, predicted event, single dependent variable in an experiment) with one or more variables (eg, predictors, independent variables).

Page 86: research

Analysis Strategies for Quantitative Data

• Multivariate statistics: Link two or more sets of variables to each other such as the simultaneous relationship between multiple dependent predicted and independent predictor variables.

• These multivariate analyses are followed by simpler univariate ones to determine the more important a) relationships between variables or b) differences between groups.

Page 87: research

Analysis Strategies for Quantitative Data

• Parametric statistics: are very powerful techniques but they require that data meet certain assumption (independence, normality, homogeneity of variance)

• Examples: • T-test• ANOVA• Pearson Correlation

Page 88: research

Analysis Strategies for Quantitative Data

• Nonparametric statistics: require a few if any assumptions about the population under study, they can be used with ordinal or nominal data.

• Examples:• the Mann-Whitney Test • the Wilcoxon Signed-Rank Test • the Kruskal-Wallis Test• Spearman Correlation

Page 89: research

Parametric vs. non-parametric tests

• There are two types of test data and consequently different types of analysis.

• As the table shows, parametric data has an underlying normal distribution which allows for more conclusions to be drawn as the shape can be mathematically described.

• Anything else is non-parametric.

Page 90: research

Parametric vs. non-parametric tests

Parametric Nonparametric

Assumed distribution Normal Any

Assumed variance Homogenous Any

Typical data Ratio or Interval Ordinal or Nominal

Data set relationships Independent Any

Usual central measure Mean Median

BenefitsCan draw more

conclusionsSimplicity; Less

affected by outliers

Page 91: research

Correlation and Regression

• Statisticians are often interested in the relationship and interaction between two variables and the strength or weakness of this relationship.

• For example, the relationship between the salary and the spending for a person over a period of time. Salary is a variable that takes on different values over a period of time and so is the spending.

Page 92: research

Scatter Plots

• Scatter diagrams are plots of the paired observations of X and Y on a graph with the independent variable X placed on the horizontal axis and the dependent variable Y placed on the vertical axis.

• Scatter plots provide a picture of the data including the range of each variable, patterns of values over the range, a suggestion to the relationship between the variables ( linear equation or a curve) and the number and place of the outliers ( data representing error in calculation , stands out of other data).

Page 93: research

Scatter Plots

• Scatterplots are important and effective in the visualization of data scatter around the line of best fit or sometimes the curve of best fit.

• The scatterplot indicates the spread of data about the mean and hence an indication or a measure of dispersion.

Page 94: research

Scatter Plots

• The more scattered the data about the line of best fit the weaker the relationship (correlation) between the two variables under study.

• The closer the points to the line of best fit the stronger the correlation between the two variables.

• The regression line is the line that best fits the data once plotted on the scatter diagram.

Page 95: research

Scatter Plot

Page 96: research

Correlation Coefficient

• Correlation is an index that measures the strength of the relationship between two variables under study.

• When two variables correlate they have effect on each other. This effect can be in the positive sense or the negative sense.

• If the variables correlate in the positive sense then an increase in the value of one results in an increase in the value of the other.

• While a negative effect means an increase in the value of one variable results in the decrease of the value of the other.

Page 97: research

Correlation Coefficient• The correlation factor is an index of values

ranging from -1 to +1. • 1 indicates a perfect strong correlation between

two variables indicated in the scatterplot by how close the points are to the line of best fit (in this case they would be very close and some actually fall straight on the line).

• 0 indicates no correlation between the two variables that is the change in one has no effect what so ever on the other.

• The sign indicates the direction of the correlation, positive means positive correlation and negative means negative reciprocal correlation.

Page 98: research

Correlation Coefficient and the ScatterPlot

• All would be evident in the scatterplot. • The closer the points to the line of best fit the

stronger the correlation. the farther away the points with more scatter around the line of best fit, the weaker the correlation.

• The sign of the correlation is also evident in the scatterplot.

• A positive correlation is indicated by a straight line moving in the positive increase direction, as x increases, y increases.

• The negative sign is indicated by the decreasing negative line, as x increases, y decreases.

Page 99: research

Degree of Correlation

Page 100: research

Regression Analysis• Regression analysis is a tool for building statistical

models that characterize relationships between a dependent variable and one or more independent variables, all of which are numerical.

• A regression model that involves a single independent variable is called simple regression.

• Simple Regression is fitting the data in to the best linear equation that represents the dependent variable Y as a linear equation of the dependent variable

X : Y = aX + b, where a and b are real coefficients.

Page 101: research

Regression Line

Page 102: research

Linear and Nonlinear Relationships

• What is the difference between a linear relationship and nonlinear relationship between two variables?• A linear relationship means the effect of X and Y is the same at all values of X

• A nonlinear relationship means the effect of X and Y changes over values of X.

• The scatterplot shows the difference between patterns of correlations between dependent variables across the groups.

Page 103: research

Linear and Nonlinear Relationships

• In a linear relationship the slope of the line is constant throughout the line.

• To obtain the slope of the straight line we can use any two points on the line and calculate the rate of change in y over the rate of change in x.

• In all cases regardless of what points are taken on the line, the slope of the line will be a constant value and hence the change in Y over the change in X will always be the same.

• The scatter plot of a linear relationship reveals the presence of the straight line modeling the behavior between both variables. 

• The scatter of the points may take the shape of a straight line with a negative slope or one with a positive slope. 

Page 104: research

Linear and Nonlinear Relationships

• The scatterplot of nonlinear relationships takes the shape of a curve that can be modeled using a number of nonlinear equations such as the exponential function or the polynomial functions.

• The slope of a curve is not a constant value as is the case of a straight line.

• In fact the slope is different at each point on the curve and we usually represent the slope by a function called the derivative.

•  If the scatter is condensed about the line of best fit (regression line) it indicates a strong correlation between both variables and in the case the points are spread about the line with more distance between each other and the line of best fit the correlation is weak. 

Page 106: research

Nonlinear Correlation

Page 108: research

Example of Linear Regression

• As an example to creating a linear regression model consider the following set of data:Y X

1 1

2 2.5

3 3

4 4.5

5 5.25

Page 109: research

SPSS and the Regression Equation

• the equation of the line of best fit is Y= mX+b , m is slope and b is the y intercept. Therefore using the above results we have Y= 0.93X -0.03 with a regression coefficient of r^2= 0.98.Best-fit values  

Slope 0.9333±0.07698

Y-intercept when X=0.0 -0.03333 ±0.2755

X-intercept when Y=0.0 0.03571

  1.071

Goodness of Fit  

r² 0.9800