curs 11-data analysis

8/4/2019 Curs 11-Data Analysis

1/24

Chap.6. Data analysis

6.1. Information systems used for dataanalysis

6.2. Descriptive statistics6.3. Inferential statistics


2/24

6.1. Information systems used fordata analysis

SPSS System (Statistical Package for Social

Sciences) is used on a large scale in marketingresearch for data analysis.

It is used mainly for data gathered with the help ofquestionnaires but also for various quantitative datafrom statistics, companys recording etc.)

The obtained information is presented as tables andcharts.

It offers multiple ways of data analysis like:summarize data, transforming variables, statisticaltests etc.


3/24


The flow of using SPSS system for information processing

Creating SPSS

data base

Selecting the

procedure of data

analysis

Selecting the

variables for

analysis

Data processing in

order to obtain the

information

Data

gathering


4/24

6.1. Information systems used for dataanalysis

Data gathering

Depends on the research method: Surveys - questionnaire

Secondary data official statistics, statistical databases, company recordings etc.

Avoiding data gathering errors is very important forthe research success. The researcher should payspecial attention to:

Proper training of the operators that collect data.

Verification in the fieldwork to ensure that theinterviewers are following the sampling procedures.

Controlling the data recordings to determinewhether interviewers are cheating.


5/24


Creating SPSS data base

In order to create a data base in SPSSthe following steps are followed:

Opening a new file Defining the variables of research

Recording data in the data base

Verification of recorded data


6/24


Start/Programs/SPSS for

Windows


7/24


A new empty data base


8/24


The window for defining variables


9/24


Setting the type of data


10/24


Defining the codes for response categories


11/24


Defining the codes for missing responses


12/24


Coding data The process of identifying and assigning

numerical scores or other character symbols todata expressed in words.

Codes facilitate the introduction of data indata bases.

Codes allow data to be processed bycomputers.

Coding depends on the type of scale usedin questionnaire.


13/24


Ex: Nominal scale

What brand of cigarettes do you smoke most often? Winston L&M Kent

Marlboro

Winchester ViceroyOther. Please specify ________

(1)

(2)(3)

(4)

(5)(6)

(7)

Attention: The assigned codes do not represent an order or a specific quantity. They areallotted only for identification of a response category (like the numbers of football players)

Binary (dichotomus) scale: - particular case

Are you smoking? Yes No

(1)

(0)


14/24


Ex: Ordinal scale1. The rank order scale according to a characteristic:

Please rank the following 5 brands of laundry detergentaccording to your preference (give the rank 1 to the most

preferred brand, rank 2 for the second preferred brand and soon until the rank 5 for the least preferred brand). OMO

ARIEL DERO PERSIL TIDE

Coding: in this case it is defined a variable for every responsecategory. The rank assigned by every respondent (from 1 to 5) willbe introduced in data base.

Attention: for the ordinal scales, the codes assigned generate anorder.


15/24


2. Semantic differential How much important is the ratio quality price when you choose a brand

of laundry detergent? __(5)__ __(4)___ _____(3)____ ____(2)____ __(1)__

very important neither important not important not at allimportant nor unimportant important

3. Numerical scale How satisfied you are with the whitening power of Ariel laundry detergent?

Very satisfied 5 4 3 2 1 Very dissatisfied

Usually, in this case only the extreme values are coded (1= verydissatisfied, 5=very satisfied)

4. Likert scalePlease indicate your opinion related to the following statement:

When somebody chooses a laundry detergent, the price is the mostimportant, all brands having about the same whitening power.

__(5)__ __(4)___ _____(3)____ ____(2)____ __(1)__strongly agree neither agree disagree stronglyagree nor disagree disagree


16/24


Interval scale The middle point of every interval is recorded in data base. This

one is used both as value of the variable and code of the responsecategory.

How many cigarettes do you generally smoke during a day ? 5-9 (7)

10-14 (12) 15-19 (17) 20-24 (22) 25-29 (27)

Ratio scale For this type of scale, coding is not used. In the data base it is

recorded the exact value indicated by the respondent.

Ex: How many hours do you study for an exam during the examinationsession?____5 h____


17/24


Ex: Divide 100 points among each of the followingbrands according to your preference for the brand:

ARIEL __40___

DERO __20___

PERSIL __30__

TIDE __10___

Coding: in this case it is defined a variablefor every response category (like in the caseof rank order scale). The value assigned by

every respondent will be introduced in thedata base.


18/24

6.2. Descriptive statistics

Descriptive analysis

Refers to the transformation of raw data into a form thatwill make them easy to understand and interpret(summarize data).

The most common ways to summarize data are: frequency

distribution, percentage distribution, calculation of centraltendency and variation indicators.

Charts could be associated to frequency tables in order tofacilitate the understanding of information.

Attention: Descriptive statistics is computed exclusively at thelevel of sample, using the data collected from the samplemembers.


19/24


Selecting the procedures of descriptive analysis in SPSS


20/24


Frequency table

An arrangement of statistical data in a row-and-column formatthat exhibits the count of responses and percentages for eachcategory assigned to a variable.

General Happiness

467 30,8 31,1 31,1

872 57,5 58,0 89,0

165 10,9 11,0 100,0

1504 99,1 100,0

13 ,91517 100,0

Very Happy

Pretty Happy

Not Too Happy

Total

Valid

NAMissingTotal

Frequency Percent Valid PercentCumulative

Percent


21/24


Measures of central tendency: mode, mean, median

Mode is the response category with the highest frequency

Median is the middle value when the data are arranged in ascending ordescending order. It divide the sample into two equal groups (50% of thesample members are on the left and the other 50% on the right of themedian).

Mean is the most commonly used for central tendency when data aremeasured with ratio or interval scale.

Mean score represents a summarized rank used in the case of ordinalscale for creating final order of analyzed categories. It is calculated likemean but it has not the same properties with this one.

n

fx

x

n

1i

ii=

=

For binary scale

pn

f

n

f0f1x YesNoYes ==

+=


22/24

6.2. Descriptive statisticsVariation indicators: range, variance, standard deviation, standard error of mean.

Range measures the spread of data Range=xlargest-xsmallest

Variance is the mean of squared deviation from mean. It is an indicator of samplehomogeneity.

Standard deviation is the square root of the variance. It is expressed in thesame units as the data.

Standard error of mean - a measure of how much the value of themean may vary from sample to sample taken from the same distribution.

n

f)xx(

s

n

1i

i

2

i2

=

=

n

f)xx(

s

n

1i

i

2

i=

=

n

ssx =

)p100(psor)p1(ps 22 ==

For binary scale

For binary scale

)p100(psor)p1(ps ==


23/24


Selecting the procedures of descriptive analysis in SPSS


24/24

curs 11-data analysis

Documents