basic data analysis (with excel) and data interpretation
DESCRIPTION
BASIC DATA ANALYSIS (WITH EXCEL) AND DATA INTERPRETATION. Basic tools : frequency, cross-tabs, mean, median, share and rate. Frequencies. What? The number of times a certain value or class of values occurs What for? A way to summarise data. Frequencies. - PowerPoint PPT PresentationTRANSCRIPT
BASIC DATA ANALYSIS (WITH EXCEL) AND DATA INTERPRETATION
Basic tools: frequency, cross-tabs, mean, median, share and rate
What? The number of times a certain value or class of values occurs
What for? A way to summarise data
Frequencies
ISCED classification Frequencies ISCED 0 1,655,386 ISCED 1 2,865,613 ISCED 2 1,756,003 ISCED 3 2,847,785 ISCED 4 27,094 ISCED 5 and 6 2,013,856 Total 11,165,737 Source: Eurostat
Example: Number of students by ISCED level in Italy, 2008
Frequencies
ISCED classification Frequencies Relative frequencies ISCED 0 1,655,386 0.15 ISCED 1 2,865,613 0.26 ISCED 2 1,756,003 0.16 ISCED 3 2,847,785 0.26 ISCED 4 27,094 0.00 ISCED 5and 6 2,013,856 0.18 Total 11,165,737 1.00 Source: Eurostat
18% of students in Italy attended higher education (ISCED 5 and 6) in 2008
Example: Number of students by ISCED level in Italy, 2008
2,013,856
11,165,737=0.18
Relative Frequencies (proportions)
ISCED classification Frequencies Relative frequencies
Cumulative frequencies
ISCED 0 1,655,386 0.15 0.15 ISCED 1 2,865,613 0.26 0.40 ISCED 2 1,756,003 0.16 0.56 ISCED 3 2,847,785 0.26 0.82 ISCED 4 27,094 0.00 0.82 ISCED 5and 6 2,013,856 0.18 1.00 Total 11,165,737 1.00 Source: Eurostat
Example: Number of students by ISCED level in Italy, 2008
=0.56+0.40 0.16
Cumulative Frequencies
Cross-tabulations• Used to analyse categorical data (gender, level of education
etc.) • Two (or more) dimensional table that records the number
(frequency) of respondents that have the specific characteristics described in the cells of the table.
School 1 School 2Female students 55% 20%
Male students 45% 80%
Total 100% 100%
What? Sum of all the values divided by the number of values(arithmetic average), applying to quantitative variables
What for? To summarise data and compare them
Disadvantage Mean is affected by extreme values (e.g. income variables)
Mean
What? The midpoint of the data after being ranked (or the average of the twomiddle numbers in case of the total is even). Thus, there are as manynumbers below the median as above the median.
What for? To summarise data and compare them
Advantage Median is not affected by extreme values
Median
Example: Individual salary per year
Mean: 54,600 EURMedian: 20,000 EUR
Total salary per year Person A 8,000 EUR Person B 15,000 EUR Person C 20,000 EUR Person D 30,000 EUR Person E 200,000 EUR
Mean and Median
What? A ratio between two measurements, which can be expressed, forexample, as a percentage.
Where is the difference? See examples
Rate vs. Share
Example: Employment rates by educational attainment (15-64, %) in Italy, 2009
ISCED 0-2 44.5 ISCED 3-4 66.5 ISCED 5-6 77.0 Total 57.5
Definition Employment rates represent persons in employment (with a certain level of education) as a percentage of the population of working age (15- 64 years) (with a certain level of education).
Example: The shares (distribution) of the employed population by educational attainment (%) in Italy, 2009
ISCED 0-2 36.7 ISCED 3-4 46.1 ISCED 5-6 17.2 Total 100.0
Definition The distribution (shares) of the employed population by educational level represents persons in employment with a certain level of education as a percentage of the total employed population.
Example:
Country ATotal unemployed: 1,000Labour force (female): 10,000- Unemployed (female) 800- Employed (female) 9,200
Country BTotal unemployed: 1,000Labour force (female): 1,600- Unemployed (female) 800- Employed (female) 800
Cross-country comparison
Share of the unemployed female (out of total unemployed):
Country A: 800/1000 = 80%Country B: 800/1000 = 80%
Female unemployment rate (a ratio between the unemployed female and female labour force):
Country A: 800/10,000 = 8%Country B: 800/1,600 = 50%
BASIC DATA INTERPRETATION: EXAMPLES
Example 1
Example 1InterpretationIn the first example, we are looking at the question regarding the choice of language applied to a questionnaire. We see that in this hypothetical example there are 8 respondents, but only 7 replied to this question and 1 did not. The graph shows the distribution of responses of the 7 respondents that replied to the question. We can say that 28.57% of respondents chose Kyrgyz language as the language for the questionnaire and 71.43% of respondents chose Russian language as the language for the questionnaire.
Example 2
Example 2InterpretationIn the second example, we are looking at the question “Do you work at this moment?”. We see that in this hypothetical example there are 8 respondents, but only 6 replied to this question and 2 did not. The graph shows the distribution of responses of the 6 respondents that replied to the question. We can say that 16.67% (that is 1 person) of respondents worked at the moment of the interview, 33.33% (that is 2 people) of the 6 respondents did not work and searched for work at the same time. Finally, 50% of respondents (that is 3 people) did not work and did not look for a job at the time of the interview).
Example 3
Example 3InterpretationHere we analyse the question on the choice of the profession. It was a multiple answer question. Therefore, respondents could tick more than one answer. Here, we see that out of 8 respondents, 6 replied to this question. Out of these six, 83.33% chose the profession because of personal interest. Moreover, 16.67% of the six respondents declared that it was (also) the choice of their parents etc.
Example 4
да нет, но я ищу работу нет и я не ищу работу TotalQ2: Мужской 33.33%
1 33.33%1
33.33%1
3
Q2: Женский 0%0
50%1
50%1
2
Total Respondents 1 2 2 5
Example 4Interpretation:This example deals with a cross-tabulation (i.e. we are taking into consideration two variables at the same time – gender and the current employment situation). In the table and graph we can see that in this hypothetical example, there are 2 women and 3 men. 50% of women (i.e. 1 person) did not work at the moment of interview, but were looking for work. Another 50% of women (i.e. 1 person) did not work and did not look for work at the same time. In case of men, 33.3% of them worked (i.e. 1 out of 3 male respondents), 33.3% of male respondents did not work but searched for employment and another 33.3% of male respondents did not work and did not look for a job.