using hisco and hiscam to code and analyze occupations

11
From HISCO to HISCAM Richard L. Zijdeman 6 July 2015 Richard L. Zijdeman From HISCO to HISCAM

Upload: richard-zijdeman

Post on 21-Jan-2017

287 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Using HISCO and HISCAM to code and analyze occupations

From HISCO to HISCAM

Richard L. Zijdeman

6 July 2015

Richard L. Zijdeman From HISCO to HISCAM

Page 2: Using HISCO and HISCAM to code and analyze occupations

Getting started

Before we start, let’s first setup our working environment

rm(list = ls()) # ReMove objects in memorysetwd("~/Dropbox/Historical Demography -

Reconstructing Life Cource Dynamics/Day 5 Occupational coding systemsand R Studio descriptives and logisticregression/hisco2hiscam/")

With the rm() (remove) function we remove objects from memory.Objects like datasets that still remain in memory from a previoussession.

Richard L. Zijdeman From HISCO to HISCAM

Page 3: Using HISCO and HISCAM to code and analyze occupations

Data work flow

It is good practice to organize all of your files for a project (e.g. apaper) in a specific folder. Here we set our working directory withsetwd() to a particular folder. One of the advantages of working likeis efficient collaboration. After a colleague has set his workingdirectory to the folder you shared with him, all links inside thatfolder will be the same as yours and thus not require any tweakingof directories or file names.

Richard L. Zijdeman From HISCO to HISCAM

Page 4: Using HISCO and HISCAM to code and analyze occupations

Reading in the data

OK, now let’s read in the data that we supposedly coded. Actuallythese data are from the Historical Sample of the Netherlands andcan only be used for this summerschool. You’re free to use the HSNdata (and I would recommend it), but you’d need to sign a licenseagreement stating that you’ll manage the data in a proper way.

There many functions (commands) to read in the data. A commonone is for reading in .csv files. Each function comes with multiplearguments that you can set, e.g. whether your file has columnnames (referred to as a ‘header’). Here are some of the obviousarguments for read.csv()

Richard L. Zijdeman From HISCO to HISCAM

Page 5: Using HISCO and HISCAM to code and analyze occupations

The read.csv() function

read.csv()

file: your file, including directoryheader: variable names or not?sep: seperator

read.csv default: “,”read.csv2 default: “;”

skip: number of rows to skipnrows: total number of rows to readstringsAsFactorsencoding (e.g. “latin1” or “UTF-8”)

Richard L. Zijdeman From HISCO to HISCAM

Page 6: Using HISCO and HISCAM to code and analyze occupations

Ok, so now let’s read in the data for our training purposes:

df <- read.csv("./data/source/sample_data.csv",stringsAsFactors = FALSE,encoding = "latin1",nrows = 1000)

Richard L. Zijdeman From HISCO to HISCAM

Page 7: Using HISCO and HISCAM to code and analyze occupations

HISCAM Universal scale - male only

hcamU2 <- read.table("http://www.camsis.stir.ac.uk/hiscam/v1_3_1/hiscam_u2.dat",

sep = "\t",header = TRUE,stringsAsFactors = FALSE)

NOTE: you cannot ‘break’ the filepath like that, but I needed to doit so you could see the url

So now you should have two dataframes: df, which is ouroccupational data with HISCO and hcamU2, which is the universalHISCAM scale for men

Richard L. Zijdeman From HISCO to HISCAM

Page 8: Using HISCO and HISCAM to code and analyze occupations

Merging the data

We now need to merge these two dataframes. There should be atleast 1 variable that both dataframes have in common. Thatdoesn’t mean they need to have the same name in both datasets.But even if they do (like in our case), I like specifying the name, soI’m sure what is being merged.

df.h <- merge(df, hcamU2,by.x = "hisco", by.y = "hisco",all.x = TRUE)

Richard L. Zijdeman From HISCO to HISCAM

Page 9: Using HISCO and HISCAM to code and analyze occupations

So with this command, we’re saying take 2 files, df and hcamU2and merge them by a variable, which is called “hisco” in the first (x)dataframe and “hisco” in the second (y) dataframe.

Now, you can imagine, that you have occupations without a HISCOcode, or that perhaps there’s only a small number of occupations inyour file and not every HISCAM from the hcamU2 file finds a matchwith your occupational data. To make sure, you can preserve allyour data, even if there was no match for it, you specify all =TRUE. Here, I specify, all.x which only preserved the non-matchesfrom my ‘df’ dataframe.

Richard L. Zijdeman From HISCO to HISCAM

Page 10: Using HISCO and HISCAM to code and analyze occupations

Now if we look at the df.h dataframe (the one that is the result ofour merge) with summary(), we see that the new variable HISCAMwas added:

summary(df.h)

Richard L. Zijdeman From HISCO to HISCAM

Page 11: Using HISCO and HISCAM to code and analyze occupations

Final comments

I’m sure Ben now has provided you already more info on R(Studio)and you’ll feel a bit more comfortable. Plunging into the deep likethis (learning how to merge in R, without getting to know Rproperly) is defintely not ideal, but actually you came a long wayduring class.

If you’d like to practice, you could try and download more of theHISCAM files and see how they relate. E.g. you could plot the earlyvs. the late period, or just look at correlations between the HISCAMvalues for different scales.

Good luck with the remainder of the course and your researchprojects afterwards.

Best wishes,

Richard

Richard L. Zijdeman From HISCO to HISCAM