using hisco and hiscam to code and analyze occupations

From HISCO to HISCAM

Richard L. Zijdeman

6 July 2015

Richard L. Zijdeman From HISCO to HISCAM

Getting started

Before we start, let’s first setup our working environment

rm(list = ls()) # ReMove objects in memorysetwd("~/Dropbox/Historical Demography -

Reconstructing Life Cource Dynamics/Day 5 Occupational coding systemsand R Studio descriptives and logisticregression/hisco2hiscam/")

With the rm() (remove) function we remove objects from memory.Objects like datasets that still remain in memory from a previoussession.


Data work flow

It is good practice to organize all of your files for a project (e.g. apaper) in a specific folder. Here we set our working directory withsetwd() to a particular folder. One of the advantages of working likeis efficient collaboration. After a colleague has set his workingdirectory to the folder you shared with him, all links inside thatfolder will be the same as yours and thus not require any tweakingof directories or file names.


Reading in the data

OK, now let’s read in the data that we supposedly coded. Actuallythese data are from the Historical Sample of the Netherlands andcan only be used for this summerschool. You’re free to use the HSNdata (and I would recommend it), but you’d need to sign a licenseagreement stating that you’ll manage the data in a proper way.

There many functions (commands) to read in the data. A commonone is for reading in .csv files. Each function comes with multiplearguments that you can set, e.g. whether your file has columnnames (referred to as a ‘header’). Here are some of the obviousarguments for read.csv()


The read.csv() function

read.csv()

file: your file, including directoryheader: variable names or not?sep: seperator

read.csv default: “,”read.csv2 default: “;”

skip: number of rows to skipnrows: total number of rows to readstringsAsFactorsencoding (e.g. “latin1” or “UTF-8”)


Ok, so now let’s read in the data for our training purposes:

df <- read.csv("./data/source/sample_data.csv",stringsAsFactors = FALSE,encoding = "latin1",nrows = 1000)


HISCAM Universal scale - male only

hcamU2 <- read.table("http://www.camsis.stir.ac.uk/hiscam/v1_3_1/hiscam_u2.dat",

sep = "\t",header = TRUE,stringsAsFactors = FALSE)

NOTE: you cannot ‘break’ the filepath like that, but I needed to doit so you could see the url

So now you should have two dataframes: df, which is ouroccupational data with HISCO and hcamU2, which is the universalHISCAM scale for men


Merging the data

We now need to merge these two dataframes. There should be atleast 1 variable that both dataframes have in common. Thatdoesn’t mean they need to have the same name in both datasets.But even if they do (like in our case), I like specifying the name, soI’m sure what is being merged.

df.h <- merge(df, hcamU2,by.x = "hisco", by.y = "hisco",all.x = TRUE)


So with this command, we’re saying take 2 files, df and hcamU2and merge them by a variable, which is called “hisco” in the first (x)dataframe and “hisco” in the second (y) dataframe.

Now, you can imagine, that you have occupations without a HISCOcode, or that perhaps there’s only a small number of occupations inyour file and not every HISCAM from the hcamU2 file finds a matchwith your occupational data. To make sure, you can preserve allyour data, even if there was no match for it, you specify all =TRUE. Here, I specify, all.x which only preserved the non-matchesfrom my ‘df’ dataframe.


Now if we look at the df.h dataframe (the one that is the result ofour merge) with summary(), we see that the new variable HISCAMwas added:

summary(df.h)


Final comments

I’m sure Ben now has provided you already more info on R(Studio)and you’ll feel a bit more comfortable. Plunging into the deep likethis (learning how to merge in R, without getting to know Rproperly) is defintely not ideal, but actually you came a long wayduring class.

If you’d like to practice, you could try and download more of theHISCAM files and see how they relate. E.g. you could plot the earlyvs. the late period, or just look at correlations between the HISCAMvalues for different scales.

Good luck with the remainder of the course and your researchprojects afterwards.

Best wishes,

Richard


using hisco and hiscam to code and analyze occupations

Data & Analytics