o v e r v i e w u s c e n s u s d a ta : a nu s c e n s u s d a ta : a n o v e r v i e w analyzing...
TRANSCRIPT
DataCamp Analyzing US Census Data in R
US Census data: anoverview
ANALYZING US CENSUS DATA IN R
Kyle WalkerInstructor
DataCamp Analyzing US Census Data in R
Course overview
What you'll learn:
How to acquire US Census data with the tidycensus R package
How to wrangle US Census data with tidyverse tools
How to use the R tigris package to acquire US Census Bureau boundary data
How to visualize and map US Census Bureau data in R with ggplot2
DataCamp Analyzing US Census Data in R
About your instructor
Fields: spatial demography & spatial data science
R developer: tidycensus, tigris, & idbr packages
DataCamp Analyzing US Census Data in R
The US Census Bureau API
To get started using US Census data in R,
Example key: "rw6pozt48ur2ugc8kg69x5phdrtnuhb2cb1subd6"
sign up for a Census API keylibrary(tidycensus)
census_api_key("YOUR KEY GOES HERE", install = TRUE)
DataCamp Analyzing US Census Data in R
Using decennial Census data with tidycensusstate_pop <- get_decennial(geography = "state",
variables = "P001001")
head(state_pop)
# A tibble: 6 x 4
GEOID NAME variable value
<chr> <chr> <chr> <dbl>
1 01 Alabama P001001 4779736
2 02 Alaska P001001 710231
3 04 Arizona P001001 6392017
4 05 Arkansas P001001 2915918
5 06 California P001001 37253956
6 08 Colorado P001001 5029196
DataCamp Analyzing US Census Data in R
Using ACS data with tidycensusstate_income <- get_acs(geography = "state",
variables = "B19013_001")
head(state_income)
# A tibble: 6 x 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01 Alabama B19013_001 44758 314
2 02 Alaska B19013_001 74444 809
3 04 Arizona B19013_001 51340 231
4 05 Arkansas B19013_001 42336 234
5 06 California B19013_001 63783 188
6 08 Colorado B19013_001 62520 287
DataCamp Analyzing US Census Data in R
Basic tidycensusfunctionality
ANALYZING US CENSUS DATA IN R
Kyle WalkerInstructor
DataCamp Analyzing US Census Data in R
Geography in tidycensus
Legal entities: geography = "county"
Statistical entities: geography = "tract"
Available geographies
DataCamp Analyzing US Census Data in R
Geography and variables in tidycensuscounty_income <- get_acs(geography = "county",
variables = "B19013_001")
county_income
# A tibble: 3,220 x 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 01001 Autauga County, Alabama B19013_001 53099 2631
2 01003 Baldwin County, Alabama B19013_001 51365 991
3 01005 Barbour County, Alabama B19013_001 33956 2655
4 01007 Bibb County, Alabama B19013_001 39776 3306
5 01009 Blount County, Alabama B19013_001 46212 2443
6 01011 Bullock County, Alabama B19013_001 29335 5435
7 01013 Butler County, Alabama B19013_001 34315 2904
8 01015 Calhoun County, Alabama B19013_001 41954 1381
9 01017 Chambers County, Alabama B19013_001 36027 1870
10 01019 Cherokee County, Alabama B19013_001 38925 2598
# ... with 3,210 more rows
DataCamp Analyzing US Census Data in R
Geographic subsets in tidycensustexas_income <- get_acs(geography = "county",
variables = c(hhincome = "B19013_001"),
state = "TX")
texas_income
# A tibble: 254 x 5
GEOID NAME variable estimate moe
<chr> <chr> <chr> <dbl> <dbl>
1 48001 Anderson County, Texas hhincome 42146 2539
2 48003 Andrews County, Texas hhincome 70121 7053
3 48005 Angelina County, Texas hhincome 44185 2107
4 48007 Aransas County, Texas hhincome 44851 4261
5 48009 Archer County, Texas hhincome 62407 5368
6 48011 Armstrong County, Texas hhincome 65000 9415
7 48013 Atascosa County, Texas hhincome 53181 4114
8 48015 Austin County, Texas hhincome 56681 4903
9 48017 Bailey County, Texas hhincome 40589 8438
10 48019 Bandera County, Texas hhincome 55434 4503
# ... with 244 more rows
DataCamp Analyzing US Census Data in R
Wide data with tidycensusget_acs(geography = "county",
variables = c(hhincome = "B19013_001",
medage = "B01002_001"),
state = "TX",
output = "wide")
# A tibble: 254 x 6
GEOID NAME hhincomeE hhincomeM medageE medageM
<chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 48001 Anderson County, Texas 42146 2539 38.9 0.5
2 48003 Andrews County, Texas 70121 7053 31.2 0.8
3 48005 Angelina County, Texas 44185 2107 36.7 0.3
4 48007 Aransas County, Texas 44851 4261 50.7 1.1
5 48009 Archer County, Texas 62407 5368 44.1 0.7
6 48011 Armstrong County, Texas 65000 9415 45.9 2.8
7 48013 Atascosa County, Texas 53181 4114 35.4 0.2
8 48015 Austin County, Texas 56681 4903 40.8 0.4
9 48017 Bailey County, Texas 40589 8438 34.4 1.1
10 48019 Bandera County, Texas 55434 4503 51.3 0.9
# ... with 244 more rows
DataCamp Analyzing US Census Data in R
Searching for data withtidycensus
ANALYZING US CENSUS DATA IN R
Kyle WalkerInstructor
DataCamp Analyzing US Census Data in R
Searching for Census variables
To find Census variable IDs, use:
Online resources like
Built-in variable searching in tidycensus
Census Reporter
DataCamp Analyzing US Census Data in R
Choosing a dataset to searchv16 <- load_variables(year = 2016,
dataset = "acs5",
cache = TRUE)
v16
# A tibble: 22,815 x 3
name label concept
<chr> <chr> <chr>
1 B00001_001 Estimate!!Total UNWEIGHTED...
2 B00002_001 Estimate!!Total UNWEIGHTED...
3 B01001_001 Estimate!!Total SEX BY AGE
4 B01001_002 Estimate!!Total!!Male SEX BY AGE
5 B01001_003 Estimate!!Total!!Male!!Under 5 years SEX BY AGE
6 B01001_004 Estimate!!Total!!Male!!5 to 9 years SEX BY AGE
7 B01001_005 Estimate!!Total!!Male!!10 to 14 years SEX BY AGE
8 B01001_006 Estimate!!Total!!Male!!15 to 17 years SEX BY AGE
9 B01001_007 Estimate!!Total!!Male!!18 and 19 years SEX BY AGE
10 B01001_008 Estimate!!Total!!Male!!20 years SEX BY AGE
# ... with 22,805 more rows
DataCamp Analyzing US Census Data in R
Filtering a variables datasetlibrary(tidyverse)
B19001 <- filter(v16, str_detect(name, "B19001"))
B19001
# A tibble: 170 x 3
name label concept
<chr> <chr> <chr>
1 B19001_001E Estimate!!Total HOUSEHOLD INCOME…
2 B19001_002E ...Less than $10,000 HOUSEHOLD INCOME…
3 B19001_003E ...$10,000 to $14,999 HOUSEHOLD INCOME…
4 B19001_004E ...$15,000 to $19,999 HOUSEHOLD INCOME…
5 B19001_005E ...$20,000 to $24,999 HOUSEHOLD INCOME…
6 B19001_006E ...$25,000 to $29,999 HOUSEHOLD INCOME…
7 B19001_007E ...$30,000 to $34,999 HOUSEHOLD INCOME…
8 B19001_008E ...$35,000 to $39,999 HOUSEHOLD INCOME…
9 B19001_009E ...$40,000 to $44,999 HOUSEHOLD INCOME…
10 B19001_010E ...$45,000 to $49,999 HOUSEHOLD INCOME…
# ... with 160 more rows
DataCamp Analyzing US Census Data in R
ACS variable structure
Anatomy of an ACS variable B19001_002E:
B: refers to base table. Other prefixes: C, DP, S.
19001: the table ID
002: the variable code within the table
E: refers to estimate.
optional in tidycensus functions, which return both E and M for each variable.
DataCamp Analyzing US Census Data in R
Visualizing Census datawith ggplot2
ANALYZING US CENSUS DATA IN R
Kyle WalkerInstructor
DataCamp Analyzing US Census Data in R
Example: plotting income by statelibrary(tidycensus)
library(tidyverse)
ne_income <- get_acs(geography = "state",
variables = "B19013_001",
survey = "acs1",
state = c("ME", "NH", "VT", "MA",
"RI", "CT", "NY"))
ggplot(ne_income, aes(x = estimate, y = NAME)) +
geom_point()
DataCamp Analyzing US Census Data in R
Customizing ggplot2 graphics of ACS dataggplot(ne_income,
aes(x = estimate,
y = reorder(NAME, estimate))) +
geom_point(color = "navy", size = 4) +
scale_x_continuous(labels = scales::dollar) +
theme_minimal(base_size = 14) +
labs(x = "2016 ACS estimate",
y = "",
title = "Median household income by state")