# Introduction

A student in one of my classes had a bunch of latitude/longitude data from the US and wanted to map them to a US Census tract and its corresponding code (FIPS code).

We can first download the Census tracts for each state from the US Census website under the “Census Tracts” dropdown menu. Then, after unzipping the file, we can read the shape file using st_read from the sf package.

library(sf)
library(dplyr)
census_tracts <- st_read("cb_2018_42_tract_500k.shp", quiet = TRUE)

Then, we can generate some fake lat/long data for Pennsylvania.

n <- 1e5
set.seed(7477263)
county_bbox <- st_bbox(census_tracts)
# randomly sample longitude and latitude from PA's bounding box
latlong_dat <- data.frame(id = seq_len(n),
x = rnorm(n),
Longitude = runif(n, min = county_bbox["xmin"],
max = county_bbox["xmax"]),
Latitude = runif(n, min = county_bbox["ymin"],
max = county_bbox["ymax"]))

# Original Solution

The original solution was to use call_geolocator_latlong to get the block FIPS using an online API, and then convert that into Census tract by taking the first 11 digits. However, the API takes a while to respond, so this solution isn’t scalable for thousands of data points.

library(tigris)
system.time({
census_block <- tigris::call_geolocator_latlon(lat = latlong_dat$Latitude, lon = latlong_dat$Longitude)
})
##    user  system elapsed
##   0.059   0.004   0.602
(census_tract <- substr(census_block, start = 1, stop = 11))
##  "42061950500"

# Improved Solution

We will use the st_intersects function from the sf library and the county-level sf shapefiles to find in which county a given lat/long pair is contained. To do that, we convert the latitude and longitude data into an sf object. Note that if you have NA values for the latitude or longitude, you’ll need to filter them out.

latlong_sf <- latlong_dat %>%
filter(!is.na(Latitude), !is.na(Longitude)) %>%
st_as_sf(coords = c("Longitude", "Latitude"), crs = st_crs(census_tracts))

Then, we use the st_intersects function to find the counties which intersect with the points. This returns a list, where the ith element is the row number of census_tracts which contains the ith lat/long pair.

We might get a message like:

although coordinates are longitude/latitude, st_intersects assumes that they are planar

I believe this can be ignored unless you’re working with data near the polar regions.

system.time({
intersected <- st_intersects(latlong_sf, census_tracts)
})
##    user  system elapsed
##   1.062   0.008   1.070

We can see that this runs super quickly—it took the same time to process all 10K lat/long pairs with st_intersects as it did to process 1 pair with call_geolocator_latlon.

We can then add the FIPS as a column to our latitude/longitude data. We can use the intersection to subset the GEOID column of census_tracts, which is the FIPS code. If the intersection was NA, then the FIPS code will be an empty string (note FIPS codes are stored as strings because sometimes they have a leading zero).

latlong_final <- latlong_sf %>%
mutate(intersection = as.integer(intersected),
fips = if_else(is.na(intersection), "",
census_tracts\$GEOID[intersection]))
head(latlong_final)
## Simple feature collection with 6 features and 4 fields
## geometry type:  POINT
## dimension:      XY
## bbox:           xmin: -80.48101 ymin: 39.73635 xmax: -75.50156 ymax: 41.99468
##   id          x                   geometry intersection        fips
## 1  1 -2.1932479 POINT (-77.89334 40.52967)         2495 42061950500
## 2  2 -2.0290433  POINT (-78.81608 41.8673)         2895 42083421000
## 3  3  0.3157029 POINT (-75.50156 39.73635)           NA
## 4  4 -1.0549416 POINT (-80.48101 40.59035)         3154 42007602900
## 5  5  0.9947392 POINT (-78.55342 41.54438)         1774 42047950100
## 6  6  1.2926765 POINT (-79.69076 41.99468)         2609 42049011802

To check our solution, we can plot all the Census tracts in Philadelphia (county code 42101) and display the points which showed up in a particular Census tract.

census_tracts %>%
filter(startsWith(GEOID, "42101")) %>%
select(geometry) %>%
plot()
latlong_final %>%
filter(fips == "42101980300") %>%
select(geometry) %>%
slice_sample(n = 10) %>%
plot(add = TRUE, reset = FALSE, pch = 16, col = "red") 