Processing FIA Data

Nan C. Pond · 2019/05/01 · 6 minute read

Why we love FIA data

The USFS Forest Inventory and Analysis program provides a wealth of information for forestry and ecological research. Lots has been written about the history, status, and structure of this program; for those looking for more background we’d recommend starting on the USFS FIA website.

The key features of this data that make it especially useful for SilviaTerra’s biometrics team include: 1) FIA plots represent a design-unbiased sample of the forest conditions at county, state, regional, and national scales. 2) The dataset is made public in a consistent and standardized process, at no cost to users. 3) The data include multiple measurements at the same location over time. 4) Many variables are recorded for each plot, making possible a wide range of analyses.

#The challenges for a new user While the FIA dataset is extremely robust and useful, it’s also a formidable challenge for a new user to dig into. A familiarity with the data collection procedure as well as the metadata of each available dataset goes a long way- for the purposes of this walkthrough, we’ll define the fields we are using, but highly encourage new users to read through the FIA database documentation and sampling protocol before beginning new analyses.

#What we cover in this blog post This blog post covers some of the basics of interacting with the FIA data that are available through the FIA datamart [https://apps.fs.usda.gov/fia/datamart/datamart.html]

We will: 1) define a function to download FIA data for a given state 2) filter that data to a reduced dataset 3) join together multiple FIA tables to create a single larger dataframe 4) perform some basic summaries and visualizations

Firstly, we define a function to download FIA data for a specified state. Within this function, we will download three different FIA tables: the PLOT, TREE, and SURVEY tables.

## [1] "downloading PLOT table"
## [1] "downloading TREE table"
## [1] "downloading SURVEY table"

Having downloaded and unzipped the PLOT, TREE, and SURVEY files, we now want to read in the data and do some additional filtering.

We’re first going to use the SURVEY table to identify plots that were measured under the annual inventory system.

fiaSurvey <- read.csv(files$surveyFile) %>%
  filter(ANN_INVENTORY == 'Y') %>%
  mutate(CN = as.character(CN))

Next we’ll read in the PLOT file, and filter it. We build in some redundancy in our filtration among multiple tables, to ensure we’re getting only the data we want. Filtering the PLOT table will focus on again identifying plots that were measured or re-measured in the annual survey system. We also want to identify plots that were forested.

fiaPlots <- read.csv(files$plotFile) %>%
  filter(KINDCD == 1 | KINDCD == 2) %>% #first or re-measurement in annual survey system
  filter(PLOT_STATUS_CD == 1) %>% #accessible forest measured
  mutate(CN = as.character(CN),
         SRV_CN = as.character(SRV_CN)) %>%
  filter(SRV_CN %in% fiaSurvey$CN) %>% # confirms that these are coded as annual inventory
  filter(INVYR != 9999) %>%
  filter(MEASYEAR > 1998)

Finally we’ll pull in the TREE file. This file is quite large and this process can take 20-40 seconds depending on which state you’re working with. You can speed it up by using the read_csv function in place of read.csv. Just be prepared that read_csv is rather verbose!

fiaTrees <- read.csv(files$treeFile) %>%
  mutate(PLT_CN = as.character(PLT_CN)) %>%
  filter(PLT_CN %in% fiaPlots$CN) %>%
  group_by(PLT_CN) %>%
  mutate(numSubplots = length(unique(SUBP))) %>%
  ungroup() %>%
  filter(numSubplots == 4) #this ensures we're working with only plots where
                           #measurements were made on all 4 subplots

plotFinalFilt <- fiaPlots %>%
  filter(CN %in% fiaTrees$PLT_CN)

Now we’ll do some data manipulation to summarize re-measured data.

Having pulled in and done preliminary filtering on these data, what we want to do is get the start and end conditions for each revisited plot.

treeCNStats <- fiaTrees %>%
    filter(DIA >= 5) %>% # keep only trees 5"+
    filter(STATUSCD != 0) %>% #drop trees that are not part of the sample
    filter(!is.na(TPA_UNADJ)) %>%
    left_join(., plotFinalFilt %>%
                mutate(PREV_PLT_CN = as.character(PREV_PLT_CN)) %>%
                select(CN, PREV_PLT_CN, MEASYEAR, LAT, LON, ELEV, DESIGNCD,
                       MACRO_BREAKPOINT_DIA),
            by = c('PLT_CN' = 'CN'))

The PLOTS table provides us with both a CN value - the current number for a plot - and its PREV_PLT_CN. This lets us match ‘backwards’ in time to identify the measurements made previously on the same plot. We will do this by defining a vector we’ll call remeasCNs. remeasCNs is defined as the set of unique PLT_CN values for which the tree data from the PREV_PLT_CN - the previous measurements made on this plot - are also available.

startPlotCNs <- unique(treeCNStats$PREV_PLT_CN[!is.na(treeCNStats$PREV_PLT_CN)])
startPlotCNs <- startPlotCNs[startPlotCNs %in% treeCNStats$PLT_CN]

remeasCNs <- unique(treeCNStats %>%
                      filter(PREV_PLT_CN %in% startPlotCNs) %>%
                      filter(!is.na(PREV_PLT_CN)) %>%
                      .$PLT_CN)


startPlotInfo <- treeCNStats %>%
  filter(PLT_CN %in% startPlotCNs) %>%
  select(PLT_CN, CN, STATUSCD, SPCD, SPGRPCD, DIA, MEASYEAR, TPA_UNADJ,
         MACRO_BREAKPOINT_DIA) %>%
  rename(TREE_CN = CN)

remeasPlotInfo <- treeCNStats %>%
  filter(PLT_CN %in% remeasCNs) %>%
  select(PLT_CN, PREV_PLT_CN, CN, PREV_TRE_CN, STATUSCD, SPCD, SPGRPCD, DIA, MEASYEAR,
         TPA_UNADJ, MACRO_BREAKPOINT_DIA) %>%
  rename(secondTreeStatus = STATUSCD,
         secondDIA = DIA,
         secondTPA = TPA_UNADJ)

Next we’re going to summarize basal area and trees per acre for each plot CN.

allCNs <- unique(fiaTrees$PLT_CN)

calculate_inches_to_ba_ft2 <- function(diam_inches) {
   pi * (diam_inches ^ 2) / (4 * 144)
}

 #for each CN:
cnStats <- fiaTrees %>%
   mutate(PLT_CN = as.character(PLT_CN)) %>%
   mutate(BA = calculate_inches_to_ba_ft2(DIA) * TPA_UNADJ) %>%
     filter(STATUSCD == 1) %>% # keep only live trees
     filter(DIA >= 5) %>% # keep only trees 5"+
     filter(!is.na(TPA_UNADJ)) %>%
     group_by(PLT_CN) %>%
     summarize(BAPA = sum(BA), TPA = sum(TPA_UNADJ)) %>%
  left_join(., plotFinalFilt %>% mutate(PREV_PLT_CN = as.character(PREV_PLT_CN)) %>% select(CN, PREV_PLT_CN, INVYR, MEASYEAR, LAT, LON, ELEV),
            by = c('PLT_CN' = 'CN'))

 secondCondition <- cnStats %>%
   filter(PREV_PLT_CN %in% allCNs) %>%
   rename(remeasBAPA = BAPA,
          remeasTPA = TPA,
          remeasYR = MEASYEAR) %>%
   mutate(CNsecondmeas = as.character(PLT_CN)) %>%
   select(-PLT_CN, -INVYR, -LAT, -LON, -ELEV) %>%
   mutate(PREV_PLT_CN = as.character(PREV_PLT_CN))

 remeas <- cnStats %>%
   left_join(., secondCondition, by = c('PLT_CN' = 'PREV_PLT_CN')) %>%
   mutate(TimeInterval = remeasYR - MEASYEAR) %>%
   filter(TimeInterval > 0) %>%
   mutate(BAPA = BAPA,
          remeasBAPA = remeasBAPA,
          TPA = TPA,
          remeasTPA = remeasTPA) %>%
   mutate(deltaBA = (remeasBAPA - BAPA) / TimeInterval) %>% #annualized change in BAPA
   filter(!is.na(TimeInterval)) %>%
   filter(!is.na(BAPA)) %>%
   filter(!is.na(remeasBAPA))

finalSummary <- remeas %>%
  select(PLT_CN, BAPA, TPA, INVYR, MEASYEAR, LAT, LON, ELEV, CNsecondmeas, TimeInterval,
         deltaBA,
         remeasBAPA, remeasTPA) %>%
  mutate(deltaTPA = (remeasTPA - TPA) / TimeInterval)

For this initial summary, we are not distinguishing ingrowth and mortality - both of which can be pulled out to more explicitly track changes in trees per acre and basal area.

We now have the starting point for further analysis of change on these plots. Annualized change, differences in remeasurement period, and overall change between remeasurement periods can be easily reviewed graphically.

###
#Figures
###

ggplot(aes(x = BAPA, y = deltaBA), data = finalSummary) +
    theme_bw() + geom_point(alpha = 0.1, color = "#30363E")

ggplot(aes(x = BAPA, y = remeasBAPA), data = finalSummary) +
  theme_bw() + geom_point(alpha = 0.1, color = "#30363E") +
  stat_smooth() + geom_abline() +
  facet_wrap(~TimeInterval)
## `geom_smooth()` using method = 'gam' and formula 'y ~ s(x, bs = "cs")'
## Warning: Computation failed in `stat_smooth()`:
## x has insufficient unique values to support 10 knots: reduce k.

## Warning: Computation failed in `stat_smooth()`:
## x has insufficient unique values to support 10 knots: reduce k.

This summary frame is a great jumping-off point for subsequent analyses - including graphical presentations, modeling, and statistical tests of difference. The same approaches to summary and filtration applied above can also be modified to explore differences by species and diameter classes, or to aggregate plots by forest type, age, or other conditions of interest. Happy hunting!