Run the following codes to import packages necessary in this project.
library(sf) # for working with spatial data
library(tigris) # for getting US census data
library(ggplot2) # for plotting maps and charts
library(mapsf) # for plotting maps
library(dplyr) # for doing join
Wildfires occur frequently around the United States. According
to the data from National Interagency Fire
Center (NIFC), in just last year (2022), there were 40,000 of
wildfire events across the country, as shown in the following map.
Each red point in the map represents a single wildfire event in 2022. As we can see, the west coast of the United States experiences a higher frequency of wildfires. This is due to multiple reasons including the climate, vegetation kind, topography, and more. Apart from the extensive damage wildfires inflict on the ecosystem, the particles and gases they emit also raise concerns about public health.
In order to find out the impact of wildfires on public health, I analyzed the correlation between wildfire frequency and deaths caused by resiratory diseases in California. The method and result are as follows
The wildfire data is from NIFC. It contains all the wildfire events in the past 20 years in the United States. After downloading the data from the website, read the data.
wildfire_all = read.csv('../data/lab1/wildfire.csv') # read the downloaded wildfire data
The key attributes of the data are as follows.
Attribute Name | Meaning | Class | Example |
---|---|---|---|
X | Longitude | numeric | -118.18071 |
Y | Latitude | numeric | 22.80898 |
FireDiscoveryDateTime | Time When the Fire is Discovered | character | 2020/02/28 20:45:40+00 |
The respiratory death rate data is from the Institute for Health Metrics and Evaluation (IHME). Through its data gateway, we can get the data of a specific year. In this study, to get rid of the impact of Covid 19, I chose to use the data of 2018. Also, after downloading the data from the website, read the data.
resp18_all = read.csv('../data/lab1/2018_RESP.csv') # read the downloaded respiratory death rate data
The key attributes of the data are as follows
Attribute | Meaning | Class | Example |
---|---|---|---|
fips | FIP code of the place (state + county) | integer | 6115 |
race_id | race group (1 for all races) | integer | 1 |
age_group_id | age group (22 for all ages) | integer | 22 |
val | respiratory death rate | numeric | 4.899945e-06 |
The California counties boundary data can be downloaded using
function tigris::counties()
. Set the coordinate reference
system to 4326 for future intersection with wildfire event points.
counties_CA = counties(state='CA') # get counties data of CA (California)
counties_CA = st_transform(counties_CA, crs=4326) # set the coordinate reference system
The key attributes of the data are as follows.
Attribute | Meaning | Class | Example |
---|---|---|---|
STATEFP | FIP code of the state | character | 06 |
COUNTYFG | FIP code of the county | character | 091 |
Then, we need to integrate these data into a counties polygon data with wildfire and death rate data. For wildfire data, we firstly need to convert it into simple feature and then we can do intersection with counties data to get wildfire event counts in each polygon. Here I chose wildfire events during 2016-2018 instead of just 2018 because I don’t think impacts of wildfire on public health is short-term. Therefore I chose 3 years of data.
# get wildfire in the California bounding box and during year 2016-2018
wildfire_selected = subset(wildfire_all, Y>32.5 & Y<42.1 & X>(-124.5) & X<(-114.1) & substring(FireDiscoveryDateTime,3,4) %in% c('16','17','18'))
# convert the data into simple feature and set the coordinate reference system
wildfire_selected= st_as_sf(wildfire_selected, coords=c('X','Y'),crs=4326)
# do intersect and get wildfire counts in each polygon
counties_CA$wildfire_count=lengths(st_intersects(counties_CA,wildfire_selected))
For respiratory death data, we can firstly choose data from California and then join it to the counties data by FIP code. Noting previous attribute tables that the FIP codes in counties data are different in format from those in death rate data. So we need to do some processing.
# trim the data: choose data from CA and get only useful attributes
resp18_CA = subset(resp18_all, substring(fips,1,1)=='6' & race_name=='Total' & age_group_id==22)
# get FIP same in format to those of the death rate data
counties_CA$FIP = as.integer(paste0(substring(counties_CA$STATEFP,2),counties_CA$COUNTYFP))
# do join
counties_result = left_join(counties_CA,resp18_CA,by=c('FIP'='fips'))
# make the number of death rate not too small to be shown on future plot
counties_result$vale4 = counties_result$val*10e4
Now we have the data we want. We can now plot using
mapsf
package.
# set projection to 3309 to get the correct scale
cal2plot=st_transform(counties_result,crs=3309)
# set the position of the title
mf_theme('default',pos='center')
# plot death rate
mf_map(x=cal2plot,type='choro',var='vale4', # set map type to choropleth map
breaks='jenks',nbreaks=5, # set classification strategy
leg_title='Respiratory Death per 10,000 people', leg_title_cex=.7, leg_val_cex=.6, leg_val_rnd=0, leg_pos='topright') # legend
# plot wildfire counts
mf_map(x=cal2plot,type='prop',var='wildfire_count', # set map type
inches=0.22, # set icon size
leg_title='Wildfire Count', leg_title_cex=.7, leg_val_cex=.6, leg_pos='right') # legend
# set layout
mf_layout(title='Wildfires and Respiratory Deaths in California', # title
credits=paste0('Sources: NIFC and IHME \nCartographer: Xiuyu Cao\nDate: Oct 21, 2023\n','mapsf ',packageVersion('mapsf')), # credits
frame=F) # no frame
As shown in the map, generally, there is a correlation between wildfire occurrences and respiratory deaths. The more frequent wildfire happens, the more death by respiratory diseases will occur. Therefore, urgent action is required, whether in the form of wildfire mitigation efforts on the West Coast or an enhancement of public healthcare measures, to effectively combat respiratory diseases and deaths.
Although there is a positive correlation between wildfire frequency and respiratory death rate in most counties of California, there is an abnormally high frequency of wildfire and low respiratory deaths in Los Angeles. As is shown in the following chart, the wildfire counts in LA is far more than in other places.
# plot wildfire frequency bar chart in CA
counties_result %>%
ggplot(aes(x=NAME,y=wildfire_count))+ # set X and Y values
geom_bar(stat='identity',fill='red')+ # set stat to show original count and fill color
labs(title='Wildfire Frequencies in Different Counties in CA, 2016-2018', # set title
x='County',y='Wildfire Frequency')+ # set X Y axes labels
theme(axis.text.x = element_text(angle=90)) # set the rotation of X text to 90 degrees
The combination of abnormally high frequency of wildfire and low respiratory deaths in Los Angeles may be caused by the following reasons.
For example, our method of assessing wildfire severity relies on counts. Nonetheless, other factors should also be taken into account, such as duration, size, etc. As we can see from the wildfire perimeter data from NIFC, the sizes of wildfires in LA are much smaller than those in other regions.
Also, maybe Los Angeles is just more advanced in wildfire
detection and recoding so that it has more wild fire counts than other
places.
For future studies, we can enhance our results by using a more rigorous and scientific variables to assess the wildfire severity in a region. Possibilities include:
The data used in this project is available here
in the folder lab1/
.