What the futureheatwaves does

Often, it is of interest in climate impact research to explore impact estimates for climate projections from many different climate models, some of which have multiple ensemble members. The scope of this data means that researchers often must spend a lot of time writing code scripts to process the data from different climate models or ensemble members. This is the case, for example, in studying the impacts of heat waves under climate change, since multi-day heat wave events must be identified and characterized for each climate model and ensemble member before impacts can be estimated. We have created this package to automate this process in a way that still allows extensive customization by the user in choices like how to define a heat wave.

The futureheatwaves package can process climate files, which are stored locally on your computer, to generate a list of the heat waves in each projection as well as characteristics of each heat wave (e.g., length, intensity, timing in the year). You can identify heat waves based either on a default definition (two or more days with temperatures at or above the community’s \(98^{th}\) percentile of year-round temperature), or you can write your own R function to use any heat wave definition of your choice, or to explore several heat wave definitions. The package identifies community-specific heat waves, based on an input file listing each community and its latitude and longitude. Data frames with the identified and characterized heat waves for each ensemble member of each climate model are output as files in a directory specified by the user.

Another function in the package allows you to apply custom functions across all the heat wave data frames that are generated. This functionality can be used to generate summary statistics (e.g., determine average heat wave length or total heat wave days) or can be used to apply more complex functions (e.g., apply epidemiological effect estimates across the heat waves to generate health impact estimates).

This package does require some preprocessing, in terms of getting climate projection data into a specific format and storing files in a specific directory structure. However, once the user has completed this preprocessing and directory set-up, the package allows the user to quickly process through data from many projections, generate heat wave data sets for each, and apply custom functions across each heat wave definition.

This package requires you to have data on community locations and climate projections set up in a specific way on your local computer. You then use arguments in the main package function, gen_hw_set, to direct the function to these files so it can process them and output heat wave projections. The final projection files will be written out to your local computer, in a directory that you select and specify through gen_hw_set. You can then apply user-created R functions to all heat waves in the output using the apply_all_models function. Other functions in the package either serve as helper functions for these two main functions or provide further features (e.g., plotting).

A basic example of using the package

Example data

We have included data files in the package to serve as example files, so users can try this package before applying it to their own directory of climate projection files. Some example files are included as comma-separated (.csv) files, rather than as saved R objects, because the gen_hw_set function requires a directory of comma-separated files as input and csv is a general file format familiar to most users. This example data therefore also allows users to see how to create the appropriate directory set-up for this package.

This example data comes from two climate models that are a part of CMIP5 (Taylor, Stouffer, and Meehl 2012): (1) the model of the Beijing Climate Center, China Meteorological Administration (BCC) (Xin, Wu, and Zhang 2013) and (2) the National Center for Atmospheric Research’s (NCAR’s) Community Climate System Model, version 4 (CCSM4) (Gent et al. 2011). We include one ensemble member from BCC (r1i1p1) and two from CCSM (r1i1p1 and r2i1p1). To ensure that the size of this example data is reasonably small, we have only included projection data for grid points from these climate models that are near five U.S. east coast cities: New York, NY; Philadelphia, PA; Newark, NJ; Baltimore, MD, and Providence, RI. Further, to keep the file sizes reasonably small, the historical projections range over the years 1990 to 1999, while the future projections are limited to 2060 to 2079. Users’ applications of this package will likely use directories with many more climate model ensemble members and more locations; however, the operation of the package is the same for this smaller example application as it would be for a much larger application.

Once thefutureheatwaves package is installed and loaded, the user can find the local location of these files using R’s system.file function:

system.file("extdata/cities.csv", package = "futureheatwaves")
## [1] "/Users/brookeanderson/Documents/CSU2016/cmip5heatwaves/futureheatwaves/inst/extdata/cities.csv"

In the later sections of this vignette, we show how to use these example files as inputs in the package functions.

Processing climate projections with the example data

To process the example climate projections directory and save the output files to a directory called example_results in the current working directory, the user can run:

# Identify location of example files
projection_dir_location <- system.file("extdata/cmip5",
                                       package = "futureheatwaves")
city_file_location <- system.file("extdata/cities.csv",
                                  package = "futureheatwaves")

# Process example files
gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                        "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv")

This code first identifies and saves as objects the path names on the user’s computer of the example climate projections directory (projection_dir_location) and the file of study locations (city_file_location). The required set-up for these inputs are described fully later in the vignette.

The gen_hw_set function processes this example input and creates a new directory, example_results, with files of identified and characterized heat waves in the user’s current working directory. In this example code, this processing is done using default values for the heat wave definition, years for which to generate the heat wave data sets, etc. Ways to customize these choices are fully explained later in the vignette.

In this call, the user must specify the directory where the results should be written (out), the location of the directory of climate projections (dataFolder), the names of the two main subdirectories of that climate projection directory, as well as their year boundaries (dataDirectories; this set-up is explained below), the file path of the city location file (citycsv) on the local computer, and the names used for the grid coordinate, climate projection, and projection date files (coordinateFilenames, tasFilenames, and timeFilenames; the set-up for these files is also explained fully later in the vignette). When gen_hw_set is run, the user is advised that the function will write files to his computer and must agree to proceed:

 Warning: This function will write new files to your computer in the 
 ~/tmp/results/ directory of your computer. If that directory already exists,
running this function will write over it. 
 Do you want to continue? (y / n): 

This warning reminds you that this function will create subdirectories and write out files to the directory you specified when defining out. If you agree that it is okay to write files and subdirectories to this directory, enter “y” at the prompt, and the function will continue running. (If you do not want to get this warning when running the function– for example, when looping and calling this function repeatedly– choose the option printWarning = FALSE when calling gen_hw_set.)

You should get output that looks something like this:

Processing thresholds for bcc1 
Reading ---> r1i1p1 
Read operation complete 
Processing projections for  bcc1 
Reading ---> r1i1p1 
Read operation complete 
Creating heatwave dataframe ~~ City:  balt  ~~ City Number:  1  ~~ Cutoff:  84.25436 
Creating heatwave dataframe ~~ City:  ny  ~~ City Number:  2  ~~ Cutoff:  78.11672 
Creating heatwave dataframe ~~ City:  nwk  ~~ City Number:  3  ~~ Cutoff:  78.11672 
Creating heatwave dataframe ~~ City:  phil  ~~ City Number:  4  ~~ Cutoff:  84.25436 
Creating heatwave dataframe ~~ City:  prov  ~~ City Number:  5  ~~ Cutoff:  74.498 
Writing bcc1: r1i1p1
Processing thresholds for ccsm 
Reading ---> r1i1p1 
Read operation complete 
Processing projections for  ccsm 
Reading ---> r1i1p1 
Read operation complete 
Creating heatwave dataframe ~~ City:  balt  ~~ City Number:  1  ~~ Cutoff:  82.92344 
Creating heatwave dataframe ~~ City:  ny  ~~ City Number:  2  ~~ Cutoff:  81.212 
Creating heatwave dataframe ~~ City:  nwk  ~~ City Number:  3  ~~ Cutoff:  81.212 
Creating heatwave dataframe ~~ City:  phil  ~~ City Number:  4  ~~ Cutoff:  81.44672 
Creating heatwave dataframe ~~ City:  prov  ~~ City Number:  5  ~~ Cutoff:  80.88872 
Writing ccsm: r1i1p1
Processing projections for  ccsm 
Reading ---> r2i1p1 
Read operation complete 
Creating heatwave dataframe ~~ City:  balt  ~~ City Number:  1  ~~ Cutoff:  82.92344 
Creating heatwave dataframe ~~ City:  ny  ~~ City Number:  2  ~~ Cutoff:  81.212 
Creating heatwave dataframe ~~ City:  nwk  ~~ City Number:  3  ~~ Cutoff:  81.212 
Creating heatwave dataframe ~~ City:  phil  ~~ City Number:  4  ~~ Cutoff:  81.44672 
Creating heatwave dataframe ~~ City:  prov  ~~ City Number:  5  ~~ Cutoff:  80.88872 
Writing ccsm: r2i2p2
Writing accumulators 
All operations completed. Exiting. 

The gen_hw_set function provides status reports on its progress in generating the heat wave projections. This helps you see that the function call is progressing, as this process can take a while if you include many cities and / or many climate model projections.

Once the function has completed running, you will have one new subdirectory in the out directory called “Heatwaves”. This directory will contain data frames with all identified and characterized heat waves for each ensemble member of each climate model in the input directory (dataFolder). The out directory will also have a file called hwModelInfo.csv that contains some basic information about the climate models and the number of ensemble members included for each as well as locationList.csv, a file that gives the closest climate model grid point to each community for each climate model.

The full output directory structure will look something like this:

Each of the heat wave files will have one heat wave per row, covering all the heat waves identified for that particular ensemble member in each study location. The data frame has the following columns with characteristics of each heat wave:

An example of one of these heat wave data frames is given in the hw_datafr data set included with the package. You can load and explore this example data frame:

data(hw_datafr)
hw_datafr[1:3, c("hw.number", "mean.temp", "length", "start.date",
                 "mean.temp.quantile", "city")]
##   hw.number mean.temp length start.date mean.temp.quantile city
## 1         1  97.21100      6 2069-07-17          0.9995019 balt
## 2         2  98.97543      7 2070-07-24          1.0000000 balt
## 3         3  98.95700      6 2071-07-14          1.0000000 balt

Required files and directory set-up

To apply gen_hw_set to your own data, rather than the example data, you will need to set up your files and directory using a certain structure, to allow the function to process it correctly.

File with community locations

First, you must input to gen_hw_set a comma-separated (.csv) file with unique identifiers for each community or location for which you wish to make projections, as well as the latitude and longitude of each. The gen_hw_set function will identify the closest grid point in each climate model to each of the locations included in this file using Euclidean distance and will generate location-specific heat waves using this grid point for the location.

The community location file should have three columns, with a row of column headers with the following column names:

  • city: A character vector with a unique identifier for each community you will be analyzing;
  • lat: A numeric vector giving each communities’ latitude; and
  • lon: A numeric vector giving each communities’ longitude.

The latitude and longitude should be in decimal degrees and should be expressed in the same way in the climate projection files and this community location file (e.g., if longitude for New York City is expressed as 286 in the climate projection files, it should not be expressed as -74 in the community location file). Further, if you are doing a US-based study and would like to later use this package’s map_grid function to map climate model grid points for each location, you should express longitude in only positive numbers (e.g., 286 longitude for New York City).

Here is an example of the correct format for the community location comma-separated file for five US communities (Baltimore, MD; New York, NY; Newark, NJ; Philadelphia, PA; and Providence RI):

"city","lat","lon"
"balt",39.3008,283.3894
"ny",40.6698,286.0562
"nwk",40.7241,285.8268
"phil",40.006817,284.8653
"prov",41.82195,288.5803

You must specify the file path of this community location file using the argument citycsv when running the main function (gen_hw_set). For example, if you had this file saved as “my_cities.csv” in your current working directory, you would specify citycsv = "my_cities.csv" in the gen_hw_set function call. You can use either absolute or relative file paths in this argument.

If your community location file has different column names for the latitude and longitude columns, you can specify this using the lat_lon_colnames argument in gen_hw_set. For example, if these columns were named “latitude” and “longitude”, you could specify lat_lon_colnames = c("latitude", "longitude") when calling gen_hw_set.

Directory structure for climate projection files

For these functions to work correctly, you must have climate projection files saved in a specific structure locally on your computer. An example of a directory of climate projection files with the required set-up is given as the cmip5 directory that comes installed with this package. To identify the location on your own computer of this directory, you can run:

system.file("extdata/cmip5", package = "futureheatwaves")
## [1] "/Users/brookeanderson/Documents/CSU2016/cmip5heatwaves/futureheatwaves/inst/extdata/cmip5"

This package was set up to work with the CMIP5 climate model output format; these projection files are split, with projections up to 2004 in a “historical” file and projections from 2006 in a different file (e.g., future forcing scenario “rcp85”), depending on the anthropogenic forcing scenario being considered. This package is therefore set up to input a directory of projection files where all projected time series for one time range is in one subdirectory and all later time series are in a second subdirectory. For the gen_hw_set function to work correctly, therefore, you must have a directory that has within it two separate subdirectories, where one subdirectory includes time series that cover one range of dates (e.g., 1980–2004) and the other subdirectory includes climate projections for the same climate model ensemble members, but for a different range of dates (e.g., 2006–2099).

You must specify in the gen_hw_set function the names of the two subdirectories as well as the date range that each subdirectory covers. For example, for a set-up with climate projection files for 1980–2004 in a subdirectory called “historical” and files for 2006–2099 in a subdirectory called “rcp85”, you would include in gen_hw_set the argument: dataDirectories = list("historical" = c(1980, 2004), "rcp85" = c(2006, 2099)). The function uses this information to figure out which directory to search in for climate projection data, based on the user’s choices of custom date ranges for certain arguments (e.g., dates for projections).

Within each ensemble member directory, there should be three comma-separated files: one with the climate model temperature output, one with time values corresponding to each row in the climate model output file, and one with latitude and longitude for grid locations corresponding to the columns of the climate model temperature output file. These three files must have the same file names within each ensemble member’s directory (e.g., if the grid location file for one ensemble member is called grid_locs.csv, then this should be the name of the grid location file for every ensemble member of every model). The package assumes that temperatures for the projection file are in degrees Kelvin unless otherwise specified by the user during processing. Other units can be specified with the input_metric argument in gen_hw_set.

The following list shows the file structure for the files in the example data included with the package. This example directory includes separate historical (with years 1990–1999) and RCP8.5 (with years 2060–2079) directories, each of which has the required climate projection, grid point locations, and projection time files for two separate models (bcc1 and ccsm) with one and two separate ensemble members, respectively, for the “rcp85” subdirectory (r1i1p1 for bcc1; r1i1p1 and r2i1p1 for ccsm).

  • cmip5
    • historical
      • bcc1
        • r1i1p1
          • tas_NorthAmerica_12mo.csv
          • latitude_longitude_NorthAmerica_12mo.csv
          • time_NorthAmerica_12mo.csv
      • ccsm
        • r1i1p1
          • tas_NorthAmerica_12mo.csv
          • latitude_longitude_NorthAmerica_12mo.csv
          • time_NorthAmerica_12mo.csv
    • rcp85
      • bcc1
        • r1i1p1
          • tas_NorthAmerica_12mo.csv
          • latitude_longitude_NorthAmerica_12mo.csv
          • time_NorthAmerica_12mo.csv
      • ccsm
        • r1i1p1
          • tas_NorthAmerica_12mo.csv
          • latitude_longitude_NorthAmerica_12mo.csv
          • time_NorthAmerica_12mo.csv
        • r2i1p1
          • tas_NorthAmerica_12mo.csv
          • latitude_longitude_NorthAmerica_12mo.csv
          • time_NorthAmerica_12mo.csv

For each climate model, the threshold temperatures to be used in heat wave definitions are always calculated from a single ensemble member. This ensemble member can be specified by the user through the threshold_ensemble argument of the gen_hw_set function; the default is “r1i1p1”, which is a reasonable choice for processing CMIP5 files. All climate models you wish to process should include this ensemble member.

The user specifies the file path of this directory of climate projections through the dataFolder option in gen_hw_set. Notice that each subdirectory in this example has all required levels of the directory structure– for example, even though the bcc1 model only has one ensemble member (in this example), we’ve still included a directory level for ensemble members in the directory structure. Also notice that the final subdirectory always includes three files (with climate projections, grid point locations, and times; these files are explained in the next section), even if multiple ensemble members of the same climate model share the same grid point locations or times. If you only want to process a subset of the climate models in the directory, you can specify this subset using the models_to_run argument in gen_hw_set.

While the package can also be used if the complete projected time series is in a single file for each ensemble member, this requires a bit of extra set-up, as the function does require you to set up your climate projection directory to have two subdirectories, whether you need two or not.

To work around this requirement if you have all climate projection files in a single subdirectory, you could copy the single subdirectory to have two copies within the dataFolder directory and use the second copy of the subdirectory as a “dummy” subdirectory. You will need to give the “real” and “dummy” subdirectories different names, and use a date range outside of the real date range for the “dummy” directory in the dataDirectories argument in gen_hw_set.

For example, if you had all climate projections in a subdirectory called “current”, which covered the years 1980–2010, you could copy this subdirectory to create a second “dummy” subdirectory in the dataFolder directory and then include the argument dataDirectories = list("current" = c(1980, 2010), "dummy" = c(2011, 2012)) when calling gen_hw_set. Since only data from the period 1980–2010 will be used in this example, the function will never try to access the “dummy” directory, but it is still required to be part of the dataFolder directory for some of the helper functions that parse directory structure and set-up processing code for gen_hw_set. You are welcome to email the maintainer of this package for more advice on getting the gen_hw_set function to work in this special case if you have problems. If we hear that many users have applications of this package like this, we may increase package functionality in the future to make this special case more straightforward to process.

Files with climate projections

For each ensemble member of each climate model, you must have three files: one giving the gridded climate model output by date, one giving the location of each of the climate model grids, and one giving the date of each of the climate model projections.

These files must be in a certain structure to run correctly through the gen_hw_set function. Therefore, the function currently requires a certain amount of preprocessing of climate projection files from a format like .netCDF to prepare them to be processed by this function. You must preprocess the file for the projection from each ensemble member to conform to the formats listed below.

Climate projection file: The climate projection file should be a comma-separated file of temperatures that looks something like this:

267.6,281.17,285.69
269.33,280.48,285.28
269.29,280.12,285.05
271.35,280.19,284.97
272.97,281.57,285.22

This is a projection file covering three grid points of a climate model, giving projections for five dates. Each column corresponds to one grid point in the climate model. Each row corresponds to one date. The file does not have a header row; rather, the observations begin immediately on the first row of the file. Similarly, the file does not have a column of row numbers.

Most projection files will include many more columns and rows than this small example, since they will usually cover all the grid points in a climate model and a long time series of observations. These projection files must not have gaps in dates (e.g., they should be year-round and not limited to the warm season) and cannot have any missing observations. Otherwise, the processing done by gen_hw_set might use non-consecutive days to test for a heat wave. There should be one and only one observation per day.

The gen_hw_set function can only process a single climate projection file for each ensemble member. Therefore, if you wish to use a composite metric, like the heat index, which combines measures of air temperature and air moisture, you must calculate this metric when setting up your climate projection files, as gen_hw_set could not input separate climate projection files for air temperature and air moisture for each ensemble member.

In our example data, the climate projection files are saved for all ensemble members with the file name “tas_NorthAmerica_12mo.csv”. You will need to specify this file name in the tasFilenames argument when running gen_hw_set, and the file name must be the same in all ensemble member subdirectories in the dataFolder directory.

Grid point location file: The grid point location file should be a comma-separated file that looks something like this:

40.464,284.06
40.464,286.88
40.464,289.69

For this file, each row gives the location for the grid point for the corresponding column in the climate projection files. For example, the first row gives the latitude (first column) and longitude (second column), in decimal degrees, for the projections given in the first column of the climate projection file, the second row of this file gives the latitude and longitude for the projection given in the second column of the climate projection file, and so on. The file should not have a header row or a separate column of row names. If you are doing a US-based study and plan to use the map_grid function to map climate grid points associated with each study location, you must use non-negative values for longitudes (i.e., with a range of 0 to 360 rather than -180 to 180) in this file.

In our example data, these grid-point location files are saved for all climate model ensemble members with the file name “latitude_longitude_NorthAmerica_12mo.csv”. You will need to specify this filename in the coordinateFilenames argument when running gen_hw_set.

Projection times data file: The projection times data file should be a comma-separated file that looks something like this:

1,1990,1,1
2,1990,1,2
3,1990,1,3
4,1990,1,4
5,1990,1,5

This file gives a date that corresponds to each row of the climate projection file. The file should have four columns: one with row numbers, and then one each with year, month, and day. Year should be given with four digits. The file should not have a header row. In our example data, these grid-point location files are always saved with the file name “time_NorthAmerica_12mo.csv”. You will need to specify this file name in the timeFilenames argument when running gen_hw_set.

Applying functions across all heat wave datasets

Once you have created a directory of files with characterized heat waves for each ensemble member, the results can be explored using the apply_all_models function. This function allows the user to apply custom R functions across all heat wave data frames created by the gen_hw_sets call. The user can apply any R function that follows certain standards in accepting input and returning output.

As an example, if the user wanted to to get the average temperature of the heat waves identified within each ensemble member, he or she could write a simple function:

average_mean_temp <- function(hw_datafr){
        out <- mean(hw_datafr$mean.temp)
        return(out)
        }

The apply_all_models function can then apply this average_mean_temp function across the heat wave data frames for all ensemble members in all climate models:

out <- system.file("extdata/example_results", package = "futureheatwaves")
apply_all_models(out = out, FUN = average_mean_temp)
##   model ensemble    value
## 1  bcc1        1 84.60418
## 2  ccsm        1 84.73236
## 3  ccsm        2 84.54713

Note that the location of the directory with the heat wave data frames must be specified using the out argument when calling apply_all_models. Typically, this will be the directory path for the directory specified with the out argument in gen_hw_set.

Location-specific results can be generated using the city_specific argument in apply_all_models:

apply_all_models(out = out, FUN = average_mean_temp, city_specific = TRUE)
##    model ensemble city    value
## 1   bcc1        1 balt 89.65682
## 2   bcc1        1  nwk 80.93607
## 3   bcc1        1   ny 80.93607
## 4   bcc1        1 phil 89.65682
## 5   bcc1        1 prov 76.80232
## 6   ccsm        1 balt 85.95983
## 7   ccsm        1  nwk 84.65961
## 8   ccsm        1   ny 84.65961
## 9   ccsm        1 phil 84.54654
## 10  ccsm        1 prov 83.73112
## 11  ccsm        2 balt 85.61720
## 12  ccsm        2  nwk 84.42461
## 13  ccsm        2   ny 84.42461
## 14  ccsm        2 phil 84.44436
## 15  ccsm        2 prov 83.84360

This output is structured as “tidy” data (Wickham 2014), allowing it to be used easily with the graphing package ggplot2 (Wickham 2009).

The apply_all_models function can also be used to project the health impacts of heat waves. As a very simplistic example, (Anderson and Bell 2009) estimated that heat waves, defined as two or more days at or above a community’s \(98^{th}\) percentile temperature, were associated with an added relative risk of 1.032 for cardiorespiratory mortality risk in 107 U.S. communities. A simple estimate of excess deaths associated with this added heat wave risk in a community can be calculated as (Peng et al. 2011):

\begin{equation} E_c = N_c * (RR - 1) * L_c \end{equation}

where:

This impact assessment equation can be translated into a function that merges each projection’s heat wave data frame with a data frame of community-specific baseline mortality rates (\(B_c\)), calculates the equation for each heat wave, and then sums up the total excess deaths across all heat waves:

excess_deaths <- function(hw_datafr, base_mortality, RR = 1.032){
        hw_datafr <- dplyr::left_join(hw_datafr, base_mortality,
                                      by = "city") %>%
                dplyr::mutate(excess_deaths = base_mort * length * RR)
        out <- sum(hw_datafr$excess_deaths)
        return(out)
}

Once defined in R, this function can be applied across all heat waves from all climate models’ ensemble members, provided that you have a data frame called base_mortality with columns with each community’s identifier (city) and baseline mortality rate (base_mort), using the call:

apply_all_models(out = out, FUN = excess_deaths, base_mortality = base_mort)

To work, the functions you apply using apply_all_models must follow a certain structure. They must input a data frame of heat waves, in the format of those output by gen_hw_set, and they must output a single-value vector (i.e., a vector of length one). They must include hw_datafr as an input. They can also include other arguments, which are passed through apply_all_models using ..., as long as none conflict with the existing argument names for apply_all_models (out, FUN, and city_specific).

We have included several functions as simple examples of the type of functions that can be used with apply_all_models:

You can see the code of any of these functions, to use as examples when developing your own, by typing just the function name (with no parentheses) in your R console.

average_mean_temp
## function(hw_datafr){
##         out <- mean(hw_datafr$mean.temp)
##         return(out)
##         }

To help users create their own functions, we have included an example data frame representative of the heat wave data frames that gen_hw_set outputs. This data can be loaded using the call:

data(hw_datafr)

The process for creating a new function to use to explore heat waves should be to:

  1. Load the example data frame with load(hw_datafr);
  2. Build a function that works with hw_datafr as an input; and
  3. Apply the function with apply_all_models to process all of the heat wave files.

Using custom options

Using a custom heat wave definition

The default heat wave definition for this package is:

A is two or more days at or above a city-specific threshold temperature, with the threshold determined as the \(98^{th}\) percentile of year-round temperature in the city during some reference period (by default, 1990–1999).

Many different definitions of heat waves exist (e.g., (Smith, Zaitchik, and Gohlke 2013)), so researchers will often want to use alternative ways to define heat waves. Researchers might want to use a specific definition, for example, because it matches the definition used by local health officials to declare heat wave warnings or, in the case of health impact assessments, to match with a definition used in an epidemiological study. Three components of the heat wave definition can be easily customized in the gen_hw_set function call, without creating a new R function to use to identify heat waves. The customization of the heat wave definition is even more extensive as one has the option of writing a custom R function.

First, the percentile used to identify a heat wave can be changed using the probThreshold option in gen_hw_set. This option can take values between 0 and 1. The default value is 0.98, or a definition with a threshold of the \(98^{th}\) percentile of the location’s temperature. For example, to identify heat waves as two or more days at or above the city’s 99th percentile of year-round temperature, the user could run:

gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                        "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv",
           probThreshold = 0.99)

Second, the user can change the number of days used in the heat wave definition using the numDays argument in the gen_hw_set function.

Third, it is possible to specify the range of years that should be used when determining this threshold. For example, if you wanted to base the threshold for each location on its current climate, you could leave this option as its default, while if you wanted to use a different set of years (i.e. a threshold relative to future projected temperatures), you could set the start and end year bounds for this reference period using the thresholdBoundaries argument in the function gen_hw_set. For example, to use temperatures in each city from 2070 to 2079 to determine threshold temperatures, you would run:

gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                        "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv",
           thresholdBoundaries = c(2070, 2079))

It is also possible to use a completely customized heat wave definition. To do this, you will need to write and load a function that implements your chosen heat wave definition. You can then reference the custom function using the IDheatwavesFunction option in gen_hw_set. To work correctly, this custom function must allow only specific inputs and generate only specific outputs.

The function must allow the following inputs (even if it does not use them within the function code):

The function must return only the input data frame (datafr), with the following columns added:

To help in developing your own heat wave identification functions, the package includes an example of the required input data frame, datafr. You can load this data frame by calling:

data(datafr)

If your heat wave function is properly set up to use in this package, it can take as input this example data frame and a threshold value and will return the original data frame with the columns hw and hw.number, as defined above, returned:

head(datafr, 3)
##             date   tmpd
## 20076 2061-01-01 30.452
## 20077 2061-01-02 30.668
## 20078 2061-01-03 35.294
id_of_hws <- IDHeatwavesR(datafr = datafr, threshold = 95, numDays = 2)
head(id_of_hws, 3)
##             date   tmpd hw hw.number
## 20076 2061-01-01 30.452  0         0
## 20077 2061-01-02 30.668  0         0
## 20078 2061-01-03 35.294  0         0

The custom function can then be passed to gen_hw_set using the IDheatwavesFunction argument. You should specify the function name in quotation marks for this argument. For example, we have included an alternative ID definition function in this package, called IDHeatwavesAlternative, which identifies heat waves as a certain number of days above the higher of either a community’s \(98^{th}\) percentile temperature or \(80^oF\). To use this as the heat wave definition, you would run:

gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                       "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv",
           IDheatwavesFunction = "IDHeatwavesAlternative")

Customizing projection dates and dates for reference temperatures

By default, the function will generate heat wave projections for the years 2070 to 2079, to align with the example data. This projection date range can be changed by specifying alternative starting and ending years in the projectionBoundaries argument of the gen_hw_set function. For example, to create projections for 2060 to 2079, the user would call:

gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                        "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv",
           projectionBoundaries = c(2060, 2079))

The heat wave data sets characterize heat wave in several ways that are based on relative temperature. These characteristics are measured by taking the absolute temperature measures of the heat wave (e.g., average temperature during the heat wave is \(90^oF\)) and comparing them to the location’s typical temperature distributions. This process generates relative measures of how intense the heat wave is compared to what is normal in that location (e.g., \(90^oF\) is in the \(99^{th}\) percentile of year-round temperatures in the location).

The default is to use temperatures for the period 2070 to 2079 for these reference temperatures. However, the user can change this specification using the referenceBoundaries option of gen_hw_set. This functionality can be useful in exploring the role of adaptation in future heat waves. For example, to use temperature projections for 1990 to 1999 when calculating relative characteristics of heat waves, to explore the assumption that cities remain adapted to their present-day climate, rather than changing in adaptation as climate change increases temperatures, a user could run:

gen_hw_set(out = "example_results",
           dataFolder = projection_dir_location ,
           dataDirectories = list("historical" = c(1990, 1999),
                                        "rcp85" = c(2060, 2079)),
           citycsv = city_file_location,
           coordinateFilenames = "latitude_longitude_NorthAmerica_12mo.csv",
           tasFilenames = "tas_NorthAmerica_12mo.csv",
           timeFilenames = "time_NorthAmerica_12mo.csv",
           referenceBoundaries = c(1990, 1999))

If the date range used for these reference temperatures does not correspond to the dates used for the projection period, the reference temperatures will be pulled only from the ensemble member for each climate model specified using the threshold_ensemble argument in gen_hw_set.

For these date range specifications, there are some restrictions on which year ranges can be selected. The starting year cannot be earlier than the first year in the first subdirectory, the ending year cannot be later than the last year of the second subdirectory, and the custom date boundaries cannot span the two subdirectories.

Mapping grid points

The package also has a function that allows you to plot the locations of grid points for each climate model that correspond with study locations, as well as show the links between climate grid points and associated locations, for US-based studies. The map_grid function allows you to specify a climate model (plot_model), as well as the location of your output directory (usually, this will be the directory specified using the out argument in gen_hw_set):

out <- system.file("extdata/example_results", package = "futureheatwaves")
map_grid(plot_model = "bcc1", out = out)

In this map, the points show the locations of grid points for that climate model that matched with study locations, and therefore were used in generating heat wave data sets. The lines on the map connect each climate model grid point to the study location(s) for which that grid point was used by gen_hw_set. This function requires that you use non-negative decimal degrees to express longitude values in both the climate projection files with grid point locations and in the community location file that are input to gen_hw_set.

Because the resulting plot is a “ggplot” object, you can use ggplot2 functions to change the formatting. For example, you could change the title and theme with the following code (results not shown):

a <- map_grid(plot_model = "bcc1", out = out)
a + ggtitle("BCC1 CMIP5 model") + theme_dark()

If you want to generate several of these maps for different climate models and then plot them together, you can use the grid.arrange function from the gridExtra package (results not shown):

library(gridExtra)
a <- map_grid(plot_model = "bcc1", out = out)
b <- map_grid(plot_model = "ccsm", out = out)
grid.arrange(a, b, ncol = 1)

Extensions

The functionality of this package can be easily expanded by loops. For example, to explore the role of the heat wave definition on projections, the user could create a loop to run gen_hw_set and apply_all_models to the same directory of climate projections but with a variety of different functions used to identify the heat waves in the projections.

Also, while this package was created to be used for research on heat waves in climate change projections, with some modifications it can be used more broadly. For example, there are other episodes like wildfires and air pollution where it may be interesting to identify extended periods of high exposures in projection time series, and this package could be applied to explore these exposures. The directory of projection data would need to be set up in the same structure as for exploring heat waves, and the input_metric should be set as fahrenheit, to pass the exposure values through to the characterized data sets without performing a conversion. A user could also use this package to explore events that must be lower than some minimum threshold (e.g., cold waves), but it would take some extra coding and conversions, since the functions in this package are written to identify periods above a threshold (for example, the user could multiple all projected temperatures by -1, and then the coldest temperatures would register as being the highest).

Users doing these kinds of extensions will need to pay attention to a few points. First, some of the event characteristics (first in the calendar year, average of May–September temperatures, days above \(90^{o}F\)) might not be meaningful for studies of other types of events. Further, because event periods are usually defined as a string of multiple days exceeding some threshold, the functions in this package may miss the first and last event of the time period. For example, if the first day of the time series is the last day of an event, this function would not identify that event because it lacks data from the earlier days that allow this day to meet the event definition. This issue would lead to, at most, missing two events out of each projection, but should be considered if studying events that might occur near the start and end of projection data. If there is adequate interest from researchers, in the future we may adapt the package to make these secondary applications part of the package.

References

Anderson, G. Brooke, and Michelle L. Bell. 2009. “Weather-Related Mortality– How Heat, Cold, and Heat Waves Affect Mortality in the United States.” Epidemiology 20: 205–13.

Gent, Peter R., Gokhan Danabasoglu, Leo J. Donner, and others. 2011. “The Community Climate System Model Version 4.” Journal of Climate 24: 4973–91. doi:10.1175/2011JCLI4083.1.

Peng, Roger D., Jennifer F. Bobb, Claudia Tebaldi, and others. 2011. “Toward a Quantitative Estimate of Future Heat Wave Mortality Under Global Climate Change.” Environmental Health Perspectives 199: 701–6. doi:10.1289/ehp.1002430.

Smith, Tiffany T, Benjamin F Zaitchik, and Julia M Gohlke. 2013. “Heat Waves in the United States: Definitions, Patterns and Trends.” Climatic Change 118: 811–25. doi:10.1007/s10584-012-0659-2.

Taylor, Karl E., Ronald J. Stouffer, and Gerald A. Meehl. 2012. “An Overview of CMIP5 and the Experiment Design.” Bulletin of the American Meteorological Society 93: 485–98. doi:10.1175/BAMS-D-11-00094.1.

Wickham, Hadley. 2009. Ggplot2– Elegant Graphics for Data Analysis. Springer-Verlag New York.

———. 2014. “Tidy Data.” Journal of Statistical Software 59: 1–23.

Xin, Xiao-Ge, Tong-Wen Wu, and Jie Zhang. 2013. “Introduction of CMIP5 Experiments Carried Out with the Climate System Models of Beijing Climate Center.” Advances in Climate Change Research 4 (1): 41–49. doi:10.3724/SP.J.1248.2013.041.