This vignette is designed to:
(1) show how gradient surface metrics are calculated globally and locally using geodiv.
(2) show the potential relationships among metrics.
Example 1, “Simple workflow with Landsat NDVI,” demonstrates the first objective by applying geodiv functions globally and locally to a small region in southwestern Oregon state, USA. Example 2, “Applying all surface metrics across Oregon, USA,” addresses the second objective. The second example applies metrics across a larger region and runs an example analysis demonstrating ways in which metrics may be correlated and grouped.
geodiv is an R package that provides methods for calculating gradient surface metrics for continuous analysis of landscape features. There are a couple of ways to download and install the geodiv R package. You can install the released version of geodiv from CRAN with:
install.packages("geodiv")
And the development version from GitHub with:
install.packages("devtools")
::install_github("bioXgeo/geodiv") devtools
Note that Mac OS users may need to install the development tools here to get the package to install:
https://cran.r-project.org/bin/macosx/tools/
To begin, let’s load the necessary packages for the examples that follow.
library(geodiv)
library(raster)
library(rasterVis)
library(mapdata)
library(maptools)
library(rgeos)
library(ggplot2)
library(tidyverse)
library(parallel)
library(sf)
library(rasterVis)
library(ggmap)
library(corrplot)
library(gridExtra)
library(cowplot)
library(factoextra)
library(cluster)
library(rfigshare)
The National Aeronautics and Space Administration (NASA) has been collecting Earth observing images with Landsat for decades. Normalized Difference Vegetation Index (NDVI) is a measure of vegetation or greenness that can be quantified from Landsat images by measuring the difference between near-infrared and red light, which vegetation strongly reflects and absorbs, respectively. For this first example, we generated an NDVI image over a small region in southwestern Oregon using Google Earth Engine. This image is available as a raster layer available as an R data object entitled ‘orforest’ when geodiv is installed. Google Earth Engine already has a number of tutorials and sample code and functions for this step that can be accessed on its site: earthengine.google.com. There are also several ways in which to access satellite data from R as well. Given that the focus of this tutorial is on the use of the geodiv R package, we provide these data already generated in Google Earth Engine.
Let’s begin by loading the example data, which is a raster layer called ‘orforest’.
# Load the orforest data into your active session.
data(orforest)
# check out the properties of the orforest raster layer.
orforest#> class : RasterLayer
#> dimensions : 371, 371, 137641 (nrow, ncol, ncell)
#> resolution : 0.0002694946, 0.0002694946 (x, y)
#> extent : -123, -122.9, 43.00002, 43.1 (xmin, xmax, ymin, ymax)
#> crs : +proj=longlat +datum=WGS84 +no_defs
#> source : memory
#> names : summer_ndvi_p45_r30_2000_2016_30m
#> values : 0.09595944, 0.8357643 (min, max)
First, we plot the data without any trends removed.
# Plot orforest without the trend removed.
<- colorRampPalette(c('lightyellow1', 'darkgreen'))(100)
eviCols <- rasterVis::rasterTheme(region = eviCols)
eviTheme <- rasterVis::levelplot(orforest, margin = F,
(orig_ndvi par.settings = eviTheme, xlab = 'Longitude',
ylab = 'Latitude', main='orforest original'))
Use the ‘remove_plane’ function of geodiv to remove the best fit polynomial plane from the image. This function searches polynomials of orders 0 - 3 to determine which is has the lowest error relative to the surface values. To fit a plane with a user-specified polynomial order, you may use the function ‘fitplane.’
# Plot orforest with the trend removed.
<- remove_plane(orforest)
orfor_rem #> [1] "Order of polynomial that minimizes errors: 0"
<- rasterVis::levelplot(orfor_rem, margin = F,
(orig_rem_plot par.settings = eviTheme,
xlab = 'Longitude', ylab = 'Latitude',
main='orforest trend removed'))
You can also plot the raw satellite image (as opposed to the NDVI derived from Landsat imagery). To do this, create a Google Maps API key and copy to the indicated location.
# Plot the actual satellite image.
<- coordinates(orforest)
coords <- mean(coords[, 1], na.rm = TRUE)
lon <- mean(coords[, 2], na.rm = TRUE)
lat <- NA # ADD AN API KEY HERE api_key
Once an API key has been added, run the following code to get the satellite image next to the original NDVI image.
# Register a Google API key.
register_google(api_key)
# Get a satellite image at the specified longitude and latitude.
<- get_map(c(lon, lat), zoom = 13, maptype = 'satellite',
img source = "google")
# Create a figure with the satellite image using ggplot.
<- ggmap(img) + coord_fixed(1.7) + xlab('Longitude') +
satimg ylab('Latitude') +
theme(axis.title.x = element_text(size = 12),
axis.title.y = element_text(size = 12),
plot.margin = unit(c(0, 0, 0, 0), "cm"))
# Plot the satellite image next to the original NDVI image.
plot_grid(satimg, orig_ndvi, ncol = 2, nrow = 1, scale = c(0.84, 1))
The simplest use of the geodiv package is to apply metrics globally over an entire image. This returns a measurement of the overall heterogeneity of the image. The ‘sa’ function calculates the average roughness of a surface as the absolute deviation of surface heights from the mean surface height. The ‘sbi’ function calculates the surface bearing index, which is the ratio of the root mean square roughness (Sq) to height at 5% of bearing area (z05). The ‘std’ function calculates texture direction metrics (i.e., the angle of dominating texture and the texture direction of the Fourier spectrum image calculated from the orforest image).
# Calculate global metrics over the entire orforest image.
<- sa(orforest)) # average roughness
(sa #> [1] 0.04466675
<- sbi(orforest)) # surface bearing index
(sbi #> [1] 0.08557302
<- std(orforest, create_plot = FALSE, option = 1))
(std #> [1] 90
Another common use case is to look at texture images where a spatial function has been applied locally over focal windows across a landscape. This functionality is provided through both the ‘focal_metrics’ and ‘texture_image’ functions.
The ‘texture_image’ function applies metrics in either square or circular windows over an image. This function tends to be faster than ‘focal_metrics,’ but uses more memory.
The ‘focal_metrics’ function is modified from the ‘window_lsm’ function in landscapemetrics (Hesselbarth, et al. 2019). Windows must be rectangular or square, and the ‘metrics’ argument is a list of functions. The output of this function is a list of rasters. This function is slower than ‘texture_image,’ but uses less memory.
Note that this step can be somewhat time intensive. For users that would like to examine the output, but don’t want to run this step, the code to import the resulting rasters is included below.
# Texture image creation using 'focal_metrics' function.
<- matrix(1, nrow = 7, ncol = 7)
window system.time(
<- focal_metrics(orforest, window,
output_rasters metrics = list('sa', 'sbi'),
progress = TRUE)
)
print(output_rasters)
# Texture image creation using 'texture_image' function.
<- c('sa', 'sbi', 'std')
metric_list system.time(output_rasters2 <- lapply(metric_list, FUN = function(m) {
texture_image(orforest, window_type = 'square', size = 7,
in_meters = FALSE, metric = m,
parallel = TRUE, nclumps = 100)}))
# Download rasters from figshare. The list of data returned here will be used
# throughout the vignette.
<- fs_download(12834896, urls_only = TRUE, mine = FALSE, session = NULL,
url version = 4)
#> No encoding supplied: defaulting to UTF-8.
# Set tempfile for rasters
<- function(url) {
get_raster <- tempfile()
tf download.file(url, destfile = tf, mode = 'wb')
<- raster(tf)
outrast return(outrast)
}
<- list()
output_rasters2 1]] <- get_raster(url[[5]])
output_rasters2[[2]] <- get_raster(url[[6]])
output_rasters2[[3]] <- get_raster(url[[7]]) output_rasters2[[
Plot the texture image rasters.
# Create list of plots.
<- c('Sa', 'Sbi', 'Std')
names <- unlist(output_rasters2)
rast_list <- lapply(seq(1, 3), FUN = function(i) {
plts ::levelplot(rast_list[[i]], margin = F, par.settings = eviTheme,
rasterVisylab = NULL, xlab = NULL, main = names[i])
})
# Arrange plots in the list into a grid.
grid.arrange(grobs = plts, nrow = 2)
By assessing heterogeneity using a variety of metrics, researchers can gain a more complete picture of heterogeneity than they would with a single metric (Dahlin, 2016). However, many metrics are related and not all are informative for every situation. To demonstrate the utility of geodiv for producing and evaluating the utility of multiple metrics, in Example 2 we apply all surface metric functions to images across Oregon, USA and examine the patterns of, and relationships among, metrics. We calculate metrics for both elevation data from the Shuttle Radar Topography Mission (SRTM) and a commonly-used measure of canopy greenness, Enhanced Vegetation Index (EVI), from NASA’s Moderate Resolution Imaging Spectroradiometer (MODIS). We then examine the correlations among metrics along a transect crossing the state, and determine how the metrics cluster using two methods, hierarchical clustering and Principal Components Analysis (PCA). This analysis demonstrates the relationships among metrics, with potential for determining how metrics group and behave with various input data.
Elevation and EVI data that are available to use along with this vignette were prepared in Google Earth Engine (Gorelick et al., 2017) and analyzed in R. Void-filled SRTM data (Farr et al., 2007) and quality-filtered, maximum growing season, MODIS EVI data (Didan, 2015) were downloaded in Fall 2019. The code chunk below downloads these data into the current R session from figshare. R may be used to access satellite data, but we provide previously-prepared data for the purpose of this analysis.
# Download data from figshare.
<- get_raster(url[[4]])
elev <- get_raster(url[[3]]) * 0.0001 evi
Aggregate both datasets to ~2km resolution for comparison between datasets and to reduce computational time.
<- aggregate(elev, fact = 8)
elev <- aggregate(evi, fact = 8) evi
Begin by masking any values that are outside of the boundaries for the state of Oregon.
<- maps::map(database = 'state', regions = 'oregon',
state fill = TRUE, plot = FALSE)
<- map2SpatialPolygons(state, IDs = state$names,
statePoly proj4string = CRS(proj4string(evi)))
<- mask(x = evi, mask = statePoly)
evi_masked <- mask(x = elev, mask = statePoly) elev_masked
Generate plots to get a sense for the spatial patterns in the data.
# plot maximum growing season EVI for Oregon
::levelplot(evi_masked, margin = F, par.settings = eviTheme,
rasterVisylab = NULL, xlab = NULL,
main = 'Maximum Growing Season EVI')
# plot elevation (in meters) for Oregon
<- colorRampPalette(c('grey7', 'grey93'))(100)
elevCols <- rasterVis::rasterTheme(region = elevCols)
elevTheme ::levelplot(elev_masked, margin = F, par.settings = elevTheme,
rasterVisylab = NULL, xlab = NULL, main = 'Elevation (m)')
Remove any trends in the data with the ‘remove_plane’ function and take a look at the plots of the images with the trends removed.
<- remove_plane(evi_masked)
evi_masked #> [1] "Order of polynomial that minimizes errors: 1"
<- remove_plane(elev_masked) # there was no trend
elev_masked #> [1] "Order of polynomial that minimizes errors: 0"
# plot again to see what the new raster looks like
::levelplot(evi_masked, margin = F, par.settings = eviTheme,
rasterVisylab = NULL, xlab = NULL, main = 'EVI without Trend')
::levelplot(elev_masked, margin = F, par.settings = elevTheme,
rasterVisylab = NULL, xlab = NULL, main = 'Elevation without Trend')
Below we generate a texture image for the state of Oregon using the ‘sa’ metric for elevation. Note that the following step may take some time. We provide the output dataframe files along with this vignette for all metrics included in geodiv for elevation and EVI over ~30km x 30km square moving windows and scaled following calculation for the subsequent analyses in this vignette.
# Calculate elevation sa texture image for state of Oregon
system.time(outrast <- texture_image(elev_masked, window_type = 'square', size = 7,
in_meters = FALSE, metric = 'sa', parallel = TRUE,
ncores = 4, nclumps = 20))
The code below illustrates how to convert the raster generated by the ‘texture_image’ function (i.e., outrast) to a dataframe for subsequent analyses.
# get raster values
<- outrast[]
vals
# Convert output raster of sa metrics (outrast) to a dataframe for
# easier use in subsequent analyses
<- coordinates(outrast)
coords <- data.frame(x = coords[, 1], y = coords[, 2], v = vals) sa_data_elev
Generating these data frames can take a while, so we have provided .csv files for all gradient surface metrics calculated for elevation and EVI in case you find them useful for working with this vignette. The below code reads in these provided .csv files by downloading data from figshare.
# The list of figshare files was completed above, so grab the appropriate files
# for the csv's of all texture image outputs for Oregon.
<- tempfile()
tf download.file(url[[2]], destfile = tf, mode = 'wb')
<- read.csv(tf, stringsAsFactors = FALSE)
data_evi unlink(tf)
<- tempfile()
tf download.file(url[[1]], destfile = tf, mode = 'wb')
<- read.csv(tf, stringsAsFactors = FALSE)
data_elev unlink(tf)
Distributions of a few elevation variables:
for (i in c(9, 10, 18, 6)) {
hist(data_elev[, i], breaks = 30, xlab = names(data_elev)[i], main = '')
}
The code below visualizes metrics over the entire state in order to capture different aspects of landscape heterogeneity. Individual metrics primarily distinguish mountainous versus flat terrain, and managed versus more natural areas; however, some metrics are difficult to interpret, or do not show very much variation over the region. The difficulty of interpreting metrics is a known complicating factor for their use. Others have addressed this issue and linked metrics with known ecosystem features or patch metrics (McGarigal et al., 2009; Kedron et al., 2018).
# New names for plots
<- data.frame(old = names(data_elev)[3:ncol(data_elev)],
plt_names new = c('Shw', 'Srw', 'Srwi', 'Std', 'Stdi', 'S10z',
'Sa', 'Sbi', 'Sci', 'Sdc 50-55%', 'Sdc 80-85%',
'Sdc 0-5%', 'Sdq6', 'Sdr', 'Sds', 'Sfd', 'Sk',
'Sku', 'Smean', 'Sph', 'Spk', 'Sq', 'Ssc',
'Ssk', 'Sv', 'Svi', 'Svk'))
<- function(df, r, theme) {
create_maps <- list()
maps_list for (i in seq(3, ncol(df))) {
<- setValues(r, df[, i])
temp is.na(r)] <- NA
temp[<- as.character(plt_names$new[plt_names$old == names(df)[i]])
goodname - 2]] <- rasterVis::levelplot(temp, margin = F,
maps_list[[i par.settings = theme,
ylab = NULL, xlab = NULL,
main = goodname)
- 2]]$par.settings$layout.heights[
maps_list[[i c( 'bottom.padding',
'top.padding',
'key.sub.padding',
'axis.xlab.padding',
'key.axis.padding',
'main.key.padding') ] <- 1
- 2]]$aspect.fill <- TRUE
maps_list[[i names(maps_list)[i - 2] <- goodname
}return(maps_list)
}
# Create plots of all possible surface gradient metrics that geodiv calculates
# for elevation and EVI.
<- create_maps(data_elev, elev_masked, elevTheme)
elev_maps <- create_maps(data_evi, evi_masked, eviTheme)
evi_maps
# Make sure that order of maps is the same for both EVI and Elevation.
<- match(plt_names$new, names(evi_maps)) # get order according to names table
new_order <- evi_maps[new_order]
evi_maps
# Create map panels (3 each for EVI and elevation).
for (l in list(elev_maps, evi_maps)) {
grid.arrange(grobs = l[1:12], nrow = 4, ncol = 3) # 850x800
grid.arrange(grobs = l[13:24], nrow = 4, ncol = 3) # 850x800
grid.arrange(grobs = l[25:27], nrow = 4, ncol = 3) # 850x800
}
In the code below, we examine an example of local correlation and clustering among the surface gradient metrics by extracting values over a horizontal transect across central Oregon.
First we convert the raw elevation and EVI data from the NASA’s SRTM and MODIS mission, respectively, to a dataframe and add those raw values to the dataframes for EVI and elevation containing the gradient surface metrics we’ve calculated across the state of Oregon.
# Convert the rasters to dataframe format and add value to dataframe with
# metric values.
<- function(r, df) {
sp_df <- as.data.frame(as(r, "SpatialPixelsDataFrame"))
pixdf $value <- pixdf[, 1]
dfreturn(df)
}
<- sp_df(elev, data_elev)
data_elev <- sp_df(evi, data_evi) data_evi
Now we extract the data along a latitudinal transect going across the state of Oregon.
# Create new dataframe of values along a latitudinal transect.
<- function(r, df) {
get_transect # Crop raster to center transect (+/- 7 pixels North or South).
<- round(nrow(r) / 2)
center_row <- crop(r, extent(r, center_row - 7, center_row + 7, 1, ncol(r)))
r_crop
# Get 8th latitudinal coordinate (center latitude) from the cropped raster.
<- unique(coordinates(r_crop)[, 2])[8]
central_y
# Get the closest latitude in the dataframe to the central raster coordinate.
<- unique(df$y[near(df$y, central_y, 0.01)])[1]
central_y
# Extract mean EVI and elevation values by raster column.
<- colMeans(as.matrix(r_crop), na.rm = TRUE)
r_means
# Now limit the dataframe to the central row across the transect.
<- df[df$y == central_y,]
transect_vals
# Add column means to dataframe.
$value <- r_means
transect_vals
return(transect_vals)
}
<- get_transect(elev, data_elev)
transect_elev <- get_transect(evi, data_evi) transect_evi
The code below places standardizes all metrics by placing them on the same scale from 0 - 1.
# Get all metrics on same scale (0-1).
<- function(df) {
scale_mets for (i in 3:ncol(df)) {
<- (df[, i] - min(df[, i], na.rm = TRUE)) /
df[,i] max(df[, i], na.rm = TRUE) - min(df[, i], na.rm = TRUE))
(
}return(df)
}
<- scale_mets(transect_elev)
transect_elev <- scale_mets(transect_evi) transect_evi
For the transect analysis, we will perform hierarchical clustering on metric values using the function ‘eclust’ in the package factoextra (Kassambara & Mundt, 2020). First, some additional data wrangling is required to prepare the data for the clustering analysis below.
# Remove NA values from the metric columns.
<- function(df) {
rm_nas for (i in 3:ncol(df)) {
<- df[!is.na(df[, i]),]
df
}return(df)
}
<- rm_nas(transect_elev)
transect_elev <- rm_nas(transect_evi) transect_evi
The code below performs the clustering analysis on the surface gradient metrics. We first determine the optimal number of clusters by examining the gap statistic, and then plot the clustered variables to see the relationships among them.
### Plot optimal number of clusters
<- function(df) {
plot_gap # enhanced k-means clustering
<- clusGap(t(df)[3:(ncol(df) - 1), ], stats::kmeans, K.max = 10,
res.km B = 100, nstart = 25)
# gap statistic plot
::fviz_gap_stat(res.km)
factoextra
}
plot_gap(transect_evi)
plot_gap(transect_elev)
### Dendrogram and scatterplot of clusters
<- function(df, nclust) {
get_clusters # Enhanced hierarchical clustering using optimal # of clusters.
<- factoextra::eclust(t(df)[3:(ncol(df) - 1),],
res.hc "hclust", k = nclust)
return(res.hc)
}
<- function(res.hc, nclust){
plot_dendrogram # Plot colors
<- c('lightgoldenrod1', 'lightblue', 'grey', 'lightsteelblue4')
plt_cols
# Dendrogram plot
fviz_dend(res.hc, rect = FALSE, k_colors = plt_cols[1:nclust],
lwd = 1, label_cols = 'black', cex = 0.8, main = "", ylab = "",
type = 'rectangle', horiz = TRUE, labels_track_height = 14) +
theme(axis.text.y = element_blank(), axis.ticks = element_blank())
}
<- function(res.hc) {
plot_scatter # Scatterplot
fviz_cluster(res.hc)
}
<- get_clusters(transect_elev, nclust = 4)
res.hc_elev <- get_clusters(transect_evi, nclust = 3)
res.hc_evi
plot_dendrogram(res.hc_elev, nclust = 4)
plot_dendrogram(res.hc_evi, nclust = 3)
plot_scatter(res.hc_elev)
plot_scatter(res.hc_evi)
Now we generate plots that show the EVI and elevation surface gradient metrics along the Oregon state transect.
First, the data have to be gathered into a longer format for this visualization.
# Create gathered (long) version of dataframe for the clustering analysis.
<- function(df) {
gather_data <- df %>% tidyr::gather(key = 'var', value = 'value',
df names(df[, seq(3, ncol(df))]))
# Order variables.
<- df[order(df$var),]
df
return(df)
}
<- gather_data(transect_elev)
gathered_elev <- gather_data(transect_evi) gathered_evi
Now we can plot the metrics along the transect, labeling the cluster.
# Plot metrics along transect, with cluster labeled.
<- function(df, res.hc, varname) {
plot_transect_mets # Map colors to cluster or variable names.
<- c("1" = "lightgoldenrod1", "2" = "lightblue", "3" = "grey",
col_map "4" = "lightsteelblue4", "EVI" = "white", "Elev" = "white")
# Create a dataframe to match variable names with cluster number.
<- data.frame(var = res.hc$labels, clust = res.hc$cluster)
clust_df <- clust_df[order(clust_df$clust),]
clust_df
# Convert var to character.
$var <- as.character(clust_df$var)
clust_df
# Join cluster number with main dataframe to get cluster labels for plotting.
<- left_join(df, clust_df, by = 'var')
df
# Anything not labeled with a cluster (i.e., the actual value) gets labeled.
$clust[is.na(df$clust)] <- varname
df
# Change 'value' label to actual variable name.
$var[df$var == 'value'] <- varname
df
# Convert cluster names to factors and match with colors.
$clust <- as.factor(df$clust)
df$var <- factor(df$var, levels = c(clust_df$var, varname))
df<- col_map[names(col_map) %in% df$clust]
cols_to_use
ggplot(df, aes(x = x, y = value)) +
geom_rect(aes(xmin = -Inf, xmax = Inf, ymin = -Inf, ymax = Inf,
fill = clust)) +
geom_line(lwd = 0.7) +
xlab('Longitude') +
facet_grid(var~., switch = 'y') +
scale_fill_manual(values = cols_to_use, name = 'Cluster') +
theme_bw() +
theme(axis.title.y = element_blank(),
axis.text.y = element_blank(),
strip.text.y.left = element_text(face = 'bold', size = 11, angle = 0),
legend.position = 'none',
axis.title.x = element_text(face = 'bold', size = 11))
}
plot_transect_mets(gathered_elev, res.hc_elev, "Elev")
plot_transect_mets(gathered_evi, res.hc_evi, "EVI")
Overall trends along the transect were similar among metrics, with more variation at smaller intervals. Using hierarchical clustering, we found four clusters of metrics for elevation, and three for EVI. The metrics fell into different combinations based on the variable considered (elevation or EVI). For example, Sdq6 and S10z grouped together for both variables, but Std and Srw were in the same group for elevation, and different groups for EVI.
Next, we will determine the statewide elevation and EVI variance explained by metrics using PCA. First, we need to get the data ready for the PCA. In the code below, we remove several variables due to their large number of NA values, caused either by windows containing too few values, or windows lacking ‘peaks’ or ‘valleys’ (pixels surrounded by lower or higher values, respectively). After cleaning the data, 21 metrics remain in the analysis.
# Get data ready for PCA by removing NA values.
<- function(df) {
clean_data # Remove columns with very large numbers of NAs.
<- sapply(df, function(x) sum(is.na(x)))
NAs <- which(NAs >= 20000)
rm_cols <- df[, -rm_cols]
df # Remove NAs from remaining columns.
<- na.omit(df)
df return(df)
}
<- clean_data(data_elev)
data_elev_noNA <- clean_data(data_evi) data_evi_noNA
In the code below, the PCA is performed with the remaining metrics using the ‘prcomp’ function in the stats package.
# Calculate the principal components.
<- prcomp(data_elev_noNA[,3:22], center = TRUE, scale = TRUE)
elev_prc <- prcomp(data_evi_noNA[,3:22], center = TRUE, scale = TRUE)
evi_prc summary(elev_prc)
#> Importance of components:
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> Standard deviation 2.3592 1.7964 1.5836 1.28897 1.12823 1.02005 0.99978
#> Proportion of Variance 0.2783 0.1613 0.1254 0.08307 0.06365 0.05203 0.04998
#> Cumulative Proportion 0.2783 0.4396 0.5650 0.64810 0.71174 0.76377 0.81374
#> PC8 PC9 PC10 PC11 PC12 PC13 PC14
#> Standard deviation 0.91966 0.79134 0.72898 0.60892 0.58858 0.52619 0.50061
#> Proportion of Variance 0.04229 0.03131 0.02657 0.01854 0.01732 0.01384 0.01253
#> Cumulative Proportion 0.85603 0.88734 0.91391 0.93245 0.94978 0.96362 0.97615
#> PC15 PC16 PC17 PC18 PC19 PC20
#> Standard deviation 0.42315 0.34368 0.29564 0.27348 0.11043 0.07380
#> Proportion of Variance 0.00895 0.00591 0.00437 0.00374 0.00061 0.00027
#> Cumulative Proportion 0.98510 0.99101 0.99538 0.99912 0.99973 1.00000
summary(evi_prc)
#> Importance of components:
#> PC1 PC2 PC3 PC4 PC5 PC6 PC7
#> Standard deviation 2.5246 1.9870 1.37645 1.25486 1.09009 0.99925 0.96783
#> Proportion of Variance 0.3187 0.1974 0.09473 0.07873 0.05941 0.04992 0.04683
#> Cumulative Proportion 0.3187 0.5161 0.61081 0.68955 0.74896 0.79889 0.84572
#> PC8 PC9 PC10 PC11 PC12 PC13 PC14
#> Standard deviation 0.82448 0.79386 0.70303 0.53013 0.50317 0.47567 0.40850
#> Proportion of Variance 0.03399 0.03151 0.02471 0.01405 0.01266 0.01131 0.00834
#> Cumulative Proportion 0.87971 0.91122 0.93593 0.94998 0.96264 0.97396 0.98230
#> PC15 PC16 PC17 PC18 PC19 PC20
#> Standard deviation 0.38320 0.30381 0.2531 0.1547 0.14751 0.07128
#> Proportion of Variance 0.00734 0.00461 0.0032 0.0012 0.00109 0.00025
#> Cumulative Proportion 0.98964 0.99426 0.9975 0.9987 0.99975 1.00000
Now let’s look at some diagnostic plots for the principal components. Scree plots indicate the importance of the principal components with a broken stick criterion. The point at which the scree plot curve crosses the broken stick model distribution, which we will plot in red, is considered to indicate the maximum number of components to retain.
# Take a look at the components using a screeplot.
<- function(pc_dat) {
plot_scree screeplot(pc_dat, type = "l", npcs = 15,
main = "Screeplot of the first 10 PCs")
abline(h = 1, col = "red", lty = 5)
legend("topright", legend = c("Eigenvalue = 1"),
col = c("red"), lty = 5, cex = 0.6)
}
plot_scree(elev_prc)
plot_scree(evi_prc)
We can also take a look at the components for elevation and EVI to see how much variance the surface metrics explain with cumulative variance plots.
# Look at how much variance is explained using a cumulative variance plot.
<- function(pc_dat) {
plot_cvar # Get cumulative variance explained.
<- summary(pc_dat)$importance[3, ][1:16]
cumpro
# Create plot of cumulative variance, marking the 5th component as the cutoff.
plot(cumpro, xlab = "PC #", ylab = "Amount of explained variance",
main = "Cumulative variance plot")
abline(v = 5, col = "blue", lty = 5)
abline(h = cumpro[5], col = "blue", lty = 5)
legend("topleft", legend = c("Cut-off @ PC5"),
col = c("blue"), lty = 5, cex = 0.6)
}
plot_cvar(elev_prc)
plot_cvar(evi_prc)
For both elevation and EVI, the first 5 principal components explained >70% of the variation. Now, let’s plot the first components for elevation to examine their relationships.
# Create scatterplots to look at relationships among principal components.
plot(elev_prc$x[, 1], elev_prc$x[, 2], xlab = "PC1 (27.83%)",
ylab = "PC2 (16.13%)", main = "PC1 / PC2 - plot")
plot(elev_prc$x[, 1], elev_prc$x[, 3], xlab = "PC1 (27.83%)",
ylab = "PC3 (12.54%)", main = "PC1 / PC3 - plot")
plot(elev_prc$x[, 2], elev_prc$x[, 3], xlab = "PC2 (16.13%)",
ylab = "PC3 (12.54%)", main = "PC2 / PC3 - plot")
In the code below, we map the components to see if there are any spatial patterns readily identifiable.
# Map components across state.
<- function(pc_dat, noNA_df, full_df, r, theme) {
map_comps # Add pc values to no-NA dataframe.
for (i in 1:5) {
<- paste0('prc', i)
colname <- pc_dat$x[, i]
noNA_df[, colname]
}
# Add PCA results back to full raster dataframe.
<- full_df %>% left_join(noNA_df)
full_dat # Cut to only the prc columns.
<- full_dat[, grep('prc', names(full_dat))]
full_dat
# Create rasters and maps with principle component values.
<- list()
out_maps for (i in 1:5) {
<- setValues(r, full_dat[, i])
new_rast <- rasterVis::levelplot(new_rast, margin = F,
pc_map par.settings = theme,
ylab = NULL, xlab = NULL,
main = paste0('PC', i))
$par.settings$layout.heights[c( 'bottom.padding',
pc_map'top.padding',
'key.sub.padding',
'axis.xlab.padding',
'key.axis.padding',
'main.key.padding') ] <- 1
$aspect.fill <- TRUE
pc_map<- pc_map
out_maps[[i]]
}
# Plot in a grid.
grid.arrange(grobs = out_maps, nrow = 2, ncol = 3)
}
map_comps(elev_prc, data_elev_noNA, data_elev, elev, elevTheme)
#> Joining, by = c("x", "y", "std", "s10z", "sa", "sbi", "sci", "sdc05055", "sdc08085", "sdc0005", "sdq6", "sdr", "sds", "sk", "sku", "sph", "spk", "sq", "ssk", "sv", "svi", "svk", "value")
map_comps(evi_prc, data_evi_noNA, data_evi, evi, eviTheme)
#> Joining, by = c("x", "y", "s10z", "sa", "sbi", "sci", "sdc05055", "sdc08085", "sdc0005", "sdq6", "sdr", "sds", "sk", "sku", "sph", "spk", "sq", "ssk", "std", "sv", "svi", "svk", "value")
What are the principal component loadings for elevation?
# Plot principal component loadings.
<- function(pc_dat) {
plot_loadings # Get rotation for top 5 components.
<- pc_dat$rotation[, 1:5]
loadings
# Figure out the relative loadings.
<- abs(loadings)
aload <- sweep(aload, 2, colSums(aload), "/")
rel
# Convert relative loadings to dataframe.
<- as.data.frame(rel)
rel # Get good variable names (from dataframe created earlier).
$var <- plt_names$new[match(rownames(rel), plt_names$old)]
rel
# Create importance plots.
<- list()
imp_plts for (i in 1:5) {
<- rel
temp # Determine whether component loading is postive or negative.
$sign <- factor(sapply(loadings[, i], FUN = function(x) x / abs(x)),
templevels = c(-1, 1))
# Order loadings by value.
<- temp[order(temp[, i]),]
temp
$var <- factor(temp$var, levels = temp$var)
temp
<- ggplot(temp, aes(x = temp[, i], y = var)) +
temp_plt geom_point(size = 3, aes(pch = sign)) +
scale_shape_manual(name = element_blank(),
breaks = c(1, -1),
values = c(19, 8),
labels = c("Positive", "Negative")) +
xlab(paste0('PC', i)) +
ylab('Metric') +
theme_bw() +
theme(panel.grid.minor = element_blank(),
legend.justification = c(1, 0),
legend.position = c(1, 0),
legend.background = element_blank(),
legend.text = element_text(size = 12),
axis.title = element_text(size = 12))
<- temp_plt
imp_plts[[i]]
}
# Return grid of first three components.
grid.arrange(grobs = imp_plts[1:3], ncol = 3)
}
plot_loadings(elev_prc)
What are the principal component loadings for EVI?
plot_loadings(evi_prc)
Looking at the PCA components and loadings, the first component for both elevation and EVI described general surface heterogeneity, whereas the second was related to the shape of the regional value distribution. The metric groupings observed match previous findings, and the first two components represent the same groupings (roughness and distribution) found by McGarigal et al. (2009).
The contrast in metric groupings between the transect hierarchical clustering analysis and statewide PCA demonstrate that there may be differences in metric information depending on the landscape size and scale. As the PCA results were more in line with previous results in the literature, this suggests that those results show the broader metric grouping habits. The transect results demonstrate that this grouping does not always hold across different regions.
Dahlin, K.M. 2016. Spectral diversity area relationships for assessing biodiversity in a wildland–agriculture matrix. Ecological applications. 26(8):2758-2768.
Didan, K., Munoz, A.B., Solano, R., Huete, A. 2015. MODIS vegetation index user’s guide (MOD13 series). University of Arizona: Vegetation Index and Phenology Lab.
Farr, T.G., Rosen, P.A., Caro, E., Crippen, R., Duren, R., Hensley, S., Kobrick, M., Paller, M., Rodriguez, E., Roth, L. and Seal, D. 2007. The shuttle radar topography mission. Reviews of geophysics. 45(2).
Gorelick, N., Hancher, M., Dixon, M., Ilyushchenko, S., Thau, D. and Moore, R., 2017. Google Earth Engine: Planetary-scale geospatial analysis for everyone. Remote sensing of Environment. 202:18-27.
Hesselbarth, M.H.K., Sciaini, M., With, K.A., Wiegand, K., Nowosad, J. 2019. landscapemetrics: an open-source R tool to calculate landscape metrics. Ecography 42:1648-1657(ver. 0).
Kassambara, A. and Mundt, F. (2020). factoextra: Extract and Visualize the Results of Multivariate Data Analyses. R package version 1.0.7. https://CRAN.R-project.org/package=factoextra
Kedron, P.J., Frazier, A.E., Ovando-Montejo, G.A., Wang, J. 2018. Surface metrics for landscape ecology: a comparison of landscape models across ecoregions and scales. Landscape Ecology. 33(9):1489-504.
Hesselbarth, M.H.K., Sciaini, M., With, K.A., Wiegand, K., Nowosad, J. 2019. landscapemetrics: an open-source R tool to calculate landscape metrics. Ecography 42:1648-1657(ver. 0).
McGarigal, K., Tagil, S., Cushman, SA. 2009. Surface metrics: an alternative to patch metrics for the quantification of landscape structure. Landscape ecology. 24(3):433-50.