A tissue microarray is an array of samples which are obtained by taking a slice of a biopsied FFPE tumor. Each individual slice is reffered to as a core. Each core is placed on a TMA and is then stained with multiple antibodies and fluorphores which illuminate when a laser is shined at them with varying wavelengths. The intensity is measured and then a random forest algorithm is used to classify the cells as being positive for a particular marker which allows us to phenotype cells. A schematic of this process is provided in Figure 1.
spatialTIME functions use a custom mif object which can be created using create_mif
. The mif object has 6 slots storing the:
create_mif
function this table must contain
We include one example of clinical and sample datasets which have a total of 229 patients with one cores. Out of those 229 samples. only 5 are included in our package.
<- spatialTIME::create_mif(clinical_data = example_clinical,
x sample_data = example_summary,
spatial_list = example_spatial,
patient_id = "deidentified_id",
sample_id = "deidentified_sample")
#prints a summary of how many patients, samples, and spatial files are present
x #> 229 patients spanning 229 samples and 5 spatial data frames were found
An individual plot for each core (each sample) is created. Plots can be assigned to an R object, such as within the empty derived
slot and printed to a PDF if a file name is provided.
When studying phenotype and individual markers, note that it is important to have the individual before the phenotype markers. This will ensure that the phenotype that are derived by multiple marers are not plotted over by the individual marker. For instance, below the the first plot appears to have no cyctotoxic T cells (CD3+ and CD8+), but then the order is changed we see the cyctotoxic T cells. Moral of the story: Put the marker combinations before the single markers.
<- c("CD3..CD8.","CD3..FOXP3.","CD3..Opal.570..Positive",
mnames_bad "CD8..Opal.520..Positive","FOXP3..Opal.620..Positive",
"PDL1..Opal.540..Positive", "PD1..Opal.650..Positive")
# Used to make the legends in both plots below be in same order and use the
# same coloring scheme for the purpose making a common legend
= viridis::turbo(length(mnames_bad))
values names(values) = mnames_bad
<- spatialTIME::plot_immunoflo(x, plot_title = "deidentified_sample", mnames = mnames_bad,
xcell_type = "Classifier.Label")
<- x[["derived"]][["spatial_plots"]][[4]] +
bad_names ::theme(legend.position = 'bottom') +
ggplot2::scale_color_manual(breaks = mnames_bad, values = values)
ggplot2#> Scale for 'colour' is already present. Adding another scale for 'colour',
#> which will replace the existing scale.
<- c("CD3..Opal.570..Positive","CD8..Opal.520..Positive",
mnames_good "FOXP3..Opal.620..Positive","PDL1..Opal.540..Positive",
"PD1..Opal.650..Positive","CD3..CD8.","CD3..FOXP3.")
<- spatialTIME::plot_immunoflo(x, plot_title = "deidentified_sample", mnames = mnames_good,
x cell_type = "Classifier.Label")
<- x[["derived"]][["spatial_plots"]][[4]] +
good_names ::theme(legend.position = 'bottom') +
ggplot2::scale_color_manual(breaks = mnames_good,
ggplot2values = values[match(mnames_good, names(values))])
#> Scale for 'colour' is already present. Adding another scale for 'colour',
#> which will replace the existing scale.
$sample %>% dplyr::filter(deidentified_sample == 'TMA3_[9,K].tif') %>%
x::select(c(2, 4:15)) %>%
dplyr::pivot_longer(cols = 2:13, names_to = 'Marker', values_to = 'Count')
tidyr#> # A tibble: 12 × 3
#> deidentified_sample Marker Count
#> <chr> <chr> <dbl>
#> 1 TMA3_[9,K].tif FOXP3 (Opal 620) Positive Cells 34
#> 2 TMA3_[9,K].tif CD3 (Opal 570) Positive Cells 536
#> 3 TMA3_[9,K].tif CD8 (Opal 520) Positive Cells 83
#> 4 TMA3_[9,K].tif PD1 (Opal 650) Positive Cells 5
#> 5 TMA3_[9,K].tif PDL1 (Opal 540) Positive Cells 1
#> 6 TMA3_[9,K].tif CD3+ FOXP3+ Cells 34
#> 7 TMA3_[9,K].tif CD3+ CD8+ Cells 68
#> 8 TMA3_[9,K].tif CD3+ CD8+ FOXP3+ Cells 4
#> 9 TMA3_[9,K].tif CD3+ PD1+ Cells 5
#> 10 TMA3_[9,K].tif CD3+ PD-L1+ Cells 1
#> 11 TMA3_[9,K].tif CD8+ PD1+ Cells 0
#> 12 TMA3_[9,K].tif CD3+ CD8+ PD-L1+ Cells 0
::ggarrange(plotlist = list(bad_names, good_names), labels = c('B', 'G'),
ggpubrcommon.legend = TRUE, legend = 'bottom')
Notice that using the “bad” (B) order that one would have the impression that there is a huge number of CD3+ and would scratch there head over why there are no CD3+ FOXP3+ or CD3+ CD8+. While the “good” (G) order can better make this distiction as well as show cells that are only positive for CD3 and not postive for FOXP3 or CD8.
Ripley’s \(K\) measures the average number of neighboring cells across each cell, that is the average (over all cells) number of cells within a specified radius of a cell. Ripley’s \(K\) is computed as follows:
\[\hat{K}(r) = \frac{1}{n}\sum_{i=1}^{n}w_{ij}{\bf 1}{(d(x_i,x_j)\le r)},\]
where \(r\) is the specified radius, \(d(x_i,x_j)\) is the distance between the \(i^{th}\) and \(j^{th}\) cell, \({\bf 1}_A\) is indicator function of event \(A\), and \(w_{ij}\) is the weights that are assigned for border corrections. The expected value of \(\hat{K}(r)\) is \(\pi r^2\), thus \(\hat{K}\) is expected to grow as a quadratic function of \(r\).
There are several edge correction, our studies has included a small number of cells and we recommend using the isotropic or translational edge correction, as opposed to the ‘border’ edge correction. The main goal of the edge correction is to account for the fact that there are unobserved points outside of the region, and the assumption is that the location of these cells has the same distribution as the study region.An excellent description of these corrections are provided here
The distribution of the nearest neighbor distances, \(\hat{G}(r)\) can be studied which is computed by
\[\hat{G}(r) = \frac{1}{n}\sum_{i=1}^{n}{\bf 1}(\min_{j}(\{d(x_i,x_j)\}\le r),\] which is interpreted as the proportion of cells whose distance to its nearest neighbor is less than \(r\). Notice that there is not a weighting factor for each pair of points as we saw above, the edge correction in these methods, reduced sample (rs) and Hanisch (han), simply have different cell inclusion condition. The reduced sample correction is similar to the border correction for count based methods, where the middle chuck of the area of interest are studed. The Hanisch border correction leaves out points whose \(k^{th}\) can not be in the area of interest. For more information about these border correction see the following [article]{https://www.routledgehandbooks.com/pdf/doi/10.1201/b16195-4}.
An underlying assumption used for many spatial clustering metrics is that the cells are randomly distributed across the region, no evidence of clustering or repulsion, and that the cell intensity is constant across the entire region. This assumption is the so-called complete spatial randomness (CSR). Damage can occur to TMAs, due to how they are collect. This damage can lead to rips and tears in the TMA which results in regions where it appears that cells cannot be locations, which is not actually the case. Due to these violations of the CSR assumption, the theoretical estimate for CSR may not be accurate, to address this the cell positivity can be permuted across all observed locations and the permutation distribution of \(K\), \(L\), \(M\), and \(G\) is a TMA specific measure of CSR.
The ripleys_k_v2
function reports a permuted and theoretical estimate of CSR, the observed value for \(K\) (method = 'K'
), \(L\) (method = L
), or Marcon \(M\) (method = 'M'
), and the full permutation distribution of \(K\), \(L\), and \(M\) of keep_perm_dis = TRUE
.
Currently, the number of permutations is 10, but this should be increased to at least 100 for a more reliable estimate of the mean.
<- spatialTIME::ripleys_k(mif = x, mnames = mnames_good, num_permutations = 10,
x edge_correction = 'translation', r = seq(0,100,10),
keep_perm_dis = FALSE, workers = 1)
#> Registered S3 method overwritten by 'spatstat.geom':
#> method from
#> print.boxx cli
# This will keeps the colors in every plot for the remainder of the vignette compatable
= viridis::turbo(length(unique(x$derived$univariate_Count$deidentified_sample)))
values names(values) = unique(x$derived$univariate_Count$deidentified_sample)
$derived$univariate_Count %>%
x::filter(Marker != 'PDL1..Opal.540..Positive') %>%
dplyr::ggplot(ggplot2::aes(x = r, y = `Degree of Clustering Permutation`)) +
ggplot2::geom_line(ggplot2::aes(color = deidentified_sample), show.legend = FALSE) +
ggplot2::facet_wrap(Marker~., scales = 'free') + ggplot2::theme_bw() +
ggplot2::scale_color_manual(values = values) ggplot2
Positive values of degree of cluster when using method = 'K'
and method = 'L'
indicates evidence of spatial clustering, while neagtive valueas correspond to spatial regularity. On the other hand, if using `method = ‘M’ then values larger than 1 correspond to clustering and values less than one correspond to regularity. Also, these values can be interpreted as the the percent difference from spatial clustering, for example if \(M=0.5\) that means there is 50% less spatial clustering than expected under CSR.
In the univariate case, we consider each cell of a single cell type and center circles around each cell (reference cell). In the bivariate case, we are interested in how many cells of Type 1 (Counted) are clustered in proximity to Type 2 (Anchor). Here the circles are centered around cell of Type 2 and then the cells of Type 1 are counted.
<- spatialTIME::bi_ripleys_k(mif = x, mnames = c("CD3..CD8.", "CD3..FOXP3."), num_permutations = 10,
x edge_correction = 'translation', r = seq(0,100,10),
keep_perm_dis = FALSE, workers = 1, exhaustive = TRUE)
$derived$bivariate_Count %>%
x::filter(anchor == 'CD3..FOXP3.') %>%
dplyr::ggplot(ggplot2::aes(x = r, y = `Degree of Clustering Permutation`)) +
ggplot2::geom_line(ggplot2::aes(color = deidentified_sample), show.legend = FALSE) +
ggplot2::theme_bw() + ggplot2::scale_color_manual(values = values) ggplot2
The interpretation of the degree of clustering is the same here. The green line shows evidence that Tregs tend to cluster around Cytotoxic T cells for all values of \(r\), while the red line indicates spatial repulsion of Tregs by Cytotoxic T cells for \(25\le r\le50\).
The NN_G
function reports a permuted and theoretical estimate of CSR, the observed value for \(G\), and the full permutation distribution of \(G\) of keep_perm_dis = TRUE
. The degree of clustering is computed by taking the ratio of the observed \(G\) and either the permutation or theoretical estimate of CSR.
Currently, the number of permutations is 10, but this should be increased to at least 100 for a more reliable estimate of the mean.
<- spatialTIME::NN_G(mif = x, mnames = mnames_good, num_permutations = 10,
x edge_correction = 'rs', r = seq(0,100,10),
keep_perm_dis = FALSE, workers = 1)
$derived$univariate_NN %>%
x::filter(Marker != 'PDL1..Opal.540..Positive') %>%
dplyr::ggplot(ggplot2::aes(x = r, y = `Degree of Clustering Permutation`)) +
ggplot2::geom_line(ggplot2::aes(color = deidentified_sample)) +
ggplot2::facet_wrap(Marker~., scales = 'free') + ggplot2::theme_bw() +
ggplot2::scale_color_manual(values = values) ggplot2
The interpretation of the degree of clustering for \(G\) that values greater than 0 indicate spatial clustering of the cell types of interest, while values less than 0 indicate dispersion of these cells. For example in core [3,B], FOXP3..Opal.620..Positive shows spatial dispersion from \(0\le r \le50\) (less than zero), while CD3..Opal.570..Positive for core [3,B] shows spatial clustering \(0\le r\le75\) (greater than zero).
The interpretation of the bivariate nearest neighbor distribution is similar to bivariate Ripley’s K, in that we are measuring the degree of clustering of one cell type with respect to another. THe actual value of degree of clustering is interpreted in the same manner at the univariate nearest neighbor distribution.
<- spatialTIME::bi_NN_G(mif = x, mnames = c("CD3..CD8.", "CD3..FOXP3."), num_permutations = 10,
x edge_correction = 'rs', r = seq(0,100,10),
keep_perm_dis = FALSE, workers = 1)
$derived$bivariate_NN %>%
x::filter(anchor == 'CD3..FOXP3.') %>%
dplyr::ggplot(ggplot2::aes(x = r, y = `Degree of Clustering Permutation`)) +
ggplot2::geom_line(ggplot2::aes(color = deidentified_sample), show.legend = FALSE) +
ggplot2::theme_bw() + ggplot2::scale_color_manual(values = values)
ggplot2#> Warning: Removed 27 row(s) containing missing values (geom_path).