The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction

HR-SIP method workflow:

Remove sparsely occuring OTUs under the assumption that they are sequencing errors.
- This is assuming that DNA from and OTU is likely going to be found in multiple gradient fractions due to differences in G+C content, diffusion, etc.
- So, if an OTU is just detected in only one fraction of a gradient, we can't tell whether it is a sequencing artifact or a real, but rare OTU.
For each OTU, use DESeq2 to detect significant enrichment in 'heavy' gradient fractions.
- 'heavy' gradients are selected ad hoc
- Multiple hypotheses are adjusted for.
- Significantly enriched OTUs are considered to have incorporated isotope (incorporators)

MW-HR-SIP method workflow:

The method is similar to HR-SIP, except instead of 1 'heavy' BD window used, multiple windows are used, but the tradeoff is the higher number of multiple hypotheses that must be corrected for.
- For each OTU, the BD window with highest log2 fold change (l2fc) value is used, so the BD window can vary among OTUs.
This method has been shown to be generally more sensitive (more true positives) than HR-SIP and more specific (less false positives) than q-SIP. See the manuscript “Multiple window High Resolution Stable Isotope Probing (MW-HR-SIP)” for more information.

Dataset

First, let's load some packages including HTSSIP.

library(dplyr)
library(ggplot2)
library(HTSSIP)

See HTSSIP introduction vignette for a description on why dataset parsing (all treatment-control comparisons) is needed.

Let's see the already parsed dataset

physeq_S2D2_l

## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 47 samples ]
## sample_data() Sample Data:       [ 47 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]
## 
## $`(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')`
## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

HR-SIP

Let's set some parameters used later.

# adjusted P-value cutoff 
padj_cutoff = 0.1
# number of cores for parallel processing (increase depending on your computational hardware)
ncores = 2

One treatment-control comparison

First, we'll just run HR-SIP on 1 treatment-control comparison. Let's get the individual phyloseq object.

physeq = physeq_S2D2_l[[1]]
physeq

## phyloseq-class experiment-level object
## otu_table()   OTU Table:         [ 1072 taxa and 46 samples ]
## sample_data() Sample Data:       [ 46 samples by 17 sample variables ]
## tax_table()   Taxonomy Table:    [ 1072 taxa by 8 taxonomic ranks ]
## phy_tree()    Phylogenetic Tree: [ 1072 tips and 1071 internal nodes ]

Let's check that the samples belong to either a 13C-treatment or 12C-control.

physeq %>% sample_data %>% .$Substrate %>% table

## .
## 12C-Con 13C-Cel 
##      23      23

OK, we should be ready to run HR-SIP!

Note that the design parameter for HRSIP() is the experimental design parameter for calculating log2 fold change (l2fc) values with DESeq. Here, it's used to distinguish label-treatment and unlabel-control samples.

df_l2fc = HRSIP(physeq,
                design = ~Substrate,
                padj_cutoff = padj_cutoff,
                sparsity_threshold = c(0,0.15,0.3))   # just using 3 thresholds to reduce time

## Sparsity threshold: 0 
## Density window: 1.7-1.75 
## Sparsity threshold: 0.15 
## Density window: 1.7-1.75 
## Sparsity threshold: 0.3 
## Density window: 1.7-1.75 
## Sparsity threshold with the most rejected hypotheses: 0

df_l2fc %>% head(n=3)

## # A tibble: 3 x 17
##   OTU   log2FoldChange       p   padj Rank1 Rank2 Rank3 Rank4 Rank5 Rank6
##   <chr>          <dbl>   <dbl>  <dbl> <fct> <fct> <fct> <fct> <fct> <fct>
## 1 OTU.…         -0.133 0.745   1.000  Bact… __Pr… __De… __De… __Ni… __un…
## 2 OTU.…          1.39  0.0486  0.427  Bact… __Ac… __RB… __un… <NA>  <NA> 
## 3 OTU.…          1.53  0.00326 0.0976 Bact… __Ac… __DA… __un… <NA>  <NA> 
## # … with 7 more variables: Rank7 <fct>, Rank8 <fct>, density_min <dbl>,
## #   density_max <dbl>, sparsity_threshold <dbl>, sparsity_apply <chr>,
## #   l2fc_threshold <dbl>

How many “incorporators”“ (rejected hypotheses)?

df_l2fc %>% 
  filter(padj < padj_cutoff) %>%
  group_by() %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length)

## # A tibble: 1 x 1
##   n_incorp_OTUs
##           <int>
## 1            36

Let's plot a breakdown of incorporators for each phylum.

# summarizing
df_l2fc_s = df_l2fc %>% 
  filter(padj < padj_cutoff) %>%
  mutate(Rank2 = gsub('^__', '', Rank2)) %>%
  group_by(Rank2) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>%
  ungroup()

# plotting
ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) +
    geom_bar(stat='identity') +
    labs(x='Phylum', y='Number of incorporators') +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle=45, hjust=1)
    )

plot of chunk HRSIP_res_plotting

All treatment-control comparisons

Let's now run HR-SIP on all treatment-control comparisons in the dataset:

# Number of comparisons
physeq_S2D2_l %>% length

## [1] 4

The function plyr::ldply() is useful (compared to lapply()) beccause it can be run in parallel and returns a data.frame object.

# Running in parallel; you may need to change the backend for your environment.
# Or you can just set .parallel=FALSE. 
doParallel::registerDoParallel(ncores)

df_l2fc = plyr::ldply(physeq_S2D2_l, 
                      HRSIP, 
                      design = ~Substrate, 
                      padj_cutoff = padj_cutoff,
                      sparsity_threshold = c(0,0.15,0.3),  # just using 3 thresholds to reduce run time
                      .parallel=TRUE)
df_l2fc %>% head(n=3)

##                                                                       .id
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##       OTU log2FoldChange           p       padj    Rank1            Rank2
## 1 OTU.514     -0.1326734 0.744892882 0.99999999 Bacteria __Proteobacteria
## 2 OTU.729      1.3853310 0.048577337 0.42707575 Bacteria  __Acidobacteria
## 3 OTU.322      1.5298136 0.003258379 0.09756653 Bacteria  __Acidobacteria
##                   Rank3                                Rank4
## 1 __Deltaproteobacteria                  __Desulfobacterales
## 2                __RB25 __uncultured_Acidobacteria_bacterium
## 3               __DA023               __uncultured_bacterium
##              Rank5        Rank6                  Rank7 Rank8 density_min
## 1 __Nitrospinaceae __uncultured __uncultured_bacterium  <NA>         1.7
## 2             <NA>         <NA>                   <NA>  <NA>         1.7
## 3             <NA>         <NA>                   <NA>  <NA>         1.7
##   density_max sparsity_threshold sparsity_apply l2fc_threshold
## 1        1.75                  0            all           0.25
## 2        1.75                  0            all           0.25
## 3        1.75                  0            all           0.25

Each specific phyloseq subset (treatment-control comparison) is delimited with the ”.id" column.

df_l2fc %>% .$.id %>% unique

## [1] "(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')"  
## [2] "(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')"

For clarity, let's edit these long strings to make them more readable when plotted.

df_l2fc = df_l2fc %>%
  mutate(.id = gsub(' \\| ', '\n', .id))
df_l2fc %>% .$.id %>% unique

## [1] "(Substrate=='12C-Con' & Day=='3')\n(Substrate=='13C-Cel' & Day == '3')"  
## [2] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='3')\n(Substrate=='13C-Glu' & Day == '3')"

How many incorporators (rejected hypotheses) & which sparsity cutoff was used for each comparison?

Note: you could set one sparsity cutoff for all comparisons by setting the sparsity_cutoff to a specific value.

df_l2fc %>% 
  filter(padj < padj_cutoff) %>%
  group_by(.id, sparsity_threshold) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>%
  as.data.frame

##                                                                        .id
## 1 (Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Cel' & Day == '14')
## 2 (Substrate=='12C-Con' & Day=='14')\n(Substrate=='13C-Glu' & Day == '14')
## 3   (Substrate=='12C-Con' & Day=='3')\n(Substrate=='13C-Cel' & Day == '3')
## 4   (Substrate=='12C-Con' & Day=='3')\n(Substrate=='13C-Glu' & Day == '3')
##   sparsity_threshold n_incorp_OTUs
## 1                0.3            97
## 2                0.0           189
## 3                0.0            36
## 4                0.3            65

How about a breakdown of incorporators for each phylum in each comparision.

# summarizing
df_l2fc_s = df_l2fc %>% 
  filter(padj < padj_cutoff) %>%
  mutate(Rank2 = gsub('^__', '', Rank2)) %>%
  group_by(.id, Rank2) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>%
  ungroup()

# plotting
ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) +
    geom_bar(stat='identity') +
    labs(x='Phylum', y='Number of incorporators') +
    facet_wrap(~ .id, scales='free') +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle=55, hjust=1)
    )

MW-HR-SIP

MW-HR-SIP is run very similarly to HRSIP, but it uses multiple buoyant density (BD) windows. MW-HR-SIP is performed with the HRSIP() function, but multiple BD windows are specified.

Let's use 3 buoyant density windows (g/ml):

1.70-1.73, 1.72-1.75, 1.74-1.77

windows = data.frame(density_min=c(1.70, 1.72, 1.74), 
                     density_max=c(1.73, 1.75, 1.77))
windows

##   density_min density_max
## 1        1.70        1.73
## 2        1.72        1.75
## 3        1.74        1.77

Running HRSIP with all 3 BD windows. Let's run this in parallel to speed things up.

You can turn off parallel processing by setting the parallel option to FALSE. Also, there's 2 different levels that could be parallelized: either the ldply() or HRSIP(). Here, I'm running HRSIP in parallel, but it may make sense in other situations (eg., many comparisons but few density windows and/or sparsity cutoffs) to use ldply in parallel only.

doParallel::registerDoParallel(ncores)

df_l2fc = plyr::ldply(physeq_S2D2_l, 
                      HRSIP, 
                      density_windows = windows,
                      design = ~Substrate, 
                      padj_cutoff = padj_cutoff,
                      sparsity_threshold = c(0,0.15,0.3), # just using 3 thresholds to reduce run time
                      .parallel = TRUE)
df_l2fc %>% head(n=3)

##                                                                       .id
## 1 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 2 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 3 (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
##       OTU log2FoldChange          p      padj    Rank1            Rank2
## 1 OTU.514    -0.06192716 0.64317475 1.0000000 Bacteria __Proteobacteria
## 2 OTU.729     1.40750346 0.11190135 0.6588185 Bacteria  __Acidobacteria
## 3 OTU.386     1.43831256 0.03408535 0.3867674 Bacteria  __Acidobacteria
##                   Rank3                                Rank4
## 1 __Deltaproteobacteria                  __Desulfobacterales
## 2                __RB25 __uncultured_Acidobacteria_bacterium
## 3                __RB25               __uncultured_bacterium
##              Rank5        Rank6                  Rank7 Rank8 density_min
## 1 __Nitrospinaceae __uncultured __uncultured_bacterium  <NA>         1.7
## 2             <NA>         <NA>                   <NA>  <NA>         1.7
## 3             <NA>         <NA>                   <NA>  <NA>         1.7
##   density_max sparsity_threshold sparsity_apply l2fc_threshold
## 1        1.73                  0            all           0.25
## 2        1.73                  0            all           0.25
## 3        1.73                  0            all           0.25

Let's check that we have all treatment-control comparisons.

df_l2fc %>% .$.id %>% unique

## [1] "(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')"  
## [2] "(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')"
## [3] "(Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')"
## [4] "(Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')"

How many incorporators (rejected hypotheses) & which sparsity cutoff was used for each comparison?

Note: one sparsity cutoff could be set for all comparisons by setting the sparsity_cutoff to a specific value.

df_l2fc %>% 
  filter(padj < padj_cutoff) %>%
  group_by(.id, sparsity_threshold) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>%
  as.data.frame

##                                                                         .id
## 1 (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Cel' & Day == '14')
## 2 (Substrate=='12C-Con' & Day=='14') | (Substrate=='13C-Glu' & Day == '14')
## 3   (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Cel' & Day == '3')
## 4   (Substrate=='12C-Con' & Day=='3') | (Substrate=='13C-Glu' & Day == '3')
##   sparsity_threshold n_incorp_OTUs
## 1                0.0           119
## 2                0.0           245
## 3                0.0            43
## 4                0.3            85

The density windows can vary for each OTU. Let's plot which density windows were used for the OTUs in the dataset.

# summarizing
df_l2fc_s = df_l2fc %>% 
  mutate(.id = gsub(' \\| ', '\n', .id)) %>%
  filter(padj < padj_cutoff) %>%
  mutate(density_range = paste(density_min, density_max, sep='-')) %>% 
  group_by(.id, sparsity_threshold, density_range) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) 

#plotting
ggplot(df_l2fc_s, aes(.id, n_incorp_OTUs, fill=density_range)) +
    geom_bar(stat='identity', position='fill') +
    labs(x='Control-treatment comparision', y='Fraction of incorporators') +
    scale_y_continuous(expand=c(0,0)) +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle=55, hjust=1)
    )

plot of chunk MWHRSIP_plotting_densWin

Different BD windows were used for different treatment-control comparisons because the amount of BD shift likely varied among taxa. For example, if a taxon incorporates 100% 13C isotope, then a very 'heavy' BD window may show a larger l2fc than a less 'heavy' BD window.

Let's look at a breakdown of incorporators for each phylum in each comparision.

# summarizing
df_l2fc_s = df_l2fc %>% 
  mutate(.id = gsub(' \\| ', '\n', .id)) %>%
  filter(padj < padj_cutoff) %>%
  mutate(Rank2 = gsub('^__', '', Rank2)) %>%
  group_by(.id, Rank2) %>%
  summarize(n_incorp_OTUs = OTU %>% unique %>% length) %>%
  ungroup()

# plotting
ggplot(df_l2fc_s, aes(Rank2, n_incorp_OTUs)) +
    geom_bar(stat='identity') +
    labs(x='Phylum', y='Number of incorporators') +
    facet_wrap(~ .id, scales='free') +
    theme_bw() +
    theme(
      axis.text.x = element_text(angle=55, hjust=1)
    )

plot of chunk MWHRSIP_plotting_incorp

Note that the MW-HR-SIP method identifies more incorporators than the HR-SIP method (which uses just one BD window).

MW-HR-SIP detects more taxa for 2 main reasons. First, taxa vary in G+C content, so using only 1 BD window likely encompasses BD shifts for taxa of certain G+C contents (eg., ~50% G+C), but may miss other taxa with higher or lower G+C content. Second, taxa can vary in how much isotope was incorporated, which will affect where each taxon's DNA is in the density gradient.

Session info

sessionInfo()

## R version 3.4.3 (2017-11-30)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Sierra 10.12.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.4/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] phyloseq_1.22.3 HTSSIP_1.4.1    ggplot2_3.2.0   tidyr_0.8.3    
## [5] dplyr_0.8.0.1  
## 
## loaded via a namespace (and not attached):
##   [1] nlme_3.1-131               bitops_1.0-6              
##   [3] matrixStats_0.54.0         bit64_0.9-7               
##   [5] doParallel_1.0.14          RColorBrewer_1.1-2        
##   [7] GenomeInfoDb_1.14.0        tools_3.4.3               
##   [9] backports_1.1.4            utf8_1.1.4                
##  [11] R6_2.4.0                   coenocliner_0.2-2         
##  [13] vegan_2.5-1                rpart_4.1-15              
##  [15] Hmisc_4.2-0                DBI_1.0.0                 
##  [17] lazyeval_0.2.2             BiocGenerics_0.24.0       
##  [19] mgcv_1.8-28                colorspace_1.4-1          
##  [21] permute_0.9-5              ade4_1.7-13               
##  [23] nnet_7.3-12                withr_2.1.2               
##  [25] tidyselect_0.2.5           gridExtra_2.3             
##  [27] DESeq2_1.18.1              bit_1.1-14                
##  [29] compiler_3.4.3             cli_1.1.0                 
##  [31] Biobase_2.38.0             htmlTable_1.13.1          
##  [33] DelayedArray_0.4.1         labeling_0.3              
##  [35] scales_1.0.0               checkmate_1.9.3           
##  [37] genefilter_1.60.0          stringr_1.4.0             
##  [39] digest_0.6.19              foreign_0.8-71            
##  [41] XVector_0.18.0             base64enc_0.1-3           
##  [43] pkgconfig_2.0.2            htmltools_0.3.6           
##  [45] highr_0.8                  htmlwidgets_1.3           
##  [47] rlang_0.4.0                RSQLite_2.1.1             
##  [49] rstudioapi_0.10            jsonlite_1.6              
##  [51] BiocParallel_1.12.0        acepack_1.4.1             
##  [53] RCurl_1.95-4.12            magrittr_1.5              
##  [55] GenomeInfoDbData_1.0.0     Formula_1.2-3             
##  [57] biomformat_1.6.0           Matrix_1.2-17             
##  [59] fansi_0.4.0                Rcpp_1.0.1                
##  [61] munsell_0.5.0              S4Vectors_0.16.0          
##  [63] ape_5.3                    stringi_1.4.3             
##  [65] MASS_7.3-51.4              SummarizedExperiment_1.8.1
##  [67] zlibbioc_1.24.0            rhdf5_2.22.0              
##  [69] plyr_1.8.4                 blob_1.1.1                
##  [71] grid_3.4.3                 parallel_3.4.3            
##  [73] crayon_1.3.4               lattice_0.20-38           
##  [75] Biostrings_2.46.0          splines_3.4.3             
##  [77] multtest_2.34.0            annotate_1.56.2           
##  [79] locfit_1.5-9.1             zeallot_0.1.0             
##  [81] knitr_1.18                 pillar_1.4.1              
##  [83] igraph_1.2.4               GenomicRanges_1.30.3      
##  [85] markdown_0.9               geneplotter_1.56.0        
##  [87] reshape2_1.4.3             codetools_0.2-16          
##  [89] stats4_3.4.3               XML_3.98-1.19             
##  [91] glue_1.3.1                 evaluate_0.14             
##  [93] latticeExtra_0.6-28        data.table_1.10.4-3       
##  [95] vctrs_0.1.0                foreach_1.4.4             
##  [97] gtable_0.3.0               purrr_0.3.2               
##  [99] assertthat_0.2.1           mime_0.6                  
## [101] xtable_1.8-4               survival_2.44-1.1         
## [103] tibble_2.1.1               iterators_1.0.10          
## [105] memoise_1.1.0              AnnotationDbi_1.40.0      
## [107] IRanges_2.12.0             cluster_2.0.6

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.