The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

CascadeData

Experimental Data of Cascade Experiments in Genomics

Frédéric Bertrand and Myriam Maumy-Bertrand

Lifecycle: stable Project Status: Active – The project has reached a stable, usable state and is being actively developed. R-CMD-check Codecov test coverage CRAN status CRAN RStudio mirror downloads GitHub Repo stars DOI

The goal of CascadeData is to provide the experimental data GSE39411 in a ready to use format. Vallat L, Kemper CA, Jung N, Maumy-Bertrand M, Bertrand F, …, Bahram S, (2013), “Reverse-engineering the genetic circuitry of a cancer cell with predicted intervention in chronic lymphocytic leukemia”. Proc Natl Acad Sci USA, 110(2):459-64, https://dx.doi.org/10.1073/pnas.1211130110.

These are featured as examples by packages such as the Cascade one, a modeling tool allowing gene selection, reverse engineering, and prediction in cascade networks. (Jung, N., Bertrand, F., Bahram, S., Vallat, L., and Maumy-Bertrand, M., 2014, https://dx.doi.org/10.1093/bioinformatics/btt705).

This website and these examples were created by F. Bertrand and M. Maumy-Bertrand.

Installation

You can install the released version of CascadeData from CRAN with:

install.packages("CascadeData")

You can install the development version of CascadeData from github with:

devtools::install_github("fbertran/CascadeData")

Examples

Probesets

Two data frames with 54613 probesets with repeated measurements on 6 independent subjects at 4 time points.

data(micro_S)
str(micro_S)
#> 'data.frame':    54613 obs. of  24 variables:
#>  $ N1_S_T60 : num  136.1 32 78 201.8 16.3 ...
#>  $ N1_S_T90 : num  116.6 43.3 63.5 209.2 8 ...
#>  $ N1_S_T210: num  127.6 31.3 57.9 208.8 15.8 ...
#>  $ N1_S_T390: num  126.8 43.8 73.2 228.2 15.8 ...
#>  $ N2_S_T60 : num  142.2 25.4 53.2 165.3 17.8 ...
#>  $ N2_S_T90 : num  132.5 35.4 97.1 222.3 10.7 ...
#>  $ N2_S_T210: num  122.8 33.3 66.5 175.4 12.9 ...
#>  $ N2_S_T390: num  132.1 55.7 69.7 178.5 13.9 ...
#>  $ N3_S_T60 : num  134 60 52.1 214.6 6.6 ...
#>  $ N3_S_T90 : num  157.1 48.7 59.9 279.5 3.7 ...
#>  $ N3_S_T210: num  49.1 45.7 44.1 181.2 1.3 ...
#>  $ N3_S_T390: num  139.8 41.4 51.8 264.9 19.5 ...
#>  $ N4_S_T60 : num  157.4 66.8 74 214.1 26.1 ...
#>  $ N4_S_T90 : num  158.3 44.8 56.8 222.7 17.9 ...
#>  $ N4_S_T210: num  145.9 42.5 59.8 221.8 18.5 ...
#>  $ N4_S_T390: num  171.2 59.2 62.3 193.3 25.1 ...
#>  $ N5_S_T60 : num  130.1 60.8 63.8 208.5 18.6 ...
#>  $ N5_S_T90 : num  119.7 41.8 47.2 222.8 26.9 ...
#>  $ N5_S_T210: num  121.3 40.5 62.7 226 21.5 ...
#>  $ N5_S_T390: num  110 44 63 227.8 27.2 ...
#>  $ N6_S_T60 : num  114.5 60.7 79.5 219.4 22.1 ...
#>  $ N6_S_T90 : num  144 38.9 79.4 234.8 28.4 ...
#>  $ N6_S_T210: num  128.4 38.7 63.4 241.9 30.5 ...
#>  $ N6_S_T390: num  132.7 52.3 66.6 218.7 19.2 ...
data(micro_US)
str(micro_US)
#> 'data.frame':    54613 obs. of  24 variables:
#>  $ N1_US_T60 : num  103.2 26 70.7 213.7 13.7 ...
#>  $ N1_US_T90 : num  133.7 34.9 71.2 168.9 17.2 ...
#>  $ N1_US_T210: num  157.3 44.2 59.4 175.1 27.8 ...
#>  $ N1_US_T390: num  179.4 44.1 62.7 225.1 10.5 ...
#>  $ N2_US_T60 : num  120.7 47.4 85.8 156.7 16.7 ...
#>  $ N2_US_T90 : num  150.5 40.4 74.4 200.4 25.8 ...
#>  $ N2_US_T210: num  153.5 51.8 71.7 181.6 23 ...
#>  $ N2_US_T390: num  174.8 61.6 66.8 224.5 12.2 ...
#>  $ N3_US_T60 : num  145 64.5 50.8 184.5 17.5 ...
#>  $ N3_US_T90 : num  162.2 58.7 51.9 219.1 21.5 ...
#>  $ N3_US_T210: num  177.8 35.9 58.3 250 12.7 ...
#>  $ N3_US_T390: num  189.6 50.5 62.8 262 20.1 ...
#>  $ N4_US_T60 : num  136 60.9 67.8 227.1 29.6 ...
#>  $ N4_US_T90 : num  135.2 62.2 65.3 221.1 18.2 ...
#>  $ N4_US_T210: num  165.3 62.7 63.3 212.1 21.4 ...
#>  $ N4_US_T390: num  154.3 47.1 68.9 199.7 9.3 ...
#>  $ N5_US_T60 : num  130.9 57.5 56.5 199 20.5 ...
#>  $ N5_US_T90 : num  122.3 49.4 62 226.8 15.4 ...
#>  $ N5_US_T210: num  184.8 57.2 53.4 191.2 18.1 ...
#>  $ N5_US_T390: num  174.6 68 69.1 214 16.5 ...
#>  $ N6_US_T60 : num  142.8 67.9 73.3 220.1 24.7 ...
#>  $ N6_US_T90 : num  148.1 46.4 79.9 257.7 20.7 ...
#>  $ N6_US_T210: num  198.1 53.9 68.6 223.3 18.9 ...
#>  $ N6_US_T390: num  155.8 42.6 63.2 236.7 15.2 ...

Some preliminary between group comparison. First create grouping factor.

groupf=factor(c(rep("S",ncol(micro_S)),rep("US",ncol(micro_US))))

Then, create the 2 most discriminative components (probeset linear combinaison, i.e. scores) featuring 100 probesets each using sparse partial least squares discrimant analysis from the mixOmics package, doi:10.18129/B9.bioc.mixOmics. An optimal choice of the number of components and of the number of kept genes can be carried out using cross-validation.

First makes sure that the mixOmics Bioconductor package is installed.

if (!requireNamespace("limma", quietly = TRUE)){
  if (!requireNamespace("BiocManager", quietly = TRUE)){
    install.packages("BiocManager")
    }
  BiocManager::install("limma")
}
modsplsda=mixOmics::splsda(t(cbind(micro_S,micro_US)),groupf, ncomp = 2, 
keepX = c(100, 100))

Create a clustered image map (cim, i.e. heat map) to represent the results of the splsda analysis.

mixOmics::cim(modsplsda)

plot of chunk cim

Retrieve the names of the probesets that were selected to create the 2 splsda components

selectedprobesets<-unique(mixOmics::selectVar(modsplsda)$name,
mixOmics::selectVar(modsplsda, comp=2)$name)

First makes sure that the limma Bioconductor package is installed.

if (!requireNamespace("limma", quietly = TRUE)){
  if (!requireNamespace("BiocManager", quietly = TRUE)){
    install.packages("BiocManager")
    }
  BiocManager::install("limma")
}

Using the limma, doi:10.18129/B9.bioc.limma, plotMDS function to create the multidimensional scaling plot of distances between the probeset expression profiles that were selected using splsda.

limma::plotMDS(cbind(micro_S,micro_US)[selectedprobesets,])

plot of chunk plotMDS

Entrez GeneIDs

The jetset package enables the selection of optimal probe sets from the HG-U95Av2, HG-U133A, HG-U133 Plus 2.0, or U133 X3P microarray platforms. It requires the org.Hs.eg.db Bioconductor package, doi:10.18129/B9.bioc.org.Hs.eg.db.

First makes sure that the jetset CRAN package and the org.Hs.eg.db Bioconductor package are installed.

if (!requireNamespace("org.Hs.eg.db", quietly = TRUE)){
  if (!requireNamespace("BiocManager", quietly = TRUE)){
      install.packages("BiocManager")
    }
  BiocManager::install("org.Hs.eg.db", version = "3.8")
}
if (!requireNamespace("jetset", quietly = TRUE)){
  install.packages("jetset")
}

Retrieve the optimal probesets for a given Entrez GeneID.

library(jetset)
resjetset=jetset::jmap("hgu133plus2", eg = sort(unique(scores.hgu133plus2$EntrezID)))

Select the optimal probesets from the datasets for a given Entrez GeneID.

micro_S_jetset<-micro_S[resjetset,]
rownames(micro_S_jetset)<-names(resjetset)
micro_US_jetset<-micro_US[resjetset,]
rownames(micro_US_jetset)<-names(resjetset)

Then, create the 2 most discriminative components (probeset linear combinaison, i.e. scores) featuring 100 probesets each using sparse partial least squares discrimant analysis from the mixOmics package, doi:10.18129/B9.bioc.mixOmics. An optimal choice of the number of components and of the number of kept genes can be carried out using cross-validation.

First makes sure that the mixOmics Bioconductor package is installed.

modsplsda_jetset=mixOmics::splsda(t(cbind(micro_S_jetset,micro_US_jetset)), 
groupf, ncomp = 2, keepX = c(100, 100))

Create a clustered image map (cim, i.e. heat map) to represent the results of the splsda analysis.

mixOmics::cim(modsplsda_jetset)

plot of chunk cimjetset

Retrieve the Entrez GeneIDs that were selected to create the 2 splsda components

selectedEntrezGeneIDs<-unique(mixOmics::selectVar(modsplsda_jetset)$name,
mixOmics::selectVar(modsplsda, comp=2)$name)

First makes sure that the limma Bioconductor package is installed.

if (!requireNamespace("limma", quietly = TRUE)){
  if (!requireNamespace("BiocManager", quietly = TRUE)){
    install.packages("BiocManager")
    }
  BiocManager::install("limma")
}

Using the limma, doi:10.18129/B9.bioc.limma, plotMDS function to create the multidimensional scaling plot of distances between the Entrez GeneID expression profiles that were selected using splsda.

limma::plotMDS(cbind(micro_S,micro_US)[selectedEntrezGeneIDs,])

plot of chunk plotMDSegid

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.