The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Generating realistic data with known truth using the jointseg package

M. Pierre-Jean, G. Rigaill, P. Neuvial

2019-01-11

This vignette describes how to use the jointseg package to partition bivariate DNA copy number signals from SNP array data into segments of constant parent-specific copy number. We demonstrate the use of the PSSeg function of this package for applying two different strategies. Both strategies consist in first identifying a list of candidate change points through a fast (greedy) segmentation method, and then to prune this list is using dynamic programming [1]. The segmentation method presented here is Recursive Binary Segmentation (RBS, [2]). We refer to [3] for a more comprehensive performance assessment of this method and other segmentation methods. segmentation, change point model, binary segmentation, dynamic programming, DNA copy number, parent-specific copy number.

Please see Appendix for citing jointseg.

HERE

This vignette illustrates how the jointseg package may be used to generate a variety of copy-number profiles from the same biological ``truth’’. Such profiles have been used to compare the performance of segmentation methods in [3].

Citing jointseg

citation("jointseg")
## 
## To cite package 'jointseg' in publications, please use the
## following references:
## 
##   Morgane Pierre-Jean, Guillem Rigaill and Pierre Neuvial ().
##   jointseg: Joint segmentation of multivariate (copy number)
##   signals.R package version 1.0.2.
## 
##   Morgane Pierre-Jean, Guillem Rigaill and Pierre Neuvial.
##   Performance evaluation of DNA copy number segmentation methods.
##   Briefings in Bioinformatics (2015) 16 (4): 600-615.
## 
## To see these entries in BibTeX format, use 'print(<citation>,
## bibtex=TRUE)', 'toBibtex(.)', or set
## 'options(citation.bibtex.max=999)'.

Setup

The parameters are defined as follows:

n <- 1e4                                 ## signal length
bkp <- c(2334, 6121)                     ## breakpoint positions
regions <- c("(1,1)", "(1,2)", "(0,2)")  ## copy number regions
ylims <- cbind(c(0, 5), c(-0.1, 1.1))
colG <- rep("#88888855", n)
hetCol <- "#00000088"

For convenience we define a custom plot function for this vignette:

plotFUN <- function(dataSet, tumorFraction) {
    regDat <- acnr::loadCnRegionData(dataSet=dataSet, tumorFraction=tumorFraction)
    sim <- getCopyNumberDataByResampling(n, bkp=bkp,
                                         regions=regions, regData=regDat)
    dat <- sim$profile
    wHet <- which(dat$genotype==1/2)
    colGG <- colG
    colGG[wHet] <- hetCol
    plotSeg(dat, sim$bkp, col=colGG)
}

Affymetrix data

ds <- "GSE29172"
pct <- 1
plotFUN(ds, pct)
Data set GSE29172 : 1 % tumor cells

Data set GSE29172 : 1 % tumor cells

plotFUN(ds, pct)
Data set GSE29172 : 1 % tumor cells (another resampling)

Data set GSE29172 : 1 % tumor cells (another resampling)

pct <- 0.7
plotFUN(ds, pct)
Data set GSE29172 : 0.7 % tumor cells

Data set GSE29172 : 0.7 % tumor cells

pct <- 0.5
plotFUN(ds, pct)
Data set GSE29172 : 0.5 % tumor cells

Data set GSE29172 : 0.5 % tumor cells

Illumina data

ds <- "GSE11976"

Session information

sessionInfo()
## R version 3.5.1 (2018-07-02)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: OS X El Capitan 10.11.6
## 
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] C/fr_FR.UTF-8/fr_FR.UTF-8/C/fr_FR.UTF-8/fr_FR.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] jointseg_1.0.2 knitr_1.20    
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.0         matrixStats_0.54.0 digest_0.6.18     
##  [4] rprojroot_1.3-2    acnr_1.0.0         backports_1.1.2   
##  [7] magrittr_1.5       evaluate_0.12      highr_0.7         
## [10] stringi_1.2.4      rmarkdown_1.10     tools_3.5.1       
## [13] stringr_1.3.1      yaml_2.2.0         compiler_3.5.1    
## [16] htmltools_0.3.6    DNAcopy_1.54.0

References

[1] Bellman, Richard. 1961. “On the Approximation of Curves by Line Segments Using Dynamic Programming.” Communications of the ACM 4 (6). ACM: 284.

[2] Gey, Servane, et al. 2008. “Using CART to Detect Multiple Change Points in the Mean for Large Sample.” https://hal.archives-ouvertes.fr/hal-00327146.

[3] Pierre-Jean, Morgane, et al. 2015. “Performance Evaluation of DNA Copy Number Segmentation Methods.” Briefings in Bioinformatics, no. 4: 600-615.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.