The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Analyze Copy Number Signatures with sigminer

Shixiang Wang ( wangsx1@sysucc.org.cn )

2024-05-11

Exploring copy number signatures with recently developed approach have been described at The repertoire of copy number alteration signatures in human cancer.

A more general introduction please read Extract, Analyze and Visualize Mutational Signatures with Sigminer.

library(sigminer)
#> Registered S3 method overwritten by 'sigminer':
#>   method      from
#>   print.bytes Rcpp
#> sigminer version 2.3.1
#> - Star me at https://github.com/ShixiangWang/sigminer
#> - Run hello() to see usage and citation.

For this analysis, data with six columns are required.

Generate allele-specific copy number profile

load(system.file("extdata", "toy_segTab.RData",
  package = "sigminer", mustWork = TRUE
))

set.seed(1234)
segTabs$minor_cn <- sample(c(0, 1), size = nrow(segTabs), replace = TRUE)
cn <- read_copynumber(segTabs,
  seg_cols = c("chromosome", "start", "end", "segVal"),
  genome_measure = "wg", complement = TRUE, add_loh = TRUE
)
#> ℹ [2024-05-11 12:07:22.722145]: Started.
#> ℹ [2024-05-11 12:07:22.739754]: Genome build  : hg19.
#> ℹ [2024-05-11 12:07:22.741917]: Genome measure: wg.
#> ℹ [2024-05-11 12:07:22.743996]: When add_loh is TRUE, use_all is forced to TRUE.
#> Please drop columns you don't want to keep before reading.
#> ✔ [2024-05-11 12:07:22.76519]: Chromosome size database for build obtained.
#> ℹ [2024-05-11 12:07:22.767823]: Reading input.
#> ✔ [2024-05-11 12:07:22.770067]: A data frame as input detected.
#> ✔ [2024-05-11 12:07:22.772699]: Column names checked.
#> ✔ [2024-05-11 12:07:22.777028]: Column order set.
#> ✔ [2024-05-11 12:07:22.796121]: Chromosomes unified.
#> ✔ [2024-05-11 12:07:22.824277]: Value 2 (normal copy) filled to uncalled chromosomes.
#> ✔ [2024-05-11 12:07:22.833025]: Data imported.
#> ℹ [2024-05-11 12:07:22.835415]: Segments info:
#> ℹ [2024-05-11 12:07:22.8376]:     Keep - 477
#> ℹ [2024-05-11 12:07:22.839727]:     Drop - 0
#> ✔ [2024-05-11 12:07:22.842663]: Segments sorted.
#> ℹ [2024-05-11 12:07:22.844732]: Adding LOH labels...
#> ℹ [2024-05-11 12:07:22.848268]: Joining adjacent segments with same copy number value. Be patient...
#> ✔ [2024-05-11 12:07:23.026732]: 410 segments left after joining.
#> ✔ [2024-05-11 12:07:23.029618]: Segmental table cleaned.
#> ℹ [2024-05-11 12:07:23.031965]: Annotating.
#> ✔ [2024-05-11 12:07:23.056107]: Annotation done.
#> ℹ [2024-05-11 12:07:23.058526]: Summarizing per sample.
#> ✔ [2024-05-11 12:07:23.094252]: Summarized.
#> ℹ [2024-05-11 12:07:23.09669]: Generating CopyNumber object.
#> ✔ [2024-05-11 12:07:23.099685]: Generated.
#> ℹ [2024-05-11 12:07:23.101863]: Validating object.
#> ✔ [2024-05-11 12:07:23.104097]: Done.
#> ℹ [2024-05-11 12:07:23.106758]: 0.385 secs elapsed.
cn
#> An object of class CopyNumber 
#> =============================
#>                           sample n_of_seg n_of_cnv n_of_amp n_of_del n_of_vchr
#>                           <char>    <int>    <int>    <int>    <int>     <int>
#>  1: TCGA-DF-A2KN-01A-11D-A17U-01       34        6        5        1         4
#>  2: TCGA-19-2621-01B-01D-0911-01       34        8        5        3         5
#>  3: TCGA-B6-A0X5-01A-21D-A107-01       29        8        4        4         2
#>  4: TCGA-A8-A07S-01A-11D-A036-01       39       11        2        9         4
#>  5: TCGA-26-6174-01A-21D-1842-01       44       13        8        5         8
#>  6: TCGA-CV-7432-01A-11D-2128-01       41       16        7        9         9
#>  7: TCGA-06-0644-01A-02D-0310-01       47       19        5       14         8
#>  8: TCGA-A5-A0G2-01A-11D-A042-01       40       21        5       16        10
#>  9: TCGA-99-7458-01A-11D-2035-01       49       26       10       16        13
#> 10: TCGA-05-4417-01A-22D-1854-01       53       37       33        4        17
#>     n_loh cna_burden
#>     <int>      <num>
#>  1:    15      0.000
#>  2:    20      0.095
#>  3:    18      0.083
#>  4:    21      0.106
#>  5:    24      0.113
#>  6:    24      0.188
#>  7:    33      0.158
#>  8:    23      0.375
#>  9:    33      0.304
#> 10:    29      0.617
cn@data
#>      chromosome     start       end segVal                       sample
#>          <char>     <num>     <num>  <int>                       <char>
#>   1:       chr1   3218923 116319008      2 TCGA-05-4417-01A-22D-1854-01
#>   2:       chr1 116324707 120523902      1 TCGA-05-4417-01A-22D-1854-01
#>   3:       chr1 149879545 247812431      4 TCGA-05-4417-01A-22D-1854-01
#>   4:      chr10    423671 135224372      3 TCGA-05-4417-01A-22D-1854-01
#>   5:      chr11    458784  19461653      3 TCGA-05-4417-01A-22D-1854-01
#>  ---                                                                   
#> 406:       chr6   1016984 170898549      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 407:       chr7    746917 158385118      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 408:       chr8    617885 145225107      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 409:       chr9    790234 140938075      2 TCGA-DF-A2KN-01A-11D-A17U-01
#> 410:       chrX         1 155270560      2 TCGA-DF-A2KN-01A-11D-A17U-01
#>       minor_cn    loh .loh_frac
#>          <num> <lgcl>     <num>
#>   1: 1.0000000  FALSE        NA
#>   2: 0.0000000   TRUE        NA
#>   3: 0.5000000   TRUE 0.1175943
#>   4: 1.0000000  FALSE        NA
#>   5: 1.0000000  FALSE        NA
#>  ---                           
#> 406: 0.3333333   TRUE 0.9979494
#> 407: 1.0000000  FALSE        NA
#> 408: 1.0000000  FALSE        NA
#> 409: 0.5000000   TRUE 0.8328715
#> 410:        NA  FALSE        NA

Classify the segments with Steele et al method

If you want to try other type of copy number signatures, change the method argument.

tally_s <- sig_tally(cn, method = "S")
#> ℹ [2024-05-11 12:07:23.165562]: Started.
#> ℹ [2024-05-11 12:07:23.171528]: When you use method 'S', please make sure you have set 'join_adj_seg' to FALSE and 'add_loh' to TRUE in 'read_copynumber() in the previous step!
#> ✔ [2024-05-11 12:07:23.197549]: Matrix generated.
#> ℹ [2024-05-11 12:07:23.200068]: 0.034 secs elapsed.

str(tally_s$all_matrices, max.level = 1)
#> List of 2
#>  $ CN_40: int [1:10, 1:40] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2
#>  $ CN_48: int [1:10, 1:48] 0 0 0 0 0 0 0 0 0 0 ...
#>   ..- attr(*, "dimnames")=List of 2

Find de novo signatures

sig_denovo = sig_auto_extract(tally_s$all_matrices$CN_48)
#> Select Run 3, which K = 2 as best solution.
head(sig_denovo$Signature)
#>                         Sig1          Sig2
#> 0:homdel:0-100Kb    0.000000  0.000000e+00
#> 0:homdel:100Kb-1Mb  0.000000  0.000000e+00
#> 0:homdel:>1Mb       0.000000  0.000000e+00
#> 1:LOH:0-100Kb       3.609460 3.819129e-242
#> 1:LOH:100Kb-1Mb     6.316554 2.814800e-127
#> 1:LOH:1Mb-10Mb     13.535473 2.784288e-190

Refit (19) reference signatures

This directly calculates the contribution of 19 reference signatures.

act_refit = sig_fit(t(tally_s$all_matrices$CN_48), sig_index = "ALL", sig_db = "CNS_TCGA")
#> ℹ [2024-05-11 12:07:24.377693]: Started.
#> ✔ [2024-05-11 12:07:24.379994]: Signature index detected.
#> ℹ [2024-05-11 12:07:24.382046]: Checking signature database in package.
#> ℹ [2024-05-11 12:07:24.386141]: Checking signature index.
#> ℹ [2024-05-11 12:07:24.388193]: Valid index for db 'CNS_TCGA':
#> CN1 CN2 CN3 CN4 CN5 CN6 CN7 CN8 CN9 CN10 CN11 CN12 CN13 CN14 CN15 CN16 CN17 CN18 CN19
#> ✔ [2024-05-11 12:07:24.390339]: Database and index checked.
#> ✔ [2024-05-11 12:07:24.392602]: Signature normalized.
#> ℹ [2024-05-11 12:07:24.394607]: Checking row number for catalog matrix and signature matrix.
#> ✔ [2024-05-11 12:07:24.396599]: Checked.
#> ℹ [2024-05-11 12:07:24.398572]: Checking rownames for catalog matrix and signature matrix.
#> ✔ [2024-05-11 12:07:24.400536]: Checked.
#> ✔ [2024-05-11 12:07:24.402494]: Method 'QP' detected.
#> ✔ [2024-05-11 12:07:24.414918]: Corresponding function generated.
#> ℹ [2024-05-11 12:07:24.417316]: Calling function.
#> ℹ [2024-05-11 12:07:24.419917]: Fitting sample: TCGA-05-4417-01A-22D-1854-01
#> ℹ [2024-05-11 12:07:24.42325]: Fitting sample: TCGA-06-0644-01A-02D-0310-01
#> ℹ [2024-05-11 12:07:24.425641]: Fitting sample: TCGA-19-2621-01B-01D-0911-01
#> ℹ [2024-05-11 12:07:24.427867]: Fitting sample: TCGA-26-6174-01A-21D-1842-01
#> ℹ [2024-05-11 12:07:24.430052]: Fitting sample: TCGA-99-7458-01A-11D-2035-01
#> ℹ [2024-05-11 12:07:24.432224]: Fitting sample: TCGA-A5-A0G2-01A-11D-A042-01
#> ℹ [2024-05-11 12:07:24.434394]: Fitting sample: TCGA-A8-A07S-01A-11D-A036-01
#> ℹ [2024-05-11 12:07:24.436542]: Fitting sample: TCGA-B6-A0X5-01A-21D-A107-01
#> ℹ [2024-05-11 12:07:24.438711]: Fitting sample: TCGA-CV-7432-01A-11D-2128-01
#> ℹ [2024-05-11 12:07:24.440867]: Fitting sample: TCGA-DF-A2KN-01A-11D-A17U-01
#> ✔ [2024-05-11 12:07:24.443045]: Done.
#> ℹ [2024-05-11 12:07:24.445076]: Generating output signature exposures.
#> ✔ [2024-05-11 12:07:24.44793]: Done.
#> ℹ [2024-05-11 12:07:24.450022]: 0.072 secs elapsed.

We can use some threshold to keep really contributed signautres.

act_refit2 = act_refit[apply(act_refit, 1, function(x) sum(x) > 0.1),]

rownames(act_refit2)
#>  [1] "CN1"  "CN2"  "CN3"  "CN4"  "CN9"  "CN11" "CN12" "CN13" "CN14" "CN19"

Plot signatures

For de novo signatures:

show_sig_profile(sig_denovo, mode = "copynumber", method = "S", style = "cosmic")

Show the activity/exposure.

show_sig_exposure(sig_denovo)

For reference signatures, you can just select what you want:

show_sig_profile(
  get_sig_db("CNS_TCGA")$db[, rownames(act_refit2)],
  style = "cosmic", 
  mode = "copynumber", method = "S", check_sig_names = FALSE)

Similarly for showing activity.

show_sig_exposure(act_refit2)

NOTE that this case shows relatively large difference with different approaches, so you need to pick based on your data size/quality and double-check the results. In general, for small-size data set, the refitting approach is recommended.

Signature assignment

To assign the de-novo signatures to reference signatures, we use cosine similarity.

get_sig_similarity(sig_denovo, sig_db = "CNS_TCGA")
#> -Comparing against COSMIC signatures
#> ------------------------------------
#> --Found Sig1 most similar to CN1
#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.706]
#> --Found Sig2 most similar to CN2
#>    Aetiology: See https://cancer.sanger.ac.uk/signatures/cn/ [similarity: 0.771]
#> ------------------------------------
#> Return result invisiblely.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.