The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Sample modifiers in pepr: derive

Michal Stolarczyk

2023-11-21

Learn derived attributes in pepr

This vignette will show you how and why to use the derived attributes functionality of the pepr package.

Problem/Goal

The example below demonstrates how to use the derived attributes to flexibly define the samples attributes the file_path column of the sample_table.csv file to match the file names in your project. Please consider the example below for reference:

sample_name protocol organism time file_path
pig_0h RRBS pig 0 data/lab/project/pig_0h.fastq
pig_1h RRBS pig 1 data/lab/project/pig_1h.fastq
frog_0h RRBS frog 0 data/lab/project/frog_0h.fastq
frog_1h RRBS frog 1 data/lab/project/frog_1h.fastq

Solution

As the name suggests the attributes in the specified attributes (here: file_path) can be derived from other ones. The way how this process is carried out is indicated explicitly in the project_config.yaml file (presented below). The name of the column is determined in the sample_modifiers.derive.attributes key-value pair, whereas the pattern for the attributes construction - in the sample_modifiers.derive.sources one. Note that the second level key (here: source) has to exactly match the attributes in the file_path column of the modified sample_annotation.csv (presented below).

   pep_version: 2.0.0
   sample_table: sample_table.csv
   output_dir: $HOME/hello_looper_results
   sample_modifiers:
      derive:
          attributes: file_path
          sources:
              source1: $HOME/data/lab/project/{organism}_{time}h.fastq
              source2: 
  /path/from/collaborator/weirdNamingScheme_{external_id}.fastq

Let’s introduce a few modifications to the original sample_annotation.csv file to map the appropriate data sources from the project_config.yaml with attributes in the derived column - [file_path]:

sample_name protocol organism time file_path
pig_0h RRBS pig 0 source1
pig_1h RRBS pig 1 source1
frog_0h RRBS frog 0 source1
frog_1h RRBS frog 1 source1

Code

Load pepr and read in the project metadata by specifying the path to the project_config.yaml:

library(pepr)
projectConfig = system.file(
"extdata",
paste0("example_peps-", branch),
"example_derive",
"project_config.yaml",
package = "pepr"
)
p = Project(projectConfig)
#> Loading config file: /tmp/RtmpoymTo9/Rinstb3055bff7/pepr/extdata/example_peps-master/example_derive/project_config.yaml

And inspect it:

sampleTable(p)
#>    sample_name protocol organism time
#> 1:      pig_0h     RRBS      pig    0
#> 2:      pig_1h     RRBS      pig    1
#> 3:     frog_0h     RRBS     frog    0
#> 4:     frog_1h     RRBS     frog    1
#>                                      file_path
#> 1:  /home/nsheff/data/lab/project/pig_0h.fastq
#> 2:  /home/nsheff/data/lab/project/pig_1h.fastq
#> 3: /home/nsheff/data/lab/project/frog_0h.fastq
#> 4: /home/nsheff/data/lab/project/frog_1h.fastq

As you can see, the resulting samples are annotated the same way as if they were read from the original, unwieldy, annotations file.

What is more, the p object consists of all the information from the project config file (project_config.yaml). Run the following line to explore it:

config(p)
#> Config object. Class: Config
#>  pep_version: 2.0.0
#>  sample_table: 
#> /tmp/RtmpoymTo9/Rinstb3055bff7/pepr/extdata/example_peps-master/example_derive/sample_table.csv
#>  output_dir: /home/nsheff/hello_looper_results
#>  sample_modifiers:
#>     derive:
#>         attributes: file_path
#>         sources:
#>             source1: /home/nsheff/data/lab/project/{organism}_{time}h.fastq
#>             source2: 
#> /path/from/collaborator/weirdNamingScheme_{external_id}.fastq
#>  name: example_derive

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.