install.packages('devtools')
devtools::install_github("sahilseth/flowr")
Run a setup function which copies ‘flowr’ helper script to enable using flow from shell terminal itself. A few examples here.
library(flowr)
setup()
A simple example where we have three instances of sleep (wait for few seconds), after completion three tmp jobs are started which create files with some random data. After all these are complete, a merge step follows, which combines them into one big file. Next we use du
to calculate the size of the resulting file.
.. note:: This is quite similar in structure to a typical workflow from where a series of alignment and sorting steps may take place on the raw fastq files. Followed by merging of the resulting bam files into one large file per-sample and further downstream processing.
The table below is referred to as flow_mat.
samplename | jobname | cmd |
---|---|---|
sample1 | sleep | sleep 10 && sleep 2;echo hello |
sample1 | sleep | sleep 11 && sleep 8;echo hello |
sample1 | sleep | sleep 11 && sleep 17;echo hello |
sample1 | create_tmp | head -c 100000 /dev/urandom > sample1_tmp_1 |
sample1 | create_tmp | head -c 100000 /dev/urandom > sample1_tmp_2 |
sample1 | create_tmp | head -c 100000 /dev/urandom > sample1_tmp_3 |
sample1 | merge | cat sample1_tmp_1 sample1_tmp_2 sample1_tmp_3 > sample1_merged |
sample1 | size | du -sh sample1_merged; echo MY shell: $SHELL |
We use an additional file specifying relationship between the steps, and also other resource requirements flow_def.
jobname | sub_type | prev_jobs | dep_type | queue | memory_reserved | walltime | cpu_reserved | platform | jobid |
---|---|---|---|---|---|---|---|---|---|
sleep | scatter | none | none | short | 2000 | 1:00 | 1 | torque | 1 |
create_tmp | scatter | sleep | serial | short | 2000 | 1:00 | 1 | torque | 2 |
merge | serial | create_tmp | gather | short | 2000 | 1:00 | 1 | torque | 3 |
size | serial | merge | serial | short | 2000 | 1:00 | 1 | torque | 4 |
fobj <- to_flow(x = flow_mat, def = as.flowdef(flow_def),
flowname = "example1", platform = "lsf")
plot_flow(fobj)
Dry run (submit)
submit_flow(fobj)
Test Successful!
You may check this folder for consistency. Also you may re-run submit with execute=TRUE
~/flowr/type1-20150520-15-18-27-5mSd32G0
Submit to the cluster
submit_flow(fobj, execute = TRUE)
Flow has been submitted. Track it from terminal using:
flowr::status(x="~/flowr/type1-20150520-15-18-46-sySOzZnE")
OR
flowr status x=~/flowr/type1-20150520-15-18-46-sySOzZnE
flowr status x=~/flowr/type1-20150520-15-18-46-sySOzZnE
Loading required package: shape
Flowr: streamlining workflows
Showing status of: /rsrch2/iacs/iacs_dep/sseth/flowr/type1-20150520-15-18-46-sySOzZnE
| | total| started| completed| exit_status|
|:---------|-----:|-------:|---------:|-----------:|
|001.sleep | 10| 10| 10| 0|
|002.tmp | 10| 10| 10| 0|
|003.merge | 1| 1| 1| 0|
|004.size | 1| 1| 1| 0|
.. note:: Interested? Here are some details on building pipelines