The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

qualitycontrol

qualitycontrol

The goal of qualitycontrol is to set a data quality control framework

Installation

You can install the qualitycontrol from GitHub with:

# install.packages("devtools")
devtools::install_github("luisgarcez11/qualitycontrol")

Data

The als_data dataset will be used to guide you through the package functionality. This data is not real, but based on data retrieved from Amyotrophic Lateral Sclerosis patients.

library(qualitycontrol)
als_data
#>    subjid p1 p2 p3 p4 p5 p6 p7 p8 p9 x1r x2r x3r age_at_baseline age_at_onset
#> 1       1  4  1  1  3  4  3  4  3  4   2   2   1              51           46
#> 2       2  4  4  4  1  1  3  3  1  4   1   2   4              82           77
#> 3       3  2  3  1  4  3  1  3  1  1   4   3   1              85           80
#> 4       4  3  2  1  1  4  1  3  2  4   4   3   3              77           72
#> 5       5  3  2  1  3  3  4  4  3  4   1   4   2              85           80
#> 6       6  2  2  1  4  1  4  4  3  1   3   5   2              73           68
#> 7       7  1  4  2  4  3  3  2  3  4   1   2   2              65           60
#> 8       8  2  2  4  4  3  2  1  2  3   3   1   1              50           62
#> 9       9  3  1  1  4  4  2  4  1  1   2   2   4              65           46
#> 10     10  3  4  1  4  3  2  3  2  1   4   3   1              81           76
#> 11     11  1  3  1  3  3  4  1 NA  3   3   2   4              51           46
#> 12     12  1  4  3  2  3  2  2 NA  1   3   2   3              50           45
#> 13     13  1  1  4  1  1  3  4 NA  2   2   3   1              82           77
#> 14     14  3  2  2  4  3  3  3  3  2   3   4   1              76           71
#> 15     15  3  4  2  2  2  3  1  3  4   4   1   4              87          376
#> 16     16  3  3  2  4  3  3  1  1  2   2   4   1              50           45
#> 17     17  3  2  3  1  4  1  3  2  1   4   4   2              85           80
#> 18     18  4  1  3  1  3  1  3  2  2   4   3   4              57           52
#> 19     19  1  3  3  2  2  2  3  2  3   2   3   2              74           69
#> 20     20  2  2  4  2  3  4  2  4  1   4   1   3              59           54
#> 21     21  2  3  3  2  3  2  4  4  1   1   3   3              79           74
#> 22     22  4  3  1  1  3  4  2  1  4   1   2   3              53           48
#> 23     23  3  3  4  3  4  1  3  4  3   2   2   2              45           40
#> 24     24  4  1  1  2  4  2  4  4  4   4   2   1              72           67
#> 25     25  4  3  1  3  3  4  3  2  3   3   4   2              77           72
#> 26     26  2  1  1  2  4  2  4  1  2   3   2   4              65           60
#> 27     27  1  1  1  1  1  1  3  3  2   2   1   1              54           49
#> 28     28  3  1  1  3  1  4  1  2  2   2   3   4              50          -23
#> 29     29  2  3  1  3  1  4  4  1  3   2   4   1              85           80
#> 30     30  3  1  2  1  3  1  2  4  1   1   2   4              85           80
#> 31     30  3  3  1  4  2  2  1  4  3   3   1   3              53           48
#>          onset baseline_date death_date
#> 1       bulbar    2003-03-26 2010-10-18
#> 2        bulba    2003-07-03 2019-06-24
#> 3       spinal    2007-01-27 9999-12-30
#> 4       bulbar    2010-11-27 2018-01-04
#> 5       bulbar    2006-10-25 2017-10-13
#> 6       spinal    2007-04-30 2010-05-08
#> 7       spinal    2002-11-15 2019-04-06
#> 8       spinal    2002-12-13 2018-05-04
#> 9       spinal    2005-06-02 2013-08-11
#> 10      bulbar    2004-06-02 2016-05-20
#> 11      bulbar    2007-03-09 2016-09-26
#> 12      bulbar    2005-01-11 2010-06-20
#> 13      bulbar    2010-12-22 2019-07-05
#> 14      bulbar    2008-10-14 2013-08-14
#> 15      spinal    2005-09-15 2010-07-20
#> 16      spinal    2007-07-05 2010-08-28
#> 17 respiratory    2002-08-19 2011-10-17
#> 18      spinal    2002-06-30 2020-12-17
#> 19 respiratory    2010-07-18 2016-05-15
#> 20      spinal    2004-08-15 2015-03-15
#> 21      bulbar    2006-04-07 2013-03-16
#> 22      bulbar    2002-06-01 2016-06-21
#> 23      bulbar    2007-08-12 2017-04-01
#> 24      bulbar    2006-08-12 2002-12-02
#> 25 respiratory    2006-08-11 2016-03-03
#> 26      spinal    2005-01-04 2011-10-05
#> 27 respiratory    2009-08-25 2015-03-11
#> 28      bulbar    2002-05-11 2017-11-09
#> 29      bulbar    2004-07-27 2014-03-27
#> 30      bulbar    2005-11-11 2015-05-30
#> 31      bulbar    2008-02-27 2014-07-05

QC mapping

The als_data_qc_mapping is an R list which contains 3 tables specifying all the tests used for quality control.

Missing

als_data_qc_mapping$missing
#> # A tibble: 13 × 3
#>    qc_type    variable type   
#>    <chr>      <chr>    <chr>  
#>  1 duplicated subjid   text   
#>  2 missing    p1       numeric
#>  3 missing    p2       numeric
#>  4 missing    p3       numeric
#>  5 missing    p4       numeric
#>  6 missing    p5       numeric
#>  7 missing    p6       numeric
#>  8 missing    p7       numeric
#>  9 missing    p8       numeric
#> 10 missing    p9       numeric
#> 11 missing    x1r      numeric
#> 12 missing    x2r      numeric
#> 13 missing    x3r      numeric

Inconsistencies

als_data_qc_mapping$inconsistencies
#> # A tibble: 2 × 6
#>   qc_type             variable1       type1   relation     variable2    type2  
#>   <chr>               <chr>           <chr>   <chr>        <chr>        <chr>  
#> 1 inconsistent_values age_at_baseline numeric greater_than age_at_onset numeric
#> 2 inconsistent_values baseline_date   date    lower_than   death_date   date

Out of range values

als_data_qc_mapping$range
#> # A tibble: 16 × 6
#>    qc_type variable        type        lower_value upper_value categories       
#>    <chr>   <chr>           <chr>       <chr>       <chr>       <chr>            
#>  1 range   p1              numeric     1           4           <NA>             
#>  2 range   p2              numeric     1           4           <NA>             
#>  3 range   p3              numeric     1           4           <NA>             
#>  4 range   p4              numeric     1           4           <NA>             
#>  5 range   p5              numeric     1           4           <NA>             
#>  6 range   p6              numeric     1           4           <NA>             
#>  7 range   p7              numeric     1           4           <NA>             
#>  8 range   p8              numeric     1           4           <NA>             
#>  9 range   p9              numeric     1           4           <NA>             
#> 10 range   x1r             numeric     1           4           <NA>             
#> 11 range   x2r             numeric     1           4           <NA>             
#> 12 range   x3r             numeric     1           4           <NA>             
#> 13 range   age_at_baseline numeric     20          100         <NA>             
#> 14 range   age_at_onset    numeric     20          100         <NA>             
#> 15 range   death_date      date        2000-01-01  2022-01-01  <NA>             
#> 16 range   onset           categorical <NA>        <NA>        bulbar, respirat…

qc_data function

qc_data takes as arguments the data to be quality controlled and the QC mapping containing the tests to be applied.

qc_data(als_data, als_data_qc_mapping)
#> # A tibble: 13 × 19
#>    subjid p1    p2    p3    p4    p5    p6    p7    p8    p9    x1r   x2r  
#>    <chr>  <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
#>  1 30     3     1     2     1     3     1     2     4     1     1     2    
#>  2 30     3     3     1     4     2     2     1     4     3     3     1    
#>  3 11     1     3     1     3     3     4     1     <NA>  3     3     2    
#>  4 12     1     4     3     2     3     2     2     <NA>  1     3     2    
#>  5 13     1     1     4     1     1     3     4     <NA>  2     2     3    
#>  6 6      2     2     1     4     1     4     4     3     1     3     5    
#>  7 15     3     4     2     2     2     3     1     3     4     4     1    
#>  8 28     3     1     1     3     1     4     1     2     2     2     3    
#>  9 3      2     3     1     4     3     1     3     1     1     4     3    
#> 10 2      4     4     4     1     1     3     3     1     4     1     2    
#> 11 8      2     2     4     4     3     2     1     2     3     3     1    
#> 12 15     3     4     2     2     2     3     1     3     4     4     1    
#> 13 24     4     1     1     2     4     2     4     4     4     4     2    
#> # … with 7 more variables: x3r <chr>, age_at_baseline <chr>,
#> #   age_at_onset <chr>, onset <chr>, baseline_date <chr>, death_date <chr>,
#> #   finding <chr>

This will return a table with all the findings. If you want to save it, you can specify the path to be saved in output_file.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.