The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

qualitycontrol

The goal of qualitycontrol is to set a data quality control framework

Installation

You can install the qualitycontrol from GitHub with:

# install.packages("devtools")
devtools::install_github("luisgarcez11/qualitycontrol")

Data

The als_data dataset will be used to guide you through the package functionality. This data is not real, but based on data retrieved from Amyotrophic Lateral Sclerosis patients.

library(qualitycontrol)
als_data
##    subjid p1 p2 p3 p4 p5 p6 p7 p8 p9 x1r x2r x3r age_at_baseline age_at_onset
## 1       1  4  1  1  3  4  3  4  3  4   2   2   1              51           46
## 2       2  4  4  4  1  1  3  3  1  4   1   2   4              82           77
## 3       3  2  3  1  4  3  1  3  1  1   4   3   1              85           80
## 4       4  3  2  1  1  4  1  3  2  4   4   3   3              77           72
## 5       5  3  2  1  3  3  4  4  3  4   1   4   2              85           80
## 6       6  2  2  1  4  1  4  4  3  1   3   5   2              73           68
## 7       7  1  4  2  4  3  3  2  3  4   1   2   2              65           60
## 8       8  2  2  4  4  3  2  1  2  3   3   1   1              50           62
## 9       9  3  1  1  4  4  2  4  1  1   2   2   4              65           46
## 10     10  3  4  1  4  3  2  3  2  1   4   3   1              81           76
## 11     11  1  3  1  3  3  4  1 NA  3   3   2   4              51           46
## 12     12  1  4  3  2  3  2  2 NA  1   3   2   3              50           45
## 13     13  1  1  4  1  1  3  4 NA  2   2   3   1              82           77
## 14     14  3  2  2  4  3  3  3  3  2   3   4   1              76           71
## 15     15  3  4  2  2  2  3  1  3  4   4   1   4              87          376
## 16     16  3  3  2  4  3  3  1  1  2   2   4   1              50           45
## 17     17  3  2  3  1  4  1  3  2  1   4   4   2              85           80
## 18     18  4  1  3  1  3  1  3  2  2   4   3   4              57           52
## 19     19  1  3  3  2  2  2  3  2  3   2   3   2              74           69
## 20     20  2  2  4  2  3  4  2  4  1   4   1   3              59           54
## 21     21  2  3  3  2  3  2  4  4  1   1   3   3              79           74
## 22     22  4  3  1  1  3  4  2  1  4   1   2   3              53           48
## 23     23  3  3  4  3  4  1  3  4  3   2   2   2              45           40
## 24     24  4  1  1  2  4  2  4  4  4   4   2   1              72           67
## 25     25  4  3  1  3  3  4  3  2  3   3   4   2              77           72
## 26     26  2  1  1  2  4  2  4  1  2   3   2   4              65           60
## 27     27  1  1  1  1  1  1  3  3  2   2   1   1              54           49
## 28     28  3  1  1  3  1  4  1  2  2   2   3   4              50          -23
## 29     29  2  3  1  3  1  4  4  1  3   2   4   1              85           80
## 30     30  3  1  2  1  3  1  2  4  1   1   2   4              85           80
## 31     30  3  3  1  4  2  2  1  4  3   3   1   3              53           48
##          onset baseline_date death_date
## 1       bulbar    2003-03-26 2010-10-18
## 2        bulba    2003-07-03 2019-06-24
## 3       spinal    2007-01-27 9999-12-30
## 4       bulbar    2010-11-27 2018-01-04
## 5       bulbar    2006-10-25 2017-10-13
## 6       spinal    2007-04-30 2010-05-08
## 7       spinal    2002-11-15 2019-04-06
## 8       spinal    2002-12-13 2018-05-04
## 9       spinal    2005-06-02 2013-08-11
## 10      bulbar    2004-06-02 2016-05-20
## 11      bulbar    2007-03-09 2016-09-26
## 12      bulbar    2005-01-11 2010-06-20
## 13      bulbar    2010-12-22 2019-07-05
## 14      bulbar    2008-10-14 2013-08-14
## 15      spinal    2005-09-15 2010-07-20
## 16      spinal    2007-07-05 2010-08-28
## 17 respiratory    2002-08-19 2011-10-17
## 18      spinal    2002-06-30 2020-12-17
## 19 respiratory    2010-07-18 2016-05-15
## 20      spinal    2004-08-15 2015-03-15
## 21      bulbar    2006-04-07 2013-03-16
## 22      bulbar    2002-06-01 2016-06-21
## 23      bulbar    2007-08-12 2017-04-01
## 24      bulbar    2006-08-12 2002-12-02
## 25 respiratory    2006-08-11 2016-03-03
## 26      spinal    2005-01-04 2011-10-05
## 27 respiratory    2009-08-25 2015-03-11
## 28      bulbar    2002-05-11 2017-11-09
## 29      bulbar    2004-07-27 2014-03-27
## 30      bulbar    2005-11-11 2015-05-30
## 31      bulbar    2008-02-27 2014-07-05

QC mapping

The als_data_qc_mapping is an R list which contains 3 tables specifying all the tests used for quality control. You can specify your own tests, by creating an excel file and then read it using the function read_qc_mapping.

Missing

als_data_qc_mapping$missing
## # A tibble: 13 × 3
##    qc_type    variable type   
##    <chr>      <chr>    <chr>  
##  1 duplicated subjid   text   
##  2 missing    p1       numeric
##  3 missing    p2       numeric
##  4 missing    p3       numeric
##  5 missing    p4       numeric
##  6 missing    p5       numeric
##  7 missing    p6       numeric
##  8 missing    p7       numeric
##  9 missing    p8       numeric
## 10 missing    p9       numeric
## 11 missing    x1r      numeric
## 12 missing    x2r      numeric
## 13 missing    x3r      numeric

Inconsistencies

als_data_qc_mapping$inconsistencies
## # A tibble: 2 × 6
##   qc_type             variable1       type1   relation     variable2    type2  
##   <chr>               <chr>           <chr>   <chr>        <chr>        <chr>  
## 1 inconsistent_values age_at_baseline numeric greater_than age_at_onset numeric
## 2 inconsistent_values baseline_date   date    lower_than   death_date   date

Out of range values

als_data_qc_mapping$range
## # A tibble: 16 × 6
##    qc_type variable        type        lower_value upper_value categories       
##    <chr>   <chr>           <chr>       <chr>       <chr>       <chr>            
##  1 range   p1              numeric     1           4           <NA>             
##  2 range   p2              numeric     1           4           <NA>             
##  3 range   p3              numeric     1           4           <NA>             
##  4 range   p4              numeric     1           4           <NA>             
##  5 range   p5              numeric     1           4           <NA>             
##  6 range   p6              numeric     1           4           <NA>             
##  7 range   p7              numeric     1           4           <NA>             
##  8 range   p8              numeric     1           4           <NA>             
##  9 range   p9              numeric     1           4           <NA>             
## 10 range   x1r             numeric     1           4           <NA>             
## 11 range   x2r             numeric     1           4           <NA>             
## 12 range   x3r             numeric     1           4           <NA>             
## 13 range   age_at_baseline numeric     20          100         <NA>             
## 14 range   age_at_onset    numeric     20          100         <NA>             
## 15 range   death_date      date        2000-01-01  2022-01-01  <NA>             
## 16 range   onset           categorical <NA>        <NA>        bulbar, respirat…

qc_data function

qc_data takes as arguments the data to be quality controlled and the QC mapping containing the tests to be applied.

qc_data(als_data, als_data_qc_mapping)[,c("subjid","age_at_onset","onset","baseline_date","death_date","finding")]
## # A tibble: 13 × 6
##    subjid age_at_onset onset  baseline_date death_date finding                  
##    <chr>  <chr>        <chr>  <chr>         <chr>      <chr>                    
##  1 30     80           bulbar 2005-11-11    2015-05-30 subjid variable is dupli…
##  2 30     48           bulbar 2008-02-27    2014-07-05 subjid variable is dupli…
##  3 11     46           bulbar 2007-03-09    2016-09-26 variable p8 is missing   
##  4 12     45           bulbar 2005-01-11    2010-06-20 variable p8 is missing   
##  5 13     77           bulbar 2010-12-22    2019-07-05 variable p8 is missing   
##  6 6      68           spinal 2007-04-30    2010-05-08 variable x2r is out of r…
##  7 15     376          spinal 2005-09-15    2010-07-20 variable age_at_onset is…
##  8 28     -23          bulbar 2002-05-11    2017-11-09 variable age_at_onset is…
##  9 3      80           spinal 2007-01-27    9999-12-30 variable death_date is o…
## 10 2      77           bulba  2003-07-03    2019-06-24 variable onset is not a …
## 11 8      62           spinal 2002-12-13    2018-05-04 variables age_at_baselin…
## 12 15     376          spinal 2005-09-15    2010-07-20 variables age_at_baselin…
## 13 24     67           bulbar 2006-08-12    2002-12-02 variables baseline_date …

This will return a table with all the findings. If you want to save it, you can specify the path to be saved in output_file.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.