The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Observe and check your data in R

2017-01-29

Create observations from your data with ‘observe_if’

The observer package checks that a given dataset passes user-specified rules. The main functions are observe_if and inspect.

For instance, according to the documentation of the diamonds dataset in package ggplot2, the column depth is equal to 100*2*z/(x+y). Let us make an observation of this:

df <- ggplot2::diamonds %>% 
  mutate(depth2 = 100*2*z/(x+y)) %>% 
  observe_if(x > 0, 
             y > 0, 
             z > 0, 
             abs(depth-depth2) < 1)

obs(df)
#> # A tibble: 4 × 8
#>      Id               Predicate Passed Failed Missing      Rows Status
#> * <int>                   <chr>  <int>  <int>   <int>    <list>  <chr>
#> 1     1                   x > 0  53932      8       0 <S3: bit> failed
#> 2     2                   y > 0  53933      7       0 <S3: bit> failed
#> 3     3                   z > 0  53920     20       0 <S3: bit> failed
#> 4     4 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>

We observe that 93 rows fail to satisfy this rule. To go further we need to see what is happening; with inspect we can select the rows at stake:

inspect(df, 4)
#> # A tibble: 100 × 11
#>    carat       cut color clarity depth table price     x     y     z
#>    <dbl>     <ord> <ord>   <ord> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#> 1   1.00   Premium     G     SI2  59.1    59  3142  6.55  6.48  0.00
#> 2   1.22   Premium     J     SI2  62.6    59  3156  6.79  4.24  3.76
#> 3   1.01   Premium     H      I1  58.1    59  3167  6.66  6.60  0.00
#> 4   0.70     Ideal     G     VS2  62.7    54  3172  5.65  5.70  3.65
#> 5   1.00 Very Good     J     SI2  62.8    63  3293  6.26  6.19  3.19
#> 6   0.70   Premium     E      IF  62.9    59  3403  5.66  5.59  3.40
#> 7   1.01      Fair     F     SI2  64.6    59  3540  6.19  6.25  4.20
#> 8   1.00      Fair     G     SI1  43.0    59  3634  6.32  6.27  3.97
#> 9   0.81   Premium     E     VS2  61.5    58  3674  5.99  5.94  3.97
#> 10  1.10   Premium     G     SI2  63.0    59  3696  6.50  6.47  0.00
#> # ... with 90 more rows, and 1 more variables: depth2 <dbl>

Another way is to write it with standard evaluation:

## Write your predicates first
p <- c(~ x > 0, ~ y > 0, ~ z > 0, 
       ~ abs(depth-depth2) < 1)

## Make observations
df %>% 
  observe_if_(.dots = p) %>% 
  obs()
#> # A tibble: 8 × 8
#>      Id               Predicate Passed Failed Missing      Rows Status
#> * <int>                   <chr>  <int>  <int>   <int>    <list>  <chr>
#> 1     1                   x > 0  53932      8       0 <S3: bit> failed
#> 2     2                   y > 0  53933      7       0 <S3: bit> failed
#> 3     3                   z > 0  53920     20       0 <S3: bit> failed
#> 4     4 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> 5     5                   x > 0  53932      8       0 <S3: bit> failed
#> 6     6                   y > 0  53933      7       0 <S3: bit> failed
#> 7     7                   z > 0  53920     20       0 <S3: bit> failed
#> 8     8 abs(depth - depth2) < 1  53840     93       7 <S3: bit> failed
#> # ... with 1 more variables: Number_of_trials <int>

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.