The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to the risks package: Get Started

Installation

Once risks is released on CRAN, install.packages("risks") will be available. Currently, the development version of risks can be installed from GitHub using:

# install.packages("remotes")  # The "remotes" package needs to be installed
remotes::install_github("stopsack/risks")

An example cohort study

We use a cohort of women with breast cancer, as used by Spiegelman and Hertzmark (Am J Epidemiol 2005) and Greenland (Am J Epidemiol 2004). The the categorical exposure is stage, the binary outcome is death, and the binary confounder is receptor.

library(risks)  # provides riskratio(), riskdiff(), postestimation functions
library(dplyr)  # For data handling
library(broom)  # For tidy() model summaries

data(breastcancer)  # Load sample data

breastcancer %>%  # Display the sample data
  group_by(receptor, stage) %>% 
  summarize(deaths = sum(death), total = n(), risk = deaths/total)
#> # A tibble: 6 × 5
#> # Groups:   receptor [2]
#>   receptor stage     deaths total   risk
#>   <chr>    <fct>      <dbl> <int>  <dbl>
#> 1 High     Stage I        5    55 0.0909
#> 2 High     Stage II      17    74 0.230 
#> 3 High     Stage III      9    15 0.6   
#> 4 Low      Stage I        2    12 0.167 
#> 5 Low      Stage II       9    22 0.409 
#> 6 Low      Stage III     12    14 0.857

Adjusted risk ratios

The risk of death is higher among women with higher-stage and hormone receptor-low cancers, which also tend to be of higher stage. Using risks models to obtain (possibly multivariable-adjusted) risk ratios or risk differences is similar to the standard code for logistic models in R. As customary in R, log(RR) is returned; see below for how to obtain exponentiated values. No options for model family or link need to be supplied:

fit_rr <- riskratio(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rr)
#> 
#> Risk ratio model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Deviance Residuals: 
#>    Min      1Q  Median      3Q     Max  
#>                                         
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I     0.0000     0.0000     NaN      NaN    
#> stageStage II    0.8989     0.3875   2.320   0.0203 *  
#> stageStage III   1.8087     0.3783   4.781 1.75e-06 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                    2.5 %   97.5 %
#> stageStage I   0.0000000 0.000000
#> stageStage II  0.1395299 1.658324
#> stageStage III 1.0671711 2.550242

Adjusted risk differences

To obtain risk differences, use riskdiff, which has the same syntax:

fit_rd <- riskdiff(formula = death ~ stage + receptor, data = breastcancer)
summary(fit_rd)
#> 
#> Risk difference model, fitted via marginal standardization of a logistic model with delta method (margstd_delta).
#> Call:
#> stats::glm(formula = death ~ stage + receptor, family = binomial(link = "logit"), 
#>     data = breastcancer, start = "(no starting values)")
#> 
#> Deviance Residuals: 
#>    Min      1Q  Median      3Q     Max  
#>                                         
#> 
#> Coefficients: (3 not defined because of singularities)
#>                Estimate Std. Error z value Pr(>|z|)    
#> stageStage I    0.00000    0.00000     NaN      NaN    
#> stageStage II   0.16303    0.05964   2.734  0.00626 ** 
#> stageStage III  0.57097    0.09962   5.732 9.95e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> (Dispersion parameter for binomial family taken to be 1)
#> 
#>     Null deviance: 228.15  on 191  degrees of freedom
#> Residual deviance: 185.88  on 188  degrees of freedom
#> AIC: 193.88
#> 
#> Number of Fisher Scoring iterations: 4
#> 
#> Confidence intervals for coefficients: (delta method)
#>                     2.5 %    97.5 %
#> stageStage I   0.00000000 0.0000000
#> stageStage II  0.04614515 0.2799187
#> stageStage III 0.37571719 0.7662158

For example, the risk of death was 57 percentage points higher in women with stage III breast cancer compared to stage I (95% confidence interval, 38 to 77 percentage points), adjusting for hormone receptor status.

The model summary in riskratio() and riskdiff() includes to two additions compared to a regular glm() model:

Accessing model coefficients

The risks package provides an interface to broom::tidy(), which returns a data frame of all coefficients (risk differences in this example), their standard errors, z statistics, and confidence intervals.

tidy(fit_rd)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I      0        0         NaN    NaN         0          0     marg…
#> 2 stageStage II     0.163    0.0596      2.73   6.26e-3   0.0461     0.280 marg…
#> 3 stageStage III    0.571    0.0996      5.73   9.95e-9   0.376      0.766 marg…


In accordance with glm() standards, coefficients for relative risks are shown on the logarithmic scale. Exponentiated coefficients for risk ratios are easily obtained via tidy(..., exponentiate = TRUE):

tidy(fit_rr, exponentiate = TRUE)
#> # A tibble: 3 × 8
#>   term           estimate std.error statistic   p.value conf.low conf.high model
#>   <chr>             <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl> <chr>
#> 1 stageStage I       1        0        NaN    NaN           1         1    marg…
#> 2 stageStage II      2.46     0.387      2.32   2.03e-2     1.15      5.25 marg…
#> 3 stageStage III     6.10     0.378      4.78   1.75e-6     2.91     12.8  marg…

For example, the risk of death was 6 times higher in women with stage III breast cancer compared to stage I (95% confidence interval, 3 to 13 times), adjusting for hormone receptor status.

More post-estimation commands

Typical R functions that build on regression models can further process fitted risks models. In addition to tidy(), examples are:

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.