modelsum
objects togethermodelsum
within an Sweave documentmodelsum
results to a .CSV filemodelsum
object to a separate Word or HTML fileVery often we are asked to summarize model results from multiple fits into a nice table. The endpoint might be of different types (e.g., survival, case/control, continuous) and there may be several independent variables that we want to examine univariately or adjusted for certain variables such as age and sex. Locally, the SAS macros %modelsum
, %glmuniv
, and %logisuni
were written to create such summary tables. With the increasing interest in R, we have developed the function modelsum
to create similar tables within the R environment.
In developing the modelsum
function, the goal was to bring the best features of these macros into an R function. However, the task was not simply to duplicate all the functionality, but rather to make use of R’s strengths (modeling, method dispersion, flexibility in function definition and output format) and make a tool that fits the needs of R users. Additionally, the results needed to fit within the general reproducible research framework so the tables could be displayed within an R markdown report.
This report provides step-by-step directions for using the functions associated with modelsum
. All functions presented here are available within the arsenal
package. An assumption is made that users are somewhat familiar with R markdown documents. For those who are new to the topic, a good initial resource is available at rmarkdown.rstudio.com.
The first step when using the modelsum
function is to load the arsenal
package. All the examples in this report use a dataset called mockstudy
made available by Paul Novotny which includes a variety of types of variables (character, numeric, factor, ordered factor, survival) to use as examples.
> require(arsenal)
> data(mockstudy) # load data
> dim(mockstudy) # look at how many subjects and variables are in the dataset
[1] 1499 14
> # help(mockstudy) # learn more about the dataset and variables
> str(mockstudy) # quick look at the data
'data.frame': 1499 obs. of 14 variables:
$ case : int 110754 99706 105271 105001 112263 86205 99508 90158 88989 90515 ...
$ age : atomic 67 74 50 71 69 56 50 57 51 63 ...
..- attr(*, "label")= chr "Age in Years"
$ arm : atomic F: FOLFOX A: IFL A: IFL G: IROX ...
..- attr(*, "label")= chr "Treatment Arm"
$ sex : Factor w/ 2 levels "Male","Female": 1 2 2 2 2 1 1 1 2 1 ...
$ race : atomic Caucasian Caucasian Caucasian Caucasian ...
..- attr(*, "label")= chr "Race"
$ fu.time : int 922 270 175 128 233 120 369 421 387 363 ...
$ fu.stat : int 2 2 2 2 2 2 2 2 2 2 ...
$ ps : int 0 1 1 1 0 0 0 0 1 1 ...
$ hgb : num 11.5 10.7 11.1 12.6 13 10.2 13.3 12.1 13.8 12.1 ...
$ bmi : atomic 25.1 19.5 NA 29.4 26.4 ...
..- attr(*, "label")= chr "Body Mass Index (kg/m^2)"
$ alk.phos : int 160 290 700 771 350 569 162 152 231 492 ...
$ ast : int 35 52 100 68 35 27 16 12 25 18 ...
$ mdquality.s: int NA 1 1 1 NA 1 1 1 1 1 ...
$ age.ord : Ord.factor w/ 8 levels "10-19"<"20-29"<..: 6 7 4 7 6 5 4 5 5 6 ...
To create a simple linear regression table (the default), use a formula statement to specify the variables that you want summarized. The example below predicts BMI with the variables sex and age.
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
If you want to take a quick look at the table, you can use summary
on your modelsum object and the table will print out as text in your R console window. If you use summary
without any options you will see a number of \(\ \) statements which translates to “space” in HTML.
If you want a nicer version in your console window then adding the text=TRUE
option.
> summary(tab1, text=TRUE)
----------------------------------------------------------------------------------
estimate std.error p.value adj.r.squared
------------------ --------------- --------------- --------------- ---------------
(Intercept) 27.5 0.181 <0.001 0.004
sex Female -0.731 0.29 0.012 .
(Intercept) 26.4 0.752 <0.001 0
Age in Years 0.013 0.012 0.290 .
----------------------------------------------------------------------------------
In order for the report to look nice within an R markdown (knitr) report, you just need to specify results="asis"
when creating the r chunk. This changes the layout slightly (compresses it) and bolds the variable names.
> summary(tab1)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 27.5 | 0.181 | <0.001 | 0.004 |
sex Female | -0.731 | 0.29 | 0.012 | . |
(Intercept) | 26.4 | 0.752 | <0.001 | 0 |
Age in Years | 0.013 | 0.012 | 0.290 | . |
If you want a data.frame version, simply use as.data.frame
.
> as.data.frame(tab1)
term model endpoint estimate std.error p.value adj.r.squared
1 (Intercept) 1 bmi 27.500 0.181 NA 0.004
2 sex Female 1 bmi -0.731 0.290 0.012 0.004
3 (Intercept) 2 bmi 26.400 0.752 NA 0.000
4 Age in Years 2 bmi 0.013 0.012 0.290 0.000
The argument adjust
allows the user to indicate that all the variables should be adjusted for these terms.
> tab2 <- modelsum(alk.phos ~ arm + ps + hgb, adjust= ~age + sex, data=mockstudy)
> summary(tab2)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 176 | 20.6 | <0.001 | -0.001 |
Treatment Arm F: FOLFOX | -14 | 8.73 | 0.117 | . |
Treatment Arm G: IROX | -2.2 | 9.86 | 0.820 | . |
sex Female | 3.02 | 7.52 | 0.688 | . |
Age in Years | -0.017 | 0.319 | 0.956 | . |
(Intercept) | 148 | 19.6 | <0.001 | 0.045 |
ps | 46.7 | 5.99 | <0.001 | . |
sex Female | 1.17 | 7.34 | 0.874 | . |
Age in Years | -0.084 | 0.311 | 0.787 | . |
(Intercept) | 337 | 32.2 | <0.001 | 0.031 |
hgb | -14 | 2.14 | <0.001 | . |
sex Female | -6 | 7.52 | 0.426 | . |
Age in Years | 0.095 | 0.314 | 0.763 | . |
To make sure the correct model is run you need to specify “family”. The options available right now are : gaussian, binomial, survival, and poisson. If there is enough interest, additional models can be added.
Look at whether there is any evidence that AlkPhos values vary by study arm after adjusting for sex and age (assuming a linear age relationship).
> fit <- lm(alk.phos ~ arm + age + sex, data=mockstudy)
> summary(fit)
Call:
lm(formula = alk.phos ~ arm + age + sex, data = mockstudy)
Residuals:
Min 1Q Median 3Q Max
-168.80 -81.45 -47.17 37.39 853.56
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 175.54808 20.58665 8.527 <2e-16 ***
armF: FOLFOX -13.70062 8.72963 -1.569 0.117
armG: IROX -2.24498 9.86004 -0.228 0.820
age -0.01741 0.31878 -0.055 0.956
sexFemale 3.01598 7.52097 0.401 0.688
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 128.5 on 1228 degrees of freedom
(266 observations deleted due to missingness)
Multiple R-squared: 0.002552, Adjusted R-squared: -0.0006969
F-statistic: 0.7855 on 4 and 1228 DF, p-value: 0.5346
> plot(fit)
The results suggest that the endpoint may need to be transformed. Calculating the Box-Cox transformation suggests a log transformation.
> require(MASS)
> boxcox(fit)
> fit2 <- lm(log(alk.phos) ~ arm + age + sex, data=mockstudy)
> summary(fit2)
Call:
lm(formula = log(alk.phos) ~ arm + age + sex, data = mockstudy)
Residuals:
Min 1Q Median 3Q Max
-3.0098 -0.4470 -0.1065 0.4205 2.0620
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 4.9692474 0.1025239 48.469 <2e-16 ***
armF: FOLFOX -0.0766798 0.0434746 -1.764 0.078 .
armG: IROX -0.0192828 0.0491041 -0.393 0.695
age -0.0004058 0.0015876 -0.256 0.798
sexFemale 0.0179253 0.0374553 0.479 0.632
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.6401 on 1228 degrees of freedom
(266 observations deleted due to missingness)
Multiple R-squared: 0.003121, Adjusted R-squared: -0.0001258
F-statistic: 0.9613 on 4 and 1228 DF, p-value: 0.4278
> plot(fit2)
Finally, look to see whether there there is a non-linear relationship with age.
> require(gam)
> fit3 <- lm(log(alk.phos) ~ arm + ns(age, df=2) + sex, data=mockstudy)
>
> # test whether there is a difference between models
> anova(fit2,fit3)
Analysis of Variance Table
Model 1: log(alk.phos) ~ arm + age + sex
Model 2: log(alk.phos) ~ arm + ns(age, df = 2) + sex
Res.Df RSS Df Sum of Sq F Pr(>F)
1 1228 503.19
2 1227 502.07 1 1.1137 2.7218 0.09924 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
>
> # look at functional form of age
> termplot(fit3, term=2, se=T, rug=T)
In this instance it looks like there isn’t enough evidence to say that the relationship is non-linear.
broom
packageThe broom
package makes it easy to extract information from the fit.
> tmp <- tidy(fit3) # coefficients, p-values
> class(tmp)
[1] "data.frame"
> tmp
term estimate std.error statistic p.value
1 (Intercept) 4.76454026 0.14102237 33.785704 1.928465e-177
2 armF: FOLFOX -0.07668790 0.04344412 -1.765208 7.777754e-02
3 armG: IROX -0.01945575 0.04906984 -0.396491 6.918118e-01
4 ns(age, df = 2)1 0.33031939 0.26002425 1.270341 2.042041e-01
5 ns(age, df = 2)2 -0.10069469 0.09349337 -1.077025 2.816809e-01
6 sexFemale 0.01829092 0.03742970 0.488674 6.251598e-01
>
> glance(fit3)
r.squared adj.r.squared sigma statistic p.value df logLik
1 0.0053278 0.001274531 0.6396787 1.314445 0.2552466 6 -1195.653
AIC BIC deviance df.residual
1 2405.305 2441.126 502.0747 1227
> ms.logy <- modelsum(log(alk.phos) ~ arm + ps + hgb, data=mockstudy, adjust= ~age + sex,
+ family=gaussian,
+ gaussian.stats=c("estimate","CI.lower.estimate","CI.upper.estimate","p.value"))
> summary(ms.logy)
estimate | CI.lower.estimate | CI.upper.estimate | p.value | |
---|---|---|---|---|
(Intercept) | 4.97 | 4.77 | 5.17 | <0.001 |
Treatment Arm F: FOLFOX | -0.077 | -0.162 | 0.009 | 0.078 |
Treatment Arm G: IROX | -0.019 | -0.116 | 0.077 | 0.695 |
sex Female | 0.018 | -0.056 | 0.091 | 0.632 |
Age in Years | 0 | -0.004 | 0.003 | 0.798 |
(Intercept) | 4.83 | 4.64 | 5.02 | <0.001 |
ps | 0.226 | 0.167 | 0.284 | <0.001 |
sex Female | 0.009 | -0.063 | 0.081 | 0.814 |
Age in Years | -0.001 | -0.004 | 0.002 | 0.636 |
(Intercept) | 5.76 | 5.45 | 6.08 | <0.001 |
hgb | -0.069 | -0.09 | -0.048 | <0.001 |
sex Female | -0.027 | -0.101 | 0.046 | 0.468 |
Age in Years | 0 | -0.003 | 0.003 | 0.925 |
> boxplot(age ~ mdquality.s, data=mockstudy, ylab=attr(mockstudy$age,'label'), xlab='mdquality.s')
>
> fit <- glm(mdquality.s ~ age + sex, data=mockstudy, family=binomial)
> summary(fit)
Call:
glm(formula = mdquality.s ~ age + sex, family = binomial, data = mockstudy)
Deviance Residuals:
Min 1Q Median 3Q Max
-2.1832 0.4500 0.4569 0.4626 0.4756
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 2.329442 0.514684 4.526 6.01e-06 ***
age -0.002353 0.008256 -0.285 0.776
sexFemale 0.039227 0.195330 0.201 0.841
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 807.68 on 1246 degrees of freedom
Residual deviance: 807.55 on 1244 degrees of freedom
(252 observations deleted due to missingness)
AIC: 813.55
Number of Fisher Scoring iterations: 4
>
> # create Odd's ratio w/ confidence intervals
> tmp <- data.frame(summary(fit)$coef)
> tmp
Estimate Std..Error z.value Pr...z..
(Intercept) 2.329441734 0.514683688 4.5259677 6.011977e-06
age -0.002353404 0.008255814 -0.2850602 7.755980e-01
sexFemale 0.039227292 0.195330166 0.2008256 8.408350e-01
>
> tmp$OR <- round(exp(tmp[,1]),2)
> tmp$lower.CI <- round(exp(tmp[,1] - 1.96* tmp[,2]),2)
> tmp$upper.CI <- round(exp(tmp[,1] + 1.96* tmp[,2]),2)
> names(tmp)[4] <- 'P-value'
>
> kable(tmp[,c('OR','lower.CI','upper.CI','P-value')])
OR | lower.CI | upper.CI | P-value | |
---|---|---|---|---|
(Intercept) | 10.27 | 3.75 | 28.17 | 0.000006 |
age | 1.00 | 0.98 | 1.01 | 0.775598 |
sexFemale | 1.04 | 0.71 | 1.53 | 0.840835 |
>
> # Assess the predictive ability of the model
>
> # code using the pROC package
> require(pROC)
> pred <- predict(fit, type='response')
> tmp <- pROC::roc(mockstudy$mdquality.s[!is.na(mockstudy$mdquality.s)]~ pred, plot=TRUE, percent=TRUE)
> tmp$auc
Area under the curve: 50.69%
broom
packageThe broom
package makes it easy to extract information from the fit.
> tidy(fit, exp=T, conf.int=T) # coefficients, p-values, conf.intervals
term estimate std.error statistic p.value conf.low
1 (Intercept) 10.2722053 0.514683688 4.5259677 6.011977e-06 3.8305925
2 age 0.9976494 0.008255814 -0.2850602 7.755980e-01 0.9814436
3 sexFemale 1.0400068 0.195330166 0.2008256 8.408350e-01 0.7119068
conf.high
1 28.876261
2 1.013764
3 1.533763
>
> glance(fit) # model summary statistics
null.deviance df.null logLik AIC BIC deviance df.residual
1 807.6764 1246 -403.7734 813.5468 828.9323 807.5468 1244
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial))
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 | 0.507 | 0 |
Age in Years | 0.998 | 0.981 | 1.01 | 0.776 | . | . |
sexFemale | 1.04 | 0.712 | 1.53 | 0.841 | . | . |
(Intercept) | NA | NA | NA | 0.003 | 0.55 | 21 |
Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.220 | . | . |
sexFemale | 1.05 | 0.717 | 1.56 | 0.794 | . | . |
>
> fitall <- modelsum(mdquality.s ~ age, data=mockstudy, family=binomial,
+ binomial.stats=c("Nmiss2","OR","p.value"))
> summary(fitall)
OR | p.value | Nmiss2 | |
---|---|---|---|
(Intercept) | NA | <0.001 | 0 |
Age in Years | 0.998 | 0.766 | . |
> require(survival)
Loading required package: survival
>
> # multivariable model with all 3 terms
> fit <- coxph(Surv(fu.time, fu.stat) ~ age + sex + arm, data=mockstudy)
> summary(fit)
Call:
coxph(formula = Surv(fu.time, fu.stat) ~ age + sex + arm, data = mockstudy)
n= 1499, number of events= 1356
coef exp(coef) se(coef) z Pr(>|z|)
age 0.004600 1.004611 0.002501 1.839 0.0659 .
sexFemale 0.039893 1.040699 0.056039 0.712 0.4765
armF: FOLFOX -0.454650 0.634670 0.064878 -7.008 2.42e-12 ***
armG: IROX -0.140785 0.868676 0.072760 -1.935 0.0530 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95 upper .95
age 1.0046 0.9954 0.9997 1.0095
sexFemale 1.0407 0.9609 0.9324 1.1615
armF: FOLFOX 0.6347 1.5756 0.5589 0.7207
armG: IROX 0.8687 1.1512 0.7532 1.0018
Concordance= 0.563 (se = 0.009 )
Rsquare= 0.037 (max possible= 1 )
Likelihood ratio test= 56.21 on 4 df, p=1.811e-11
Wald test = 56.26 on 4 df, p=1.77e-11
Score (logrank) test = 56.96 on 4 df, p=1.259e-11
>
> # check proportional hazards assumption
> fit.z <- cox.zph(fit)
> fit.z
rho chisq p
age -0.0311 1.46 0.226
sexFemale -0.0325 1.44 0.230
armF: FOLFOX 0.0343 1.61 0.205
armG: IROX 0.0337 1.54 0.214
GLOBAL NA 4.59 0.332
> plot(fit.z[1], resid=FALSE) # makes for a cleaner picture in this case
> abline(h=coef(fit)[1], col='red')
>
> # check functional form for age using pspline (penalized spline)
> # results are returned for the linear and non-linear components
> fit2 <- coxph(Surv(fu.time, fu.stat) ~ pspline(age) + sex + arm, data=mockstudy)
> fit2
Call:
coxph(formula = Surv(fu.time, fu.stat) ~ pspline(age) + sex +
arm, data = mockstudy)
coef se(coef) se2 Chisq DF p
pspline(age), linear 0.00443 0.00237 0.00237 3.48989 1.00 0.0617
pspline(age), nonlin 13.11270 3.08 0.0047
sexFemale 0.03993 0.05610 0.05607 0.50663 1.00 0.4766
armF: FOLFOX -0.46240 0.06494 0.06493 50.69608 1.00 1.1e-12
armG: IROX -0.15243 0.07301 0.07299 4.35876 1.00 0.0368
Iterations: 6 outer, 16 Newton-Raphson
Theta= 0.954
Degrees of freedom for terms= 4.1 1.0 2.0
Likelihood ratio test=70.1 on 7.08 df, p=1.59e-12 n= 1499
>
> # plot smoothed age to visualize why significant
> termplot(fit2, se=T, terms=1)
> abline(h=0)
>
> # The c-statistic comes out in the summary of the fit
> summary(fit2)$concordance
C se(C)
0.568432549 0.008779125
>
> # It can also be calculated using the survConcordance function
> survConcordance(Surv(fu.time, fu.stat) ~ predict(fit2), data=mockstudy)
Call:
survConcordance(formula = Surv(fu.time, fu.stat) ~ predict(fit2),
data = mockstudy)
n= 1499
Concordance= 0.5684325 se= 0.008779125
concordant discordant tied.risk tied.time std(c-d)
620221.00 470282.00 5021.00 766.00 19235.49
broom
packageThe broom
package makes it easy to extract information from the fit.
> tidy(fit) # coefficients, p-values
term estimate std.error statistic p.value
1 age 0.004600011 0.002501114 1.8391844 6.588807e-02
2 sexFemale 0.039892735 0.056038632 0.7118792 4.765396e-01
3 armF: FOLFOX -0.454650445 0.064878289 -7.0077441 2.421952e-12
4 armG: IROX -0.140784996 0.072759529 -1.9349355 5.299821e-02
conf.low conf.high
1 -0.0003020836 0.009502105
2 -0.0699409642 0.149726435
3 -0.5818095536 -0.327491336
4 -0.2833910528 0.001821061
>
> glance(fit) # model summary statistics
n nevent statistic.log p.value.log statistic.sc p.value.sc
1 1499 1356 56.21071 1.811218e-11 56.9642 1.258749e-11
statistic.wald p.value.wald r.squared r.squared.max concordance
1 56.26 1.770173e-11 0.03680443 0.9999923 0.562838
std.error.concordance logLik AIC BIC
1 0.008779125 -8797.588 17603.18 17624.03
> ##Note: You must use quotes when specifying family="survival"
> ## family=survival will not work
> summary(modelsum(Surv(fu.time, fu.stat) ~ arm,
+ adjust=~age + sex, data=mockstudy, family="survival"))
HR | CI.lower.HR | CI.upper.HR | p.value | concordance | |
---|---|---|---|---|---|
Treatment Arm F: FOLFOX | 0.635 | 0.559 | 0.721 | <0.001 | 0.563 |
Treatment Arm G: IROX | 0.869 | 0.753 | 1 | 0.053 | . |
sexFemale | 1.04 | 0.932 | 1.16 | 0.477 | . |
age | 1 | 1 | 1.01 | 0.066 | . |
>
> ##Note: the pspline term is not working yet
> #summary(modelsum(Surv(fu.time, fu.stat) ~ arm,
> # adjust=~pspline(age) + sex, data=mockstudy, family='survival'))
Poisson regression is useful when predicting an outcome variable representing counts. It can also be useful when looking at survival data. Cox models and Poisson models are very closely related and survival data can be summarized using Poisson regression. If you have overdispersion (see if the residual deviance is much larger than degrees of freedom), you may want to use quasipoisson()
instead of poisson()
. Some of these diagnostics need to be done outside of modelsum
.
For the first example, use the solder dataset available in the rpart
package. The endpoint skips
has a definite Poisson look.
> require(rpart) ##just to get access to solder dataset
> data(solder)
> hist(solder$skips)
>
> fit <- glm(skips ~ Opening + Solder + Mask , data=solder, family=poisson)
> anova(fit, test='Chi')
Analysis of Deviance Table
Model: poisson, link: log
Response: skips
Terms added sequentially (first to last)
Df Deviance Resid. Df Resid. Dev Pr(>Chi)
NULL 719 6855.7
Opening 2 2524.56 717 4331.1 < 2.2e-16 ***
Solder 1 936.95 716 3394.2 < 2.2e-16 ***
Mask 3 1653.09 713 1741.1 < 2.2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
> summary(fit)
Call:
glm(formula = skips ~ Opening + Solder + Mask, family = poisson,
data = solder)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.7252 -1.3409 -0.6276 0.6930 5.2342
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -1.30871 0.08068 -16.222 < 2e-16 ***
OpeningM 0.25851 0.06656 3.884 0.000103 ***
OpeningS 1.89349 0.05363 35.306 < 2e-16 ***
SolderThin 1.09973 0.03864 28.465 < 2e-16 ***
MaskA3 0.42819 0.07547 5.674 1.4e-08 ***
MaskB3 1.20225 0.06697 17.953 < 2e-16 ***
MaskB6 1.86648 0.06310 29.580 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 6855.7 on 719 degrees of freedom
Residual deviance: 1741.1 on 713 degrees of freedom
AIC: 3337.2
Number of Fisher Scoring iterations: 5
Overdispersion is when the Residual deviance is larger than the degrees of freedom. This can be tested, approximately using the following code. The goal is to have a p-value that is \(>0.05\).
> 1-pchisq(fit$deviance, fit$df.residual)
[1] 0
One possible solution is to use the quasipoisson family instead of the poisson family. This adjusts for the overdispersion.
> fit2 <- glm(skips ~ Opening + Solder + Mask, data=solder, family=quasipoisson)
> summary(fit2)
Call:
glm(formula = skips ~ Opening + Solder + Mask, family = quasipoisson,
data = solder)
Deviance Residuals:
Min 1Q Median 3Q Max
-4.7252 -1.3409 -0.6276 0.6930 5.2342
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -1.30871 0.12496 -10.473 < 2e-16 ***
OpeningM 0.25851 0.10310 2.507 0.012382 *
OpeningS 1.89349 0.08307 22.794 < 2e-16 ***
SolderThin 1.09973 0.05984 18.377 < 2e-16 ***
MaskA3 0.42819 0.11689 3.663 0.000268 ***
MaskB3 1.20225 0.10372 11.591 < 2e-16 ***
MaskB6 1.86648 0.09774 19.097 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 2.399074)
Null deviance: 6855.7 on 719 degrees of freedom
Residual deviance: 1741.1 on 713 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
broom
packageThe broom
package makes it easy to extract information from the fit.
> tidy(fit) # coefficients, p-values
term estimate std.error statistic p.value
1 (Intercept) -1.3087062 0.08067587 -16.221780 3.537930e-59
2 OpeningM 0.2585107 0.06656163 3.883780 1.028452e-04
3 OpeningS 1.8934884 0.05363137 35.305612 4.816124e-273
4 SolderThin 1.0997315 0.03863508 28.464582 3.216362e-178
5 MaskA3 0.4281934 0.07546810 5.673833 1.396375e-08
6 MaskB3 1.2022472 0.06696662 17.952933 4.552147e-72
7 MaskB6 1.8664830 0.06309987 29.579826 2.716304e-192
>
> glance(fit) # model summary statistics
null.deviance df.null logLik AIC BIC deviance df.residual
1 6855.69 719 -1661.623 3337.247 3369.302 1741.08 713
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="quasipoisson"))
RR | CI.lower.RR | CI.upper.RR | p.value | |
---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 |
Opening M | 1.29 | 0.915 | 1.84 | 0.147 |
Opening S | 6.64 | 5.06 | 8.89 | <0.001 |
(Intercept) | NA | NA | NA | <0.001 |
Solder Thin | 3 | 2.34 | 3.89 | <0.001 |
(Intercept) | NA | NA | NA | 0.007 |
Mask A3 | 1.53 | 0.99 | 2.41 | 0.059 |
Mask B3 | 3.33 | 2.27 | 5.01 | <0.001 |
Mask B6 | 6.47 | 4.53 | 9.53 | <0.001 |
> summary(modelsum(skips~Opening + Solder + Mask, data=solder, family="poisson"))
RR | CI.lower.RR | CI.upper.RR | p.value | |
---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 |
Opening M | 1.29 | 1.14 | 1.48 | <0.001 |
Opening S | 6.64 | 5.99 | 7.39 | <0.001 |
(Intercept) | NA | NA | NA | <0.001 |
Solder Thin | 3 | 2.79 | 3.24 | <0.001 |
(Intercept) | NA | NA | NA | <0.001 |
Mask A3 | 1.53 | 1.32 | 1.78 | <0.001 |
Mask B3 | 3.33 | 2.92 | 3.8 | <0.001 |
Mask B6 | 6.47 | 5.72 | 7.33 | <0.001 |
This second example uses the survival endpoint available in the mockstudy
dataset. There is a close relationship between survival and Poisson models, and often it is easier to fit the model using Poisson regression, especially if you want to present absolute risk.
> # add .01 to the follow-up time (.01*1 day) in order to keep everyone in the analysis
> fit <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm, data=mockstudy, family=poisson)
> summary(fit)
Call:
glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex +
arm, family = poisson, data = mockstudy)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1188 -0.4041 0.3242 0.9727 4.3588
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -5.875627 0.108984 -53.913 < 2e-16 ***
age 0.003724 0.001705 2.184 0.0290 *
sexFemale 0.027321 0.038575 0.708 0.4788
armF: FOLFOX -0.335141 0.044600 -7.514 5.72e-14 ***
armG: IROX -0.107776 0.050643 -2.128 0.0333 *
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for poisson family taken to be 1)
Null deviance: 2113.5 on 1498 degrees of freedom
Residual deviance: 2048.0 on 1494 degrees of freedom
AIC: 5888.2
Number of Fisher Scoring iterations: 5
> 1-pchisq(fit$deviance, fit$df.residual)
[1] 0
>
> coef(coxph(Surv(fu.time,fu.stat) ~ age + sex + arm, data=mockstudy))
age sexFemale armF: FOLFOX armG: IROX
0.004600011 0.039892735 -0.454650445 -0.140784996
> coef(fit)[-1]
age sexFemale armF: FOLFOX armG: IROX
0.003723763 0.027320917 -0.335141090 -0.107775577
>
> # results from the Poisson model can then be described as risk ratios (similar to the hazard ratio)
> exp(coef(fit)[-1])
age sexFemale armF: FOLFOX armG: IROX
1.0037307 1.0276976 0.7152372 0.8978291
>
> # As before, we can model the dispersion which alters the standard error
> fit2 <- glm(fu.stat ~ offset(log(fu.time+.01)) + age + sex + arm,
+ data=mockstudy, family=quasipoisson)
> summary(fit2)
Call:
glm(formula = fu.stat ~ offset(log(fu.time + 0.01)) + age + sex +
arm, family = quasipoisson, data = mockstudy)
Deviance Residuals:
Min 1Q Median 3Q Max
-3.1188 -0.4041 0.3242 0.9727 4.3588
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -5.875627 0.566666 -10.369 <2e-16 ***
age 0.003724 0.008867 0.420 0.675
sexFemale 0.027321 0.200572 0.136 0.892
armF: FOLFOX -0.335141 0.231899 -1.445 0.149
armG: IROX -0.107776 0.263318 -0.409 0.682
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for quasipoisson family taken to be 27.03493)
Null deviance: 2113.5 on 1498 degrees of freedom
Residual deviance: 2048.0 on 1494 degrees of freedom
AIC: NA
Number of Fisher Scoring iterations: 5
broom
packageThe broom
package makes it easy to extract information from the fit.
> tidy(fit) ##coefficients, p-values
term estimate std.error statistic p.value
1 (Intercept) -5.875626610 0.108984423 -53.9125359 0.000000e+00
2 age 0.003723763 0.001705363 2.1835606 2.899455e-02
3 sexFemale 0.027320917 0.038575062 0.7082533 4.787879e-01
4 armF: FOLFOX -0.335141090 0.044600079 -7.5143610 5.718959e-14
5 armG: IROX -0.107775577 0.050642805 -2.1281518 3.332450e-02
>
> glance(fit) ##model summary statistics
null.deviance df.null logLik AIC BIC deviance df.residual
1 2113.504 1498 -2939.082 5888.164 5914.727 2047.979 1494
modelsum
Remember that the result from modelsum
is different from the fit
above. The modelsum
summary shows the results for age + offset(log(fu.time+.01))
then sex + offset(log(fu.time+.01))
instead of age + sex + arm + offset(log(fu.time+.01))
.
> summary(modelsum(fu.stat ~ age, adjust=~offset(log(fu.time+.01))+ sex + arm,
+ data=mockstudy, family=poisson))
RR | CI.lower.RR | CI.upper.RR | p.value | |
---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 |
Age in Years | 1 | 1 | 1.01 | 0.029 |
armF: FOLFOX | 0.715 | 0.656 | 0.781 | <0.001 |
armG: IROX | 0.898 | 0.813 | 0.991 | 0.033 |
sexFemale | 1.03 | 0.953 | 1.11 | 0.479 |
Here are multiple examples showing how to use some of the different options.
There are standard settings for each type of model regarding what information is summarized in the table. This behavior can be modified using the modelsum.control function. In fact, you can save your standard settings and use that for future tables.
> mycontrols <- modelsum.control(gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
+ show.adjust=FALSE, show.intercept=FALSE)
> tab2 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy, control=mycontrols)
> summary(tab2)
estimate | std.error | adj.r.squared | |
---|---|---|---|
Age in Years | 0.012 | 0.012 | 0.004 |
You can also change these settings directly in the modelsum call.
> tab3 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy,
+ gaussian.stats=c("estimate","std.error","adj.r.squared","Nmiss"),
+ show.intercept=FALSE, show.adjust=FALSE)
> summary(tab3)
estimate | std.error | adj.r.squared | |
---|---|---|---|
Age in Years | 0.012 | 0.012 | 0.004 |
In the above example, age is shown with a label (Age in Years), but sex is listed “as is”. This is because the data was created in SAS and in the SAS dataset, age had a label but sex did not. The label is stored as an attribute within R.
> ## Look at one variable's label
> attr(mockstudy$age,'label')
[1] "Age in Years"
>
> ## See all the variables with a label
> unlist(lapply(mockstudy,'attr','label'))
age arm
"Age in Years" "Treatment Arm"
race bmi
"Race" "Body Mass Index (kg/m^2)"
>
> ## or
> cbind(sapply(mockstudy,attr,'label'))
[,1]
case NULL
age "Age in Years"
arm "Treatment Arm"
sex NULL
race "Race"
fu.time NULL
fu.stat NULL
ps NULL
hgb NULL
bmi "Body Mass Index (kg/m^2)"
alk.phos NULL
ast NULL
mdquality.s NULL
age.ord NULL
If you want to add labels to other variables, there are a couple of options. First, you could add labels to the variables in your dataset.
> attr(mockstudy$age,'label') <- 'Age, yrs'
>
> tab1 <- modelsum(bmi ~ age, adjust=~sex, data=mockstudy)
> summary(tab1)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 26.8 | 0.766 | <0.001 | 0.004 |
Age, yrs | 0.012 | 0.012 | 0.348 | . |
sex Female | -0.718 | 0.291 | 0.014 | . |
Another option is to add labels after you have created the table
> mylabels <- list(sexFemale = "Female", age ="Age, yrs")
> summary(tab1, labelTranslations = mylabels)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 26.8 | 0.766 | <0.001 | 0.004 |
Age, yrs | 0.012 | 0.012 | 0.348 | . |
sex Female | -0.718 | 0.291 | 0.014 | . |
Alternatively, you can check the variable labels and manipulate them with a function called labels, which works on the tableby object.
> labels(tab1)
bmi age
"Body Mass Index (kg/m^2)" "Age, yrs"
sexFemale
"sex Female"
> labels(tab1) <- c(sexFemale="Female", age="Baseline Age (yrs)")
> labels(tab1)
bmi age
"Body Mass Index (kg/m^2)" "Baseline Age (yrs)"
sexFemale
"Female"
> summary(tab1)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 26.8 | 0.766 | <0.001 | 0.004 |
Baseline Age (yrs) | 0.012 | 0.012 | 0.348 | . |
Female | -0.718 | 0.291 | 0.014 | . |
> summary(modelsum(age~mdquality.s+sex, data=mockstudy), show.intercept=FALSE)
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
mdquality.s | -0.326 | 1.09 | 0.766 | -0.001 | 252 |
sex Female | -1.2 | 0.61 | 0.048 | 0.002 | 0 |
> summary(modelsum(mdquality.s ~ age + bmi, data=mockstudy, adjust=~sex, family=binomial),
+ show.adjust=FALSE)
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 | 0.507 | 0 |
Age, yrs | 0.998 | 0.981 | 1.01 | 0.776 | . | . |
(Intercept) | NA | NA | NA | 0.003 | 0.55 | 21 |
Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.220 | . | . |
Often one wants to summarize a number of variables. Instead of typing by hand each individual variable, an alternative approach is to create a formula using the paste
command with the collapse="+"
option.
> # create a vector specifying the variable names
> myvars <- names(mockstudy)
>
> # select the 8th through the 12th
> # paste them together, separated by the + sign
> RHS <- paste(myvars[8:12], collapse="+")
> RHS
[1] “ps+hgb+bmi+alk.phos+ast”
>
> # create a formula using the as.formula function
> as.formula(paste('mdquality.s ~ ', RHS))
mdquality.s ~ ps + hgb + bmi + alk.phos + ast
>
> # use the formula in the modelsum function
> summary(modelsum(as.formula(paste('mdquality.s ~', RHS)), family=binomial, data=mockstudy))
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 | 0.62 | 208 |
ps | 0.461 | 0.332 | 0.639 | <0.001 | . | . |
(Intercept) | NA | NA | NA | 0.783 | 0.573 | 208 |
hgb | 1.18 | 1.04 | 1.33 | 0.011 | . | . |
(Intercept) | NA | NA | NA | 0.002 | 0.549 | 21 |
Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.225 | . | . |
(Intercept) | NA | NA | NA | <0.001 | 0.552 | 208 |
alk.phos | 0.999 | 0.998 | 1 | 0.159 | . | . |
(Intercept) | NA | NA | NA | <0.001 | 0.545 | 208 |
ast | 0.995 | 0.988 | 1 | 0.099 | . | . |
These steps can also be done using the formulize
function.
> ## The formulize function does the paste and as.formula steps
> tmp <- formulize('mdquality.s',myvars[8:10])
> tmp
mdquality.s ~ ps + hgb + bmi <environment: 0x7453118>
>
> ## More complex formulas could also be written using formulize
> tmp2 <- formulize('mdquality.s',c('ps','hgb','sqrt(bmi)'))
>
> ## use the formula in the modelsum function
> summary(modelsum(tmp, data=mockstudy, family=binomial))
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 | 0.62 | 208 |
ps | 0.461 | 0.332 | 0.639 | <0.001 | . | . |
(Intercept) | NA | NA | NA | 0.783 | 0.573 | 208 |
hgb | 1.18 | 1.04 | 1.33 | 0.011 | . | . |
(Intercept) | NA | NA | NA | 0.002 | 0.549 | 21 |
Body Mass Index (kg/m^2) | 1.02 | 0.987 | 1.06 | 0.225 | . | . |
Here are two ways to get the same result (limit the analysis to subjects age>50 and in the F: FOLFOX treatment group).
mockstudy
. This example also selects a subset of variables. The modelsum
function is then applied to this subsetted data.> newdata <- subset(mockstudy, subset=age>50 & arm=='F: FOLFOX', select = c(age,sex, bmi:alk.phos))
> dim(mockstudy)
[1] 1499 14
> table(mockstudy$arm)
A: IFL F: FOLFOX G: IROX
428 691 380
> dim(newdata)
[1] 557 4
> names(newdata)
[1] "age" "sex" "bmi" "alk.phos"
> summary(modelsum(alk.phos ~ ., data=newdata))
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
(Intercept) | 123 | 46.9 | 0.009 | -0.001 | 0 |
age | 0.619 | 0.719 | 0.390 | . | . |
(Intercept) | 165 | 7.67 | <0.001 | -0.002 | 0 |
sex Female | -5.5 | 12.1 | 0.650 | . | . |
(Intercept) | 239 | 33.7 | <0.001 | 0.01 | 11 |
bmi | -2.8 | 1.21 | 0.022 | . | . |
modelsum
to subset the data.> summary(modelsum(log(alk.phos) ~ sex + ps + bmi, subset=age>50 & arm=="F: FOLFOX", data=mockstudy))
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
(Intercept) | 4.87 | 0.039 | <0.001 | -0.002 | 0 |
sex Female | -0.005 | 0.062 | 0.931 | . | . |
(Intercept) | 4.77 | 0.04 | <0.001 | 0.027 | 0 |
ps | 0.183 | 0.05 | <0.001 | . | . |
(Intercept) | 5.21 | 0.172 | <0.001 | 0.007 | 11 |
bmi | -0.012 | 0.006 | 0.044 | . | . |
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset = age>50 & bmi<24, data=mockstudy))
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 179 | 14.6 | <0.001 | 0.007 |
ps | 20.8 | 13.4 | 0.122 | . |
sex Female | -18 | 16.7 | 0.293 | . |
(Intercept) | 373 | 104 | <0.001 | 0.009 |
bmi | -8.2 | 4.73 | 0.083 | . |
sex Female | -24 | 16.9 | 0.155 | . |
> summary(modelsum(alk.phos ~ ps + bmi, adjust=~sex, subset=1:30, data=mockstudy))
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
(Intercept) | 169 | 57 | 0.006 | 0.294 | 0 |
ps | 255 | 68.1 | <0.001 | . | . |
sex Female | 49.6 | 67.6 | 0.470 | . | . |
(Intercept) | 453 | 201 | 0.033 | -0.049 | 1 |
bmi | -6 | 7.41 | 0.426 | . | . |
sex Female | -22 | 79.8 | 0.782 | . | . |
> ## create a variable combining the levels of mdquality.s and sex
> with(mockstudy, table(interaction(mdquality.s,sex)))
0.Male 1.Male 0.Female 1.Female
77 686 47 437
> summary(modelsum(age ~ interaction(mdquality.s,sex), data=mockstudy))
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
(Intercept) | 59.7 | 1.31 | <0.001 | 0.003 | 252 |
interaction(mdquality.s, sex)1.Male | 0.73 | 1.39 | 0.598 | . | . |
interaction(mdquality.s, sex)0.Female | 0.988 | 2.13 | 0.643 | . | . |
interaction(mdquality.s, sex)1.Female | -1 | 1.42 | 0.474 | . | . |
Certain transformations need to be surrounded by I()
so that R knows to treat it as a variable transformation and not some special model feature. If the transformation includes any of the symbols / - + ^ *
then surround the new variable by I()
.
> summary(modelsum(arm=="F: FOLFOX" ~ I(age/10) + log(bmi) + mdquality.s,
+ data=mockstudy, family=binomial))
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | 0.126 | 0.514 | 0 |
Age, yrs | 1.05 | 0.957 | 1.14 | 0.326 | . | . |
(Intercept) | NA | NA | NA | 0.611 | 0.508 | 33 |
Body Mass Index (kg/m^2) | 1.09 | 0.638 | 1.87 | 0.748 | . | . |
(Intercept) | NA | NA | NA | 0.074 | 0.502 | 252 |
mdquality.s | 1.04 | 0.719 | 1.53 | 0.819 | . | . |
> mytab <- modelsum(bmi ~ sex + alk.phos + age, data=mockstudy)
> mytab2 <- mytab[c('age','sex','alk.phos')]
> summary(mytab2)
estimate | std.error | p.value | adj.r.squared | Nmiss | |
---|---|---|---|---|---|
(Intercept) | 26.4 | 0.752 | <0.001 | 0 | 0 |
Age, yrs | 0.013 | 0.012 | 0.290 | . | . |
(Intercept) | 27.5 | 0.181 | <0.001 | 0.004 | 0 |
sex Female | -0.731 | 0.29 | 0.012 | . | . |
(Intercept) | 27.9 | 0.253 | <0.001 | 0.011 | 261 |
alk.phos | -0.005 | 0.001 | <0.001 | . | . |
> summary(mytab[c('age','sex')])
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 26.4 | 0.752 | <0.001 | 0 |
Age, yrs | 0.013 | 0.012 | 0.290 | . |
(Intercept) | 27.5 | 0.181 | <0.001 | 0.004 |
sex Female | -0.731 | 0.29 | 0.012 | . |
> summary(mytab[c(3,1)])
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 26.4 | 0.752 | <0.001 | 0 |
Age, yrs | 0.013 | 0.012 | 0.290 | . |
(Intercept) | 27.5 | 0.181 | <0.001 | 0.004 |
sex Female | -0.731 | 0.29 | 0.012 | . |
modelsum
objects togetherIt is possible to combine two modelsum objects so that they print out together, however you need to pay attention to the columns that are being displayed. It is easier to combine two models of the same family (such as two sets of linear models). If you want to combine linear and logistic model results then you would want to display the beta coefficients for the logistic model.
> ## demographics
> tab1 <- modelsum(bmi ~ sex + age, data=mockstudy)
> ## lab data
> tab2 <- modelsum(mdquality.s ~ hgb + alk.phos, data=mockstudy, family=binomial)
>
> tab12 <- merge(tab1,tab2)
> class(tab12)
[1] “modelsumList”
>
> ##ERROR: The merge works, but not the summary
> #summary(tab12)
When creating a pdf the tables are automatically numbered and the title appears below the table. In Word and HTML, the titles appear un-numbered and above the table.
> t1 <- modelsum(bmi ~ sex + age, data=mockstudy)
> summary(t1, title='Demographics')
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 27.5 | 0.181 | <0.001 | 0.004 |
sex Female | -0.731 | 0.29 | 0.012 | . |
(Intercept) | 26.4 | 0.752 | <0.001 | 0 |
Age, yrs | 0.013 | 0.012 | 0.290 | . |
Depending on the report you are writing you have the following options:
Use all values available for each variable
Use only those subjects who have measurements available for all the variables
> ## look at how many missing values there are for each variable
> apply(is.na(mockstudy),2,sum)
case age arm sex race fu.time
0 0 0 0 7 0
fu.stat ps hgb bmi alk.phos ast
0 266 266 33 266 266
mdquality.s age.ord
252 0
> ## Show how many subjects have each variable (non-missing)
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+ control=modelsum.control(gaussian.stats=c("N","estimate"))))
estimate | N | |
---|---|---|
(Intercept) | 27.3 | 1205 |
ast | -0.005 | . |
(Intercept) | 26.4 | 1466 |
Age, yrs | 0.013 | . |
>
> ## Always list the number of missing values
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+ control=modelsum.control(gaussian.stats=c("Nmiss2","estimate"))))
estimate | Nmiss2 | |
---|---|---|
(Intercept) | 27.3 | 261 |
ast | -0.005 | . |
(Intercept) | 26.4 | 0 |
Age, yrs | 0.013 | . |
>
> ## Only show the missing values if there are some (default)
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+ control=modelsum.control(gaussian.stats=c("Nmiss","estimate"))))
estimate | Nmiss | |
---|---|---|
(Intercept) | 27.3 | 261 |
ast | -0.005 | . |
(Intercept) | 26.4 | 0 |
Age, yrs | 0.013 | . |
>
> ## Don't show N at all
> summary(modelsum(bmi ~ ast + age, data=mockstudy,
+ control=modelsum.control(gaussian.stats=c("estimate"))))
estimate | |
---|---|
(Intercept) | 27.3 |
ast | -0.005 |
(Intercept) | 26.4 |
Age, yrs | 0.013 |
Within modelsum.control function there are 4 options for controlling the number of significant digits shown.
digits: controls the number of significant digits (counting both before and after the decimal point) for continuous variables
nsmall: controls the number of digits after the decimal point for the beta and standard error
nsmall.ratio: controls the number of digits for the ratio statistics (OR, HR, RR), default=2
digits.test: controls the number of digits after the decimal point for p-values (default=3)
> summary(modelsum(bmi ~ sex + age + fu.time, data=mockstudy), digits=4, digits.test=2)
estimate | std.error | p.value | adj.r.squared | |
---|---|---|---|---|
(Intercept) | 27.49 | 0.1813 | <0.01 | 0.0036 |
sex Female | -0.7311 | 0.2903 | 0.01 | . |
(Intercept) | 26.42 | 0.7521 | <0.01 | 1e-04 |
Age, yrs | 0.013 | 0.0123 | 0.29 | . |
(Intercept) | 26.49 | 0.2447 | <0.01 | 0.0079 |
fu.time | 0.0011 | 3e-04 | <0.01 | . |
It is important to understand how R treats the digits
argument. Here are some summaries for the variable pi
. Note that with 4 digits, the number after the decimal point changes after multiplying pi by 10 or 100. However, the nsmall
option specifies the number of values after the decimal point. The two can be used together (see the help file for format
for more details on how that works).
> format(pi, digits=1)
[1] "3"
> format(pi, digits=3)
[1] "3.14"
> format(pi, digits=4)
[1] "3.142"
> format(pi*10, digits=4)
[1] "31.42"
> format(pi*100, digits=4)
[1] "314.2"
> format(pi*100, nsmall=4)
[1] "314.1593"
> format(pi*100, nsmall=2, digits=4)
[1] "314.16"
Occasionally it is of interest to fit models using case weights. The modelsum
function allows you to pass on the weights to the models and it will do the appropriate fit.
> mockstudy$agegp <- cut(mockstudy$age, breaks=c(18,50,60,70,90), right=FALSE)
>
> ## create weights based on agegp and sex distribution
> tab1 <- with(mockstudy,table(agegp, sex))
> tab1
sex
agegp Male Female
[18,50) 152 110
[50,60) 258 178
[60,70) 295 173
[70,90) 211 122
> tab2 <- with(mockstudy, table(agegp, sex, arm))
> gpwts <- rep(tab1, length(unique(mockstudy$arm)))/tab2
>
> ## apply weights to subjects
> index <- with(mockstudy, cbind(as.numeric(agegp), as.numeric(sex), as.numeric(as.factor(arm))))
> mockstudy$wts <- gpwts[index]
>
> ## show weights by treatment arm group
> tapply(mockstudy$wts,mockstudy$arm, summary)
$`A: IFL`
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.923 3.225 3.548 3.502 3.844 4.045
$`F: FOLFOX`
Min. 1st Qu. Median Mean 3rd Qu. Max.
2.033 2.070 2.201 2.169 2.263 2.303
$`G: IROX`
Min. 1st Qu. Median Mean 3rd Qu. Max.
3.667 3.734 4.023 3.945 4.031 4.471
> mockstudy$newvarA <- as.numeric(mockstudy$arm=='A: IFL')
> tab1 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'),
+ family=binomial)
> summary(tab1, title='No Case Weights used')
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | <0.001 | 0.55 | 210 |
ast | 1 | 0.998 | 1.01 | 0.258 | . | . |
(Intercept) | NA | NA | NA | 0.091 | 0.5 | 29 |
bmi | 1 | 0.98 | 1.03 | 0.808 | . | . |
(Intercept) | NA | NA | NA | 0.990 | 0.514 | 210 |
hgb | 0.965 | 0.894 | 1.04 | 0.372 | . | . |
>
> suppressWarnings({
+ tab2 <- modelsum(newvarA ~ ast + bmi + hgb, data=mockstudy, subset=(arm !='G: IROX'),
+ weights=wts, family=binomial)
+ summary(tab2, title='Case Weights used')
+ })
OR | CI.lower.OR | CI.upper.OR | p.value | concordance | Nmiss | |
---|---|---|---|---|---|---|
(Intercept) | NA | NA | NA | 0.504 | 0.55 | 210 |
ast | 1 | 1 | 1.01 | 0.068 | . | . |
(Intercept) | NA | NA | NA | 0.820 | 0.5 | 29 |
bmi | 1 | 0.988 | 1.02 | 0.780 | . | . |
(Intercept) | NA | NA | NA | 0.039 | 0.514 | 210 |
hgb | 0.956 | 0.913 | 1 | 0.058 | . | . |
modelsum
within an Sweave documentFor those users who wish to create tables within an Sweave document, the following code seems to work.
\documentclass{article}
\usepackage{longtable}
\usepackage{pdfpages}
\begin{document}
\section{Read in Data}
<<echo=TRUE>>=
require(arsenal)
require(knitr)
require(rmarkdown)
data(mockstudy)
tab1 <- modelsum(bmi~sex+age, data=mockstudy)
@
\section{Convert Summary.modelsum to LaTeX}
<<echo=TRUE, results='hide', message=FALSE>>=
capture.output(summary(tab1), file="Test.md")
## Convert R Markdown Table to LaTeX
render("Test.md", pdf_document(keep_tex=TRUE))
@
\includepdf{Test.pdf}
\end{document}
modelsum
results to a .CSV fileWhen looking at multiple variables it is sometimes useful to export the results to a csv file. The as.data.frame
function creates a data frame object that can be exported or further manipulated within R.
> summary(tab2, text=T)
-----------------------------------------------------------------------------------------------------------
OR CI.lower.OR CI.upper.OR p.value concordance Nmiss
----------------- -------------- -------------- -------------- -------------- -------------- --------------
(Intercept) NA NA NA 0.504 0.55 210
ast 1 1 1.01 0.068 . .
(Intercept) NA NA NA 0.820 0.5 29
bmi 1 0.988 1.02 0.780 . .
(Intercept) NA NA NA 0.039 0.514 210
hgb 0.956 0.913 1 0.058 . .
-----------------------------------------------------------------------------------------------------------
> tmp <- as.data.frame(tab2)
> tmp
term model endpoint OR CI.lower.OR CI.upper.OR p.value
1 (Intercept) 1 newvarA NA NA NA 0.504
2 ast 1 newvarA 1.000 1.000 1.01 0.068
3 (Intercept) 2 newvarA NA NA NA 0.820
4 bmi 2 newvarA 1.000 0.988 1.02 0.780
5 (Intercept) 3 newvarA NA NA NA 0.039
6 hgb 3 newvarA 0.956 0.913 1.00 0.058
concordance Nmiss
1 0.550 210
2 0.550 210
3 0.500 29
4 0.500 29
5 0.514 210
6 0.514 210
> # write.csv(tmp, '/my/path/here/mymodel.csv')
modelsum
object to a separate Word or HTML file> ## write to an HTML document
> # write2html(tab2, "~/ibm/trash.html")
>
> ## write to a Word document
> # write2word(tab2, "~/ibm/trash.doc", title="My table in Word")
The available summary statistics, by varible type, are:
binomial
,quasibinomial
: Logistic regression modelsOR, CI.lower.OR, CI.upper.OR, p.value, concordance, Nmiss
estimate, CI.lower.estimate, CI.upper.estimate, N, Nmiss2, endpoint, std.error, statistic, logLik, AIC, BIC, null.deviance, deviance, df.residual, df.null
gaussian
: Linear regression modelsestimate, std.error, p.value, adj.r.squared, Nmiss
CI.lower.estimate, CI.upper.estimate, N, Nmiss2, statistic, standard.estimate, endpoint, r.squared, AIC, BIC, logLik, statistic.F, p.value.F
poisson
, quasipoisson
: Poisson regression modelsRR, CI.lower.RR, CI.upper.RR, p.value, concordance, Nmiss
CI.lower.estimate, CI.upper.estimate, CI.RR, Nmiss2, se, estimate, z.stat, endpoint, AIC, BIC, logLik, dispersion, null.deviance, deviance, df.residual, df.null
survival
: Cox modelsHR, CI.lower.HR, CI.upper.HR, p.value, concordance, Nmiss
CI.lower.estimate, CI.upper.estimate, N, Nmiss2, estimate, se, endpoint, Nevents, z.stat, r.squared, logLik, AIC, BIC, statistic.sc, p.value.sc, p.value.log, p.value.wald, N, std.error.concordance
The full description of these parameters that can be shown for models include:
N
: a count of the number of observations used in the analysisNmiss
: only show the count of the number of missing values if there are some missing valuesNmiss2
: always show a count of the number of missing values for a modelendpoint
: dependent variable used in the modelstd.err
: print the standard errorstatistic
: test statisticp.value
: print the p-valuer.squared
: print the model R-squareadj.r.squared
: print the model adjusted R-squarer.squared
: print the model R-squareconcordance
: print the model C statistic (which is the AUC for logistic models)logLik
: print the loglikelihood valuep.value.log
: print the p-value for the overall model likelihood testp.value.wald
: print the p-value for the overall model wald testp.value.sc
: print the p-value for overall model score testAIC
: print the Akaike information criterionBIC
: print the Bayesian information criterionnull.deviance
: null deviancedeviance
: model deviancedf.residual
: degrees of freedom for the residualdf.null
: degrees of freedom for the null modeldispersion
: This is used in Poisson models and is defined as the deviance/df.residualstatistic.sc
: overall model score statisticstd.error.concordance
: standard error for the C statisticHR
: print the hazard ratio (for survival models), i.e. exp(beta)CI.lower.HR, CI.upper.HR
: print the confidence interval for the HROR
: print the odd’s ratio (for logistic models), i.e. exp(beta)CI.lower.OR, CI.upper.OR
: print the confidence interval for the ORRR
: print the risk ratio (for poisson models), i.e. exp(beta)CI.lower.RR, CI.upper.RR
: print the confidence interval for the RRestimate
: print beta coefficientstandardized.estimate
: print the standardized beta coefficientCI.lower.estimate, CI.upper.estimate
: print the confidence interval for the beta coefficientmodelsum.control
settingsA quick way to see what arguments are possible to utilize in a function is to use the args()
command. Settings involving the number of digits can be set in modelsum.control
or in summary.modelsum
.
> args(modelsum.control)
function (digits = 3, nsmall = NULL, nsmall.ratio = 2, digits.test = 3,
show.adjust = TRUE, show.intercept = TRUE, conf.level = 0.95,
binomial.stats = c("OR", "CI.lower.OR", "CI.upper.OR", "p.value",
"concordance", "Nmiss"), gaussian.stats = c("estimate",
"std.error", "p.value", "adj.r.squared", "Nmiss"), poisson.stats = c("RR",
"CI.lower.RR", "CI.upper.RR", "p.value", "concordance",
"Nmiss"), survival.stats = c("HR", "CI.lower.HR", "CI.upper.HR",
"p.value", "concordance", "Nmiss"), ...)
NULL
Settings:
summary.modelsum
settingsThe summary.modelsum function has options that modify how the table appears (such as adding a title or modifying labels).
> args(arsenal:::summary.modelsum)
function (object, title = NULL, labelTranslations = NULL, digits = NA,
nsmall = NA, nsmall.ratio = NA, digits.test = NA, show.intercept = NA,
show.adjust = NA, text = FALSE, removeBlanks = text, labelSize = 1.2,
pfootnote = TRUE, ...)
NULL
Settings: