The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette of package maxEff
(Github, RPubs) documents three
methods of selecting one from many numeric
predictors for a
regression model, to ensure that the additional predictor has the
maximum effect size. The three methods are implemented in functions
add_num()
, add_dummy()
and
add_dummy_partition()
.
Examples in this vignette require that the search
path
has
Users should remove parameter mc.cores = 1L
from all
examples and use the default option, which engages all CPU cores on the
current host for macOS. The authors are forced to have
mc.cores = 1L
in this vignette in order to pass
CRAN
’s submission check.
Term / Abbreviation | Description | Reference |
---|---|---|
Forward pipe operator | ?base::pipeOp introduced in R
4.1.0 |
|
abs |
Absolute value | base::abs |
coxph |
Cox model | survival::coxph |
CRAN , R |
The Comprehensive R Archive Network | https://cran.r-project.org |
factor |
Factor, or categorical variable | base::factor |
function |
R function |
base::`function` |
groupedHyperframe |
Grouped hyper data frame | groupedHyperframe::as.groupedHyperframe |
head |
First parts of an object | utils::head ;
utils:::head.default |
hypercolumns , hyperframe |
(Hyper columns of) hyper data frame | spatstat.geom::hyperframe |
labels |
Labels from object | base::labels ;
maxEff::labels.node1 |
levels |
Levels of a factor |
base::levels |
listof |
List of objects | stats::listof |
logistic |
Logistic regression model | stats::glm(., family = binomial('logit')) |
matrix |
Matrix | base::matrix |
partition |
Stratified partition | maxEff::statusPartition ,
caret::createDataPartition |
PFS |
Progression/recurrence free survival | https://en.wikipedia.org/wiki/Progression-free_survival |
predict |
Model prediction | stats::predict ;
maxEff::predict.add_num ;
maxEff::predict.add_dummy |
quantile |
Quantile | stats::quantile |
rpart |
Recursive partitioning and regression trees | rpart::rpart |
S3 , generic ,
methods |
S3 object oriented system |
base::UseMethod ;
utils::methods ; utils::getS3method ; https://adv-r.hadley.nz/s3.html |
sort_by |
Sort an object by some criterion | base::sort_by ;
maxEff::sort_by.add_ |
subset |
Subsets of object by conditions | base::subset ;
maxEff::subset.add_dummy |
Surv |
Survival object | survival::Surv |
update |
Update and re-fit a model call | stats::update |
This work is supported by NCI R01CA222847 (I. Chervoneva, T. Zhan, and H. Rui) and R01CA253977 (H. Rui and I. Chervoneva).
node1()
: Dichotomization via First Node of Recursive
PartitioningConsider the following recursive partitioning and regression example,
data(cu.summary, package = 'rpart')
(r = rpart(Price ~ Mileage, data = cu.summary, cp = .Machine$double.eps, maxdepth = 1L))
#> n=60 (57 observations deleted due to missingness)
#>
#> node), split, n, deviance, yval
#> * denotes terminal node
#>
#> 1) root 60 983551500 12615.67
#> 2) Mileage>=24.5 25 206417200 9730.16 *
#> 3) Mileage< 24.5 35 420299300 14676.74 *
Function node1()
creates a dichotomizing rule, i.e., a
function
, based on the first node of
partitioning,
(foo = r |> node1())
#>
#> Dichotomizing Rule based on Recursive Partitioning:
#>
#> function (newx)
#> {
#> ret <- (newx >= 24.5)
#> if (all(ret, na.rm = TRUE) || !any(ret, na.rm = TRUE))
#> warning("Dichotomized value is all-0 or all-1")
#> return(ret)
#> }
#> <environment: 0x125350040>
The dichotomizing rule foo
converts a
numeric
object to a logical
object.
Developers may obtain the numeric
cutoff value of
foo
, or a brief text of to describe foo
, for
further analysis.
statusPartition()
: Stratified Partition by Survival
StatusFunction statusPartition()
performs stratified partition
based on the survival status of a Surv
response, to avoid
the situation that Cox model is degenerate if all subjects are censored.
This is a deviation from the very popular function
caret::createDataPartition()
, which stratifies
Surv
response by the quantile
s of the survival
time.
y = (survival::capacitor) |>
with(expr = Surv(time, status))
set.seed(15); id = y |>
statusPartition(times = 1L, p = .5)
table(y[id[[1L]], 2L]) / table(y[,2L]) # balanced by status
#>
#> 0 1
#> 0.5 0.5
set.seed(15); id0 = y |>
caret::createDataPartition(times = 1L, p = .5)
table(y[id0[[1L]], 2L]) / table(y[,2L]) # *not* balanced by status
#>
#> 0 1
#> 0.46875 0.56250
Data set Ki67
in package
groupedHyperframe
is a
groupedHyperframe
.
data(Ki67, package = 'groupedHyperframe')
Ki67
#> Grouped Hyperframe: ~patientID/tissueID
#>
#> 645 tissueID nested in
#> 622 patientID
#>
#> logKi67 tissueID Tstage PFS recfreesurv_mon recurrence adj_rad adj_chemo
#> 1 (numeric) TJUe_I17 2 100+ 100 0 FALSE FALSE
#> 2 (numeric) TJUe_G17 1 22 22 1 FALSE FALSE
#> 3 (numeric) TJUe_F17 1 99+ 99 0 FALSE NA
#> 4 (numeric) TJUe_D17 1 99+ 99 0 FALSE TRUE
#> 5 (numeric) TJUe_J18 1 112 112 1 TRUE TRUE
#> 6 (numeric) TJUe_N17 4 12 12 1 TRUE FALSE
#> 7 (numeric) TJUe_J17 2 64+ 64 0 FALSE FALSE
#> 8 (numeric) TJUe_F19 2 56+ 56 0 FALSE FALSE
#> 9 (numeric) TJUe_P19 2 79+ 79 0 FALSE FALSE
#> 10 (numeric) TJUe_O19 2 26 26 1 FALSE TRUE
#> histology Her2 HR node race age patientID
#> 1 3 TRUE TRUE TRUE White 66 PT00037
#> 2 3 FALSE TRUE FALSE Black 42 PT00039
#> 3 3 FALSE TRUE FALSE White 60 PT00040
#> 4 3 FALSE TRUE TRUE White 53 PT00042
#> 5 3 FALSE TRUE TRUE White 52 PT00054
#> 6 2 TRUE TRUE TRUE Black 51 PT00059
#> 7 3 FALSE TRUE TRUE Asian 50 PT00062
#> 8 2 TRUE TRUE TRUE White 37 PT00068
#> 9 3 TRUE TRUE FALSE White 68 PT00082
#> 10 2 TRUE TRUE FALSE Black 55 PT00084
s = Ki67 |>
aggregate_quantile(by = ~ patientID, probs = seq.int(from = .01, to = .99, by = .01))
#> Column(s) tissueID removed; as they are not identical per aggregation-group
Candidate of additional predictors are stored in a
numeric
-hypercolumn
logKi67.quantile
. Users are encouraged to learn
more about groupedHyperframe
class and function
aggregate_quantile()
from package
groupedHyperframe
vignette.
Partition into a training (80%) and test (20%) set.
set.seed(234); id = sample.int(n = nrow(s), size = nrow(s)*.8) |> sort.int()
s0 = s[id, , drop = FALSE] # training set
s1 = s[-id, , drop = FALSE] # test set
Let’s consider a starting model of endpoint PFS
with
predictor Tstage
on the training set s0
.
summary(m <- coxph(PFS ~ Tstage, data = s0))
#> Warning in as.data.frame.hyperframe(data, optional = TRUE): 1 variable
#> discarded in conversion to data frame
#> Warning in as.data.frame.hyperframe(data, optional = TRUE): 1 variable
#> discarded in conversion to data frame
#> Warning in as.data.frame.hyperframe(data): 1 variable discarded in conversion
#> to data frame
#> Call:
#> coxph(formula = PFS ~ Tstage, data = s0)
#>
#> n= 497, number of events= 90
#>
#> coef exp(coef) se(coef) z Pr(>|z|)
#> Tstage 0.7162 2.0466 0.1020 7.019 2.23e-12 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> exp(coef) exp(-coef) lower .95 upper .95
#> Tstage 2.047 0.4886 1.676 2.5
#>
#> Concordance= 0.679 (se = 0.028 )
#> Likelihood ratio test= 39.61 on 1 df, p=3e-10
#> Wald test = 49.27 on 1 df, p=2e-12
#> Score (logrank) test = 54.96 on 1 df, p=1e-13
numeric
Predictor with Maximum Effect
SizeFunction add_num()
treats each additional predictor as a
numeric
variable, and update
s the starting
model with each additional predictor. Function add_num()
returns an 'add_num'
object, which is a listof
objects with an internal class 'add_num_'
.
The S3
generic sort_by()
sorts the
'add_num'
object by the abs
olute value of the
regression coefficient (i.e., effect size) of the additioanal
numeric
predictor.
The S3
generic head()
chooses the top
n
element from the object returned from the previous
step.
set.seed(14837); m1 = m |>
add_num(x = ~ logKi67.quantile, mc.cores = 1L) |>
sort_by(y = abs(effsize)) |>
head(n = 2L)
m1
#> .slice(logKi67.quantile, "0.99")
#> .slice(logKi67.quantile, "0.98")
The S3 generic predict()
uses model m1
on
the test set s1
.
m1[1L] |> predict(newdata = s1)
#> .slice(logKi67.quantile, "0.99") :
#> Call:
#> coxph(formula = PFS ~ Tstage + x., data = newd)
#>
#> coef exp(coef) se(coef) z p
#> Tstage 0.4068 1.5020 0.2359 1.724 0.0847
#> x. 0.1114 1.1179 0.1926 0.579 0.5629
#>
#> Likelihood ratio test=3.21 on 2 df, p=0.2008
#> n= 125, number of events= 28
logical
Predictor with Maximum Effect
Sizeadd_dummy()
: Naive PracticeFunction add_dummy()
partitions each additional
numeric
predictor into a logical
variable
using function node1()
, and update
s the
starting model by adding in each of the dichotomized
logical
predictor. Function add_dummy()
returns an 'add_dummy'
object, which is a
listof
node1
objects.
The S3
generic subset()
subsets the the
'add_dummy'
object by the balance of partition of the
additional predictor.
The S3
generic sort_by()
sorts the
'add_dummy'
object by the abs
olute value of
regression coefficient (i.e., effect size) of the additional
logical
predictor.
The S3
generic head()
chooses the top
n
element from the object returned from the previous
step.
set.seed(14837); m2 = m |>
add_dummy(x = ~ logKi67.quantile, mc.cores = 1L) |>
subset(subset = p1 > .05 & p1 < .95) |>
sort_by(y = abs(effsize)) |>
head(n = 2L)
m2
#> .slice(logKi67.quantile, "0.7")>=6.60981019916946
#> .slice(logKi67.quantile, "0.01")>=6.02789726883965
The S3 generic predict()
uses model m2
on
the test set s1
.
m2[1L] |> predict(newdata = s1)
#> .slice(logKi67.quantile, "0.7")>=6.60981019916946 :
#> Call:
#> coxph(formula = PFS ~ Tstage + x., data = newd)
#>
#> coef exp(coef) se(coef) z p
#> Tstage 0.3644 1.4397 0.2402 1.517 0.1292
#> x.TRUE 0.6600 1.9348 0.3841 1.718 0.0857
#>
#> Likelihood ratio test=5.83 on 2 df, p=0.05415
#> n= 125, number of events= 28
add_dummy_partition()
: via Repeated PartitionsFunction add_dummy_partition()
partitions each
additional numeric
predictor into a logical
variable in the following steps.
node1()
) on the training set. Apply this dichotomizing rule
on the test set and obtain the estimated regression coefficient (i.e.,
effect size) of the additional logical
predictor.logical
predictor.Function add_dummy_partition()
also returns an
'add_dummy'
object.
set.seed(14837); m3 = m |>
add_dummy_partition(~ logKi67.quantile, times = 20L, mc.cores = 1L) |>
subset(subset = p1 > .15 & p1 < .85) |>
sort_by(y = abs(effsize), decreasing = TRUE) |>
head(n = 2L)
m3
#> .slice(logKi67.quantile, "0.14")>=6.26711450931504
#> .slice(logKi67.quantile, "0.26")>=6.38090570405786
m3[1L] |> predict(newdata = s1)
#> .slice(logKi67.quantile, "0.14")>=6.26711450931504 :
#> Call:
#> coxph(formula = PFS ~ Tstage + x., data = newd)
#>
#> coef exp(coef) se(coef) z p
#> Tstage 0.3912 1.4788 0.2348 1.666 0.0956
#> x.TRUE 0.3783 1.4598 0.3918 0.966 0.3343
#>
#> Likelihood ratio test=3.79 on 2 df, p=0.1505
#> n= 125, number of events= 28
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.