The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package provides tools for assessing the differences between logistic regression model-fits across sub-populations. For example, having data with sub-populations based on gender may or may not require separate models.
Consider two sub-populations of Portuguese Language course students, with 29 covariates and an output variable indicating whether the student has failed the course. These sub-populations differ by gender (M/F). These two sub-populations provide two logistic regression models, which may or may not be equivalent.
The purpose of this vignette is to exemplify to usage of equivalence tests for different stages in the model: the linear coefficients, the vector of per-example log odds ratio and the mean-square error of prediction. Each method provides a different aspect of possible model equivalence.
The data used is taken from the Student Performance Data1,2.
library(LogRegEquiv)
The regression coefficient vectors provide us an insight to how a model describes the phenomenon. In our case, how is the binary variable final_failure
affected by the covariates. When two models describe a phenomenon in a similar manner, descriptive equivalence is achieved.
"final_fail ~ ."
formula <- glm(formula = formula, family = binomial(link = "logit"),
female_model <-data = ptg_stud_f_train)
glm(formula = formula, family = binomial(link = "logit"),
male_model <-data = ptg_stud_m_train)
Denote δβ as the maximal tolerated difference per coefficient, in this case δβ=0.1.
0.1
delta_beta <-print(beta_equivalence(model_a = female_model,
model_b = male_model,
delta = delta_beta,
alpha = 0.05))
#> $equivalence
#> [,1]
#> [1,] TRUE
#>
#> $test_statistic
#> [,1]
#> [1,] 27.98613
#>
#> $critical_value
#> [1] 117.3907
#>
#> $ncp
#> [,1]
#> [1,] 114.8285
#>
#> $p_value
#> [,1]
#> [1,] 1.213615e-15
It is possible to assess descriptive equivalence even without creating the models, just by providing the descriptive_equiv
function a regression formula, two datasets and a delta
argument:
print(descriptive_equiv(data_a = ptg_stud_f_train,
data_b = ptg_stud_m_train,
formula = formula,
delta = delta_beta,
alpha = 0.05))
#> $equivalence
#> $equivalence$equivalence
#> [,1]
#> [1,] TRUE
#>
#> $equivalence$test_statistic
#> [,1]
#> [1,] 27.98613
#>
#> $equivalence$critical_value
#> [1] 117.3907
#>
#> $equivalence$ncp
#> [,1]
#> [1,] 114.8285
#>
#> $equivalence$p_value
#> [,1]
#> [1,] 1.213615e-15
#>
#>
#> $model_a
#>
#> Call: glm(formula = as.formula(formula), family = binomial(link = "logit"),
#> data = data_a)
#>
#> Coefficients:
#> (Intercept) schoolMS age addressU
#> -36.98368 24.34098 0.84479 -0.40450
#> famsizeLE3 PstatusT Medu Fedu
#> 0.91066 -0.03428 -0.01533 -0.86226
#> Mjobhealth Mjobother Mjobservices Mjobteacher
#> 1.20812 -0.42232 -0.30637 0.10319
#> Fjobhealth Fjobother Fjobservices Fjobteacher
#> 5.20915 0.96372 -1.53855 7.35800
#> reasonhome reasonother reasonreputation guardianmother
#> 1.14750 1.03742 0.36426 1.85646
#> guardianother traveltime studytime failures
#> -0.41585 -1.41104 -0.57161 2.99493
#> schoolsupyes famsupyes paidyes activitiesyes
#> 0.80975 2.13445 1.99467 -0.25541
#> nurseryyes higheryes internetyes romanticyes
#> 3.34798 -1.59043 0.37851 -1.03869
#> famrel freetime goout Dalc
#> -0.60406 0.18248 -0.86391 0.12718
#> Walc health absences
#> 0.89384 -0.26370 -0.07661
#>
#> Degrees of Freedom: 305 Total (i.e. Null); 267 Residual
#> Null Deviance: 237.3
#> Residual Deviance: 95.59 AIC: 173.6
#>
#> $model_b
#>
#> Call: glm(formula = as.formula(formula), family = binomial(link = "logit"),
#> data = data_b)
#>
#> Coefficients:
#> (Intercept) schoolMS age addressU
#> -13.03371 2.66321 -0.34577 -0.33386
#> famsizeLE3 PstatusT Medu Fedu
#> -1.00660 17.07648 0.20849 -0.56054
#> Mjobhealth Mjobother Mjobservices Mjobteacher
#> -0.64558 -0.76138 -0.59806 -1.25666
#> Fjobhealth Fjobother Fjobservices Fjobteacher
#> 1.73168 0.67765 1.20543 2.27924
#> reasonhome reasonother reasonreputation guardianmother
#> -0.09858 -0.83649 -0.48737 1.07253
#> guardianother traveltime studytime failures
#> -2.14502 -0.28351 0.16529 0.95858
#> schoolsupyes famsupyes paidyes activitiesyes
#> 1.17323 -0.31895 0.89259 -0.57924
#> nurseryyes higheryes internetyes romanticyes
#> -0.01788 -2.61458 0.43417 0.80444
#> famrel freetime goout Dalc
#> 0.17551 -0.18980 -0.13723 -0.36880
#> Walc health absences
#> 0.32404 0.13664 0.15950
#>
#> Degrees of Freedom: 212 Total (i.e. Null); 174 Residual
#> Null Deviance: 205.8
#> Residual Deviance: 116.3 AIC: 194.3
It is also possible to use different δβ value for each coefficient. In such case, the delta
argument should be provided with a vector whose length matches the number of covariates in the model. Mind the intercept and the fact that categorical variables with k≥3 values are represented in the model by multiple variables. In this example the data has 29 covariates but they are represented by 39 covariates in the model:
set.seed(1)
0.01 * runif(39)
delta_beta_vec <-print(beta_equivalence(model_a = female_model,
model_b = male_model,
delta = delta_beta_vec,
alpha = 0.05))
#> $equivalence
#> [,1]
#> [1,] FALSE
#>
#> $test_statistic
#> [,1]
#> [1,] 27.98613
#>
#> $critical_value
#> [1] 25.92615
#>
#> $ncp
#> [,1]
#> [1,] 0.3496614
#>
#> $p_value
#> [,1]
#> [1,] 0.0890721
Assume we have an existing model MA based on population A. This is our source. Given a new target population B of size k and model MB, we would like to check whether the source model fits the target population. We do so by comparing the log-odds produced by the models for the target population samples. When two models produce equivalent log-odds, individual predictive equivalence is achieved.
To set the indifference level for the individual predictive equivalence δθ, we propose looking at the distances between the estimated log odds and the classification threshold. Consider {ˆθ1,...,ˆθm} to be the m estimated log-odd values for the test population for the original model, meaning that these are predictions for a test set of population A XAtest using model MA. In a calibrated logistic regression, the classification of the ith subject by the model (ˆyi) is 1 when ˆθi>0 and 0 otherwise. The absolute log-odds |ˆθi| is minimal change to the log-odds that could change to the classification. We therefore propose setting the threshold δθ to be a small quantile r (say r=0.1) of the observed distribution of absolute log-odds δθ=|ˆθ|(⌈r⋅m⌉).
Using r=0.05, we get that both models provide equivalent log-odds for the female testing data:
0.05
r <-print(individual_predictive_equiv(model_a = female_model,
model_b = male_model,
test_data = ptg_stud_f_test,
r = r,
alpha = 0.05))
#> $equivalence
#> [1] FALSE
#>
#> $test_statistic
#> [1] 1.642021
#>
#> $critical_value
#> [1] -1.665151
#>
#> $xi_bar
#> [1] 20.50852
#>
#> $delta_theta
#> [1] 19.03885
#>
#> $p_value
#> [1] 0.9476412
On the other hand, using r=0.025 we get that both models do not provide equivalent log-odds for the male testing data:
0.025
r <-print(individual_predictive_equiv(model_a = female_model,
model_b = male_model,
test_data = ptg_stud_m_test,
r = r,
alpha = 0.05))
#> $equivalence
#> [1] FALSE
#>
#> $test_statistic
#> [1] 5.306304
#>
#> $critical_value
#> [1] -1.674689
#>
#> $xi_bar
#> [1] 20.34987
#>
#> $delta_theta
#> [1] 14.84435
#>
#> $p_value
#> [1] 0.9999988
The Brier score is reducing the assessment of the overall model performance into a single figure. Given models MA,MB and a test set Xtest of size m, we can assess the equivalence of the Brier scores obtained by the models. The acceptable score degradation δB>1 should be selected, we then see the Brier scores as equivalent if 1δ2B<BSBBSA<δ2B. The dv_index
variable indicates the column number of the dependent variable.
ptg_stud_m_test
testing_data <-print(performance_equiv(model_a = female_model,
model_b = male_model,
test_data = testing_data,
dv_index = 30,
delta_B = 1.1,
alpha = 0.05))
#> $equivalence
#> [1] FALSE
#>
#> $brier_score_ac
#> [1] 0.1886792
#>
#> $brier_score_bc
#> [1] 0.1239275
#>
#> $diff_sd_l
#> [1] 0.2183946
#>
#> $diff_sd_u
#> [1] 0.3207404
#>
#> $test_stat_l
#> [1] -1.066901
#>
#> $test_stat_u
#> [1] -2.369071
#>
#> $crit_val
#> [1] 2.006647
#>
#> $epsilon_B
#> [1] 1.21
#>
#> $p_value_l
#> [1] 0.1454725
#>
#> $p_value_u
#> [1] 0.01079071
print(performance_equiv(model_a = female_model,
model_b = male_model,
test_data = ptg_stud_f_test,
dv_index = 30,
delta_B = 1.1,
alpha = 0.05))
#> $equivalence
#> [1] FALSE
#>
#> $brier_score_ac
#> [1] 0.1176009
#>
#> $brier_score_bc
#> [1] 0.09829823
#>
#> $diff_sd_l
#> [1] 0.1817481
#>
#> $diff_sd_u
#> [1] 0.2553075
#>
#> $test_stat_l
#> [1] 0.05346545
#>
#> $test_stat_u
#> [1] -1.51225
#>
#> $crit_val
#> [1] 1.991673
#>
#> $epsilon_B
#> [1] 1.21
#>
#> $p_value_l
#> [1] 0.5212493
#>
#> $p_value_u
#> [1] 0.06730882
P. Cortez and A. Silva. Using Data Mining to Predict Secondary School Student Performance. In A. Brito and J. Teixeira Eds., Proceedings of 5th FUture BUsiness TEChnology Conference (FUBUTEC 2008) pp. 5-12, Porto, Portugal, April, 2008, EUROSIS, ISBN 978-9077381-39-7. See also http://www3.dsi.uminho.pt/pcortez/student.pdf↩︎
Data retrieved from the UC Irvine Machine Learning Repository.↩︎
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.