The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

BranchGLM Vignette

1 Fitting GLMs

BranchGLM() allows fitting of gaussian, binomial, gamma, and Poisson GLMs with a variety of links available.
Parallel computation can also be done to speed up the fitting process, but it is only useful for larger datasets.

1.1 Optimization methods

The optimization method can be specified, the default method is fisher scoring, but BFGS and L-BFGS are also available.
BFGS and L-BFGS typically perform better when there are many predictors in the model (at least 50 predictors), otherwise fisher scoring is typically faster.
The grads argument is for L-BFGS only and it is the number of gradients that are stored at a time and are used to approximate the inverse information. The default value for this is 10, but another common choice is 5.
The tol argument controls how strict the convergence criteria are, lower values of this will lead to more accurate results, but may also be slower.
The method argument is ignored for linear regression and the OLS solution is used.

1.2 Initial values

Initial values for the coefficient estimates may be specified via the init argument.
If no initial values are specified, then the initial values are estimated via linear regression with the response variable transformed by the link function.

1.3 Parallel computation

Parallel computation can be employed via OpenMP by setting the parallel argument to TRUE and setting the nthreads argument to the desired number of threads used.
For smaller datasets this can actually slow down the model fitting process, so parallel computation should only be used for larger datasets.

2 Families

2.1 Gaussian

Permissible links for the gaussian family are
- identity, which results in linear regression
- inverse
- log
- square root (sqrt)
The most commonly used link function for the gaussian family is the identity link.
The dispersion parameter for this family is estimated by using the mean square error.

# Loading in BranchGLM
library(BranchGLM)

# Fitting gaussian regression models for mtcars dataset
cars <- mtcars

## Identity link
BranchGLM(mpg ~ ., data = cars, family = "gaussian", link = "identity")
#> Results from gaussian regression with identity link function 
#> Using the formula mpg ~ .
#> 
#>             Estimate Std. Error t value Pr(>|t|)  
#> (Intercept) 12.30000   18.72000  0.6573  0.51810  
#> cyl         -0.11140    1.04500 -0.1066  0.91610  
#> disp         0.01334    0.01786  0.7468  0.46350  
#> hp          -0.02148    0.02177 -0.9868  0.33500  
#> drat         0.78710    1.63500  0.4813  0.63530  
#> wt          -3.71500    1.89400 -1.9610  0.06325 .
#> qsec         0.82100    0.73080  1.1230  0.27390  
#> vs           0.31780    2.10500  0.1510  0.88140  
#> am           2.52000    2.05700  1.2250  0.23400  
#> gear         0.65540    1.49300  0.4389  0.66520  
#> carb        -0.19940    0.82880 -0.2406  0.81220  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 4.609 for the log-likelihood
#> Dispersion parameter taken to be 7.024 for standard errors
#> 32 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 147.5 on 21 degrees of freedom
#> AIC: 163.7

2.2 Gamma

Permissible links for the gamma family are
- identity
- inverse, this is the canonical link for the gamma family
- log
- square root (sqrt)
The most commonly used link functions for the gamma family are inverse and log.
The dispersion parameter for this family is estimated via maximum likelihood,
similar to the MASS::gamma.dispersion() function.

# Fitting gamma regression models for mtcars dataset
## Inverse link
GammaFit <- BranchGLM(mpg ~ ., data = cars, family = "gamma", link = "inverse")
GammaFit
#> Results from gamma regression with inverse link function 
#> Using the formula mpg ~ .
#> 
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  6.794e-02  3.843e-02  1.7680  0.09164 .
#> cyl         -1.760e-03  2.424e-03 -0.7263  0.47570  
#> disp         7.947e-06  4.379e-05  0.1815  0.85770  
#> hp           6.724e-05  5.045e-05  1.3330  0.19680  
#> drat         4.283e-04  3.398e-03  0.1261  0.90090  
#> wt           9.224e-03  4.186e-03  2.2030  0.03886 *
#> qsec        -1.739e-03  1.452e-03 -1.1980  0.24420  
#> vs           3.086e-04  4.139e-03  0.0746  0.94130  
#> am          -6.305e-04  4.287e-03 -0.1471  0.88450  
#> gear        -4.882e-03  3.210e-03 -1.5210  0.14320  
#> carb         1.025e-03  1.844e-03  0.5560  0.58410  
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 0.008682 for the log-likelihood
#> Dispersion parameter taken to be 0.01347 for standard errors
#> 32 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 0.2782 on 21 degrees of freedom
#> AIC: 152.3
#> Algorithm converged in 3 iterations using Fisher's scoring

## Log link
GammaFit <- BranchGLM(mpg ~ ., data = cars, family = "gamma", link = "log")
GammaFit
#> Results from gamma regression with log link function 
#> Using the formula mpg ~ .
#> 
#>               Estimate Std. Error t value Pr(>|t|)   
#> (Intercept)  2.754e+00  8.709e-01  3.1620 0.004696 **
#> cyl          7.509e-03  4.862e-02  0.1544 0.878700   
#> disp         6.521e-05  8.309e-04  0.0785 0.938200   
#> hp          -8.898e-04  1.013e-03 -0.8785 0.389600   
#> drat         2.366e-02  7.609e-02  0.3110 0.758900   
#> wt          -1.655e-01  8.815e-02 -1.8780 0.074380 . 
#> qsec         3.055e-02  3.401e-02  0.8984 0.379200   
#> vs          -1.141e-03  9.792e-02 -0.0117 0.990800   
#> am           5.421e-02  9.570e-02  0.5664 0.577100   
#> gear         6.014e-02  6.948e-02  0.8655 0.396500   
#> carb        -2.277e-02  3.856e-02 -0.5905 0.561100   
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 0.009637 for the log-likelihood
#> Dispersion parameter taken to be 0.01521 for standard errors
#> 32 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 0.3089 on 21 degrees of freedom
#> AIC: 155.6
#> Algorithm converged in 2 iterations using Fisher's scoring

2.3 Poisson

Permissible links for the Poisson family are
- identity
- log, this is the canonical link for the Poisson family
- square root (sqrt)
The most commonly used link function for the Poisson family is the log link.
The dispersion parameter for this family is always 1.

# Fitting poisson regression models for warpbreaks dataset
warp <- warpbreaks

## Log link
BranchGLM(breaks ~ ., data = warp, family = "poisson", link = "log")
#> Results from poisson regression with log link function 
#> Using the formula breaks ~ .
#> 
#>             Estimate Std. Error z value  Pr(>|z|)    
#> (Intercept)  3.69200    0.04541  81.300 < 2.2e-16 ***
#> woolB       -0.20600    0.05157  -3.994 6.490e-05 ***
#> tensionM    -0.32130    0.06027  -5.332 9.729e-08 ***
#> tensionH    -0.51850    0.06396  -8.107 5.209e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 1 for the log-likelihood
#> Dispersion parameter taken to be 1 for standard errors
#> 54 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 210.4 on 50 degrees of freedom
#> AIC: 493.1
#> Algorithm converged in 3 iterations using Fisher's scoring

2.4 Binomial

Permissible links for the binomial family are
- cloglog
- log
- logit, this is the canonical link for the binomial family
- probit
The most commonly used link functions for the binomial family are logit and probit.
The dispersion parameter for this family is always 1.

# Fitting binomial regression models for toothgrowth dataset
Data <- ToothGrowth

## Logit link
BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")
#> Results from binomial regression with logit link function 
#> Using the formula supp ~ .
#> 
#>             Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)   1.5380     0.7860   1.956 0.050440 . 
#> len          -0.2127     0.0728  -2.921 0.003484 **
#> dose          2.0890     0.8497   2.458 0.013970 * 
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 1 for the log-likelihood
#> Dispersion parameter taken to be 1 for standard errors
#> 60 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 72.33 on 57 degrees of freedom
#> AIC: 78.33
#> Algorithm converged in 4 iterations using Fisher's scoring

## Probit link
BranchGLM(supp ~ ., data = Data, family = "binomial", link = "probit")
#> Results from binomial regression with probit link function 
#> Using the formula supp ~ .
#> 
#>             Estimate Std. Error z value Pr(>|z|)   
#> (Intercept)  0.94020    0.46900   2.005 0.045000 * 
#> len         -0.13180    0.04195  -3.142 0.001678 **
#> dose         1.30800    0.49710   2.631 0.008502 **
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Dispersion parameter taken to be 1 for the log-likelihood
#> Dispersion parameter taken to be 1 for standard errors
#> 60 observations used to fit the model
#> (0 observations removed due to missingness)
#> 
#> Residual Deviance: 72.19 on 57 degrees of freedom
#> AIC: 78.19
#> Algorithm converged in 5 iterations using Fisher's scoring

2.4.1 Functions for binomial GLMs

BranchGLM has some utility functions for binomial GLMs
- Table() creates a confusion matrix based on the predicted classes and observed classes
- ROC() creates an ROC curve which can be plotted with plot()
- AUC() and Cindex() calculate the area under the ROC curve
- MultipleROCCurves() allows for the plotting of multiple ROC curves on the same plot

2.4.1.1 Table

# Fitting logistic regression model for toothgrowth dataset
catFit <- BranchGLM(supp ~ ., data = Data, family = "binomial", link = "logit")

Table(catFit)
#> Confusion matrix:
#> ----------------------
#>             Predicted
#>              OJ   VC
#> 
#>          OJ  17   13
#> Observed
#>          VC  7    23
#> 
#> ----------------------
#> Measures:
#> ----------------------
#> Accuracy:  0.6667 
#> Sensitivity:  0.7667 
#> Specificity:  0.5667 
#> PPV:  0.6389

2.4.1.2 ROC

# Creating ROC curve
catROC <- ROC(catFit)

plot(catROC, main = "ROC Curve", col = "indianred")

2.4.1.3 Cindex/AUC

# Getting Cindex/AUC
Cindex(catFit)
#> [1] 0.7127778

AUC(catFit)
#> [1] 0.7127778

2.4.1.4 MultipleROCPlots

# Showing ROC plots for logit, probit, and cloglog
probitFit <- BranchGLM(supp ~ . ,data = Data, family = "binomial", 
                       link = "probit")

cloglogFit <- BranchGLM(supp ~ . ,data = Data, family = "binomial", 
                       link = "cloglog")

MultipleROCCurves(catROC, ROC(probitFit), ROC(cloglogFit), 
                  names = c("Logistic ROC", "Probit ROC", "Cloglog ROC"))

2.4.1.5 Using predictions

For each of the methods used in this section predicted probabilities and observed classes can also be supplied instead of the BranchGLM object.


preds <- predict(catFit)

Table(preds, Data$supp)
#> Confusion matrix:
#> ----------------------
#>             Predicted
#>              OJ   VC
#> 
#>          OJ  17   13
#> Observed
#>          VC  7    23
#> 
#> ----------------------
#> Measures:
#> ----------------------
#> Accuracy:  0.6667 
#> Sensitivity:  0.7667 
#> Specificity:  0.5667 
#> PPV:  0.6389

AUC(preds, Data$supp)
#> [1] 0.7127778

ROC(preds, Data$supp) |> plot(main = "ROC Curve", col = "deepskyblue")

3 Useful functions

BranchGLM has many utility functions for GLMs such as
- coef() to extract the coefficients
- logLik() to extract the log likelihood
- AIC() to extract the AIC
- BIC() to extract the BIC
- predict() to obtain predictions from the fitted model
The coefficients, standard errors, Wald test statistics, and p-values are stored in the coefficients slot of the fitted model

# Predict method
predict(GammaFit)
#>           Mazda RX4       Mazda RX4 Wag          Datsun 710      Hornet 4 Drive 
#>            21.69333            21.15566            25.92103            20.27496 
#>   Hornet Sportabout             Valiant          Duster 360           Merc 240D 
#>            17.15124            19.83386            14.55730            22.26169 
#>            Merc 230            Merc 280           Merc 280C          Merc 450SE 
#>            23.89767            18.75674            19.10374            15.10160 
#>          Merc 450SL         Merc 450SLC  Cadillac Fleetwood Lincoln Continental 
#>            16.07372            16.13726            12.20297            11.70443 
#>   Chrysler Imperial            Fiat 128         Honda Civic      Toyota Corolla 
#>            11.60706            27.90334            30.14896            30.14451 
#>       Toyota Corona    Dodge Challenger         AMC Javelin          Camaro Z28 
#>            23.41000            17.02230            17.63782            13.90043 
#>    Pontiac Firebird           Fiat X1-9       Porsche 914-2        Lotus Europa 
#>            16.06923            28.65170            26.61054            28.59460 
#>      Ford Pantera L        Ferrari Dino       Maserati Bora          Volvo 142E 
#>            17.89013            19.53099            14.11845            23.30350

# Accessing coefficients matrix
GammaFit$coefficients
#>                  Estimate   Std. Error     t value    Pr(>|t|)
#> (Intercept)  2.754211e+00 0.8709425717  3.16233309 0.004696035
#> cyl          7.509180e-03 0.0486249041  0.15443073 0.878744694
#> disp         6.521183e-05 0.0008309089  0.07848253 0.938187259
#> hp          -8.897889e-04 0.0010128913 -0.87846431 0.389632603
#> drat         2.366270e-02 0.0760938573  0.31096731 0.758891171
#> wt          -1.655137e-01 0.0881470376 -1.87769980 0.074381777
#> qsec         3.055119e-02 0.0340061853  0.89840094 0.379157557
#> vs          -1.140739e-03 0.0979227190 -0.01164938 0.990815316
#> am           5.420526e-02 0.0956958853  0.56643253 0.577103946
#> gear         6.013892e-02 0.0694813389  0.86554054 0.396522931
#> carb        -2.277255e-02 0.0385618276 -0.59054645 0.561126613

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.