The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

prettyglm

CRAN_Status_Badge Downloads R build status Lifecycle: stable

One of the main advantages of using Generalised Linear Models is their interpretability. The goal of prettyglm is to provide a set of functions which easily create beautiful coefficient summaries which can readily be shared and explained.

Forword

prettyglm was created to solve some common faced when building Generalised Linear Models, such as displaying categorical base levels, and visualizing the number of records in each category on a duel axis. Since then a number of other functions which are useful when fitting glms have been added.

If you don’t find the function you are looking for here consider checking out some other great packages which help visualize the output from glms:tidycat, jtools or GGally

Installation

You can install the latest CRAN release with:

install.packages('prettyglm')

Documentation

Please see the website prettyglm for more detailed documentation and examples.

A Simple Example

To explore the functionality of prettyglm we will use a data set sourced from kaggle which contains information about a Portugal banks marketing campaigns results. The campaign was based mostly on direct phone calls, offering clients a term deposit. The target variable y indicates if the client agreed to place the deposit after the phone call.

Pre-processing

A critical step for this package to work well is to set all categorical predictors as factors.

library(prettyglm)
library(dplyr)
data("bank")

# Easiest way to convert multiple columns to a factor.
columns_to_factor <- c('job',
                       'marital',
                       'education',
                       'default',
                       'housing',
                       'loan')
bank_data  <- bank_data  %>%
  dplyr::filter(loan != 'unknown') %>% 
  dplyr::filter(default != 'yes') %>% 
  dplyr::mutate(age = as.numeric(age)) %>% 
  dplyr::mutate_at(columns_to_factor, list(~factor(.))) %>% # multiple columns to factor
  dplyr::mutate(T_DEPOSIT = as.factor(base::ifelse(y=='yes',1,0))) #convert target to 0 and 1 for performance plots

Building a glm

For this example we will build a glm using stats::glm(), however prettyglm is working to support parsnip and workflow model objects which use the glm model engine.

deposit_model <- stats::glm(T_DEPOSIT ~ marital +
                                        default:loan +
                                        loan +
                                        age,
                             data = bank_data,
                             family = binomial)

Visualising Fitted Model Coefficients

Create table of model coefficients with pretty_coefficients()

pretty_coefficients(deposit_model, type_iii = 'Wald')

Create plots of the model relativities with pretty_relativities()

pretty_relativities(feature_to_plot = 'marital',
                    model_object = deposit_model)

pretty_relativities(feature_to_plot = 'age',
                    model_object = deposit_model)

pretty_relativities(feature_to_plot = 'default:loan',
                    model_object = deposit_model,
                    iteractionplottype = 'colour',
                    facetorcolourby = 'loan')

Visualising one-way model performance with one_way_ave()

one_way_ave() creates one-way model performance plots.

education

For discrete variables the number of records in each group will be plotted on a second axis.

one_way_ave(feature_to_plot = 'education',
            model_object = deposit_model,
            target_variable = 'T_DEPOSIT',
            data_set = bank_data)

age

For continuous variables the stats::density() will be plotted on a second axis.

one_way_ave(feature_to_plot = 'age',
            model_object = deposit_model,
            target_variable = 'T_DEPOSIT',
            data_set = bank_data)

Plot actual vs expected by predicted band with actual_expected_bucketed()

actual_expected_bucketed() creates actual vs expected performance plots by predicted band.

actual_expected_bucketed(target_variable = 'T_DEPOSIT',
                         model_object = deposit_model,
                         data_set = bank_data)

Support My Work

“Buy Me A Coffee”

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.