The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
LBBNN implements Latent Bayesian Binary Neural Networks in R using the torch package. An LBBNN is a Bayesian neural network, where each weight is associated with a Bernoulli inclusion variable, allowing weights to be turned on or off and incorporating model uncertainty in addition to parameter uncertainty.
This vignette walks through basic usage on a simple dataset: data preparation, model definition, training, validation, and visualization.
For this example we use the raisin dataset, consisting of 900 samples of two different types of raisins, with 7 morphological features. The paper that introduces the dataset reports around 86% accuracy using a standard MLP.
To start, we use the get_dataloaders function to divide the data into a training set and a test set. The function returns a train_loader and test_loader object. These are PyTorch DataLoader objects, optimized for automatic batch handling and parallel data loading.
In this case we set aside 180 samples to validate performance.
The model depends on several key hyperparameters. The sizes argument determines the architecture of the network. It is a vector, where the first element is the number of features and the last element the number of outputs. The intermediate value define the number of neurons in the hidden layers. In this case, we have 7 features, 2 hidden layers consisting of 5 neurons each, and 1 output neuron.
The inclusion_priors argument determines the prior inclusion probability for each weight matrix. All the weights within each layer are given the same prior. Similarly, stds refers to the prior for the standard deviation of the weights.
The inclusion_inits argument refers to how the probabilities of the inclusion parameters are initialized. This can determine the initial density of the network, and further how density evolves during training. There are several possible keywords that can be given, such as ‘dense’, where all the probabilities are initialized close to 1, or ‘sparse’, where they are close to 0. ‘polarized’ gives probabilities that are either close to 0 or 1. In this example, we use ‘balanced’, which results in probabilities in [0.27, 0.73]. Additionally, the flow and input_skip arguments control whether to include normalizing flows in the variational distribution, and the input-skip architecture.
problem <- "binary classification"
sizes <- c(7, 5, 5, 1)
inclusion_priors <- c(0.5, 0.5, 0.5)
stds <- c(1, 1, 1)
inclusion_inits <- 'balanced'
device <- "cpu"
model <- lbbnn_net(problem_type = problem, sizes = sizes,
prior = inclusion_priors,
inclusion_inits = inclusion_inits,
input_skip = TRUE, std = stds,
flow = FALSE, device = device)One epoch refers to one pass through the training dataset. Other keywords are the model object, the learning rate for the optimizer, the dataloader, and the device to train on. Optionally, performance metrics such as loss, accuracy and density can be printed to the console during training.
After training we can use the validate_lbbnn function to validate the results on the data that was set aside. num_samples refers to how many samples to use for model avearing. It returns the accuracy for the full model, and for the sparse model, selected with using the median probability model, i.e. including weights that have a posterior inclusion probability > 0.5. In addition, it returns the density, and the density within active paths.
If we are interested in looking at which variables affect predictions in general, we can obtain global explanations through the plot function:
We see that only 4 of the 7 features are used.
If we instead want to get the explanations for specific sample, we can instead use the keyword ‘local’ within the plot function. We must also provide the specific datapoint we want to explain.
x_data <- train_loader$dataset$tensors[[1]]
data <- x_data[42, ]
plot(model, type = "local", data = data,num_samples = 10)Can also get the same information using coef:
print(coef(model, data,num_samples = 10))
#> lower mean upper
#> x0 -0.4679386 -0.4673928 -0.4669682
#> x1 -0.4550755 -0.4547976 -0.4545650
#> x2 0.0000000 0.0000000 0.0000000
#> x3 0.0000000 0.0000000 0.0000000
#> x4 -0.4490815 -0.4486520 -0.4483100
#> x5 0.0000000 0.0000000 0.0000000
#> x6 0.0000000 0.0000000 0.0000000These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.