Basic Walkthrough

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

1. Introduction
2. The dataset
3. Training the model
- 3.1 Using the lightgbm() function
- 3.2 Using the lgb.train() function
4. References

1. Introduction

Welcome to the world of LightGBM, a highly efficient gradient boosting implementation (Ke et al. 2017).

library(lightgbm)

This vignette will guide you through its basic usage. It will show how to build a simple binary classification model based on a subset of the bank dataset (Moro, Cortez, and Rita 2014). You will use the two input features “age” and “balance” to predict whether a client has subscribed a term deposit.

2. The dataset

The dataset looks as follows.

data(bank, package = "lightgbm")

bank[1L:5L, c("y", "age", "balance")]
#>         y   age balance
#>    <char> <int>   <int>
#> 1:     no    30    1787
#> 2:     no    33    4789
#> 3:     no    35    1350
#> 4:     no    30    1476
#> 5:     no    59       0

# Distribution of the response
table(bank$y)
#> 
#>   no  yes 
#> 4000  521

3. Training the model

The R-package of LightGBM offers two functions to train a model:

lgb.train(): This is the main training logic. It offers full flexibility but requires a Dataset object created by the lgb.Dataset() function.
lightgbm(): Simpler, but less flexible. Data can be passed without having to bother with lgb.Dataset().

3.1 Using the `lightgbm()` function

In a first step, you need to convert data to numeric. Afterwards, you are ready to fit the model by the lightgbm() function.

# Numeric response and feature matrix
y <- as.numeric(bank$y == "yes")
X <- data.matrix(bank[, c("age", "balance")])

# Train
fit <- lightgbm(
  data = X
  , label = y
  , params = list(
    num_leaves = 4L
    , learning_rate = 1.0
    , objective = "binary"
  )
  , nrounds = 10L
  , verbose = -1L
)

# Result
summary(predict(fit, X))
#>    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
#> 0.01192 0.07370 0.09871 0.11593 0.14135 0.65796

It seems to have worked! And the predictions are indeed probabilities between 0 and 1.

3.2 Using the `lgb.train()` function

Alternatively, you can go for the more flexible interface lgb.train(). Here, as an additional step, you need to prepare y and X by the data API lgb.Dataset() of LightGBM. Parameters are passed to lgb.train() as a named list.

# Data interface
dtrain <- lgb.Dataset(X, label = y)

# Parameters
params <- list(
  objective = "binary"
  , num_leaves = 4L
  , learning_rate = 1.0
)

# Train
fit <- lgb.train(
  params
  , data = dtrain
  , nrounds = 10L
  , verbose = -1L
)

Try it out! If stuck, visit LightGBM’s documentation for more details.

4. References

Ke, Guolin, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. “LightGBM: A Highly Efficient Gradient Boosting Decision Tree.” In Advances in Neural Information Processing Systems 30 (NIPS 2017).

Moro, Sérgio, Paulo Cortez, and Paulo Rita. 2014. “A Data-Driven Approach to Predict the Success of Bank Telemarketing.” Decision Support Systems 62: 22–31.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.