nntrf hyper-parameter tuning

Ricardo Aler

2020-07-27

library(nntrf)
library(mlr)
#> Loading required package: ParamHelpers
#> 'mlr' is in maintenance mode since July 2019. Future development
#> efforts will go into its successor 'mlr3'
#> (<https://mlr3.mlr-org.com>).
library(mlrCPO)
library(FNN)

nntrf Hyper-parameter Tuning

nntrf has several hyper-parameters which are important in order to obtain good results. Those are:

Machine learning pipelines usually contain two kinds of steps: pre-processing and classifier/regressor. Both kinds of steps contain hyper-parameters and they are optimized together. nntrf is a preprocessing step. The classifier method that will be used after preprocessing is KNN, whose main hyper-parameter is the number of neighbors (k). Hyper-parameter tuning could be programmed from scratch, but it is more efficient to use the procedures already available in machine learning packages such as mlr or Caret. In this case, mlr will be used. Code to do that is described below.

The next piece of code has nothing to do with nntrf. It just establishes that the doughnutRandRotated dataset is going to be used (with target variable “V11”), that grid search is going to be used for hyper-parameter tuning, that an external 3-fold crossvalidation is going to be used to evaluate models, while an inner 3-fold crossvalidation is going to be used for hyper-parameter tuning.

data("doughnutRandRotated")

doughnut_task <- makeClassifTask(data = doughnutRandRotated, target = "V11")
control_grid <- makeTuneControlGrid()
inner_desc <- makeResampleDesc("CV", iter=3)
outer_desc <-  makeResampleDesc("CV", iter=3)
set.seed(0)
outer_inst <- makeResampleInstance(outer_desc, doughnut_task)

A mlr subpakage, called mlrCPO, is going to be used to combine pre-processing and learning into a single pipeline. In order to do that, nntrf must be defined as a pipeline step, as follows. Basically, it defines train and retrafo methods. The former, trains the neural networks and stores the hidden layer weights, the latter applies the transformation on a dataset. pSS is used to define the main nntrf hyper-parameters.

cpo_nntrf = makeCPO("nntrfCPO",  
                       # Here, the hyper-parameters of nntrf are defined
                       pSS(repetitions = 1 : integer[1, ],
                           xavier_ini = FALSE : logical,
                           orthog_ini = FALSE : logical,
                           size: integer[1, ],
                           maxit = 100 : integer[1, ],
                           use_sigmoid = FALSE: logical),
                       dataformat = "numeric",
                       cpo.train = function(data, target, 
                                            repetitions, xavier_ini, orthog_ini, 
                                            size, maxit, use_sigmoid) {
                         data_and_class <- cbind(as.data.frame(data), class=target[[1]])
                         nnpo <- nntrf(repetitions=repetitions,
                                       xavier_ini=xavier_ini,
                                       orthog_ini=orthog_ini,
                                       formula=class~.,
                                       data=data_and_class,
                                       size=size, maxit=maxit, trace=FALSE)
                       },
                       cpo.retrafo = function(data, control, 
                                              repetitions, xavier_ini, orthog_ini, 
                                              size, maxit, use_sigmoid) {
                       
                         trf_x <- control$trf(x=data,use_sigmoid=use_sigmoid)
                         trf_x
                       })

Next, the pipeline of pre-processing + classifier method (KNN in this case) is defined.

# knn is the machine learning method. The knn available in the FNN package is used
knn_lrn <- makeLearner("classif.fnn")
# Then, knn is combined with nntrf's preprocessing into a pipeline
knn_nntrf <- cpo_nntrf() %>>% knn_lrn
# Just in case, we fix the values of the hyper-parameters that we do not require to optimize
# (not necessary, because they already have default values. Just to make their values explicit)
knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=1, nntrfCPO.maxit=100, 
                          nntrfCPO.xavier_ini=FALSE, nntrfCPO.orthog_ini=FALSE,
                          nntrfCPO.use_sigmoid=FALSE)

# However, we are going to use 2 repetitions here, instead of 1 (the default):

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2)

Next, the hyper-parameter space for the pipeline is defined. Only two hyper-parameters will be optimized: the number of KNN neighbors (k), from 1 to 7, and the number of hidden neurons (size), from 1 to 10. The remaining hyper-parameters are left to some default values.

ps <- makeParamSet(makeDiscreteParam("k", values = 1:7),
                   makeDiscreteParam("nntrfCPO.size", values = 1:10)
)

Next, a mlr wrapper is used to give the knn_nntrf pipeline the ability to do hyper-parameter tuning.

knn_nntrf_tune <- makeTuneWrapper(knn_nntrf, resampling = inner_desc, par.set = ps, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)

Finally, the complete process (3-fold hyper-parameter tuning) and 3-fold outer model evaluation is run. It takes some time.

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_nntrf_tune <- resample(knn_nntrf_tune, doughnut_task, outer_inst, 
                                 measures = list(acc), 
                                 extract = getTuneResult, show.info =  FALSE)
save(error_knn_nntrf_tune, file="../inst/extdata/error_knn_nntrf_tune.rda")
}

Errors and optimal hyper-parameters are as follows (the 3-fold inner hyper-parameter tuning crossvalidation accuracy is also shown in acc.test.mean ).

print(error_knn_nntrf_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=6; nntrfCPO.size=9
#> acc.test.mean=0.9596539
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=7; nntrfCPO.size=9
#> acc.test.mean=0.9588959
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=7; nntrfCPO.size=6
#> acc.test.mean=0.9611513

And the final outer 3-fold crossvalition accuracy is displayed below. Please, note that this acc.test.mean corresponds to the outer 3-fold crossvalidation, while the acc.test.mean above, corresponds to the inner 3-fold crossvalidation accuracy (computed during hyper-parameter tuning).

print(error_knn_nntrf_tune$aggr)
#> acc.test.mean 
#>     0.9365014

Although not required, mlr allows to display the results of the different hyper-parameter values, sorted by the inner 3-fold crossvalidation accuracy, from best to worse.

library(dplyr)
results_hyper <- generateHyperParsEffectData(error_knn_nntrf_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k nntrfCPO.size acc.test.mean iteration exec.time nested_cv_run
#> 1 7             6     0.9611513        42     3.464             3
#> 2 6             9     0.9596539        62     4.511             1
#> 3 6             5     0.9595025        34     3.368             1
#> 4 7             9     0.9588959        63     4.742             2
#> 5 5             7     0.9576958        47     3.398             2
#> 6 5             7     0.9563521        47     4.273             3

We can also check directly what would happen with only 4 neurons (and 5 neighbors).

knn_nntrf <- cpo_nntrf() %>>% makeLearner("classif.fnn")

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2, nntrfCPO.maxit=100,
                          nntrfCPO.xavier_ini=FALSE, nntrfCPO.orthog_ini=FALSE,
                          nntrfCPO.use_sigmoid=FALSE, k=5, nntrfCPO.size=4)

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
  error_knn_nntrf <- resample(knn_nntrf, doughnut_task, outer_inst, measures = list(acc), 
                            show.info =  FALSE)
save(error_knn_nntrf, file="../inst/extdata/error_knn_nntrf.rda")
}
# First, the three evaluations of the outer 3-fold crossvalidation, one per fold:
print(error_knn_nntrf$measures.test)
#>   iter       acc
#> 1    1 0.9564956
#> 2    2 0.9741974
#> 3    3 0.9271146
# Second, their average
print(error_knn_nntrf$aggr)
#> acc.test.mean 
#>     0.9526025

Hyper-parameter tuning with PCA

In order to compare a supervised transformation method (nntrf) with an unsupervised one (PCA), it is very easy to do exactly the same pre-processing with PCA. In this case, the main hyper-parameters are k (number of KNN neighbors) and Pca.rank (the number of PCA components to be used, which would be the counterpart of size, the number of hidden neurons used by nntrf).

knn_pca <- cpoPca(center=TRUE, scale=TRUE, export=c("rank")) %>>% knn_lrn

ps_pca <- makeParamSet(makeDiscreteParam("k", values = 1:7),
                       makeDiscreteParam("pca.rank", values = 1:10)
)

knn_pca_tune <- makeTuneWrapper(knn_pca, resampling = inner_desc, par.set = ps_pca, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)
set.seed(0)
# Please, note that in order to save time, results have been precomputed

cached <- system.file("extdata", "error_knn_pca_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_pca_tune <- resample(knn_pca_tune, doughnut_task, outer_inst, 
                               measures = list(acc), 
                               extract = getTuneResult, show.info =  FALSE)
save(error_knn_pca_tune, file="../inst/extdata/error_knn_pca_tune.rda")
}

It can be seen below that while nntrf was able to get a high accuracy, PCA only gets to nearly 0.65. Also the number of components required by PCA is the maximum allowed (pca.rank=10)

print(error_knn_pca_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=4; pca.rank=10
#> acc.test.mean=0.6410671
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=4; pca.rank=10
#> acc.test.mean=0.6402640
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=2; pca.rank=10
#> acc.test.mean=0.6367183
print(error_knn_pca_tune$aggr)
#> acc.test.mean 
#>     0.6422997
results_hyper <- generateHyperParsEffectData(error_knn_pca_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k pca.rank acc.test.mean iteration exec.time nested_cv_run
#> 1 4       10     0.6410671        67     1.638             1
#> 2 4       10     0.6402640        67     1.532             2
#> 3 6       10     0.6396640        69     1.655             2
#> 4 2       10     0.6389139        65     1.528             2
#> 5 6       10     0.6374673        69     1.590             1
#> 6 2       10     0.6367183        65     1.397             3

Hyper-parameter tuning with just KNN

For completeness sake, below are the results with, no pre-processing, just KNN (results are very similar to the ones with PCA):


ps_knn <- makeParamSet(makeDiscreteParam("k", values = 1:7))


knn_tune <- makeTuneWrapper(knn_lrn, resampling = inner_desc, par.set = ps_knn, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)

set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_tune.rda", package = "nntrf")
if(file.exists(cached)){load(cached)} else {
error_knn_tune <- resample(knn_tune, doughnut_task, outer_inst, measures = list(acc), 
                           extract = getTuneResult, show.info =  FALSE)
save(error_knn_tune, file="../inst/extdata/error_knn_tune.rda")
}
print(error_knn_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=4
#> acc.test.mean=0.6386676
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=2
#> acc.test.mean=0.6377138
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=4
#> acc.test.mean=0.6365698
print(error_knn_tune$aggr)
#> acc.test.mean 
#>     0.6389003
results_hyper <- generateHyperParsEffectData(error_knn_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k acc.test.mean iteration exec.time nested_cv_run
#> 1 4     0.6386676         4     1.406             1
#> 2 2     0.6377138         2     1.132             2
#> 3 4     0.6371137         4     1.273             2
#> 4 4     0.6365698         4     1.288             3
#> 5 2     0.6358180         2     1.125             3
#> 6 6     0.6356136         6     1.451             2

nntrf Hyper-parameter Tuning with Xavier initialization

Just for the record, let’s see results with Xavier initialization, instead of the standard nnet weight initialization.

# knn is the machine learning method. The knn available in the FNN package is used
knn_lrn <- makeLearner("classif.fnn")
# Then, knn is combined with nntrf's preprocessing into a pipeline
knn_nntrf <- cpo_nntrf() %>>% knn_lrn
# Just in case, we fix the values of the hyper-parameters that we do not require to optimize
# (not necessary, because they already have default values. Just to make their values explicit)
knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=1, nntrfCPO.maxit=100, 
                          nntrfCPO.xavier_ini=FALSE, nntrfCPO.orthog_ini=FALSE,
                          nntrfCPO.use_sigmoid=FALSE)

# However, we are going to use 2 repetitions here, instead of 1 (the default):

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2, nntrfCPO.xavier_ini=TRUE,
                          nntrfCPO.orthog_ini=FALSE)
ps <- makeParamSet(makeDiscreteParam("k", values = 1:7),
                   makeDiscreteParam("nntrfCPO.size", values = 1:10)
)
knn_nntrf_tune <- makeTuneWrapper(knn_nntrf, resampling = inner_desc, par.set = ps, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)
set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf_xavier_tune.rda", package = "nntrf")
if(file.exists(cached)) {load(cached)} else {
error_knn_nntrf_xavier_tune <- resample(knn_nntrf_tune, doughnut_task, outer_inst, 
                                 measures = list(acc), 
                                 extract = getTuneResult, show.info =  FALSE)
save(error_knn_nntrf_xavier_tune, file="../inst/extdata/error_knn_nntrf_xavier_tune.rda")
}
print(error_knn_nntrf_xavier_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=6; nntrfCPO.size=4
#> acc.test.mean=0.9689512
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=5; nntrfCPO.size=4
#> acc.test.mean=0.9641515
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=5; nntrfCPO.size=5
#> acc.test.mean=0.9677468
print(error_knn_nntrf_xavier_tune$aggr)
#> acc.test.mean 
#>     0.8354942
library(dplyr)
results_hyper <- generateHyperParsEffectData(error_knn_nntrf_xavier_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k nntrfCPO.size acc.test.mean iteration exec.time nested_cv_run
#> 1 6             4     0.9689512        27     2.462             1
#> 2 5             5     0.9677468        33     2.792             3
#> 3 4             4     0.9671467        25     2.569             3
#> 4 6             3     0.9667004        20     2.005             1
#> 5 7             5     0.9652004        35     2.997             1
#> 6 5             5     0.9643040        33     2.491             1

And now Xavier + Orthogonal initialization.

knn_nntrf <- setHyperPars(knn_nntrf, nntrfCPO.repetitions=2, 
                          nntrfCPO.xavier_ini=TRUE,
                          nntrfCPO.orthog_ini=TRUE)
knn_nntrf_tune <- makeTuneWrapper(knn_nntrf, resampling = inner_desc, par.set = ps, 
                                     control = control_grid, measures = list(acc), show.info = FALSE)
set.seed(0)
# Please, note that in order to save time, results have been precomputed
cached <- system.file("extdata", "error_knn_nntrf_xavier_orthog_tune.rda", package = "nntrf")
if(file.exists(cached)) {load(cached)} else {
error_knn_nntrf_xavier_tune <- resample(knn_nntrf_tune, doughnut_task, outer_inst, 
                                 measures = list(acc), 
                                 extract = getTuneResult, show.info =  FALSE)
save(error_knn_nntrf_xavier_tune, file="../inst/extdata/error_knn_nntrf_xavier_orthog_tune.rda")
}
print(error_knn_nntrf_xavier_tune$extract)
#> [[1]]
#> Tune result:
#> Op. pars: k=7; nntrfCPO.size=9
#> acc.test.mean=0.9616023
#> 
#> [[2]]
#> Tune result:
#> Op. pars: k=7; nntrfCPO.size=4
#> acc.test.mean=0.9599553
#> 
#> [[3]]
#> Tune result:
#> Op. pars: k=6; nntrfCPO.size=6
#> acc.test.mean=0.9594959
print(error_knn_nntrf_xavier_tune$aggr)
#> acc.test.mean 
#>     0.9542012
library(dplyr)
results_hyper <- generateHyperParsEffectData(error_knn_nntrf_xavier_tune)
head(arrange(results_hyper$data, -acc.test.mean))
#>   k nntrfCPO.size acc.test.mean iteration exec.time nested_cv_run
#> 1 7             9     0.9616023        63     4.492             1
#> 2 3             7     0.9610027        45     3.587             1
#> 3 7             4     0.9599553        28     2.767             2
#> 4 6             6     0.9594959        41     3.468             3
#> 5 5             7     0.9590487        47     3.654             1
#> 6 5             8     0.9587459        54     4.104             3