The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
For a better understanding of MAKL
library, we build a simple example in this document. We first create a synthetic dataset that consists of 1000 rows and 6 features, using standard Gaussian distribution.
library(MAKL)
set.seed(64327) #midas
<- matrix(rnorm(6000, 0, 1), nrow = 1000)
df colnames(df) <- c("F1", "F2", "F3", "F4", "F5", "F6")
As to membership
argument of makl_train()
, we prepare a list consisting of two groups such that the first one contains the features F1, F5 and F6; the second one contains the rest. Note that the column names of the input dataset should be a superset of the union of all feature names in the groups
list.
# check colnames(df) for them to be matching with group members
<- list()
groups 1]] <- c("F1", "F5", "F6")
groups[[2]] <- c("F2", "F3", "F4") groups[[
We then create the response vector y
such that it will be dependent on the second, the third and the fourth features, namely F2, F3 and F4: If, for a data instance, the sum of entries in the second, the third and the fourth columns is positive, the corresponding response is assigned +1, else, it is assigned -1.
<- c()
y for(i in 1:nrow(df)) {
if((df[i, 2] + df[i, 3] + df[i, 4]) > 0) {
<- +1
y[i] else {
} <- -1
y[i]
} }
We use the synthetic dataset df
and response vector y
as our train dataset and train response vector in makl_train()
, we choose the number of random features D
equal to 2 which makes sense knowing that our train dataset is 6 dimensional. We choose the number of rows to be used for distance matrix calculation, sigma_N
equal to 1000, and lambda_set
consisting of 0.9, 0.8, 0.7, 0.6 for sparse solutions. As membership list, we use the groups
list that we created above.
<- makl_train(X = df, y = y, D = 2, sigma_N = 1000, CV = 1, membership = groups, lambda_set = c(0.9, 0.8, 0.7, 0.6))
makl_model #> Lambda: 155.0901 nr.var: 5
#> Lambda: 137.8579 nr.var: 5
#> Lambda: 120.6257 nr.var: 5
#> Lambda: 103.3934 nr.var: 5
When we check the coefficients of our model, we see that the chosen kernel for prediction by makl_train()
was the kernel of the second group. This was an expected result since we created the response vector y
to be dependent on the second group members of the groups
list.
$model$coefficients
makl_model#> 155.090126229481 137.857889981761 120.625653734041 103.39341748632
#> [1,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [2,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [3,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [4,] 0.00000000 0.0000000 0.0000000 0.0000000
#> [5,] -0.29314353 -0.5938544 -0.9106226 -1.2539243
#> [6,] 0.06703617 0.1352210 0.2057486 0.2799665
#> [7,] 0.24539658 0.4973664 0.7630398 1.0509792
#> [8,] -0.36108294 -0.7320709 -1.1246002 -1.5535840
#> [9,] 0.12450233 0.1542956 0.1858601 0.2195980
Now, let us create a synthetic dataset df_test
and a synthetic test response vector y_test
to use in makl_test()
to check the results.
<- matrix(rnorm(600, 0, 1), nrow = 100)
df_test colnames(df_test) <- c("F1", "F2", "F3", "F4", "F5", "F6")
<- c()
y_test for(i in 1:nrow(df_test)) {
if((df_test[i, 2] + df_test[i, 3] + df_test[i, 4]) > 0) {
<- +1
y_test[i] else {
} <- -1
y_test[i]
}
}<-makl_test(X = df_test, y = y_test, makl_model = makl_model) result
The list result
contains two elements: 1) The predictions for the test response vector y_test
and 2) The area under the ROC curve (AUROC) versus the number of selected kernels values for each element in the lambda_set
if CV
is not applied; the area under the ROC curve versus the number of selected kernels value for the best lambda
in the lambda_set
if CV
is applied.
$auroc_kernel_number
result#> auroc_array n_selected_kernels
#> 0.9 0.9494179 1
#> 0.8 0.9494179 1
#> 0.7 0.9498193 1
#> 0.6 0.9498193 1
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.