Multi-class classification with feature and parameter selection using sparse group lasso. Suitable for high dimensional problems.
This is the release of R package msgl version 2.3.6.
This package implements procedures for working with multinomial logistic regression models using sparse group lasso. This includes procedures for fitting and cross validating sparse models in a high dimensional setup. See the Quick Start (Predict primary cancer site based on microRNA measurements) for an example of a traditional workflow consisting of 1) model selection and assessment using cross validation, 2) estimation of a final model and 3) using the selected model for carrying out predictions on new data.
alt tag
Classification of cancer site. Error estimted by 10-fold cross validation on a data set consist of miRNA expression measurements of leaser dissected primary cancers.
Package highlights:
The penalized maximum likelihood estimator for multinomial logistic regression is computed using a coordinate gradient descent algorithm via the sglOptim optimizer. Use of parallel computing for cross validation and subsampling is supported through the foreach and doParallel packages.
The package is under active development with releases to CRAN about ones or twice each year.
Install the released version from CRAN:
install.packages("msgl")
Install the release candidate from GitHub:
# install.packages("devtools")
devtools::install_github("vincent-dk/sglOptim")
devtools::install_github("vincent-dk/msgl")
Install the development version from GitHub:
# install.packages("devtools")
devtools::install_github("vincent-dk/sglOptim", ref = "develop")
devtools::install_github("vincent-dk/msgl", ref = "develop")
library(msgl)
## Loading required package: Matrix
## Loading required package: sglOptim
## Loading required package: foreach
## Loading required package: doParallel
## Loading required package: iterators
## Loading required package: parallel
# Load some data
data(PrimaryCancers)
# Setup 2 parallel units
cl <- makeCluster(2)
registerDoParallel(cl)
# Do 10-fold cross validation on 100 models with increasing complexity, using the 2 parallel units
fit.cv <- msgl::cv(
x = x,
classes = classes,
alpha = 0.5,
lambda = 0.5,
use_parallel = TRUE
)
## Running msgl 10 fold cross validation (dense design matrix)
##
## Samples: Features: Classes: Groups: Parameters:
## 165 372 9 372 3.348k
stopCluster(cl)
# Print information about models
# and cross validation errors
fit.cv
##
## Call:
## msgl::cv(x = x, classes = classes, alpha = 0.5, lambda = 0.5,
## use_parallel = TRUE)
##
## Models:
##
## Index: Lambda: Features: Parameters: Error:
## 1 1.00 1.3 10.6 0.96
## 20 0.88 5.2 32.7 0.82
## 40 0.76 7.7 46.8 0.72
## 60 0.66 11.9 68.4 0.59
## 80 0.58 15.9 90.7 0.47
## 100 0.50 20.8 115.5 0.41
##
## Best model:
##
## Index: Lambda: Features: Parameters: Error:
## 100 0.5 20.8 115.5 0.41
GPL (>=2)