README

Martin Vincent

2017-04-01

High Dimensional Linear Multiple-Response Regression

CRAN_Status_Badge Travis-CI Build Status AppVeyor Build Status Coverage Status

Linear multiple-response regression with feature and parameter selection using sparse group lasso. Suitable for high dimensional problems.

This is the release of R package lsgl version 1.3.6.

R-package Overview

This package implements procedures for working with linear multiple-response regression models using sparse group lasso. This includes procedures for fitting and cross validating sparse models in a high dimensional setup. See the Quick Start (Predict airline ticket prices for multiple airlines) for an example of a traditional workflow consisting of 1) model selection and assessment using cross validation, 2) estimation of a final model and 3) using the selected model for carrying out predictions on new data.

The multiple lasso estimator and the least squares estimate

The multiple lasso estimator and the least squares estimate

Comparison of the multiple lasso estimator and least squares estimate on simulated data with 50 samples, 50 features and 25 groups. See the lsgl example in the package, i.e. run example(lsgl).

Package highlights:

The penalized maximum likelihood estimator for the linear multiple-response regression model is computed using a coordinate gradient descent algorithm via the sglOptim optimizer. Use of parallel computing for cross validation and subsampling is supported through the foreach and doParallel packages.

Status

The package is under active development with releases to CRAN about ones or twice each year.

Installation

Get the released version from CRAN:

install.packages("lsgl")

Install the release candidate from GitHub:

# install.packages("devtools")
devtools::install_github("vincent-dk/sglOptim")
devtools::install_github("vincent-dk/lsgl")

Install the development version from GitHub:

# install.packages("devtools")
devtools::install_github("vincent-dk/sglOptim", ref = "develop")
devtools::install_github("vincent-dk/lsgl", ref = "develop")

Minimal Example

library(lsgl)

# Load sone data
data(AirlineTicketPrices)

# Setup 2 parallel units
cl <- makeCluster(2)
registerDoParallel(cl)

# Do 10-fold cross validation on 100 models with increasing complexity, using the 2 parallel units
fit.cv <- lsgl::cv(
  x = X,
  y = Y,
  alpha = 0.5,
  lambda = 0.01,
  use_parallel = TRUE
)
## 
## Running lsgl 10 fold cross validation 
## 
##  Samples:  Features:  Models:  Groups:  Parameters: 
##        337        412        6      412       2.472k
stopCluster(cl)

# Print information about models
# and cross validation errors
fit.cv
## 
## Call:
## lsgl::cv(x = X, y = Y, alpha = 0.5, lambda = 0.01, use_parallel = TRUE)
## 
## Models:
## 
##  Index:  Lambda:  Features:  Parameters:  Error: 
##        1    1.000          3           18     132
##       20    0.413        4.3         25.8     104
##       40    0.163       10.8         62.5      79
##       60    0.064       15.1         85.5      66
##       80    0.025       32.8        162.6      58
##      100    0.010         47        205.1      52
## 
## Best model:
## 
##  Index:  Lambda:  Features:  Parameters:  Error: 
##      100     0.01         47        205.1      52

Documentation

Author

Martin Vincent

License

GPL (>=2)