The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This R Markdown document demonstrates the usage of the
rSDR package for Robust Sufficient Dimension Reduction
(rSDR) using alpha-distance covariance and Stiefel manifold learning for
estimation (Hsin-Hsiung Huang, Feng Yu & Teng Zhang, 2024). We apply
the method to ionosphere dataset, including
cross-validation, bootstrap methods for optimal alpha selection, and
visualization of results for both 2 and 3 dimensions.
# 
# # Load rSDR package
# if (!requireNamespace("rSDR", quietly = TRUE)) {
#   if (!requireNamespace("devtools", quietly = TRUE)){   install.packages("devtools")
# }
#   library(devtools)
#   devtools::install_github("scchen1/rSDR")
#   # or
#   
#   #a local zip file (`rSDR-main.zip`) downloaded from GitHub
#   #devtools::install_local(path='yourpath/rSDR-main.zip')
# }
library(rSDR)We use the ionosphere dataset from the
fdm2id package. The dataset has 33 predictor variables
(V1-V33) and one response variable (V35, ‘g’ for good or ‘b’ for
bad).
The data include a response variable Y with nx1 matrix
and predictors X with nxp matrix, where n is the number of
observation and p is the number of variable.
#Load ionosphere data in fdm2id package
utils::data("ionosphere", package = "fdm2id")
X<-as.matrix(ionosphere[,c(1:33)])
Y<-ifelse(ionosphere[,34]=='b',0,1)
# Y will be used for rSDR
Y<-matrix(Y,length(Y),1)
#Y.factor will be used in plot_rSDR
Y.factor<-factor(ionosphere$V35,levels=c('b','g'),labels=c('Bad','Good'))set.seed(2435)
sdr_result<-rSDR(X=X, Y=Y, d=3, alpha=0.3,maxiter=1000,tol=1e-7)
# An optimal of C is obtained by maximizing the target function using 
# ManifoldOptim method
head(sdr_result$C_value)
#>              [,1]        [,2]        [,3]
#> [1,]  0.060376161  0.12984351  0.32228204
#> [2,]  0.141480457 -0.09261385  0.08642944
#> [3,] -0.037722305  0.06518416  0.14365305
#> [4,]  0.205080004 -0.30551967 -0.00615399
#> [5,] -0.005404643  0.10617035  0.17880486
#> [6,]  0.225708005  0.06038528 -0.06383628
#The value of cost function f is equal to the negative of the target function.
sdr_result$f_value
#> [1] -0.05426311
#beta=sigma_x^(-0.5)%*%C 
head(sdr_result$beta)
#>              [,1]       [,2]       [,3]
#> [1,]  0.005052794  0.7568295  0.5552178
#> [2,]  0.104607845 -0.4298763  0.1027418
#> [3,] -0.286795008  0.3727621  0.8018906
#> [4,]  0.306400216 -0.7760211 -0.3561266
#> [5,] -0.161184953  0.2073087  0.5865177
#> [6,]  0.297500484  0.5492788 -0.5587245
#projected_data: This projects X onto the estimated SDR directions (X %*% beta),
#the projected matrix (data) n x d
head(sdr_result$projected_data)
#>            V1          V2         V3
#> 1  0.49558138  0.09203527 0.15575243
#> 2  0.22281560  1.53635693 1.15002405
#> 3  0.52522648 -0.25455816 0.07992585
#> 4 -0.12508370  3.09342626 1.02262469
#> 5  0.25594361 -0.02213188 0.17249223
#> 6 -0.09054116  1.13584722 0.86440192#Plot for 3-dimensional reduction using rSDR
plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF"))# plan(multisession) will launch parallel workers running in the background
# to save running time. To shut down background workers launched this way, call
# plan(sequential)
# use all local cores except one
# future::plan(future::multisession, workers = future::availableCores() - 1)
# use 2 cores for parallel
future::plan("multisession", workers = 2)
set.seed(2435)
opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.5, 0.7),X=X,Y=Y,d=3,kfolds=10)
#> This will take a few seconds or minutes to get the results.
#opt_results<-optimal_alpha_cv(alpha.v=c(0.3, 0.4, 0.5, 0.6, 0.7),X=X,Y=Y,d=3,kfolds=10)
# Optimal alpha using cross validation
opt_results$opt.alpha
#> [1] 0.7
# The cost function values by alpha (column) for each cross validation (row) 
head(opt_results$f_test)
#>              0.3         0.5         0.7
#> [1,] -0.04919316 -0.05200294 -0.04633166
#> [2,] -0.06920450 -0.08929782 -0.09433253
#> [3,] -0.04472843 -0.05253492 -0.05533650
#> [4,] -0.05847677 -0.04592416 -0.05414926
#> [5,] -0.08752592 -0.09585069 -0.09452867
#> [6,] -0.02952418 -0.02512230 -0.02467935
# The mean of cost function value by alpha 
opt_results$f_test.mean
#>         0.3         0.5         0.7 
#> -0.06205520 -0.06120896 -0.06648143
# The standard deviation of cost function value by alpha 
opt_results$f_test.sd
#>        0.3        0.5        0.7 
#> 0.01926043 0.02119046 0.02535447
# The reduced dimension
opt_results$d
#> [1] 3
# Number of folds
opt_results$kfolds
#> [1] 10# plan(multisession) will launch parallel workers running in the background
# to save running time. To shut down background workers launched this way, call
# plan(sequential)
# use all local cores except one
# future::plan(future::multisession, workers = future::availableCores() - 1)
# use 2 cores for parallel
future::plan("multisession", workers = 2)
set.seed(2435)
opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.5, 0.7),X=X,Y=Y,d=3,R=10)
#> This will take a few seconds or minutes to get the results.
#opt_results<-optimal_alpha_boot(alpha.v=c(0.3,0.4,0.5,0.6, 0.7),X=X,Y=Y,d=3,R=50)
# Optimal alpha using bootstrap method
opt_results$opt.alpha
#> [1] 0.7
# The cost function values by alpha (column) for R bootstrap replicates (row) 
head(opt_results$f_test)
#>              0.3         0.5         0.7
#> [1,] -0.04123228 -0.06510945 -0.06349116
#> [2,] -0.05346989 -0.05615554 -0.05670479
#> [3,] -0.06034808 -0.06623436 -0.07687898
#> [4,] -0.06069561 -0.06189878 -0.05231286
#> [5,] -0.06213373 -0.04805119 -0.05082178
#> [6,] -0.05106833 -0.04766037 -0.05188302
# The mean of cost function value by alpha 
opt_results$f_test.mean
#>         0.3         0.5         0.7 
#> -0.05348157 -0.05805082 -0.05836145
# The standard deviation of cost function value by alpha 
opt_results$f_test.sd
#>         0.3         0.5         0.7 
#> 0.007649731 0.007248535 0.008392497
# The reduced dimension
opt_results$d
#> [1] 3
# Number of bootstrap replicates
opt_results$R
#> [1] 10set.seed(2435)
sdr_result<-rSDR(X=X, Y=Y, d=2, alpha=0.3,maxiter=1000,tol=1e-7)
# An optimal of C is obtained by maximizing the target function using ManifoldOptim method
head(sdr_result$C_value)
#>              [,1]        [,2]
#> [1,] -0.195840175  0.01920542
#> [2,] -0.008489214  0.36357606
#> [3,] -0.089659622 -0.06152668
#> [4,]  0.025908724  0.35114685
#> [5,]  0.206643972  0.12064514
#> [6,] -0.317245433  0.13294555
#The value of cost function f is equal to the negative of the target function.
sdr_result$f_value
#> [1] -0.04539063
#beta=sigma_x^(-0.5)%*%C 
head(sdr_result$beta)
#>            [,1]        [,2]
#> [1,] -0.4503527 -0.46366907
#> [2,]  0.4871003  0.93057571
#> [3,] -0.1964792  0.06231069
#> [4,] -0.1416008  0.54261663
#> [5,]  0.8963119  0.19068888
#> [6,] -1.0070804 -0.46193622
#projected_data: This projects X onto the estimated SDR directions (X %*% beta), the projected matrix (data) n x d
head(sdr_result$projected_data)
#>           V1         V2
#> 1 -0.3147479  0.3907114
#> 2  0.5904977  0.8874504
#> 3 -0.2551743  0.3706773
#> 4 -0.1722560 -1.2104044
#> 5 -0.4147929  0.3816966
#> 6 -0.2916534 -0.4002997#Plot for 2-dimensional reduction using rSDR
plot_rSDR(projected_data=sdr_result$projected_data,Y=Y.factor,Y.name='group',colors=c("#374E55FF", "#DF8F44FF"))This document tested the core functionality of the rSDR
package, including:
Performing Robust Sufficient Dimension Reduction with
rSDR.
Selecting the optimal alpha using cross-validation
(optimal_alpha_cv) and bootstrap
(optimal_alpha_boot).
Visualizing the results with plot_alpha, and
plot_rSDR.
The package successfully processes the ionosphere dataset, optimizes the alpha parameter, and produces meaningful visualizations of the reduced dimensions.
Hsin-Hsiung Huang, Feng Yu & Teng Zhang (19 Feb 2024): Robust sufficient dimension reduction via α-distance covariance, Journal of Nonparametric Statistics, DOI:10.1080/10485252.2024.2313137
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.