The Regression()
function performs multiple facets of a complete regression analysis. Abbreviate with reg()
.
To illustrate, first read the Employee data included as part of lessR.
##
## >>> Suggestions
## Details about your data, Enter: details() for d, or details(name)
##
## Data Types
## ------------------------------------------------------------
## character: Non-numeric data values
## integer: Numeric data values, integers only
## double: Numeric data values with decimal digits
## ------------------------------------------------------------
##
## Variable Missing Unique
## Name Type Values Values Values First and last values
## ------------------------------------------------------------------------------------------
## 1 Years integer 36 1 16 7 NA 15 ... 1 2 10
## 2 Gender character 37 0 2 M M M ... F F M
## 3 Dept character 36 1 5 ADMN SALE SALE ... MKTG SALE FINC
## 4 Salary double 37 0 37 53788.26 94494.58 ... 56508.32 57562.36
## 5 JobSat character 35 2 3 med low low ... high low high
## 6 Plan integer 37 0 3 1 1 3 ... 2 2 1
## 7 Pre integer 37 0 27 82 62 96 ... 83 59 80
## 8 Post integer 37 0 22 92 74 97 ... 90 71 87
## ------------------------------------------------------------------------------------------
The brief version provides just the basic analysis, what Excel provides, plus a scatterplot with the regression line, which becomes a scatterplot matrix with multiple regression.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
## Predictor Variable 1: Years
## Predictor Variable 2: Pre
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 36
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 44140.971 13666.115 3.230 0.003 16337.052 71944.891
## Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
## Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
##
##
## Standard deviation of residuals: 11753.478 for 33 degrees of freedom
##
## R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
##
## Null hypothesis that all population slope coefficients are 0:
## F-statistic: 43.827 df: 2 and 33 p-value: 0.000
##
##
## df Sum Sq Mean Sq F-value p-value
## Years 1 12107157290.292 12107157290.292 87.641 0.000
## Pre 1 1639658.444 1639658.444 0.012 0.914
##
## Model 2 12108796948.736 6054398474.368 43.827 0.000
## Residuals 33 4558759843.773 138144237.690
## Salary 35 16667556792.508 476215908.357
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## RESIDUALS AND INFLUENCE
##
## FORECASTING ERROR
The full output is extensive: Summary of the analysis, estimated model, fit indices, ANOVA, correlation matrix, collinearity analysis, best subset regression, residuals and influence statistics, and prediction intervals.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
## Predictor Variable 1: Years
## Predictor Variable 2: Pre
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 36
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 44140.971 13666.115 3.230 0.003 16337.052 71944.891
## Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
## Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
##
##
## Standard deviation of residuals: 11753.478 for 33 degrees of freedom
##
## R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
##
## Null hypothesis that all population slope coefficients are 0:
## F-statistic: 43.827 df: 2 and 33 p-value: 0.000
##
##
## df Sum Sq Mean Sq F-value p-value
## Years 1 12107157290.292 12107157290.292 87.641 0.000
## Pre 1 1639658.444 1639658.444 0.012 0.914
##
## Model 2 12108796948.736 6054398474.368 43.827 0.000
## Residuals 33 4558759843.773 138144237.690
## Salary 35 16667556792.508 476215908.357
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## Salary Years Pre
## Salary 1.00 0.85 0.03
## Years 0.85 1.00 0.05
## Pre 0.03 0.05 1.00
##
##
## Tolerance VIF
## Years 0.998 1.002
## Pre 0.998 1.002
##
##
## Years Pre R2adj X's
## 1 0 0.718 1
## 1 1 0.710 2
## 0 1 -0.028 1
##
## [based on Thomas Lumley's leaps function from the leaps package]
##
##
##
## RESIDUALS AND INFLUENCE
##
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [res_rows = 20, out of 36 rows of data, or do res_rows="all"]
## -----------------------------------------------------------------------------------------
## Years Pre Salary fitted resid rstdnt dffits cooks
## Correll, Trevon 21 97 134419.230 110648.843 23770.387 2.424 1.217 0.430
## James, Leslie 18 70 122563.380 101387.773 21175.607 1.998 0.714 0.156
## Capelle, Adam 24 83 108138.430 120658.778 -12520.348 -1.211 -0.634 0.132
## Hoang, Binh 15 96 111074.860 91158.659 19916.201 1.860 0.649 0.131
## Korhalkar, Jessica 2 74 72502.500 49292.181 23210.319 2.171 0.638 0.122
## Billing, Susan 4 91 72675.260 55484.493 17190.767 1.561 0.472 0.071
## Singh, Niral 2 59 61055.440 49566.155 11489.285 1.064 0.452 0.068
## Skrotzki, Sara 18 63 91352.330 101515.627 -10163.297 -0.937 -0.397 0.053
## Saechao, Suzanne 8 98 55545.250 68362.271 -12817.021 -1.157 -0.390 0.050
## Kralik, Laura 10 74 92681.190 75303.447 17377.743 1.535 0.287 0.026
## Anastasiou, Crystal 2 59 56508.320 49566.155 6942.165 0.636 0.270 0.025
## Langston, Matthew 5 94 49188.960 58681.106 -9492.146 -0.844 -0.268 0.024
## Afshari, Anbar 6 100 69441.930 61822.925 7619.005 0.689 0.264 0.024
## Cassinelli, Anastis 10 80 57562.360 75193.857 -17631.497 -1.554 -0.265 0.022
## Osterman, Pascal 5 69 49704.790 59137.730 -9432.940 -0.826 -0.216 0.016
## Bellingar, Samantha 10 67 66337.830 75431.301 -9093.471 -0.793 -0.198 0.013
## LaRoe, Maria 10 80 61961.290 75193.857 -13232.567 -1.148 -0.195 0.013
## Ritchie, Darnell 7 82 53788.260 65403.102 -11614.842 -1.006 -0.190 0.012
## Sheppard, Cory 14 66 95027.550 88455.199 6572.351 0.579 0.176 0.011
## Downs, Deborah 7 90 57139.900 65256.982 -8117.082 -0.706 -0.174 0.010
##
##
## FORECASTING ERROR
##
## Data, Predicted, Standard Error of Forecast, 95% Prediction Intervals
## [sorted by lower bound of prediction interval]
## [to see all intervals do pred_rows="all"]
## --------------------------------------------------------------------------------------------------
## Years Pre Salary pred sf pi:lwr pi:upr width
## Hamide, Bita 1 83 51036.850 45876.388 12290.483 20871.211 70881.564 50010.352
## Singh, Niral 2 59 61055.440 49566.155 12619.291 23892.014 75240.296 51348.281
## Anastasiou, Crystal 2 59 56508.320 49566.155 12619.291 23892.014 75240.296 51348.281
## ...
## Link, Thomas 10 83 66312.890 75139.062 11933.518 50860.137 99417.987 48557.849
## LaRoe, Maria 10 80 61961.290 75193.857 11918.048 50946.405 99441.308 48494.903
## Cassinelli, Anastis 10 80 57562.360 75193.857 11918.048 50946.405 99441.308 48494.903
## ...
## Correll, Trevon 21 97 134419.230 110648.843 12881.876 84440.470 136857.217 52416.747
## Capelle, Adam 24 83 108138.430 120658.778 12955.608 94300.394 147017.161 52716.767
##
##
## ----------------------------------
## Plot 1: Distribution of Residuals
## Plot 2: Residuals vs Fitted Values
## Plot 3: ScatterPlot Matrix
## ----------------------------------
Brief output with standardization of all variables in the model. Plot the residuals as a line connecting each data point to the corresponding point on the regression line.
##
## Rescaled Data, First Six Rows
## Salary Years
## Hamide, Bita -1.044 -1.466
## Singh, Niral -0.584 -1.291
## Korhalkar, Jessica -0.059 -1.291
## Anastasiou, Crystal -0.793 -1.291
## Gvakharia, Kimberly -1.098 -1.116
## Stanley, Emma -1.269 -1.116
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ Years, rescale="z", plot_errors=TRUE, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
## Predictor Variable: Years
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 36
##
## Data are Standardized
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) -0.026 0.089 -0.299 0.767 -0.206 0.154
## Years 0.853 0.090 9.498 0.000 0.670 1.035
##
##
## Standard deviation of residuals: 0.531 for 34 degrees of freedom
##
## R-squared: 0.726 Adjusted R-squared: 0.718 PRESS R-squared: 0.681
##
## Null hypothesis that all population slope coefficients are 0:
## F-statistic: 90.217 df: 1 and 34 p-value: 0.000
##
##
## df Sum Sq Mean Sq F-value p-value
## Model 1 25.472 25.472 90.217 0.000
## Residuals 34 9.600 0.282
## Salary 35 35.072 1.002
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## RESIDUALS AND INFLUENCE
##
## FORECASTING ERROR
The standard output includes $R^2_{press}, the value of \(R^2\) when applied to new, previously unseen data. Still, a cross-validation option is also offered with the kfold
parameter. Here specify three folds.
## [1] 0.8209663 0.6795900 0.2668813
## [1] 0 0 0
## [1] 0 0 0
## digits_d: 3
## K-FOLD CROSS-VALIDATION
##
## Model from Training Data Applied to Testing Data
## ---------------------------------- ----------------------------------
## fold n se MSE Rsq n se MSE Rsq
## 1 | 24 12259.072 150284839.759 0.634 | 12 11654.158 135819407.187 0.821
## 2 | 24 11228.730 126084385.499 0.742 | 12 13672.420 186935076.838 0.680
## 3 | 24 10998.802 120973652.485 0.803 | 12 15171.451 230172932.321 0.267
## ---------------------------------- ----------------------------------
## Mean 11495.535 132447625.914 0.726 13499.343 184309138.782 0.589
The output of Regression()
can be stored into an R object, here named r. The output object consists of various components, what R calls a list object.
Entering the name of the object displays the full output.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ Years + Pre, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
## Predictor Variable 1: Years
## Predictor Variable 2: Pre
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 36
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 44140.971 13666.115 3.230 0.003 16337.052 71944.891
## Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
## Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
##
##
## Standard deviation of residuals: 11753.478 for 33 degrees of freedom
##
## R-squared: 0.726 Adjusted R-squared: 0.710 PRESS R-squared: 0.659
##
## Null hypothesis that all population slope coefficients are 0:
## F-statistic: 43.827 df: 2 and 33 p-value: 0.000
##
##
## df Sum Sq Mean Sq F-value p-value
## Years 1 12107157290.292 12107157290.292 87.641 0.000
## Pre 1 1639658.444 1639658.444 0.012 0.914
##
## Model 2 12108796948.736 6054398474.368 43.827 0.000
## Residuals 33 4558759843.773 138144237.690
## Salary 35 16667556792.508 476215908.357
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## Salary Years Pre
## Salary 1.00 0.85 0.03
## Years 0.85 1.00 0.05
## Pre 0.03 0.05 1.00
##
##
## Tolerance VIF
## Years 0.998 1.002
## Pre 0.998 1.002
##
##
## Years Pre R2adj X's
## 1 0 0.718 1
## 1 1 0.710 2
## 0 1 -0.028 1
##
## [based on Thomas Lumley's leaps function from the leaps package]
##
##
##
## RESIDUALS AND INFLUENCE
##
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [res_rows = 20, out of 36 rows of data, or do res_rows="all"]
## -----------------------------------------------------------------------------------------
## Years Pre Salary fitted resid rstdnt dffits cooks
## Correll, Trevon 21 97 134419.230 110648.843 23770.387 2.424 1.217 0.430
## James, Leslie 18 70 122563.380 101387.773 21175.607 1.998 0.714 0.156
## Capelle, Adam 24 83 108138.430 120658.778 -12520.348 -1.211 -0.634 0.132
## Hoang, Binh 15 96 111074.860 91158.659 19916.201 1.860 0.649 0.131
## Korhalkar, Jessica 2 74 72502.500 49292.181 23210.319 2.171 0.638 0.122
## Billing, Susan 4 91 72675.260 55484.493 17190.767 1.561 0.472 0.071
## Singh, Niral 2 59 61055.440 49566.155 11489.285 1.064 0.452 0.068
## Skrotzki, Sara 18 63 91352.330 101515.627 -10163.297 -0.937 -0.397 0.053
## Saechao, Suzanne 8 98 55545.250 68362.271 -12817.021 -1.157 -0.390 0.050
## Kralik, Laura 10 74 92681.190 75303.447 17377.743 1.535 0.287 0.026
## Anastasiou, Crystal 2 59 56508.320 49566.155 6942.165 0.636 0.270 0.025
## Langston, Matthew 5 94 49188.960 58681.106 -9492.146 -0.844 -0.268 0.024
## Afshari, Anbar 6 100 69441.930 61822.925 7619.005 0.689 0.264 0.024
## Cassinelli, Anastis 10 80 57562.360 75193.857 -17631.497 -1.554 -0.265 0.022
## Osterman, Pascal 5 69 49704.790 59137.730 -9432.940 -0.826 -0.216 0.016
## Bellingar, Samantha 10 67 66337.830 75431.301 -9093.471 -0.793 -0.198 0.013
## LaRoe, Maria 10 80 61961.290 75193.857 -13232.567 -1.148 -0.195 0.013
## Ritchie, Darnell 7 82 53788.260 65403.102 -11614.842 -1.006 -0.190 0.012
## Sheppard, Cory 14 66 95027.550 88455.199 6572.351 0.579 0.176 0.011
## Downs, Deborah 7 90 57139.900 65256.982 -8117.082 -0.706 -0.174 0.010
##
##
## FORECASTING ERROR
##
## Data, Predicted, Standard Error of Forecast, 95% Prediction Intervals
## [sorted by lower bound of prediction interval]
## [to see all intervals do pred_rows="all"]
## --------------------------------------------------------------------------------------------------
## Years Pre Salary pred sf pi:lwr pi:upr width
## Hamide, Bita 1 83 51036.850 45876.388 12290.483 20871.211 70881.564 50010.352
## Singh, Niral 2 59 61055.440 49566.155 12619.291 23892.014 75240.296 51348.281
## Anastasiou, Crystal 2 59 56508.320 49566.155 12619.291 23892.014 75240.296 51348.281
## ...
## Link, Thomas 10 83 66312.890 75139.062 11933.518 50860.137 99417.987 48557.849
## LaRoe, Maria 10 80 61961.290 75193.857 11918.048 50946.405 99441.308 48494.903
## Cassinelli, Anastis 10 80 57562.360 75193.857 11918.048 50946.405 99441.308 48494.903
## ...
## Correll, Trevon 21 97 134419.230 110648.843 12881.876 84440.470 136857.217 52416.747
## Capelle, Adam 24 83 108138.430 120658.778 12955.608 94300.394 147017.161 52716.767
##
##
## ----------------------------------
## Plot 1: Distribution of Residuals
## Plot 2: Residuals vs Fitted Values
## Plot 3: ScatterPlot Matrix
## ----------------------------------
Or, work with the components individually. Use the base R names()
function to identify all of the components. Component names that begin with out_
are part of the standard output. Other components include just data and statistics designed to be input in additional procedures.
## [1] "out_suggest" "call" "formula" "out_title_bck" "out_background" "out_title_basic"
## [7] "out_estimates" "out_fit" "out_anova" "out_title_kfold" "out_kfold" "out_title_rel"
## [13] "out_cor" "out_collinear" "out_subsets" "out_title_res" "out_residuals" "out_title_pred"
## [19] "out_predict" "out_ref" "out_Rmd" "out_Word" "out_pdf" "out_odt"
## [25] "out_rtf" "out_plots" "n.vars" "n.obs" "n.keep" "coefficients"
## [31] "sterrs" "tvalues" "pvalues" "cilb" "ciub" "anova_model"
## [37] "anova_residual" "anova_total" "se" "resid_range" "Rsq" "Rsqadj"
## [43] "PRESS" "RsqPRESS" "m_se" "m_MSE" "m_Rsq" "cor"
## [49] "tolerances" "vif" "resid.max" "pred_min_max" "residuals" "fitted"
## [55] "cooks.distance" "model" "terms"
Here just display the estimates as part of the standard output.
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 44140.971 13666.115 3.230 0.003 16337.052 71944.891
## Years 3251.408 347.529 9.356 0.000 2544.355 3958.462
## Pre -18.265 167.652 -0.109 0.914 -359.355 322.825
Here display the coefficients.
## (Intercept) Years Pre
## 44140.97140 3251.40825 -18.26496
Because reg()
accomplishes its computations with base R functions lm()
, can pass lm()
parameters to reg()
, which then passes their values to lm()
. Here first use base R function contr.sum()
to calculate an effect coding contrast matrix for a categorical variable with three levels, such as the variable Plan in the Employee data set.
## [,1] [,2]
## 1 1 0
## 2 0 1
## 3 -1 -1
Now use the lm()
parameter contrasts
to define the effect coding for Plan, passed to reg_brief()
. Contrasts only apply to factors, so first convert Plan to a factor.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ Plan, contrasts=list(Plan=cnt), Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
## Predictor Variable: Plan
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 37
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 76737.724 3897.284 19.690 0.000 68817.491 84657.958
## Plan1 -4166.287 5113.762 -0.815 0.421 -14558.703 6226.128
## Plan2 -6866.355 4920.990 -1.395 0.172 -16867.009 3134.299
##
##
## Standard deviation of residuals: 21456.776 for 34 degrees of freedom
##
## R-squared: 0.085 Adjusted R-squared: 0.031 PRESS R-squared: -0.133
##
## Null hypothesis that all population slope coefficients are 0:
## F-statistic: 1.580 df: 2 and 34 p-value: 0.221
##
##
## df Sum Sq Mean Sq F-value p-value
## Model 2 1454537623.133 727268811.566 1.580 0.221
## Residuals 34 15653370109.356 460393238.510
## Salary 36 17107907732.489 475219659.236
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## RESIDUALS AND INFLUENCE
##
## FORECASTING ERROR
The \(R^2\) fit statistic compares the sum of the squared errors of the model with the X predictor variables to the sum of squared errors of the null model. The baseline of comparison, the null model, is a model with no X variables such that the fitted value for each set of X values is the mean of response variable \(y\). The corresponding slope intercept is the mean of \(y\), and the standard deviation of the residuals is the standard deviation of \(y\).
The following submits the null model for Salary, and plots the errors. Compare the standard deviation of the residuals to a regression model of Salary with one or more predictor variables.
## >>> Suggestion
## # Create an R markdown file for interpretative output with Rmd = "file_name"
## reg(Salary ~ 1, plot_errors=TRUE, Rmd="eg")
##
##
## BACKGROUND
##
## Data Frame: d
##
## Response Variable: Salary
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 37
##
##
## BASIC ANALYSIS
##
## Estimate Std Err t-value p-value Lower 95% Upper 95%
## (Intercept) 73795.557 3583.821 20.591 0.000 66527.230 81063.883
##
##
## Standard deviation of residuals: 21799.533 for 36 degrees of freedom
##
##
## df Sum Sq Mean Sq F-value p-value
## Residuals 36 17107907732.489 475219659.236
##
##
## K-FOLD CROSS-VALIDATION
##
## RELATIONS AMONG THE VARIABLES
##
## RESIDUALS AND INFLUENCE
##
## FORECASTING ERROR
The parameter Rmd
creates an R markdown file that is automatically generated and html document from knitting the various output components together with full interpretation. A new, much more complete form of computer output.
Not run here.
reg(Salary ~ Years + Pre, Rmd="eg")
##
## Response Variable: Gender
## Predictor Variable 1: Salary
##
## Number of cases (rows) of data: 37
## Number of cases retained for analysis: 37
##
##
##
## BASIC ANALYSIS
##
## Model Coefficients
##
## Estimate Std Err z-value p-value Lower 95% Upper 95%
## (Intercept) -2.6191 1.3715 -1.910 0.056 -5.3073 0.0691
## Salary 0.0000 0.0000 1.904 0.057 -0.0000 0.0001
##
##
## Odds ratios and confidence intervals
##
## Odds Ratio Lower 95% Upper 95%
## (Intercept) 0.0729 0.0050 1.0715
## Salary 1.0000 1.0000 1.0001
##
##
## Model Fit
##
## Null deviance: 51.266 on 36 degrees of freedom
## Residual deviance: 46.918 on 35 degrees of freedom
##
## AIC: 50.91807
##
## Number of iterations to convergence: 4
##
##
##
##
## ANALYSIS OF RESIDUALS AND INFLUENCE
## Data, Fitted, Residual, Studentized Residual, Dffits, Cook's Distance
## [sorted by Cook's Distance]
## [res_rows = 20 out of 37 cases (rows) of data]
## --------------------------------------------------------------------
## Salary Gender fitted residual rstudent dffits cooks
## James, Leslie 122563 F 0.8424 -0.8424 -2.1213 -0.7143 0.46299
## Langston, Matthew 49189 M 0.2900 0.7100 1.6237 0.3646 0.08559
## Osterman, Pascal 49705 M 0.2938 0.7062 1.6139 0.3586 0.08225
## Kralik, Laura 92681 F 0.6522 -0.6522 -1.4942 -0.3313 0.06402
## Ritchie, Darnell 53788 M 0.3243 0.6757 1.5380 0.3136 0.05962
## Skrotzki, Sara 91352 F 0.6416 -0.6416 -1.4698 -0.3161 0.05736
## Cassinelli, Anastis 57562 M 0.3539 0.6461 1.4703 0.2761 0.04409
## Link, Thomas 66313 M 0.4267 0.5733 1.3223 0.2111 0.02335
## Anderson, David 69548 M 0.4547 0.5453 1.2706 0.1967 0.01962
## Stanley, Grayson 69625 M 0.4553 0.5447 1.2694 0.1965 0.01955
## Capelle, Adam 108138 M 0.7632 0.2368 0.7586 0.2236 0.01954
## Knox, Michael 99063 M 0.7011 0.2989 0.8637 0.2179 0.01935
## Hoang, Binh 111075 M 0.7813 0.2187 0.7265 0.2228 0.01919
## Sheppard, Cory 95028 M 0.6706 0.3294 0.9132 0.2119 0.01869
## Wu, James 94495 M 0.6665 0.3335 0.9199 0.2110 0.01859
## Campagna, Justin 72321 M 0.4788 0.5212 1.2275 0.1888 0.01759
## Fulton, Scott 87786 M 0.6124 0.3876 1.0066 0.1980 0.01706
## Adib, Hassan 83014 M 0.5720 0.4280 1.0715 0.1892 0.01613
## Pham, Scott 81871 M 0.5622 0.4378 1.0875 0.1875 0.01599
## Portlock, Ryan 77715 M 0.5261 0.4739 1.1469 0.1841 0.01593
##
##
## FORECASTS
##
## Probability threshold for predicting M: 0.5
##
## 0: F
## 1: M
##
## Data, Fitted Values, Standard Errors
## [sorted by fitted value]
## --------------------------------------------------------------------
## Salary Gender predict fitted std.err
## Stanley, Emma 46125 F 0 0.2684 0.1161
## Langston, Matthew 49189 M 0 0.2900 0.1126
## Osterman, Pascal 49705 M 0 0.2938 0.1119
## Gvakharia, Kimberly 49869 F 0 0.2949 0.1117
##
## ... for the rows of data where fitted is close to 0.5 ...
##
## Salary Gender predict fitted std.err
## Campagna, Justin 72321 M 0 0.4788 0.08710
## Korhalkar, Jessica 72502 F 0 0.4804 0.08713
## Billing, Susan 72675 F 0 0.4819 0.08718
## Portlock, Ryan 77715 M 1 0.5261 0.09079
## Pham, Scott 81871 M 1 0.5622 0.09670
##
## ... for the last 4 rows of sorted data ...
##
## Salary Gender predict fitted std.err
## Capelle, Adam 108138 M 1 0.7632 0.1355
## Hoang, Binh 111075 M 1 0.7813 0.1364
## James, Leslie 122563 F 1 0.8424 0.1318
## Correll, Trevon 134419 M 1 0.8901 0.1174
## --------------------------------------------------------------------
##
##
## Confusion Matrix for Gender
##
## Probability threshold for predicting M: 0.5
##
## Baseline Predicted
## ---------------------------------------------------
## Total %Tot 0 1 %Correct
## ---------------------------------------------------
## 0 19 51.4 16 3 84.2
## Gender 1 18 48.6 8 10 55.6
## ---------------------------------------------------
## Total 37 70.3
##
## Accuracy: 70.27
## Recall: 55.56
## Precision: 76.92
Specify multiple logistic regression with the usual R formula syntax. Specify additional probability thresholds beyond just the default 0.5 with the prob_cut
parameter.
Logit(Gender ~ Years + Salary, prob_cut=c(.3, .5 .7))
Use the base R help()
function to view the full manual for Regression()
. Simply enter a question mark followed by the name of the function, or its abbreviation.
?reg