The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

DisImpact Tutorial

Vinh Nguyen

2022-10-10

Introduction

The DisImpact R package contains functions that help in determining disproportionate impact (DI) based on the following methodologies:

  1. percentage point gap (PPG) method,
  2. proportionality index method (method #1 in reference), and
  3. 80% index method (method #2 in reference).

Install Package

# From CRAN (Official)
install.packages('DisImpact')

# From github (Development)
devtools::install_github('vinhdizzo/DisImpact')

Load Packages

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Load toy student equity data

To illustrate the functionality of the package, let’s load a toy data set:

# Load fake data set
data(student_equity)

# Print first few observations
head(student_equity)
##         Ethnicity Gender Cohort Transfer Cohort_Math Math Cohort_English
## 1 Native American Female   2017        0        2017    1           2017
## 2 Native American Female   2017        0        2018    1             NA
## 3 Native American Female   2017        0        2018    1           2017
## 4 Native American   Male   2017        1        2017    1           2018
## 5 Native American   Male   2017        0        2017    1           2019
## 6 Native American   Male   2017        1        2019    1           2018
##   English      Ed_Goal     College_Status Student_ID EthnicityFlag_Asian
## 1       0 Deg/Transfer First-time College     100001                   0
## 2      NA Deg/Transfer First-time College     100002                   0
## 3       0 Deg/Transfer First-time College     100003                   0
## 4       1        Other First-time College     100004                   0
## 5       0 Deg/Transfer              Other     100005                   0
## 6       1        Other First-time College     100006                   0
##   EthnicityFlag_Black EthnicityFlag_Hispanic EthnicityFlag_NativeAmerican
## 1                   0                      0                            1
## 2                   0                      0                            1
## 3                   0                      0                            1
## 4                   0                      0                            1
## 5                   0                      0                            1
## 6                   0                      0                            1
##   EthnicityFlag_PacificIslander EthnicityFlag_White EthnicityFlag_Carribean
## 1                             0                   0                       0
## 2                             0                   0                       0
## 3                             0                   0                       0
## 4                             0                   0                       0
## 5                             0                   0                       0
## 6                             0                   0                       0
##   EthnicityFlag_EastAsian EthnicityFlag_SouthEastAsian
## 1                       0                            0
## 2                       0                            0
## 3                       0                            0
## 4                       0                            0
## 5                       0                            0
## 6                       0                            0
##   EthnicityFlag_SouthWestAsianNorthAfrican EthnicityFlag_AANAPI
## 1                                        0                    1
## 2                                        0                    1
## 3                                        0                    1
## 4                                        0                    1
## 5                                        0                    1
## 6                                        0                    1
##   EthnicityFlag_Unknown EthnicityFlag_TwoorMoreRaces
## 1                     0                            0
## 2                     0                            0
## 3                     0                            0
## 4                     0                            0
## 5                     0                            0
## 6                     0                            0
# For description of data set
## ?student_equity

For a description of the student_equity data set, type ?student_equity in the R console.

The toy data set can be summarized as follows:

# Summarize toy data
dim(student_equity)
## [1] 20000    24
dSumm <- student_equity %>%
  group_by(Cohort, Ethnicity) %>%
  summarize(n=n(), Transfer_Rate=mean(Transfer))
## `summarise()` has grouped output by 'Cohort'. You can override using the
## `.groups` argument.
dSumm ## This is a summarized version of the data set
## # A tibble: 12 x 4
## # Groups:   Cohort [2]
##    Cohort Ethnicity           n Transfer_Rate
##     <int> <chr>           <int>         <dbl>
##  1   2017 Asian            3000         0.687
##  2   2017 Black            1000         0.31 
##  3   2017 Hispanic         2000         0.205
##  4   2017 Multi-Ethnicity   500         0.524
##  5   2017 Native American   100         0.43 
##  6   2017 White            3400         0.604
##  7   2018 Asian            3000         0.743
##  8   2018 Black            1000         0.297
##  9   2018 Hispanic         2000         0.218
## 10   2018 Multi-Ethnicity   500         0.484
## 11   2018 Native American   100         0.35 
## 12   2018 White            3400         0.631

Percentage point gap (PPG) method

di_ppg is the main work function, and it can take on vectors or column names the tidy way:

# Vector
di_ppg(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   386
## 3 0.1817500 0.2417500            1                  1139
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0
# Tidy and column reference
di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   386
## 3 0.1817500 0.2417500            1                  1139
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0

For a description of the di_ppg function, including both function arguments and returned results, type ?di_ppg in the R console.

Sometimes, one might want to break out the DI calculation by cohort:

# Cohort
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   174
## 3  0.03000000 0.1750000 0.2350000            1                   558
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   211
## 9  0.03000000 0.1885000 0.2485000            1                   581
## 10 0.04382693 0.4401731 0.5278269            1                     6
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         205
## 3                         619
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         242
## 9                         641
## 10                         28
## 11                         19
## 12                          0

di_ppg is also applicable to summarized data; just pass the counts to success and group size to weight. For example, we use the summarized data set, dSumm, and sample size n, in the following:

di_ppg(success=Transfer_Rate*n, group=Ethnicity, cohort=Cohort, weight=n, data=dSumm) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   174
## 3  0.03000000 0.1750000 0.2350000            1                   558
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   211
## 9  0.03000000 0.1885000 0.2485000            1                   581
## 10 0.04382693 0.4401731 0.5278269            1                     6
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         205
## 3                         619
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         242
## 9                         641
## 10                         28
## 11                         19
## 12                          0

By default, di_ppg uses the overall success rate as the reference rate for comparison (default: reference='overall'). The reference argument also accepts 'hpg' (highest performing group success rate as the reference rate), 'all but current' (success rate of all groups combined excluding the comparison group), or a group value from group.

# Reference: Highest performing group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='hpg', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   347
## 3  0.03000000 0.1750000 0.2350000            1                   905
## 4  0.04382693 0.4801731 0.5678269            1                    60
## 5  0.09800000 0.3320000 0.5280000            1                    16
## 6  0.03000000 0.5738235 0.6338235            1                   182
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   416
## 9  0.03000000 0.1885000 0.2485000            1                   990
## 10 0.04382693 0.4401731 0.5278269            1                   108
## 11 0.09800000 0.2520000 0.4480000            1                    30
## 12 0.03000000 0.6014706 0.6614706            1                   279
##    success_needed_full_parity
## 1                           0
## 2                         378
## 3                         965
## 4                          82
## 5                          26
## 6                         284
## 7                           0
## 8                         447
## 9                        1050
## 10                        130
## 11                         40
## 12                        381
# Reference: All but current (PPG minus 1)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='all but current', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.4397143 all but current
## 2    2017           Black 1000     310 0.3100000 0.5366667 all but current
## 3    2017        Hispanic 2000     410 0.2050000 0.5912500 all but current
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.5134737 all but current
## 5    2017 Native American  100      43 0.4300000 0.5148485 all but current
## 6    2017           White 3400    2053 0.6038235 0.4677273 all but current
## 7    2018           Asian 3000    2230 0.7433333 0.4511429 all but current
## 8    2018           Black 1000     297 0.2970000 0.5656667 all but current
## 9    2018        Hispanic 2000     437 0.2185000 0.6188750 all but current
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.5416842 all but current
## 11   2018 Native American  100      35 0.3500000 0.5407071 all but current
## 12   2018           White 3400    2147 0.6314706 0.4910606 all but current
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   196
## 3  0.03000000 0.1750000 0.2350000            1                   713
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   238
## 9  0.03000000 0.1885000 0.2485000            1                   741
## 10 0.04382693 0.4401731 0.5278269            1                     7
## 11 0.09800000 0.2520000 0.4480000            1                    10
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         227
## 3                         773
## 4                           0
## 5                           9
## 6                           0
## 7                           0
## 8                         269
## 9                         801
## 10                         29
## 11                         20
## 12                          0
# Reference: custom group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='White', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6038235           White
## 2    2017           Black 1000     310 0.3100000 0.6038235           White
## 3    2017        Hispanic 2000     410 0.2050000 0.6038235           White
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6038235           White
## 5    2017 Native American  100      43 0.4300000 0.6038235           White
## 6    2017           White 3400    2053 0.6038235 0.6038235           White
## 7    2018           Asian 3000    2230 0.7433333 0.6314706           White
## 8    2018           Black 1000     297 0.2970000 0.6314706           White
## 9    2018        Hispanic 2000     437 0.2185000 0.6314706           White
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.6314706           White
## 11   2018 Native American  100      35 0.3500000 0.6314706           White
## 12   2018           White 3400    2147 0.6314706 0.6314706           White
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   263
## 3  0.03000000 0.1750000 0.2350000            1                   738
## 4  0.04382693 0.4801731 0.5678269            1                    18
## 5  0.09800000 0.3320000 0.5280000            1                     8
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   304
## 9  0.03000000 0.1885000 0.2485000            1                   766
## 10 0.04382693 0.4401731 0.5278269            1                    52
## 11 0.09800000 0.2520000 0.4480000            1                    19
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         294
## 3                         798
## 4                          40
## 5                          18
## 6                           0
## 7                           0
## 8                         335
## 9                         826
## 10                         74
## 11                         29
## 12                          0
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='Asian', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   347
## 3  0.03000000 0.1750000 0.2350000            1                   905
## 4  0.04382693 0.4801731 0.5678269            1                    60
## 5  0.09800000 0.3320000 0.5280000            1                    16
## 6  0.03000000 0.5738235 0.6338235            1                   182
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   416
## 9  0.03000000 0.1885000 0.2485000            1                   990
## 10 0.04382693 0.4401731 0.5278269            1                   108
## 11 0.09800000 0.2520000 0.4480000            1                    30
## 12 0.03000000 0.6014706 0.6614706            1                   279
##    success_needed_full_parity
## 1                           0
## 2                         378
## 3                         965
## 4                          82
## 5                          26
## 6                         284
## 7                           0
## 8                         447
## 9                        1050
## 10                        130
## 11                         40
## 12                        381

The user could also pass in custom reference points for comparison (e.g., a state-wide rate). di_ppg accepts either a single reference point to be used or a vector of reference points, one for each cohort. For the latter, the vector of reference points will be taken to correspond to the cohort variable, alphabetically ordered.

# With custom reference (single)
di_ppg(success=Transfer, group=Ethnicity, reference=0.54, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333      0.54         numeric 0.03000000
## 2           Black 2000     607 0.3035000      0.54         numeric 0.03000000
## 3        Hispanic 4000     847 0.2117500      0.54         numeric 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000      0.54         numeric 0.03099032
## 5 Native American  200      78 0.3900000      0.54         numeric 0.06929646
## 6           White 6800    4200 0.6176471      0.54         numeric 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2735000 0.3335000            1                   414
## 3 0.1817500 0.2417500            1                  1193
## 4 0.4730097 0.5349903            1                     6
## 5 0.3207035 0.4592965            1                    17
## 6 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        474
## 3                       1314
## 4                         37
## 5                         31
## 6                          0
# With custom reference (multiple)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference=c(0.5, 0.55), data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333      0.50         numeric
## 2    2017           Black 1000     310 0.3100000      0.50         numeric
## 3    2017        Hispanic 2000     410 0.2050000      0.50         numeric
## 4    2017 Multi-Ethnicity  500     262 0.5240000      0.50         numeric
## 5    2017 Native American  100      43 0.4300000      0.50         numeric
## 6    2017           White 3400    2053 0.6038235      0.50         numeric
## 7    2018           Asian 3000    2230 0.7433333      0.55         numeric
## 8    2018           Black 1000     297 0.2970000      0.55         numeric
## 9    2018        Hispanic 2000     437 0.2185000      0.55         numeric
## 10   2018 Multi-Ethnicity  500     242 0.4840000      0.55         numeric
## 11   2018 Native American  100      35 0.3500000      0.55         numeric
## 12   2018           White 3400    2147 0.6314706      0.55         numeric
##           moe    pct_lo    pct_hi di_indicator success_needed_not_di
## 1  0.03000000 0.6573333 0.7173333            0                     0
## 2  0.03099032 0.2790097 0.3409903            1                   160
## 3  0.03000000 0.1750000 0.2350000            1                   530
## 4  0.04382693 0.4801731 0.5678269            0                     0
## 5  0.09800000 0.3320000 0.5280000            0                     0
## 6  0.03000000 0.5738235 0.6338235            0                     0
## 7  0.03000000 0.7133333 0.7733333            0                     0
## 8  0.03099032 0.2660097 0.3279903            1                   223
## 9  0.03000000 0.1885000 0.2485000            1                   604
## 10 0.04382693 0.4401731 0.5278269            1                    12
## 11 0.09800000 0.2520000 0.4480000            1                    11
## 12 0.03000000 0.6014706 0.6614706            0                     0
##    success_needed_full_parity
## 1                           0
## 2                         190
## 3                         591
## 4                           0
## 5                           8
## 6                           0
## 7                           0
## 8                         254
## 9                         663
## 10                         34
## 11                         21
## 12                          0

Disproportionate impact using the PPG relies on calculating the margine margin of error (MOE) pertaining around the success rate. The MOE calculated in di_ppg has 2 underlying assumptions (defaults):

  1. the minimum MOE returned is 0.03, and
  2. using 0.50 as the proportion in the margin of error formula, \(1.96 \times \sqrt{\hat{p} (1-\hat{p}) / n}\).

To override 1, the user could specify min_moe in di_ppg. To override 2, the user could specify use_prop_in_moe=TRUE in di_ppg.

# min_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.02000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.02191347
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.02000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 6           White 6800    4200 0.6176471    0.5264         overall 0.02000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333            0                     0
## 2 0.2815865 0.3254135            1                   402
## 3 0.1917500 0.2317500            1                  1179
## 4 0.4730097 0.5349903            0                     0
## 5 0.3207035 0.4592965            1                    14
## 6 0.5976471 0.6376471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0
# use_prop_in_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02, use_prop_in_moe=TRUE) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.02000000
## 2           Black 2000     607 0.3035000    0.5264         overall 0.02015028
## 3        Hispanic 4000     847 0.2117500    0.5264         overall 0.02000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03098933
## 5 Native American  200      78 0.3900000    0.5264         overall 0.06759869
## 6           White 6800    4200 0.6176471    0.5264         overall 0.02000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6953333 0.7353333            0                     0
## 2 0.2833497 0.3236503            1                   406
## 3 0.1917500 0.2317500            1                  1179
## 4 0.4730107 0.5349893            0                     0
## 5 0.3224013 0.4575987            1                    14
## 6 0.5976471 0.6376471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        446
## 3                       1259
## 4                         23
## 5                         28
## 6                          0

In cases where the proportion is used in calculating MOE, an observed proportion of 0 or 1 would lead to a zero MOE. To account for these scenarios, the user could leverage the prop_sub_0 and prop_sub_1 parameters in di_ppg and ppg_moe as substitutes. These parameters default to 0.5, which maximizes the MOE (making it more difficult to declare disproportionate impact).

# Set Native American to have have zero transfers and see what the results
di_ppg(success=Transfer, group=Ethnicity, data=student_equity %>% mutate(Transfer=ifelse(Ethnicity=='Native American', 0, Transfer)), use_prop_in_moe=TRUE, prop_sub_0=0.1, prop_sub_1=0.9) %>%
  as.data.frame
## Warning in ppg_moe(n = n, proportion = pct, min_moe = min_moe, prop_sub_0 =
## prop_sub_0, : The vector `proportion` contains 0. This will lead to a zero MOE.
## `prop_sub_0=0.1` will be used in calculating the MOE for these cases.
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5225         overall 0.03000000
## 2           Black 2000     607 0.3035000    0.5225         overall 0.03000000
## 3        Hispanic 4000     847 0.2117500    0.5225         overall 0.03000000
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5225         overall 0.03098933
## 5 Native American  200       0 0.0000000    0.5225         overall 0.04157788
## 6           White 6800    4200 0.6176471    0.5225         overall 0.03000000
##        pct_lo     pct_hi di_indicator success_needed_not_di
## 1  0.68533333 0.74533333            0                     0
## 2  0.27350000 0.33350000            1                   378
## 3  0.18175000 0.24175000            1                  1123
## 4  0.47301067 0.53498933            0                     0
## 5 -0.04157788 0.04157788            1                    97
## 6  0.58764706 0.64764706            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        438
## 3                       1243
## 4                         19
## 5                        105
## 6                          0

Proportionality index method

di_prop_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success pct_success pct_group di_prop_index di_indicator
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159            0
## 2           Black 2000     607 0.057655775      0.10     0.5765578            1
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606            1
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468            0
## 5 Native American  200      78 0.007408815      0.01     0.7408815            1
## 6           White 6800    4200 0.398936170      0.34     1.1733417            0
##   success_needed_not_di success_needed_full_parity
## 1                     0                          0
## 2                   256                        496
## 3                   998                       1574
## 4                     0                         24
## 5                     7                         28
## 6                     0                          0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success pct_success pct_group di_prop_index di_indicator
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159            0
## 2           Black 2000     607 0.057655775      0.10     0.5765578            1
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606            1
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468            0
## 5 Native American  200      78 0.007408815      0.01     0.7408815            1
## 6           White 6800    4200 0.398936170      0.34     1.1733417            0
##   success_needed_not_di success_needed_full_parity
## 1                     0                          0
## 2                   256                        496
## 3                   998                       1574
## 4                     0                         24
## 5                     7                         28
## 6                     0                          0
# With cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             1                   111                        227
## 3             1                   491                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             1                   146                        269
## 9             1                   507                        801
## 10            0                     0                         29
## 11            1                     9                         20
## 12            0                     0                          0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             1                   111                        227
## 3             1                   491                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             1                   146                        269
## 9             1                   507                        801
## 10            0                     0                         29
## 11            1                     9                         20
## 12            0                     0                          0

For a description of the di_prop_index function, including both function arguments and returned results, type ?di_prop_index in the R console.

Note that the referenced document describing this method does not recommend a threshold on the proportionality index for declaring disproportionate impact. The di_prop_index function uses di_prop_index_cutoff=0.8 as the default threshold, which the user could change.

# Changing threshold for DI
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_prop_index_cutoff=0.5) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator success_needed_not_di success_needed_full_parity
## 1             0                     0                          0
## 2             0                     0                        227
## 3             1                   116                        773
## 4             0                     0                          0
## 5             0                     0                          9
## 6             0                     0                          0
## 7             0                     0                          0
## 8             0                     0                        269
## 9             1                   114                        801
## 10            0                     0                         29
## 11            0                     0                         20
## 12            0                     0                          0

80% index method

di_80_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference reference_group di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333           Asian   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333           Asian   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333           Asian   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333           Asian   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333           Asian   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333           Asian   0.8634395
##   di_indicator success_needed_not_di success_needed_full_parity
## 1            0                     0                          0
## 2            1                   538                        824
## 3            1                  1443                       2015
## 4            1                    69                        212
## 5            1                    37                         66
## 6            0                     0                        665
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group di_80_index
## 1           Asian 6000    4292 0.7153333 0.7153333           Asian   1.0000000
## 2           Black 2000     607 0.3035000 0.7153333           Asian   0.4242777
## 3        Hispanic 4000     847 0.2117500 0.7153333           Asian   0.2960158
## 4 Multi-Ethnicity 1000     504 0.5040000 0.7153333           Asian   0.7045666
## 5 Native American  200      78 0.3900000 0.7153333           Asian   0.5452004
## 6           White 6800    4200 0.6176471 0.7153333           Asian   0.8634395
##   di_indicator success_needed_not_di success_needed_full_parity
## 1            0                     0                          0
## 2            1                   538                        824
## 3            1                  1443                       2015
## 4            1                    69                        212
## 5            1                    37                         66
## 6            0                     0                        665
# With cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                   240                        378
## 3    0.2982541            1                   690                        965
## 4    0.7623666            1                    13                         82
## 5    0.6256062            1                    12                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                   298                        447
## 9    0.2939462            1                   753                       1050
## 10   0.6511211            1                    56                        130
## 11   0.4708520            1                    25                         40
## 12   0.8495120            0                     0                        381
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                   240                        378
## 3    0.2982541            1                   690                        965
## 4    0.7623666            1                    13                         82
## 5    0.6256062            1                    12                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                   298                        447
## 9    0.2939462            1                   753                       1050
## 10   0.6511211            1                    56                        130
## 11   0.4708520            1                    25                         40
## 12   0.8495120            0                     0                        381

For a description of the di_80_index function, including both function arguments and returned results, type ?di_80_index in the R console.

By default, di_80_index uses the group with the highest success rate as reference in calculating the index. One could specify the the comparison group using the reference_group argument (a value from group).

# Changing reference group
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, reference_group='White') %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6038235           White
## 2    2017           Black 1000     310 0.3100000 0.6038235           White
## 3    2017        Hispanic 2000     410 0.2050000 0.6038235           White
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6038235           White
## 5    2017 Native American  100      43 0.4300000 0.6038235           White
## 6    2017           White 3400    2053 0.6038235 0.6038235           White
## 7    2018           Asian 3000    2230 0.7433333 0.6314706           White
## 8    2018           Black 1000     297 0.2970000 0.6314706           White
## 9    2018        Hispanic 2000     437 0.2185000 0.6314706           White
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.6314706           White
## 11   2018 Native American  100      35 0.3500000 0.6314706           White
## 12   2018           White 3400    2147 0.6314706 0.6314706           White
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.1383017            0                     0                          0
## 2    0.5133950            1                   174                        294
## 3    0.3395032            1                   557                        798
## 4    0.8678032            0                     0                         40
## 5    0.7121286            1                     6                         18
## 6    1.0000000            0                     0                          0
## 7    1.1771464            0                     0                          0
## 8    0.4703307            1                   209                        335
## 9    0.3460177            1                   574                        826
## 10   0.7664648            1                    11                         74
## 11   0.5542618            1                    16                         29
## 12   1.0000000            0                     0                          0

By default, di_80_index uses 80% (di_80_index_cutoff=0.80) as the default threshold for declaring disproportionate impact. One could override this using another threshold via the di_80_index_cutoff argument.

# Changing threshold for DI
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_80_index_cutoff=0.50) %>% as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##    di_80_index di_indicator success_needed_not_di success_needed_full_parity
## 1    1.0000000            0                     0                          0
## 2    0.4510184            1                    34                        378
## 3    0.2982541            1                   278                        965
## 4    0.7623666            0                     0                         82
## 5    0.6256062            0                     0                         26
## 6    0.8785017            0                     0                        284
## 7    1.0000000            0                     0                          0
## 8    0.3995516            1                    75                        447
## 9    0.2939462            1                   307                       1050
## 10   0.6511211            0                     0                        130
## 11   0.4708520            1                     3                         40
## 12   0.8495120            0                     0                        381

When dealing with a non-success variable like drop-out or probation

All methods and functions implemented in the DisImpact package treat outcomes as positive: 1 is desired over 0 (higher rate is better, lower rate indicates disparity). The choice of the name success in the functions’ arguments is intentional to remind the user of this.

Suppose we have a variable that indicates something negative (e.g., a flag for students on academic probation). We could calculate DI on the converse of it by using the ! (logical negation) operator:

## di_ppg(success=!Probation, group=Ethnicity, data=student_equity) %>%
##   as.data.frame ## If there were a Probation variable
di_ppg(success=!Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame ## Illustrating the point with `!`
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    1708 0.2846667    0.4736         overall 0.03000000
## 2           Black 2000    1393 0.6965000    0.4736         overall 0.03000000
## 3        Hispanic 4000    3153 0.7882500    0.4736         overall 0.03000000
## 4 Multi-Ethnicity 1000     496 0.4960000    0.4736         overall 0.03099032
## 5 Native American  200     122 0.6100000    0.4736         overall 0.06929646
## 6           White 6800    2600 0.3823529    0.4736         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.2546667 0.3146667            1                   954
## 2 0.6665000 0.7265000            0                     0
## 3 0.7582500 0.8182500            0                     0
## 4 0.4650097 0.5269903            0                     0
## 5 0.5407035 0.6792965            0                     0
## 6 0.3523529 0.4123529            1                   417
##   success_needed_full_parity
## 1                       1134
## 2                          0
## 3                          0
## 4                          0
## 5                          0
## 6                        621

Transformations on the fly

We can compute the success, group, and cohort variables on the fly:

# Transform success
a <- sample(0:1, size=nrow(student_equity), replace=TRUE, prob=c(0.95, 0.05))
mean(a)
## [1] 0.05065
di_ppg(success=pmax(Transfer, a), group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4379 0.7298333    0.5504         overall 0.03000000
## 2           Black 2000     683 0.3415000    0.5504         overall 0.03000000
## 3        Hispanic 4000    1002 0.2505000    0.5504         overall 0.03000000
## 4 Multi-Ethnicity 1000     533 0.5330000    0.5504         overall 0.03099032
## 5 Native American  200      86 0.4300000    0.5504         overall 0.06929646
## 6           White 6800    4325 0.6360294    0.5504         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6998333 0.7598333            0                     0
## 2 0.3115000 0.3715000            1                   358
## 3 0.2205000 0.2805000            1                  1080
## 4 0.5020097 0.5639903            0                     0
## 5 0.3607035 0.4992965            1                    11
## 6 0.6060294 0.6660294            0                     0
##   success_needed_full_parity
## 1                          0
## 2                        418
## 3                       1200
## 4                         18
## 5                         25
## 6                          0
# Collapse Black and Hispanic
di_ppg(success=Transfer, group=ifelse(Ethnicity %in% c('Black', 'Hispanic'), 'Black/Hispanic', Ethnicity), data=student_equity) %>% as.data.frame
##             group    n success       pct reference reference_group        moe
## 1           Asian 6000    4292 0.7153333    0.5264         overall 0.03000000
## 2  Black/Hispanic 6000    1454 0.2423333    0.5264         overall 0.03000000
## 3 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall 0.03099032
## 4 Native American  200      78 0.3900000    0.5264         overall 0.06929646
## 5           White 6800    4200 0.6176471    0.5264         overall 0.03000000
##      pct_lo    pct_hi di_indicator success_needed_not_di
## 1 0.6853333 0.7453333            0                     0
## 2 0.2123333 0.2723333            1                  1525
## 3 0.4730097 0.5349903            0                     0
## 4 0.3207035 0.4592965            1                    14
## 5 0.5876471 0.6476471            0                     0
##   success_needed_full_parity
## 1                          0
## 2                       1705
## 3                         23
## 4                         28
## 5                          0

Calculate DI for many variables and groups

It is often the case that the user desires to calculate disproportionate impact across many outcome variables and many disaggregation/group variables. The function di_iterate allows the user to specify a data set and the various variables to iterate across:

# Multiple group variables
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall') %>% as.data.frame
##    success_variable cohort_variable cohort disaggregation           group     n
## 1          Transfer          Cohort   2017         - None           - All 10000
## 2          Transfer          Cohort   2017      Ethnicity           Asian  3000
## 3          Transfer          Cohort   2017      Ethnicity           Black  1000
## 4          Transfer          Cohort   2017      Ethnicity        Hispanic  2000
## 5          Transfer          Cohort   2017      Ethnicity Multi-Ethnicity   500
## 6          Transfer          Cohort   2017      Ethnicity Native American   100
## 7          Transfer          Cohort   2017      Ethnicity           White  3400
## 8          Transfer          Cohort   2017         Gender          Female  4930
## 9          Transfer          Cohort   2017         Gender            Male  4886
## 10         Transfer          Cohort   2017         Gender           Other   184
## 11         Transfer          Cohort   2018         - None           - All 10000
## 12         Transfer          Cohort   2018      Ethnicity           Asian  3000
## 13         Transfer          Cohort   2018      Ethnicity           Black  1000
## 14         Transfer          Cohort   2018      Ethnicity        Hispanic  2000
## 15         Transfer          Cohort   2018      Ethnicity Multi-Ethnicity   500
## 16         Transfer          Cohort   2018      Ethnicity Native American   100
## 17         Transfer          Cohort   2018      Ethnicity           White  3400
## 18         Transfer          Cohort   2018         Gender          Female  4928
## 19         Transfer          Cohort   2018         Gender            Male  4880
## 20         Transfer          Cohort   2018         Gender           Other   192
##    success       pct ppg_reference ppg_reference_group        moe    pct_lo
## 1     5140 0.5140000        0.5140             overall 0.03000000 0.4840000
## 2     2062 0.6873333        0.5140             overall 0.03000000 0.6573333
## 3      310 0.3100000        0.5140             overall 0.03099032 0.2790097
## 4      410 0.2050000        0.5140             overall 0.03000000 0.1750000
## 5      262 0.5240000        0.5140             overall 0.04382693 0.4801731
## 6       43 0.4300000        0.5140             overall 0.09800000 0.3320000
## 7     2053 0.6038235        0.5140             overall 0.03000000 0.5738235
## 8     2513 0.5097363        0.5140             overall 0.03000000 0.4797363
## 9     2548 0.5214900        0.5140             overall 0.03000000 0.4914900
## 10      79 0.4293478        0.5140             overall 0.07224656 0.3571013
## 11    5388 0.5388000        0.5388             overall 0.03000000 0.5088000
## 12    2230 0.7433333        0.5388             overall 0.03000000 0.7133333
## 13     297 0.2970000        0.5388             overall 0.03099032 0.2660097
## 14     437 0.2185000        0.5388             overall 0.03000000 0.1885000
## 15     242 0.4840000        0.5388             overall 0.04382693 0.4401731
## 16      35 0.3500000        0.5388             overall 0.09800000 0.2520000
## 17    2147 0.6314706        0.5388             overall 0.03000000 0.6014706
## 18    2638 0.5353084        0.5388             overall 0.03000000 0.5053084
## 19    2642 0.5413934        0.5388             overall 0.03000000 0.5113934
## 20     108 0.5625000        0.5388             overall 0.07072541 0.4917746
##       pct_hi di_indicator_ppg success_needed_not_di_ppg
## 1  0.5440000                0                         0
## 2  0.7173333                0                         0
## 3  0.3409903                1                       174
## 4  0.2350000                1                       558
## 5  0.5678269                0                         0
## 6  0.5280000                0                         0
## 7  0.6338235                0                         0
## 8  0.5397363                0                         0
## 9  0.5514900                0                         0
## 10 0.5015944                1                         3
## 11 0.5688000                0                         0
## 12 0.7733333                0                         0
## 13 0.3279903                1                       211
## 14 0.2485000                1                       581
## 15 0.5278269                1                         6
## 16 0.4480000                1                        10
## 17 0.6614706                0                         0
## 18 0.5653084                0                         0
## 19 0.5713934                0                         0
## 20 0.6332254                0                         0
##    success_needed_full_parity_ppg di_prop_index di_indicator_prop_index
## 1                               0     1.0000000                       0
## 2                               0     1.3372244                       0
## 3                             205     0.6031128                       1
## 4                             619     0.3988327                       1
## 5                               0     1.0194553                       0
## 6                               9     0.8365759                       0
## 7                               0     1.1747539                       0
## 8                              22     0.9917049                       0
## 9                               0     1.0145719                       0
## 10                             16     0.8353071                       0
## 11                              0     1.0000000                       0
## 12                              0     1.3796090                       0
## 13                            242     0.5512249                       1
## 14                            641     0.4055308                       1
## 15                             28     0.8982925                       0
## 16                             19     0.6495917                       1
## 17                              0     1.1719944                       0
## 18                             18     0.9935198                       0
## 19                              0     1.0048134                       0
## 20                              0     1.0439866                       0
##    success_needed_not_di_prop_index success_needed_full_parity_prop_index
## 1                                 0                                     0
## 2                                 0                                     0
## 3                               111                                   227
## 4                               491                                   773
## 5                                 0                                     0
## 6                                 0                                     9
## 7                                 0                                     0
## 8                                 0                                    42
## 9                                 0                                     0
## 10                                0                                    16
## 11                                0                                     0
## 12                                0                                     0
## 13                              146                                   269
## 14                              507                                   801
## 15                                0                                    29
## 16                                9                                    20
## 17                                0                                     0
## 18                                0                                    34
## 19                                0                                     0
## 20                                0                                     0
##    di_80_index_reference_group di_80_index di_indicator_80_index
## 1                        - All   1.0000000                     0
## 2                        Asian   1.0000000                     0
## 3                        Asian   0.4510184                     1
## 4                        Asian   0.2982541                     1
## 5                        Asian   0.7623666                     1
## 6                        Asian   0.6256062                     1
## 7                        Asian   0.8785017                     0
## 8                         Male   0.9774614                     0
## 9                         Male   1.0000000                     0
## 10                        Male   0.8233098                     0
## 11                       - All   1.0000000                     0
## 12                       Asian   1.0000000                     0
## 13                       Asian   0.3995516                     1
## 14                       Asian   0.2939462                     1
## 15                       Asian   0.6511211                     1
## 16                       Asian   0.4708520                     1
## 17                       Asian   0.8495120                     0
## 18                       Other   0.9516595                     0
## 19                       Other   0.9624772                     0
## 20                       Other   1.0000000                     0
##    success_needed_not_di_80_index success_needed_full_parity_80_index
## 1                               0                                   0
## 2                               0                                   0
## 3                             240                                 378
## 4                             690                                 965
## 5                              13                                  82
## 6                              12                                  26
## 7                               0                                 284
## 8                               0                                  58
## 9                               0                                   0
## 10                              0                                  17
## 11                              0                                   0
## 12                              0                                   0
## 13                            298                                 447
## 14                            753                                1050
## 15                             56                                 130
## 16                             25                                  40
## 17                              0                                 381
## 18                              0                                 134
## 19                              0                                 103
## 20                              0                                   0
# Multiple group variables and different reference groups

bind_rows(
  di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall')
  , di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups=c('White', 'Male'), include_non_disagg_results=FALSE) # include_non_disagg_results = FALSE: Already have this scenario in Overall run
)
## # A tibble: 38 x 25
##    success_variable cohort_variable cohort disaggregation group        n success
##    <chr>            <chr>            <int> <chr>          <chr>    <dbl>   <int>
##  1 Transfer         Cohort            2017 - None         - All    10000    5140
##  2 Transfer         Cohort            2017 Ethnicity      Asian     3000    2062
##  3 Transfer         Cohort            2017 Ethnicity      Black     1000     310
##  4 Transfer         Cohort            2017 Ethnicity      Hispanic  2000     410
##  5 Transfer         Cohort            2017 Ethnicity      Multi-E~   500     262
##  6 Transfer         Cohort            2017 Ethnicity      Native ~   100      43
##  7 Transfer         Cohort            2017 Ethnicity      White     3400    2053
##  8 Transfer         Cohort            2017 Gender         Female    4930    2513
##  9 Transfer         Cohort            2017 Gender         Male      4886    2548
## 10 Transfer         Cohort            2017 Gender         Other      184      79
## # ... with 28 more rows, and 18 more variables: pct <dbl>, ppg_reference <dbl>,
## #   ppg_reference_group <chr>, moe <dbl>, pct_lo <dbl>, pct_hi <dbl>,
## #   di_indicator_ppg <dbl>, success_needed_not_di_ppg <dbl>,
## #   success_needed_full_parity_ppg <dbl>, di_prop_index <dbl>,
## #   di_indicator_prop_index <dbl>, success_needed_not_di_prop_index <dbl>,
## #   success_needed_full_parity_prop_index <dbl>,
## #   di_80_index_reference_group <chr>, di_80_index <dbl>, ...

There is a separate vignette that explains how one might leverage di_iterate for rapid dashboard development and deployment with disaggregation and disproportionate impact features.

Appendix: R and R Package Versions

This vignette was generated using an R session with the following packages. There may be some discrepancies when the reader replicates the code caused by version mismatch.

sessionInfo()
## R version 4.0.2 (2020-06-22)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19044)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=C                          
## [2] LC_CTYPE=English_United States.1252   
## [3] LC_MONETARY=English_United States.1252
## [4] LC_NUMERIC=C                          
## [5] LC_TIME=English_United States.1252    
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
## [1] forcats_0.5.0    scales_1.1.1     ggplot2_3.3.2    stringr_1.4.0   
## [5] knitr_1.39       dplyr_1.0.8      DisImpact_0.0.21
## 
## loaded via a namespace (and not attached):
##  [1] Rcpp_1.0.8.3      highr_0.9         pillar_1.7.0      bslib_0.3.1      
##  [5] compiler_4.0.2    jquerylib_0.1.4   sets_1.0-21       prettydoc_0.4.1  
##  [9] tools_4.0.2       digest_0.6.25     gtable_0.3.0      jsonlite_1.7.0   
## [13] evaluate_0.15     lifecycle_1.0.1   tibble_3.1.6      fstcore_0.9.12   
## [17] pkgconfig_2.0.3   rlang_1.0.1       DBI_1.1.0         cli_3.2.0        
## [21] parallel_4.0.2    yaml_2.3.5        xfun_0.30         fastmap_1.1.0    
## [25] withr_2.5.0       duckdb_0.5.0      generics_0.1.2    vctrs_0.3.8      
## [29] sass_0.4.1        grid_4.0.2        tidyselect_1.1.2  data.table_1.14.3
## [33] glue_1.6.1        R6_2.3.0          fansi_1.0.2       rmarkdown_2.14   
## [37] farver_2.0.3      tidyr_1.2.0       purrr_0.3.4       blob_1.2.1       
## [41] magrittr_2.0.2    htmltools_0.5.2   ellipsis_0.3.2    fst_0.9.8        
## [45] assertthat_0.2.1  colorspace_1.4-1  collapse_1.8.8    labeling_0.3     
## [49] utf8_1.2.2        stringi_1.4.6     munsell_0.5.0     crayon_1.5.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.