Introduction

The DisImpact R package contains functions that help in determining disproportionate impact (DI) based on the following methodologies:

  1. percentage point gap (PPG) method,
  2. proportionality index method (method #1 in reference), and
  3. 80% index method (method #2 in reference).

Install Package

# From CRAN (Official)
## install.packages('DisImpact')

# From github (Development)
## devtools::install_github('vinhdizzo/DisImpact')

Load Packages

library(DisImpact)
library(dplyr) # Ease in manipulations with data frames

Load toy student equity data

To illustrate the functionality of the package, let's load a toy data set:

# Load fake data set
data(student_equity)

The toy data set can be summarized as follows:

# Summarize toy data
dim(student_equity)
## [1] 20000    11
dSumm <- student_equity %>%
  group_by(Cohort, Ethnicity) %>%
  summarize(n=n(), Transfer_Rate=mean(Transfer))
dSumm ## This is a summarized version of the data set
## # A tibble: 12 x 4
## # Groups:   Cohort [2]
##    Cohort Ethnicity           n Transfer_Rate
##     <int> <chr>           <int>         <dbl>
##  1   2017 Asian            3000         0.687
##  2   2017 Black            1000         0.31 
##  3   2017 Hispanic         2000         0.205
##  4   2017 Multi-Ethnicity   500         0.524
##  5   2017 Native American   100         0.43 
##  6   2017 White            3400         0.604
##  7   2018 Asian            3000         0.743
##  8   2018 Black            1000         0.297
##  9   2018 Hispanic         2000         0.218
## 10   2018 Multi-Ethnicity   500         0.484
## 11   2018 Native American   100         0.35 
## 12   2018 White            3400         0.631

Percentage point gap (PPG) method

di_ppg is the main work function, and it can take on vectors or column names the tidy way:

# Vector
di_ppg(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5264         overall
## 2           Black 2000     607 0.3035000    0.5264         overall
## 3        Hispanic 4000     847 0.2117500    0.5264         overall
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall
## 5 Native American  200      78 0.3900000    0.5264         overall
## 6           White 6800    4200 0.6176471    0.5264         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.6853333 0.7453333            0
## 2 0.03000000 0.2735000 0.3335000            1
## 3 0.03000000 0.1817500 0.2417500            1
## 4 0.03099032 0.4730097 0.5349903            0
## 5 0.06929646 0.3207035 0.4592965            1
## 6 0.03000000 0.5876471 0.6476471            0
# Tidy and column reference
di_ppg(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5264         overall
## 2           Black 2000     607 0.3035000    0.5264         overall
## 3        Hispanic 4000     847 0.2117500    0.5264         overall
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall
## 5 Native American  200      78 0.3900000    0.5264         overall
## 6           White 6800    4200 0.6176471    0.5264         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.6853333 0.7453333            0
## 2 0.03000000 0.2735000 0.3335000            1
## 3 0.03000000 0.1817500 0.2417500            1
## 4 0.03099032 0.4730097 0.5349903            0
## 5 0.06929646 0.3207035 0.4592965            1
## 6 0.03000000 0.5876471 0.6476471            0

Sometimes, one might want to break out the DI calculation by cohort:

# Cohort
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            0
## 5  0.09800000 0.3320000 0.5280000            0
## 6  0.03000000 0.5738235 0.6338235            0
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            0

di_ppg is also applicable to summarized data; just pass the counts to success and group size to weight. For example, we use the summarized data set, dSumm, and sample size n, in the following:

di_ppg(success=Transfer_Rate*n, group=Ethnicity, cohort=Cohort, weight=n, data=dSumm) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333    0.5140         overall
## 2    2017           Black 1000     310 0.3100000    0.5140         overall
## 3    2017        Hispanic 2000     410 0.2050000    0.5140         overall
## 4    2017 Multi-Ethnicity  500     262 0.5240000    0.5140         overall
## 5    2017 Native American  100      43 0.4300000    0.5140         overall
## 6    2017           White 3400    2053 0.6038235    0.5140         overall
## 7    2018           Asian 3000    2230 0.7433333    0.5388         overall
## 8    2018           Black 1000     297 0.2970000    0.5388         overall
## 9    2018        Hispanic 2000     437 0.2185000    0.5388         overall
## 10   2018 Multi-Ethnicity  500     242 0.4840000    0.5388         overall
## 11   2018 Native American  100      35 0.3500000    0.5388         overall
## 12   2018           White 3400    2147 0.6314706    0.5388         overall
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            0
## 5  0.09800000 0.3320000 0.5280000            0
## 6  0.03000000 0.5738235 0.6338235            0
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            0

By default, di_ppg uses the overall success rate as the reference rate for comparison (default: reference='overall'). The reference argument also accepts 'hpg' (highest performing group success rate as the reference rate), 'all but current' (success rate of all groups combined excluding the comparison group), or a group value from group.

# Reference: Highest performing group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='hpg', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333             hpg
## 2    2017           Black 1000     310 0.3100000 0.6873333             hpg
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333             hpg
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333             hpg
## 5    2017 Native American  100      43 0.4300000 0.6873333             hpg
## 6    2017           White 3400    2053 0.6038235 0.6873333             hpg
## 7    2018           Asian 3000    2230 0.7433333 0.7433333             hpg
## 8    2018           Black 1000     297 0.2970000 0.7433333             hpg
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333             hpg
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333             hpg
## 11   2018 Native American  100      35 0.3500000 0.7433333             hpg
## 12   2018           White 3400    2147 0.6314706 0.7433333             hpg
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            1
## 5  0.09800000 0.3320000 0.5280000            1
## 6  0.03000000 0.5738235 0.6338235            1
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            1
# Reference: All but current (PPG minus 1)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='all but current', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.4397143 all but current
## 2    2017           Black 1000     310 0.3100000 0.5366667 all but current
## 3    2017        Hispanic 2000     410 0.2050000 0.5912500 all but current
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.5134737 all but current
## 5    2017 Native American  100      43 0.4300000 0.5148485 all but current
## 6    2017           White 3400    2053 0.6038235 0.4677273 all but current
## 7    2018           Asian 3000    2230 0.7433333 0.4511429 all but current
## 8    2018           Black 1000     297 0.2970000 0.5656667 all but current
## 9    2018        Hispanic 2000     437 0.2185000 0.6188750 all but current
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.5416842 all but current
## 11   2018 Native American  100      35 0.3500000 0.5407071 all but current
## 12   2018           White 3400    2147 0.6314706 0.4910606 all but current
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            0
## 5  0.09800000 0.3320000 0.5280000            0
## 6  0.03000000 0.5738235 0.6338235            0
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            0
# Reference: custom group
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='White', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6038235           White
## 2    2017           Black 1000     310 0.3100000 0.6038235           White
## 3    2017        Hispanic 2000     410 0.2050000 0.6038235           White
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6038235           White
## 5    2017 Native American  100      43 0.4300000 0.6038235           White
## 6    2017           White 3400    2053 0.6038235 0.6038235           White
## 7    2018           Asian 3000    2230 0.7433333 0.6314706           White
## 8    2018           Black 1000     297 0.2970000 0.6314706           White
## 9    2018        Hispanic 2000     437 0.2185000 0.6314706           White
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.6314706           White
## 11   2018 Native American  100      35 0.3500000 0.6314706           White
## 12   2018           White 3400    2147 0.6314706 0.6314706           White
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            1
## 5  0.09800000 0.3320000 0.5280000            1
## 6  0.03000000 0.5738235 0.6338235            0
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            0
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference='Asian', data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333 0.6873333           Asian
## 2    2017           Black 1000     310 0.3100000 0.6873333           Asian
## 3    2017        Hispanic 2000     410 0.2050000 0.6873333           Asian
## 4    2017 Multi-Ethnicity  500     262 0.5240000 0.6873333           Asian
## 5    2017 Native American  100      43 0.4300000 0.6873333           Asian
## 6    2017           White 3400    2053 0.6038235 0.6873333           Asian
## 7    2018           Asian 3000    2230 0.7433333 0.7433333           Asian
## 8    2018           Black 1000     297 0.2970000 0.7433333           Asian
## 9    2018        Hispanic 2000     437 0.2185000 0.7433333           Asian
## 10   2018 Multi-Ethnicity  500     242 0.4840000 0.7433333           Asian
## 11   2018 Native American  100      35 0.3500000 0.7433333           Asian
## 12   2018           White 3400    2147 0.6314706 0.7433333           Asian
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            1
## 5  0.09800000 0.3320000 0.5280000            1
## 6  0.03000000 0.5738235 0.6338235            1
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            1

The user could also pass in custom reference points for comparison (eg, a state-wide rate). di_ppg accepts either a single reference point to be used or a vector of reference points, one for each cohort. For the latter, the vector of reference points will be taken to correspond to the cohort variable, alphabetically ordered.

# With custom reference (single)
di_ppg(success=Transfer, group=Ethnicity, reference=0.54, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333      0.54         numeric
## 2           Black 2000     607 0.3035000      0.54         numeric
## 3        Hispanic 4000     847 0.2117500      0.54         numeric
## 4 Multi-Ethnicity 1000     504 0.5040000      0.54         numeric
## 5 Native American  200      78 0.3900000      0.54         numeric
## 6           White 6800    4200 0.6176471      0.54         numeric
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.6853333 0.7453333            0
## 2 0.03000000 0.2735000 0.3335000            1
## 3 0.03000000 0.1817500 0.2417500            1
## 4 0.03099032 0.4730097 0.5349903            1
## 5 0.06929646 0.3207035 0.4592965            1
## 6 0.03000000 0.5876471 0.6476471            0
# With custom reference (multiple)
di_ppg(success=Transfer, group=Ethnicity, cohort=Cohort, reference=c(0.5, 0.55), data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference reference_group
## 1    2017           Asian 3000    2062 0.6873333      0.50         numeric
## 2    2017           Black 1000     310 0.3100000      0.50         numeric
## 3    2017        Hispanic 2000     410 0.2050000      0.50         numeric
## 4    2017 Multi-Ethnicity  500     262 0.5240000      0.50         numeric
## 5    2017 Native American  100      43 0.4300000      0.50         numeric
## 6    2017           White 3400    2053 0.6038235      0.50         numeric
## 7    2018           Asian 3000    2230 0.7433333      0.55         numeric
## 8    2018           Black 1000     297 0.2970000      0.55         numeric
## 9    2018        Hispanic 2000     437 0.2185000      0.55         numeric
## 10   2018 Multi-Ethnicity  500     242 0.4840000      0.55         numeric
## 11   2018 Native American  100      35 0.3500000      0.55         numeric
## 12   2018           White 3400    2147 0.6314706      0.55         numeric
##           moe    pct_lo    pct_hi di_indicator
## 1  0.03000000 0.6573333 0.7173333            0
## 2  0.03099032 0.2790097 0.3409903            1
## 3  0.03000000 0.1750000 0.2350000            1
## 4  0.04382693 0.4801731 0.5678269            0
## 5  0.09800000 0.3320000 0.5280000            0
## 6  0.03000000 0.5738235 0.6338235            0
## 7  0.03000000 0.7133333 0.7733333            0
## 8  0.03099032 0.2660097 0.3279903            1
## 9  0.03000000 0.1885000 0.2485000            1
## 10 0.04382693 0.4401731 0.5278269            1
## 11 0.09800000 0.2520000 0.4480000            1
## 12 0.03000000 0.6014706 0.6614706            0

Disproportionate impact using the PPG relies on calculating the margine margin of error (MOE) pertaining around the success rate. The MOE calculated in di_ppg has 2 underlying assumptions (defaults):

  1. the minimum MOE returned is 0.03, and
  2. using 0.50 as the proportion in the margin of error formula, \(1.96 \times \sqrt{\hat{p} (1-\hat{p}) / n}\).

To override 1, the user could specify min_moe in di_ppg. To override 2, the user could specify use_prop_in_moe=TRUE in di_ppg.

# min_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02) %>%
  as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5264         overall
## 2           Black 2000     607 0.3035000    0.5264         overall
## 3        Hispanic 4000     847 0.2117500    0.5264         overall
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall
## 5 Native American  200      78 0.3900000    0.5264         overall
## 6           White 6800    4200 0.6176471    0.5264         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.02000000 0.6953333 0.7353333            0
## 2 0.02191347 0.2815865 0.3254135            1
## 3 0.02000000 0.1917500 0.2317500            1
## 4 0.03099032 0.4730097 0.5349903            0
## 5 0.06929646 0.3207035 0.4592965            1
## 6 0.02000000 0.5976471 0.6376471            0
# use_prop_in_moe
di_ppg(success=Transfer, group=Ethnicity, data=student_equity, min_moe=0.02, use_prop_in_moe=TRUE) %>%
  as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5264         overall
## 2           Black 2000     607 0.3035000    0.5264         overall
## 3        Hispanic 4000     847 0.2117500    0.5264         overall
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall
## 5 Native American  200      78 0.3900000    0.5264         overall
## 6           White 6800    4200 0.6176471    0.5264         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.02000000 0.6953333 0.7353333            0
## 2 0.02015028 0.2833497 0.3236503            1
## 3 0.02000000 0.1917500 0.2317500            1
## 4 0.03098933 0.4730107 0.5349893            0
## 5 0.06759869 0.3224013 0.4575987            1
## 6 0.02000000 0.5976471 0.6376471            0

In cases where the proportion is used in calculating MOE, an observed proportion of 0 or 1 would lead to a zero MOE. To account for these scenarios, the user could leverage the prop_sub_0 and prop_sub_1 parameters in di_ppg and ppg_moe as substitutes. These parameters default to 0.5, which maximizes the MOE (making it more difficult to declare disproportionate impact).

# Set Native American to have have zero transfers and see what the results
di_ppg(success=Transfer, group=Ethnicity, data=student_equity %>% mutate(Transfer=ifelse(Ethnicity=='Native American', 0, Transfer)), use_prop_in_moe=TRUE, prop_sub_0=0.1, prop_sub_1=0.9) %>%
  as.data.frame
## Warning in ppg_moe(n = n, proportion = pct, min_moe = min_moe, prop_sub_0 =
## prop_sub_0, : The vector `proportion` contains 0. This will lead to a zero
## MOE. `prop_sub_0=0.1` will be used in calculating the MOE for these cases.
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5225         overall
## 2           Black 2000     607 0.3035000    0.5225         overall
## 3        Hispanic 4000     847 0.2117500    0.5225         overall
## 4 Multi-Ethnicity 1000     504 0.5040000    0.5225         overall
## 5 Native American  200       0 0.0000000    0.5225         overall
## 6           White 6800    4200 0.6176471    0.5225         overall
##          moe      pct_lo     pct_hi di_indicator
## 1 0.03000000  0.68533333 0.74533333            0
## 2 0.03000000  0.27350000 0.33350000            1
## 3 0.03000000  0.18175000 0.24175000            1
## 4 0.03098933  0.47301067 0.53498933            0
## 5 0.04157788 -0.04157788 0.04157788            1
## 6 0.03000000  0.58764706 0.64764706            0

Proportionality index

di_prop_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success pct_success pct_group di_prop_index
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159
## 2           Black 2000     607 0.057655775      0.10     0.5765578
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468
## 5 Native American  200      78 0.007408815      0.01     0.7408815
## 6           White 6800    4200 0.398936170      0.34     1.1733417
##   di_indicator
## 1            0
## 2            1
## 3            1
## 4            0
## 5            1
## 6            0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success pct_success pct_group di_prop_index
## 1           Asian 6000    4292 0.407674772      0.30     1.3589159
## 2           Black 2000     607 0.057655775      0.10     0.5765578
## 3        Hispanic 4000     847 0.080452128      0.20     0.4022606
## 4 Multi-Ethnicity 1000     504 0.047872340      0.05     0.9574468
## 5 Native American  200      78 0.007408815      0.01     0.7408815
## 6           White 6800    4200 0.398936170      0.34     1.1733417
##   di_indicator
## 1            0
## 2            1
## 3            1
## 4            0
## 5            1
## 6            0
# With cohort
## Vector
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator
## 1             0
## 2             1
## 3             1
## 4             0
## 5             0
## 6             0
## 7             0
## 8             1
## 9             1
## 10            0
## 11            1
## 12            0
## Tidy and column reference
di_prop_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator
## 1             0
## 2             1
## 3             1
## 4             0
## 5             0
## 6             0
## 7             0
## 8             1
## 9             1
## 10            0
## 11            1
## 12            0

Note that the referenced document describing this method does not recommend a threshold on the proportionality index for declaring disproportionate impact. The di_prop_index function uses di_prop_index_cutoff=0.8 as the default threshold, which the user could change.

# Changing threshold for DI
di_prop_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_prop_index_cutoff=0.5) %>% as.data.frame
##    cohort           group    n success pct_success pct_group di_prop_index
## 1    2017           Asian 3000    2062 0.401167315      0.30     1.3372244
## 2    2017           Black 1000     310 0.060311284      0.10     0.6031128
## 3    2017        Hispanic 2000     410 0.079766537      0.20     0.3988327
## 4    2017 Multi-Ethnicity  500     262 0.050972763      0.05     1.0194553
## 5    2017 Native American  100      43 0.008365759      0.01     0.8365759
## 6    2017           White 3400    2053 0.399416342      0.34     1.1747539
## 7    2018           Asian 3000    2230 0.413882702      0.30     1.3796090
## 8    2018           Black 1000     297 0.055122494      0.10     0.5512249
## 9    2018        Hispanic 2000     437 0.081106162      0.20     0.4055308
## 10   2018 Multi-Ethnicity  500     242 0.044914625      0.05     0.8982925
## 11   2018 Native American  100      35 0.006495917      0.01     0.6495917
## 12   2018           White 3400    2147 0.398478099      0.34     1.1719944
##    di_indicator
## 1             0
## 2             0
## 3             1
## 4             0
## 5             0
## 6             0
## 7             0
## 8             0
## 9             1
## 10            0
## 11            0
## 12            0

80% index

di_80_index is the main work function for this method, and it can take on vectors or column names the tidy way:

# Without cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity) %>% as.data.frame
##             group    n success       pct reference_group reference
## 1           Asian 6000    4292 0.7153333           Asian 0.7153333
## 2           Black 2000     607 0.3035000           Asian 0.7153333
## 3        Hispanic 4000     847 0.2117500           Asian 0.7153333
## 4 Multi-Ethnicity 1000     504 0.5040000           Asian 0.7153333
## 5 Native American  200      78 0.3900000           Asian 0.7153333
## 6           White 6800    4200 0.6176471           Asian 0.7153333
##   di_80_index di_indicator
## 1   1.0000000            0
## 2   0.4242777            1
## 3   0.2960158            1
## 4   0.7045666            1
## 5   0.5452004            1
## 6   0.8634395            0
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference_group reference
## 1           Asian 6000    4292 0.7153333           Asian 0.7153333
## 2           Black 2000     607 0.3035000           Asian 0.7153333
## 3        Hispanic 4000     847 0.2117500           Asian 0.7153333
## 4 Multi-Ethnicity 1000     504 0.5040000           Asian 0.7153333
## 5 Native American  200      78 0.3900000           Asian 0.7153333
## 6           White 6800    4200 0.6176471           Asian 0.7153333
##   di_80_index di_indicator
## 1   1.0000000            0
## 2   0.4242777            1
## 3   0.2960158            1
## 4   0.7045666            1
## 5   0.5452004            1
## 6   0.8634395            0
# With cohort
## Vector
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort) %>% as.data.frame
##    cohort           group    n success       pct reference_group reference
## 1    2017           Asian 3000    2062 0.6873333           Asian 0.6873333
## 2    2017           Black 1000     310 0.3100000           Asian 0.6873333
## 3    2017        Hispanic 2000     410 0.2050000           Asian 0.6873333
## 4    2017 Multi-Ethnicity  500     262 0.5240000           Asian 0.6873333
## 5    2017 Native American  100      43 0.4300000           Asian 0.6873333
## 6    2017           White 3400    2053 0.6038235           Asian 0.6873333
## 7    2018           Asian 3000    2230 0.7433333           Asian 0.7433333
## 8    2018           Black 1000     297 0.2970000           Asian 0.7433333
## 9    2018        Hispanic 2000     437 0.2185000           Asian 0.7433333
## 10   2018 Multi-Ethnicity  500     242 0.4840000           Asian 0.7433333
## 11   2018 Native American  100      35 0.3500000           Asian 0.7433333
## 12   2018           White 3400    2147 0.6314706           Asian 0.7433333
##    di_80_index di_indicator
## 1    1.0000000            0
## 2    0.4510184            1
## 3    0.2982541            1
## 4    0.7623666            1
## 5    0.6256062            1
## 6    0.8785017            0
## 7    1.0000000            0
## 8    0.3995516            1
## 9    0.2939462            1
## 10   0.6511211            1
## 11   0.4708520            1
## 12   0.8495120            0
## Tidy and column reference
di_80_index(success=Transfer, group=Ethnicity, cohort=Cohort, data=student_equity) %>%
  as.data.frame
##    cohort           group    n success       pct reference_group reference
## 1    2017           Asian 3000    2062 0.6873333           Asian 0.6873333
## 2    2017           Black 1000     310 0.3100000           Asian 0.6873333
## 3    2017        Hispanic 2000     410 0.2050000           Asian 0.6873333
## 4    2017 Multi-Ethnicity  500     262 0.5240000           Asian 0.6873333
## 5    2017 Native American  100      43 0.4300000           Asian 0.6873333
## 6    2017           White 3400    2053 0.6038235           Asian 0.6873333
## 7    2018           Asian 3000    2230 0.7433333           Asian 0.7433333
## 8    2018           Black 1000     297 0.2970000           Asian 0.7433333
## 9    2018        Hispanic 2000     437 0.2185000           Asian 0.7433333
## 10   2018 Multi-Ethnicity  500     242 0.4840000           Asian 0.7433333
## 11   2018 Native American  100      35 0.3500000           Asian 0.7433333
## 12   2018           White 3400    2147 0.6314706           Asian 0.7433333
##    di_80_index di_indicator
## 1    1.0000000            0
## 2    0.4510184            1
## 3    0.2982541            1
## 4    0.7623666            1
## 5    0.6256062            1
## 6    0.8785017            0
## 7    1.0000000            0
## 8    0.3995516            1
## 9    0.2939462            1
## 10   0.6511211            1
## 11   0.4708520            1
## 12   0.8495120            0

By default, di_80_index uses the group with the highest success rate as reference in calculating the index. One could specify the the comparison group using the reference_group argument (a value from group).

# Changing reference group
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, reference_group='White') %>% as.data.frame
##    cohort           group    n success       pct reference_group reference
## 1    2017           Asian 3000    2062 0.6873333           White 0.6038235
## 2    2017           Black 1000     310 0.3100000           White 0.6038235
## 3    2017        Hispanic 2000     410 0.2050000           White 0.6038235
## 4    2017 Multi-Ethnicity  500     262 0.5240000           White 0.6038235
## 5    2017 Native American  100      43 0.4300000           White 0.6038235
## 6    2017           White 3400    2053 0.6038235           White 0.6038235
## 7    2018           Asian 3000    2230 0.7433333           White 0.6314706
## 8    2018           Black 1000     297 0.2970000           White 0.6314706
## 9    2018        Hispanic 2000     437 0.2185000           White 0.6314706
## 10   2018 Multi-Ethnicity  500     242 0.4840000           White 0.6314706
## 11   2018 Native American  100      35 0.3500000           White 0.6314706
## 12   2018           White 3400    2147 0.6314706           White 0.6314706
##    di_80_index di_indicator
## 1    1.1383017            0
## 2    0.5133950            1
## 3    0.3395032            1
## 4    0.8678032            0
## 5    0.7121286            1
## 6    1.0000000            0
## 7    1.1771464            0
## 8    0.4703307            1
## 9    0.3460177            1
## 10   0.7664648            1
## 11   0.5542618            1
## 12   1.0000000            0

By default, di_80_index uses 80% (di_80_index_cutoff=0.80) as the default threshold for declaring disproportionate impact. One could override this using another threshold via the di_80_index_cutoff argument.

# Changing threshold for DI
di_80_index(success=student_equity$Transfer, group=student_equity$Ethnicity, cohort=student_equity$Cohort, di_80_index_cutoff=0.50) %>% as.data.frame
##    cohort           group    n success       pct reference_group reference
## 1    2017           Asian 3000    2062 0.6873333           Asian 0.6873333
## 2    2017           Black 1000     310 0.3100000           Asian 0.6873333
## 3    2017        Hispanic 2000     410 0.2050000           Asian 0.6873333
## 4    2017 Multi-Ethnicity  500     262 0.5240000           Asian 0.6873333
## 5    2017 Native American  100      43 0.4300000           Asian 0.6873333
## 6    2017           White 3400    2053 0.6038235           Asian 0.6873333
## 7    2018           Asian 3000    2230 0.7433333           Asian 0.7433333
## 8    2018           Black 1000     297 0.2970000           Asian 0.7433333
## 9    2018        Hispanic 2000     437 0.2185000           Asian 0.7433333
## 10   2018 Multi-Ethnicity  500     242 0.4840000           Asian 0.7433333
## 11   2018 Native American  100      35 0.3500000           Asian 0.7433333
## 12   2018           White 3400    2147 0.6314706           Asian 0.7433333
##    di_80_index di_indicator
## 1    1.0000000            0
## 2    0.4510184            1
## 3    0.2982541            1
## 4    0.7623666            0
## 5    0.6256062            0
## 6    0.8785017            0
## 7    1.0000000            0
## 8    0.3995516            1
## 9    0.2939462            1
## 10   0.6511211            0
## 11   0.4708520            1
## 12   0.8495120            0

When dealing with a non-success variable like drop-out or probation

All methods and functions implemented in the DisImpact package treat outcomes as positive: 1 is desired over 0 (higher rate is better, lower rate indicates disparity). The choice of the name success in the functions' arguments is intentional to remind the user of this.

Suppose we have a variable that indicates something negative (eg, a flag for students on academic probation). We could calculate DI on the converse of it by using the ! (logical negation) operator:

## di_ppg(success=!Probation, group=Ethnicity, data=student_equity) %>%
##   as.data.frame ## If there were a Probation variable
di_ppg(success=!Transfer, group=Ethnicity, data=student_equity) %>%
  as.data.frame ## Illustrating the point with `!`
##             group    n success       pct reference reference_group
## 1           Asian 6000    1708 0.2846667    0.4736         overall
## 2           Black 2000    1393 0.6965000    0.4736         overall
## 3        Hispanic 4000    3153 0.7882500    0.4736         overall
## 4 Multi-Ethnicity 1000     496 0.4960000    0.4736         overall
## 5 Native American  200     122 0.6100000    0.4736         overall
## 6           White 6800    2600 0.3823529    0.4736         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.2546667 0.3146667            1
## 2 0.03000000 0.6665000 0.7265000            0
## 3 0.03000000 0.7582500 0.8182500            0
## 4 0.03099032 0.4650097 0.5269903            0
## 5 0.06929646 0.5407035 0.6792965            0
## 6 0.03000000 0.3523529 0.4123529            1

Transformations on the fly

We can compute the success, group, and cohort variables on the fly:

# Transform success
a <- sample(0:1, size=nrow(student_equity), replace=TRUE, prob=c(0.95, 0.05))
mean(a)
## [1] 0.04885
di_ppg(success=pmax(Transfer, a), group=Ethnicity, data=student_equity) %>%
  as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4363 0.7271667   0.54845         overall
## 2           Black 2000     676 0.3380000   0.54845         overall
## 3        Hispanic 4000     994 0.2485000   0.54845         overall
## 4 Multi-Ethnicity 1000     525 0.5250000   0.54845         overall
## 5 Native American  200      81 0.4050000   0.54845         overall
## 6           White 6800    4330 0.6367647   0.54845         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.6971667 0.7571667            0
## 2 0.03000000 0.3080000 0.3680000            1
## 3 0.03000000 0.2185000 0.2785000            1
## 4 0.03099032 0.4940097 0.5559903            0
## 5 0.06929646 0.3357035 0.4742965            1
## 6 0.03000000 0.6067647 0.6667647            0
# Collapse Black and Hispanic
di_ppg(success=Transfer, group=ifelse(Ethnicity %in% c('Black', 'Hispanic'), 'Black/Hispanic', Ethnicity), data=student_equity) %>% as.data.frame
##             group    n success       pct reference reference_group
## 1           Asian 6000    4292 0.7153333    0.5264         overall
## 2  Black/Hispanic 6000    1454 0.2423333    0.5264         overall
## 3 Multi-Ethnicity 1000     504 0.5040000    0.5264         overall
## 4 Native American  200      78 0.3900000    0.5264         overall
## 5           White 6800    4200 0.6176471    0.5264         overall
##          moe    pct_lo    pct_hi di_indicator
## 1 0.03000000 0.6853333 0.7453333            0
## 2 0.03000000 0.2123333 0.2723333            1
## 3 0.03099032 0.4730097 0.5349903            0
## 4 0.06929646 0.3207035 0.4592965            1
## 5 0.03000000 0.5876471 0.6476471            0

Calculate DI for many variables and groups

It is often the case that the user desires to calculate disproportionate impact across many outcome variables and many disaggregation/group variables. The function di_iterate allows the user to specify a data set and the various variables to iterate across:

# Multiple group variables
di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall') %>% as.data.frame
##    success_variable cohort_variable cohort disaggregation           group
## 1          Transfer          Cohort   2017      Ethnicity           Asian
## 2          Transfer          Cohort   2017      Ethnicity           Black
## 3          Transfer          Cohort   2017      Ethnicity        Hispanic
## 4          Transfer          Cohort   2017      Ethnicity Multi-Ethnicity
## 5          Transfer          Cohort   2017      Ethnicity Native American
## 6          Transfer          Cohort   2017      Ethnicity           White
## 7          Transfer          Cohort   2018      Ethnicity           Asian
## 8          Transfer          Cohort   2018      Ethnicity           Black
## 9          Transfer          Cohort   2018      Ethnicity        Hispanic
## 10         Transfer          Cohort   2018      Ethnicity Multi-Ethnicity
## 11         Transfer          Cohort   2018      Ethnicity Native American
## 12         Transfer          Cohort   2018      Ethnicity           White
## 13         Transfer          Cohort   2017         Gender          Female
## 14         Transfer          Cohort   2017         Gender            Male
## 15         Transfer          Cohort   2017         Gender           Other
## 16         Transfer          Cohort   2018         Gender          Female
## 17         Transfer          Cohort   2018         Gender            Male
## 18         Transfer          Cohort   2018         Gender           Other
## 19         Transfer          Cohort   2017         - None           - All
## 20         Transfer          Cohort   2018         - None           - All
##        n success       pct ppg_reference ppg_reference_group        moe
## 1   3000    2062 0.6873333        0.5140             overall 0.03000000
## 2   1000     310 0.3100000        0.5140             overall 0.03099032
## 3   2000     410 0.2050000        0.5140             overall 0.03000000
## 4    500     262 0.5240000        0.5140             overall 0.04382693
## 5    100      43 0.4300000        0.5140             overall 0.09800000
## 6   3400    2053 0.6038235        0.5140             overall 0.03000000
## 7   3000    2230 0.7433333        0.5388             overall 0.03000000
## 8   1000     297 0.2970000        0.5388             overall 0.03099032
## 9   2000     437 0.2185000        0.5388             overall 0.03000000
## 10   500     242 0.4840000        0.5388             overall 0.04382693
## 11   100      35 0.3500000        0.5388             overall 0.09800000
## 12  3400    2147 0.6314706        0.5388             overall 0.03000000
## 13  4930    2513 0.5097363        0.5140             overall 0.03000000
## 14  4886    2548 0.5214900        0.5140             overall 0.03000000
## 15   184      79 0.4293478        0.5140             overall 0.07224656
## 16  4928    2638 0.5353084        0.5388             overall 0.03000000
## 17  4880    2642 0.5413934        0.5388             overall 0.03000000
## 18   192     108 0.5625000        0.5388             overall 0.07072541
## 19 10000    5140 0.5140000        0.5140             overall 0.03000000
## 20 10000    5388 0.5388000        0.5388             overall 0.03000000
##       pct_lo    pct_hi di_indicator_ppg di_prop_index
## 1  0.6573333 0.7173333                0     1.3372244
## 2  0.2790097 0.3409903                1     0.6031128
## 3  0.1750000 0.2350000                1     0.3988327
## 4  0.4801731 0.5678269                0     1.0194553
## 5  0.3320000 0.5280000                0     0.8365759
## 6  0.5738235 0.6338235                0     1.1747539
## 7  0.7133333 0.7733333                0     1.3796090
## 8  0.2660097 0.3279903                1     0.5512249
## 9  0.1885000 0.2485000                1     0.4055308
## 10 0.4401731 0.5278269                1     0.8982925
## 11 0.2520000 0.4480000                1     0.6495917
## 12 0.6014706 0.6614706                0     1.1719944
## 13 0.4797363 0.5397363                0     0.9917049
## 14 0.4914900 0.5514900                0     1.0145719
## 15 0.3571013 0.5015944                1     0.8353071
## 16 0.5053084 0.5653084                0     0.9935198
## 17 0.5113934 0.5713934                0     1.0048134
## 18 0.4917746 0.6332254                0     1.0439866
## 19 0.4840000 0.5440000                0     1.0000000
## 20 0.5088000 0.5688000                0     1.0000000
##    di_indicator_prop_index di_80_index_reference_group di_80_index
## 1                        0                       Asian   1.0000000
## 2                        1                       Asian   0.4510184
## 3                        1                       Asian   0.2982541
## 4                        0                       Asian   0.7623666
## 5                        0                       Asian   0.6256062
## 6                        0                       Asian   0.8785017
## 7                        0                       Asian   1.0000000
## 8                        1                       Asian   0.3995516
## 9                        1                       Asian   0.2939462
## 10                       0                       Asian   0.6511211
## 11                       1                       Asian   0.4708520
## 12                       0                       Asian   0.8495120
## 13                       0                        Male   0.9774614
## 14                       0                        Male   1.0000000
## 15                       0                        Male   0.8233098
## 16                       0                       Other   0.9516595
## 17                       0                       Other   0.9624772
## 18                       0                       Other   1.0000000
## 19                       0                       - All   1.0000000
## 20                       0                       - All   1.0000000
##    di_indicator_80_index
## 1                      0
## 2                      1
## 3                      1
## 4                      1
## 5                      1
## 6                      0
## 7                      0
## 8                      1
## 9                      1
## 10                     1
## 11                     1
## 12                     0
## 13                     0
## 14                     0
## 15                     0
## 16                     0
## 17                     0
## 18                     0
## 19                     0
## 20                     0
# Multiple group variables and different reference groups

bind_rows(
  di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups='overall')
  , di_iterate(data=student_equity, success_vars=c('Transfer'), group_vars=c('Ethnicity', 'Gender'), cohort_vars=c('Cohort'), ppg_reference_groups=c('White', 'Male'), include_non_disagg_results=FALSE) # include_non_disagg_results = FALSE: Already have this scenario in Overall run
)
## # A tibble: 38 x 19
##    success_variable cohort_variable cohort disaggregation group     n
##    <chr>            <chr>            <int> <chr>          <chr> <dbl>
##  1 Transfer         Cohort            2017 Ethnicity      Asian  3000
##  2 Transfer         Cohort            2017 Ethnicity      Black  1000
##  3 Transfer         Cohort            2017 Ethnicity      Hisp~  2000
##  4 Transfer         Cohort            2017 Ethnicity      Mult~   500
##  5 Transfer         Cohort            2017 Ethnicity      Nati~   100
##  6 Transfer         Cohort            2017 Ethnicity      White  3400
##  7 Transfer         Cohort            2018 Ethnicity      Asian  3000
##  8 Transfer         Cohort            2018 Ethnicity      Black  1000
##  9 Transfer         Cohort            2018 Ethnicity      Hisp~  2000
## 10 Transfer         Cohort            2018 Ethnicity      Mult~   500
## # ... with 28 more rows, and 13 more variables: success <int>, pct <dbl>,
## #   ppg_reference <dbl>, ppg_reference_group <chr>, moe <dbl>,
## #   pct_lo <dbl>, pct_hi <dbl>, di_indicator_ppg <dbl>,
## #   di_prop_index <dbl>, di_indicator_prop_index <dbl>,
## #   di_80_index_reference_group <chr>, di_80_index <dbl>,
## #   di_indicator_80_index <dbl>

There is a separate vignette that explains how one might leverage di_iterate for rapid dashboard development and deployment.