The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
flag_resp()
to create and compare flagging
strategiesOne use-case for response quality indicators is to use them to flag
responses which potentially are of low quality. resquin
provides the function flag_resp()
to create a data frame of
booleans (T
and F
) according to user-defined
cut-off values on response quality indicators. If a respondent receives
a T
value, they are flagged as suspicious. If they receive
F
value, they are deemed unsuspicious.
The strength of flag_resp()
lies in its ability to
quickly create and compare multiple flagging strategies, as the
following example illustrates:
Suppose we use data on response styles to decide whether respondents
are low-quality responders on the 15 item nep
scale. We can
use resp_styles()
to calculate response style indices per
respondent.
library(resquin)
nep_resp_styles <- resp_styles(
x = nep,
scale_min = 1, # minimum response option
scale_max = 5, # maximum response option
min_valid_responses = 1) # default, excludes respondents with any missing value
summary(nep_resp_styles)
#>
#> ── Averages of response quality indicators
#> MRS ARS DRS ERS NERS
#> 0.16 0.55 0.30 0.24 0.76
#>
#> ── Quantiles of response quality indicators
#> # A tibble: 5 × 6
#> quantiles MRS ARS DRS ERS NERS
#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 0% 0 0 0 0 0
#> 2 25% 0.07 0.47 0.2 0.07 0.6
#> 3 50% 0.13 0.53 0.33 0.2 0.8
#> 4 75% 0.2 0.6 0.4 0.4 0.93
#> 5 100% 0.93 1 0.73 1 1
In the first example, we will consider the acquiescence response
style (ARS). ARS represents the tendency of respondents to agree to
questions regardless of their content. Since the nep
scale
includes positively and negatively keyed items, we can expect that
higher ARS values indeed correspond to this behavior: Respondents who
are more concerned about nature should choose higher response options on
the positively keyed items and more negative responses on the negatively
keyed items. Just choosing all high response options presents a
substantively inconsistent response behavior, potentially caused by
acquiescence.
A first idea could be to flag respondents which have more than 80% responses in the ARS category.
first_flagging <- flag_resp(nep_resp_styles,
ARS > 0.8)
summary(first_flagging)
#>
#> ── Number of respondents flagged (Total N: 1222)
#> ARS > 0.8
#> 33
We can see that 33 respondents are flagged as suspicious, as their ARS score is above 0.8.
In a second step, we might also be interested in flagging respondents
who choose the same response option repeatedly. We can use the
resp_patterns()
to compute the longest string length
indicator. This indicator shows the longest string of repeated response
options. We will flag respondents which have a longest string length of
8 or more. We keep the ARS flagging strategy in place to compare it to
the new one.
nep_resp_patterns <- resp_patterns(nep)
nep_resp_patterns_resp_styles <- cbind(nep_resp_styles,nep_resp_patterns[,-1])
second_flagging <- flag_resp(nep_resp_patterns_resp_styles,
ARS > 0.8,
longest_string_length >= 8)
summary(second_flagging)
#>
#> ── Number of respondents flagged (Total N: 1222)
#> ARS > 0.8 longest_string_length >= 8
#> 33 19
#>
#> ── Agreement between flagging strategies
#>
#>
#> Flag longest_string_length >= 8 ARS > 0.8
#> --------------------------- --------------------------- ----------
#> longest_string_length >= 8 19
#> ARS > 0.8 9 33
We can see that 19 respondents have a longest string length of larger or equal to 8. The output also contains an agreement matrix between the flagging strategies. In the second row of the first column, we can see that the two flagging strategies agree on 9 flagged respondents. Together, both strategies would flag 33 + 19 - 9 = 43 respondents of 1222.
It is also possible to join mutliple flagging expressions with an
&
or |
operator.
flag_resp(nep_resp_patterns_resp_styles,
ARS > 0.8,
longest_string_length >= 8,
ARS > 0.8 | longest_string_length >= 8) |>
summary()
#>
#> ── Number of respondents flagged (Total N: 1222)
#> ARS > 0.8 longest_string_length >= 8
#> 33 19
#> ARS > 0.8 | longest_string_length >= 8
#> 43
#>
#> ── Agreement between flagging strategies
#>
#>
#> Flag ARS > 0.8 | longest_string_length >= 8 longest_string_length >= 8 ARS > 0.8
#> --------------------------------------- --------------------------------------- --------------------------- ----------
#> ARS > 0.8 | longest_string_length >= 8 43
#> longest_string_length >= 8 19 19
#> ARS > 0.8 33 9 33
We can use any vector of logical (i.e. T
and
F
) values with the same number of rows as the
nep
data frame and compare them with the values provided by
resquin
. In the following example we create a random vector
of boolean values and add it to the data frame from the last
example.
random_vector <- sample(c(F,T),1000,replace = T)
random_vector[is.na(nep_resp_styles$ARS)] <- NA # Add missing data as in the other data frames
# example three contains response indicator values per respondent
external_indicator_data <- cbind(
nep_resp_patterns_resp_styles,
new_indicator = random_vector)
flag_resp(external_indicator_data,
ARS > 0.8,
longest_string_length >= 8,
new_indicator == T) |>
summary()
#>
#> ── Number of respondents flagged (Total N: 1222)
#> ARS > 0.8 longest_string_length >= 8
#> 33 19
#> new_indicator == T
#> 374
#>
#> ── Agreement between flagging strategies
#>
#>
#> Flag new_indicator == T longest_string_length >= 8 ARS > 0.8
#> --------------------------- ------------------- --------------------------- ----------
#> new_indicator == T 374
#> longest_string_length >= 8 9 19
#> ARS > 0.8 16 9 33
The new indicator new_indicator
now is included in the
output of the summary function and can be compared with the other
indicators.
The output of flag_resp()
can be used to filter out the
flagged respondents. The output of flag_resp()
is just a
collection of logicals:
flag_df <- flag_resp(
nep_resp_patterns_resp_styles,
ARS > 0.8,
longest_string_length >= 8,
ARS > 0.8 | longest_string_length >= 8)
flag_df
#> # A data frame: 1,222 × 4
#> id `ARS > 0.8` `longest_string_length >= 8` ARS > 0.8 | longest_string_l…¹
#> <int> <lgl> <lgl> <lgl>
#> 1 1 FALSE FALSE FALSE
#> 2 2 FALSE FALSE FALSE
#> 3 3 FALSE FALSE FALSE
#> 4 4 FALSE FALSE FALSE
#> 5 5 FALSE FALSE FALSE
#> 6 6 NA NA NA
#> 7 7 FALSE FALSE FALSE
#> 8 8 FALSE FALSE FALSE
#> 9 9 FALSE FALSE FALSE
#> 10 10 FALSE FALSE FALSE
#> # ℹ 1,212 more rows
#> # ℹ abbreviated name: ¹`ARS > 0.8 | longest_string_length >= 8`
We can use these to filter respondents from the original
nep
dataset. We can exclude the flagged respondent.
# Exclude the 33 flagged respondents with ARS > 0.8
nep[!flag_df$`ARS > 0.8`,] |>
na.omit() #exclude respondents with missing values
#> # A tibble: 904 × 15
#> bczd005a bczd006a bczd007a bczd008a bczd009a bczd010a bczd011a bczd012a
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 4 5 2 4 4 5 2
#> 2 5 2 4 2 5 2 4 1
#> 3 4 2 2 2 4 2 4 2
#> 4 2 2 4 4 3 4 3 2
#> 5 2 2 5 5 4 4 5 4
#> 6 5 2 5 2 5 4 4 2
#> 7 5 2 5 3 4 4 5 1
#> 8 5 2 4 4 4 3 5 2
#> 9 1 3 5 5 4 5 5 1
#> 10 4 2 4 2 5 3 2 2
#> # ℹ 894 more rows
#> # ℹ 7 more variables: bczd013a <dbl>, bczd014a <dbl>, bczd015a <dbl>,
#> # bczd016a <dbl>, bczd017a <dbl>, bczd018a <dbl>, bczd019a <dbl>
Alternatively we can filter out the flagged respondent.
# Extract only the 33 flagged respondents with ARS 0.8
nep[flag_df$`ARS > 0.8`,] |>
na.omit()
#> # A tibble: 33 × 15
#> bczd005a bczd006a bczd007a bczd008a bczd009a bczd010a bczd011a bczd012a
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 4 3 4 4 4 4 4 4
#> 2 4 4 5 5 4 4 5 3
#> 3 3 4 4 4 4 4 4 4
#> 4 4 4 4 4 4 4 4 4
#> 5 4 4 4 5 4 4 5 4
#> 6 4 4 5 3 4 4 4 4
#> 7 4 4 5 4 5 4 5 4
#> 8 4 4 3 5 4 4 4 4
#> 9 4 2 5 4 5 4 5 4
#> 10 4 4 4 4 4 4 4 4
#> # ℹ 23 more rows
#> # ℹ 7 more variables: bczd013a <dbl>, bczd014a <dbl>, bczd015a <dbl>,
#> # bczd016a <dbl>, bczd017a <dbl>, bczd018a <dbl>, bczd019a <dbl>
Notice that you can also use the id
column in the
flag_df
to join the flag_df
to your original
data.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.