The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This package contains functions to compute various properties of laboratory or police lineups.
Since the early 1970s eyewitness testimony researchers have recognised the importance of estimating properties such as lineup bias (is the lineup biased against the suspect, leading to a rate of choosing higher than one would expect by chance?), and lineup size (how many reasonable choices are in fact available to the witness? A lineup is supposed to consist of a suspect and a number of additional members, or foils, whom a poor quality witness might mistake for the perpetrator). Lineup measures are descriptive, in the first instance, but since the earliest articles in the literature researchers have recognised the importance of reasoning inferentially about them.
The measures described below were originally proposed by Doob & Kirshenbaum (1973), Wells, Leippe & Ostrom (1979), Malpass (1981), Tredoux (1998, 1999), and Malpass, Tredoux & McQuiston-Surrett (2007). Most of the measures assume that a sample of mock witnesses has been given a description of the perpetrator, and a lineup, and asked to choose someone from the lineup (either as the best match to the description, or the best match to the lineup member the police likely have in mind; see Malpass et al., 2007)
Much of the justification for what follows can be found in Tredoux (1998), and Malpass et al. (2007)
The majority of functions in r4lineups require a numeric vector representing lineup data, whilst other functions require you to pass a list of lineup data. It is advisable to read the documentation to determine the format of data needed by each specific function. Most r4lineups functions will not work with a dataframe, even if the dataframe contains only one column, and thus appears to be in vector like format. If your lineup data is in dataframe format already, you can easily convert it to a vector or a list.
For instance, if we have 10 mock witnesses, and each views a lineup with 6 members, then we could record their choices as a numeric vector, where the elements represent the position of the lineup member chosen by each witness:
lineup_vec <- c(3, 2, 5, 6, 1, 3, 3, 5, 6, 4)
However, it is also possible to represent this set of choices as a one-dimensional table:
lineup_tbl <- table(lineup_vec)
If we had our lineup choices recorded in a data frame, which is what we might end up with if we import the data from a spreadsheet program, then we could easily extract the variable representing choices in the dataframe into a vector.
For instance, if the lineup choices shown above for 10 mock witnesses were part of a larger dataset, and we had recorded the ages and participant number of each participant, then the dataframe might appear as shown below (we create the dataframe below with simulated data here, though)
participant_num <- paste("Partic_",1:10, sep = "")
age <- round(runif(10, 18, 80), 2)
line_df <- data.frame(participant_num, age, lineup_vec)
Show the dataframe:
line_df
Extract the lineup choices, per participant, to a vector: a_lineup_vec <- line_df$linep_vec
Or make a table directly from the variable of choices: a_lineup_table <- table(line_df$lineup_vec)
One important issue is that there is no way of the functions in this package knowing if some of the members of a lineup received no choices. In other words, if there were in fact 7 lineup members in our running example, but member 7 was not chosen by any witnesses, then this would not be evident from the lineup vector, since it records the choice made by each witness, but not the number of lineup members. To deal with this problem, generally the functions in the package require nominal lineup size (the number of physical members of the lineup) to be passed as an argument.
Let’s look at a slightly larger data set:
lineup_1 | lineup_2 | lineup_3 |
---|---|---|
2 | 7 | 1 |
5 | 1 | 3 |
2 | 2 | 3 |
For functions that require data for only one lineup or one lineup pair, you can convert the data from a given lineup to a vector by calling for e.g. lineup_vec <- nortje2012$lineup_1
or alternatively lineup_vec <- nortje2012[[1]]
For functions that require several lineups or several lineup pairs, you can convert the data to a list, e.g., lineup_list <- as.list(nortje2012)
Nominal Size
The notion of ‘Nominal size’ refers to the number of physical members of a lineup. Functions relying on nominal size for accurate calculations require you to manually specify nominal size. If the specified nominal size is not reflected in the dataframe or vector passed to the function, execution of the function is halted with the following error message: “User-declared nominal size does not match observed nominal size. Please check vector of target positions.”
This package contains several sample datasets, listed below:
nortje2012
: A dataframe of lineup choices for 3 independent lineups, from a study conducted by Nortje & Tredoux (2012)line73
: Lineup data from Doob & Kirshenbaum (1973)mockdata
: Lineup data from a study conducted by Nortje & Tredoux (2012)mickwick
: Confidence and accuracy data for lineup choices. Sample data to be passed to ROC function make_roc()
face_sim
) can be found at https://github.com/tmnaylor/malpass_foilsAll sample datasets can be loaded by calling data(sample_data)
. Further details can be found in the documentation for each dataset.
Proportion refers to the proportion of witnesses selecting a particular member of a lineup. This package provides calculations of proportion for individual lineup members (lineup_prop_vec()
), and for all members of a lineup with a subsidiary function (allprop()
).
The calculation of the proportion of mock witnesses choosing the suspect is essentially a measure of lineup bias (see Malpass, 1981; Tredoux, 1998), especially when one tests whether it exceeds the proportion expected by chance guessing alone (1/nominal_size).
This function requires three arguments: a vector of lineup data, the position of the target in the lineup (target_pos), and nominal size (k). The example use of the function here returns the proportion of mock witnesses choosing the target/linuep member occupying position 3 in the lineup, and is 0.14% of the witnesses.
Compute bootstrapped estimates: The original work on lineup bias suggested testing whether the computed proportion = 1/nominal_size using a z test of proportions, or a binomial probability (see Tredoux, 1998), or a confidence interval around the proportion (see Tredoux, 1998), but very few studies actually use enough mock witnesses to strongly warrant this approach, since it assumes approximation to a normal distribution, and the size of the lineup places hard limits that are often not observed by computed confidence intervals. We therefore recommend computing bootstrapped estimates of all lineup measures, and especially the sampling variability around estimates. To calculate a boostrapped lineup proportion for a single lineup member, we pass the lineup_prop_boot()
function as an argument to the boot()
function supplied by the package ‘boot’. We do not work directly with the lineup_prop_boot function, all we need to know to bootstrap is the lineup vector, the position of the target (target_pos), and the number of bootstrap resamples we desire (R).
bootobject <- boot::boot(lineup_vec, lineup_prop_boot, target_pos = 3, R = 1000)
bootobject
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot::boot(data = lineup_vec, statistic = lineup_prop_boot, R = 1000,
target_pos = 3)
Bootstrap Statistics :
original bias std. error
t1* 0.1428571 0.0005714286 0.02997173
cis <- boot::boot.ci(bootobject, conf = 0.95, type = "bca")
cis
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot::boot.ci(boot.out = bootobject, conf = 0.95, type = "bca")
Intervals :
Level BCa
95% ( 0.0827, 0.2030 )
Calculations and Intervals on Original Scale
If you are not familiar with how the boot or the boot.ci functions work, you could consult the documentation for the boot package. However, by using the functions as shown here you will be able to bootstrap without needing to understand them in detail. Briefly, the argument ‘conf’ allows you to specify the significance level, alpha, and ‘type’ specifies the type of confidence interval to be returned. If type = “bca”, the function will return bias-corrected confidence intervals.
Should you want to calculate a proportion for each member of a given lineup, the allprop()
function can be called. This might be the case when there is no pre-specified suspect in the lineup, as some researchers do in laboratory experiments. Instead of manually specifying the target position, a vector indexing all target positions is automatically generated from the argument ‘k’, within the function itself.
The function then returns proportion of mock witnesses selecting each lineup member.
Compute bootstrapped estimates:
We can generate bootstrap estimates for all the computed proportions, too, with lineup_boot_allprop()
.
This function takes a vector of lineup data, a vector indexing target positions, and nominal size.
lineuprops.ci <- lineup_boot_allprop(lineup_vec, target_pos, k = 8)
Warning: package 'bindrcpp' was built under R version 3.5.1
lineuprops.ci
ci_low ci_high
1 0.083 0.203
2 0.075 0.188
3 0.083 0.202
4 0.015 0.083
5 0.203 0.361
6 0.015 0.083
7 0.113 0.233
8 0.008 0.060
The object lineuprops.ci
is a dataframe of 2 columns and k rows, in which columns 1 and 2 contain lower and upper confidence intervals, respectively, and the number of rows = nominal size.
Note: k = nominal size, and must always be explicitly declared (for more info, see the Data Format section, above).
We defined nominal size above as the number of physical members of a lineup, but it is clear from experience that not all lineup members are good foils, and some might not end up being chosen at all. They are not good tests of witness memory, in short. Wells et al. (1979) proposed the notion of ‘functional size’ as a measure of the number of plausible lineup members, and defined it as the inverse of the proportion of mock witnesses choosing the suspect.
Functional size = \(\frac{D}{N}\)
You can compute functional size by inverting the result of the lineup_prop_vec()
function, or by calling func_size
, but we suggest you use the function func_size_report()
which provides more detail, including bootstrapped confidence intervals.
To compute functional size with confidence intervals, call func_size_report()
, and pass a vector of lineup data, as well as one scalar indicating target position, and another indicating nominal size:
The measure of functional size proposed by Wells et al. (1979) does not take the choosing rates of foils other than the suspect into account, and Malpass (1981) thus proposed a measure of the number of plausible lineup foils that takes all the data from a lineup table into account. He called this measure ‘effective size’.
Malpass’ Effective size A = \(k_{a}-\sum_{i=1}^{k_{a}}\frac{|o_{i}-e_{a}|}{2e_{a}}\)
where \(o_{i}\) is the (observed) number of mock witnesses who choose lineup member i; \(e_{a}\) is the adjusted nominal chance expectation [N(l/\(k_{a}\))], and \(k_{a}\) is the adjusted nominal number of alternatives in the lineup (original number-number of null foils)
Malpass’ measure of effective size poses problems for statistical inference based on distribution theory, though, as argued by Tredoux (1998). He instead proposed an alternate measure of effective size, known as E’, derived from earlier work by Agresti and Agresti (1978).
This package allows you to calculate effective size in several different ways.
Malpass’s original (1981) & adjusted effective size (see: Tredoux, 1998)
Malpass’ original formula for effective size proposed a denominator computed by entering the number of foils that attracted non-zero choice frequencies. Tredoux (1998) argued against this practice, pointing out that zero frequencies could arise from random guessing with appreciable frequency. He suggested a minor amendment of Malpass’ original formula, as follows.
Malpass’ Effective size B = \(k-\sum_{i=1}^{k}\frac{|o_{i}-e|}{2e}\)
Both Malpass’ measures of effective size can be computed by calling the esize_m()
function and passing a table of lineup data. If ‘both = FALSE’ is passed as an argument, only Malpass’s adjusted formula for effective size is used. If both = TRUE, both Malpass’s original and adjusted calculations of effective size are returned. You must also specify nominal size.
esize_m(table(lineup_vec), k = 8, both = TRUE)
Effective size (Malpass, 1981) = 5.962406
Effective size (Malpass, 1981,
adj Tredoux, 1998) = 5.962406
[1] 5.962406
Compute bootstrapped estimates (Malpass’s adjusted):
Tredoux (1998) argued that it is better to use the alternate measure of E’, but this was in a time when bootrapping methods for statistical inference were not widely known, and we therefore include a function that allows you to compute bootstrap estimates of the measures, with confidence intervals.
To do this, one must first generate a bootstrap sample using the gen_boot_samples()
function, the gen_esize_m function()
, and the gen_esize_m_ci()
function. Their usage is shown below.
#Create a dataframe of bootstrapped lineup data
bootdata <- gen_boot_samples(lineup_vec, 1000)
#Calculate effective size for each lineup in bootdata
#Pass bootstrap df to function
#Nominal size is again declared by user
lineupsizes <- gen_esize_m(bootdata, k = 8)
#Pass vector of boostrapped effective sizes to gen_esize_m_ci
#to calculate lower and upper CIs with desired level of alpha
gen_esize_m_ci(lineupsizes, perc = .025)
2.5%
5.345865
gen_esize_m_ci(lineupsizes, perc = .975)
97.5%
6.323308
Compute bootstrap descriptive statistics (Malpass’s adjusted effective size):
This function allows you to calculate descriptive statistics for a boostrapped vector/df of effective sizes (see object lineupsizes
, above). We pass this to gen_boot_propmean_se()
. Descriptive statistics are reported in some detail.
Tredoux (1998) proposed a measure of effective size that is amenable to statistical inference using distribution theory.
E’=\(\frac{1}{1-I}\), where I=\(1-\sum_{1=1}^k(\frac{O_i}{N})^2\)
and where k, N, and 0_i_ are all as defined earlier.
To compute Tredoux’s measure of effective size, E’, use the function esize_T. We can compute confidence intervals around the measure of E’ using normal distribution theory, and using bootstrap intervals, with the functions esize_T_boot()
from r4lineups, as well asboot()
& boot.ci()
from the boot package. Usage is shown below.
#Compute effective size from a single vector of lineup choices
esize_T(lineup_table)
[1] 5.693273
#Compute bootstrapped effective size
esize_boot <- boot::boot(lineup_table, esize_T_boot, R = 1000)
#View boot object
esize_boot
ORDINARY NONPARAMETRIC BOOTSTRAP
Call:
boot::boot(data = lineup_table, statistic = esize_T_boot, R = 1000)
Bootstrap Statistics :
original bias std. error
t1* 5.693273 0.2036739 0.7642065
#Get confidence intervals
esize_boot.ci <- boot::boot.ci(esize_boot)
Warning in boot::boot.ci(esize_boot): bootstrap variances needed for
studentized intervals
esize_boot.ci
BOOTSTRAP CONFIDENCE INTERVAL CALCULATIONS
Based on 1000 bootstrap replicates
CALL :
boot::boot.ci(boot.out = esize_boot)
Intervals :
Level Normal Basic
95% ( 3.992, 6.987 ) ( 3.971, 6.972 )
Level Percentile BCa
95% ( 4.415, 7.416 ) ( 4.131, 7.024 )
Calculations and Intervals on Original Scale
Some BCa intervals may be unstable
Effective Size Per Foils
Malpass (1981) proposed an additional method for computing effective size, based on the number of foils who are chosen at a rate nearly equal to chance expectation. We propose that ‘nearly equal’ could mean that chance expectation falls within a confidence interval constructed around the proportion of mock witnesses choosing the lineup member.
To calculate effective size by counting the number of foils who fall within the CI for chance guessing, we pass a vector of lineup data, a vector of target positions and nominal size. This function returns effective size from a set of bootstrapped data. Default alpha is 0.05, but this can be adjusted as desired.
Applying this function to our data, we see that effective size for this lineup is 4.
To test if the effective sizes of two independent lineups are significantly different from one another, we can take the difference between the effective sizes and calculate confidence intervals. This function, effsize_compare()
, requires a dataframe containing lineup data for two independent lineups. Here, we create a new dataframe containing two independent lineups from nortje2012
.
#Get data
linedf <- nortje2012[1:2]
#Compare effective size
effsize_compare(linedf)
The two Effective sizes are 5.693273 4.967425
If the interval includes 0, ns at p = .05
Confidence intervals of difference [95%]
Normal Theory -0.236 1.75
Bootstrap: percentile (R = 1000) -0.3 1.71
Bootstrap: bias-corrected (R = 1000) -0.217 1.752
The effective size for lineup_1 is therefore not significantly different from the effective size for lineup_2.
Wells and Lindsay (1980) proposed the use of a ‘diagnosticity ratio’ (essentially a risk ratio) computed from witness choices in target present lineup and target absent lineup (i.e. the ratio of those). Such a ratio could tell us the trade off between hits and false positives, and be used to evaluate modifications to lineup procedures, for instance.
We can calculate diagnosticity ratio for a set of identification parades using either diag_ratio_W()
for Wells & Lindsay’s (1980) diagnosticity ratio formula, or diag_ratio_T()
for Wells’ adjusted diagnosticity ratio (see: Tredoux, 1998).
Both functions require the same set of arguments: a vector of lineup choices in which the target was present (TP), a vector of lineup choices in which the target was absent (TA), as well as target positions and nominal size for TP and TA lineups, respectively. These last arguments are specified explicitly.
#Target present & target absent lineup data for 1 lineup pair
TP_lineup <- nortje2012$lineup_1
TA_lineup <- nortje2012$lineup_3
#Compute diagnosticity ratio
diag_ratio_W(TP_lineup, TA_lineup, pos_pres = 3, pos_abs = 7, k1 =8, k2 = 8)
[1] 6.333333
diag_ratio_T(TP_lineup, TA_lineup, pos_pres = 3, pos_abs = 7, k1 =8, k2 = 8)
[1] 5.571429
We can also calculate the variance of the diagnosticity ratio, which is useful for applying some inferential methods:
Note: var_diag_ratio()
relies on Wells & Lindsay’s original formula for computing diagnosticity ratio. See Tredoux 1998 for further details.
We can compare diagnosticity ratios to other diagnosticity ratios, as detailed in Tredoux (1998). He showed a method for computing the equivalence of k diagnosticity ratios, in particular.
The functions included in this package are based on an approach to calculating homogeneity for k independent diagnosticity ratios, as detailed in Tredoux (1998).
The homogeneity function requires a dataframe containing the specific parameters used in the estimation of diagnosticity ratio homogeneity. For each lineup pair (TP and TA), we need to calculate the natural log of the diagnosticity ratio (referred to here as lnd), its variance (var_lnd), and a weight for each ratio that is equal to the inverse of its variance (wi). This is used to calculate a pooled estimator (i.e., the mean of the set of diagnosticity ratios). From this, we compute a chi-square deviate with k-1 degrees of freedom, and calculate its significance.
All these calculations are built into the homog_diag function, and so do not need to be calculated separately. You will need to pass a dataframe of lnd, var_lnd and wi to homog_diag, and all subsequent calculations are performed on this dataframe.
To get the dataframe required by homog_diag()
, we follow the following steps:
For each lineup pair, load data for TP and TA lineups into separate vectors. Then, make a list containing all vectors for TP lineups, and another containing the TA lineups. The positions of each lineup pair should correspond in each list.
#Target present data:
TP_lineup1 <- round(runif(100,1,6))
TP_lineup2 <- round(runif(70,1,5))
TP_lineup3 <- round(runif(20,1,4))
lineup_pres_list <- list(TP_lineup1, TP_lineup2, TP_lineup3)
#Target absent data:
TA_lineup1 <- round(runif(100,1,6))
TA_lineup2 <- round(runif(70,1,5))
TA_lineup3 <- round(runif(20,1,4))
lineup_abs_list <- list(TA_lineup1, TA_lineup2, TA_lineup3)
Next, the function requires a list of target positions for each lineup pair. For each set of TP/TA data in the TP/TA lists, there should be a corresponding target position list.
lineup1_pos <- c(1, 2, 3, 4, 5, 6)
lineup2_pos <- c(1, 2, 3, 4, 5)
lineup3_pos <- c(1, 2, 3, 4)
pos_list <- list(lineup1_pos, lineup2_pos, lineup3_pos)
To ensure the data have been coded accurately, we then specify nominal size for each lineup pair. The order in which nominal size for each lineup pair is listed must also correspond with the positions of each respective lineup in the lineup lists (i.e., if lineup 1 has k = 6, then the first element of vector ‘k’ = 6). Following the example above, we create our nominal size vector k by calling k <- c(6, 5, 4)
Finally, we pass the above arguments/data to homog_diag()
, which assesses the homogeneity of k independent diagnosticity ratios.
This function returns bootstrapped estimates of the homogeneity of k diagnosticity ratios, and therefore takes the same arguments as homog_diag()
(which provides normal theory estimates).
This function does not require you to specify a list of target positions (this is generated from the data).
Thus, we follow steps 1 and 3, outlined above, before calling homog_diag_boot()
:
Note: R = number of bootstrap replications
The make_roc()
function allows you to compute and plot an ROC curve for data from an eyewitness experiment, where accuracy is recorded for target present and target absent lineups. We follow the ideas suggested by Mickes, Wixted, and others (see references). This function is experimental, and we don’t offer much control over the output.
confidence | accuracy |
---|---|
1 | 1 |
1 | 1 |
1 | 1 |
1 | 1 |
2 | 1 |
2 | 1 |
2 | 1 |
3 | 1 |
Following some ideas suggested by Tredoux (2002), this function computes the degree to which each face in a set of faces loads onto a common factor computed from numeric representations of the faces.
The images used in this example can be found at https://github.com/tmnaylor/malpass_foils, and can be loaded by calling image_read
from the magick
package:
print(foil1)
format width height colorspace matte filesize density
1 JPEG 138 154 sRGB FALSE 33506 72x72
Without passing any arguments, call face_sim()
:
A dialog box will appear, through which you may access your OS’s file explorer. Navigate to the folder in which the images are stored, select all, and open.
Note: Each image should be of one face only. Each face should be standardised to have the same pupil locations in xy space (this will mean repositioning and resizing each face image).
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.