Our Experiment: Each eyetrackingR vignette uses the eyetrackingR package to analyze real data from a simple 2-alternative forced choice (2AFC) word recognition task administered to 19- and 24-month-olds. On each trial, infants were shown a picture of an animate object (e.g., a horse) and an inanimate object (e.g., a spoon). After inspecting the images, they disappeared and they heard a label referring to one of them (e.g., “The horse is nearby!”). Finally, the objects re-appeared on the screen and they were prompted to look at the target (e.g., “Look at the horse!”).
In this vignette, we want to ascertain when a predictor had a significant effect during a trial. Analyses that aggregate over the trial window tell us whether an effect was significant, growth curve analyses tell us the trajectory of our effect over the course of the trial, and onset-contingent analyses can tell you reaction times for certain experimental designs. But none of these approaches allow you to ask: What is the onset of some predictor’s effect, and how long does the effect last? eyetrackingR includes two types of analyses for answering these questions, both of which we cover here.
Before performing this analysis, we’ll need to prepare and clean our dataset. Here we will to do this quickly and with few notes but, for more information, see the vignette on preparing your data.
set.seed(42)
library("Matrix")
library("lme4")
library("ggplot2")
library("eyetrackingR")
data("word_recognition")
data <- make_eyetrackingr_data(word_recognition,
participant_column = "ParticipantName",
trial_column = "Trial",
time_column = "TimeFromTrialOnset",
trackloss_column = "TrackLoss",
aoi_columns = c('Animate','Inanimate'),
treat_non_aoi_looks_as_missing = TRUE
)
# subset to response window post word-onset
response_window <- subset_by_window(data,
window_start_time = 15500,
window_end_time = 21000,
rezero = FALSE)
## Avg. window length in new data will be 5500
# analyze amount of trackloss by subjects and trials
(trackloss <- trackloss_analysis(data = response_window))
# remove trials with > 25% of trackloss
response_window_clean <- clean_by_trackloss(data = response_window,
trial_prop_thresh = .25)
## Performing Trackloss Analysis...
## Will exclude trials whose trackloss proportion is greater than : 0.25
## ...removed 33 trials.
# create Target condition column
response_window_clean$Target <- as.factor( ifelse(test = grepl('(Spoon|Bottle)', response_window_clean$Trial),
yes = 'Inanimate',
no = 'Animate') )
Our first approach is to look for runs of significant differences between our conditions after smoothing the data in a series of time bins (similar to Wendt et al., 2014). This involves:
smooth.spline()
, loess()
, or no smoother)This is a useful technique for estimating the timepoints of divergence between two conditions, while the smoothing helps remove minor deviations that might disrupt what would otherwise be considered a single divergent period. This can be especially helpful in infant data, which can be extremely noisy. Note that this approach does not explicitly control for Type-I error rates (i.e., it’s not a replacement for something like Bonferroni correction), and it can only deal with two-level factors.
This method returns a list of divergences between your two conditions based on time windows in which the 95% confidence intervals did not include 0 (i.e., p < .05).
To begin, we need to use make_time_sequence_data
to generate a time-binned dataframe. The bootstrap analysis we’ll be doing requires that we summarize our data–traditionally, we’d want a by-participants summary (but in some circumstances, you might want by-items as well).
response_time <- make_time_sequence_data(response_window_clean,
time_bin_size = 100,
predictor_columns = c("Target"),
aois = "Animate",
summarize_by = "ParticipantName"
)
# visualize timecourse
plot(response_time, predictor_column = "Target") +
theme_light() +
coord_cartesian(ylim = c(0,1))
We can then use make_boot_splines_data
to resample (samples
times) from this dataset and fit a smoother to each sample:
bootstrapped_familiar <- make_boot_splines_data(response_time,
predictor_column = 'Target',
within_subj = TRUE,
samples = 1000,
alpha = .05,
smoother = "smooth.spline")
We can then plot this curve. Because this is a within-subjects design, we get a single curve (and CI) corresponding to the difference between conditions. Within between-subjects designs, you’ll get two curves (and CIs) corresponding to the estimate of each condition.
plot(bootstrapped_familiar)
## Plotting within-subjects differences...
Finally, we can look at each timepoint to see whether the CIs include 0 and look for runs of significant timepoints – i.e., divergences.
bootstrap_analysis_familiar <- analyze_boot_splines(bootstrapped_familiar)
summary(bootstrap_analysis_familiar)
## Divergences:
## 1: 15900 - 21000
This analysis suggests that the effect is significant for virtually the entire time-window, starting as early as 15900ms.
Note that, for designs such as this one, it’s often good to examine the effect in terms of the random-structure that comes with items, not just participants. While heirarchical approaches helped us do this in previous vignettes, these approaches aren’t applicable to the boot-splines analysis. What we can do instead is repeat the boot-splines analysis, but this time collapsing within items instead of subjects. Ideally, both analyses should give very similar results.
EyetrackingR makes this easy. We simply make a new time_sequence
dataset, this time summarized by item. When we compute boot-splines this time, we want to set within_subj
to FALSE– this argument now essentially means within_items
, and this is not true of this dataset (each item was either animate or inanimate).
response_time_item <- make_time_sequence_data(response_window_clean,
time_bin_size = 100,
predictor_columns = c("Target"),
aois = "Animate",
summarize_by = "Trial") # <---"Trial" corresponds to both item and trial
bootstrapped_familiar_item <- make_boot_splines_data(response_time_item,
predictor_column = 'Target',
within_subj = FALSE,
samples = 1000,
alpha = .05,
smoother = "smooth.spline")
plot(bootstrapped_familiar_item)
bootstrap_analysis_familiar_item <- analyze_boot_splines(bootstrapped_familiar_item)
plot(bootstrap_analysis_familiar_item)
summary(bootstrap_analysis_familiar_item)
## Divergences:
## 1: 15900 - 21000
Our second approach is to perform a different type of bootstrapping analyses, referred to as a cluster-based permutation analysis (Maris & Oostenveld, 2007). This analysis takes a summed statistic for each cluster of time bins that pass some level of significance, and compares each to the “null” distribution of sum statistics (obtained by bootstrap resampling data within the largest of the clusters).
This type of analysis should often give similar results to the above, but it has two main advanatges.
t.test
, wilcox.test
, lm
, and lmer
), so that continuous predictors, covariates, etc. can also be included in the model being tested.Here’s the procedure eyetrackingR implements under the hood:
To perform this procedure using eyetrackingR, we’ll need to make a time_sequence_data
dataframe. In this case, we already made above (response_window
).
Next, we’ll set the threshold for the t-statistic that will be considered a divergence. This can be a source of misconceptions. The size of the initial threshold you set doesn’t matter too much, but you should set it in a principled way (e.g., don’t run the cluster analysis, examine the result, then decide you want to use a different threshold). Here, we’ll just set it based on the t-distribution: ~2.06 corresponds to the usual statistic we would use in a t-test for this sample.
num_sub = length(unique((response_window_clean$ParticipantName)))
threshold_t = qt(p = 1 - .05/2,
df = num_sub-1) # pick threshold t based on alpha = .05 two tailed
levels(response_window_clean$Target)
## [1] "Animate" "Inanimate"
We can then look for initial clusters:
df_timeclust <- make_time_cluster_data(response_time,
test= "t.test",
paired = TRUE,
predictor_column = "Target",
threshold = threshold_t)
## Computing t.test for each time bin...
plot(df_timeclust) +
ylab("T-Statistic")
summary(df_timeclust)
## Test Type: t.test
## Predictor: Target
## Formula: Prop ~ Target
## Summary of Clusters ======
## Cluster Direction SumStatistic StartTime EndTime
## 1 1 Positive 132.29900 16100 19300
## 2 2 Positive 42.31067 19400 20800
The above tells us there are two potential clusters. As described in the procedure above, eyetrackingR next bootstraps a “null” distribution, which can be visualized:
clust_analysis <- analyze_time_clusters(df_timeclust, within_subj = TRUE, paired=TRUE,
samples=100) # in practice, you should use a lot more
plot(clust_analysis)
How can we interpret these results?
summary(clust_analysis)
## Test Type: t.test
## Predictor: Target
## Formula: Prop ~ Target
## Null Distribution ======
## Mean: 1.6325
## 2.5%: -20.3665
## 97.5%: 39.5539
## Summary of Clusters ======
## Cluster Direction SumStatistic StartTime EndTime Probability
## 1 1 Positive 132.29900 16100 19300 0.00
## 2 2 Positive 42.31067 19400 20800 0.03
The probabilities listed above tell us the probability of seeing the effect of each cluster (or bigger) by chance. So we can report these as p-values with alpha=.05 (two-tailed). Each cluster that passes criterion is a cluster whose stretch of time exhibited a reliable effect.
The two methods described here have several advantages and disadvantages.
The boot-splines method is especially well-suited for simple designs with large but noisy effects. The splines help smooth over noise, and with such large effects, sensitivity-sacrificing methods that control for the false-alarm rate are less of a concern.
The time-cluster is better-suited for subtle but less noisy effects. This method is extremely sensitive, while setting an upper-bound on the family-wise false-alarm rate (% of spurious experiments, regardless of number-of-time-bins). This method also allows for a wider range of experimental designs and predictors that simply cannot be captured by the boot-splines method (e.g, continuous predictors, covariates, or anything else available to hierarchical models). The main disadvantage of this method is that it does not gracefully deal with noisy data. Because we simply group time-bins by adjacency, a single noisy time-bin (perhaps one with missing data) can break up a time-cluster that otherwise reliably exhibits an effect. In simple experimental designs with noisy data (many infant-studies), the boot-splines method can avoid this pitfall.
Maris, E., Oostenveld, R., (2007). Nonparametric statistical testing of EEG- and MEG-data. Journal of Neuroscience Methods 164 (1), 177–190.
Wendt, D., Brand, T., & Kollmeier, B. (2014). An Eye-Tracking Paradigm for Analyzing the Processing Time of Sentences with Different Linguistic Complexities. PLoS ONE, 9(6), e100186. http://doi.org/10.1371/journal.pone.0100186.t003