Summary

The gosset package provides a set of tools and methods to implement a workflow to analyse experimental agriculture data, from data synthesis to model selection and visualisation. The package is named after W.S. Gosset aka ‘Student’, a pioneer of modern statistics in small sample experimental design and analysis.

In this example I show one of the possible workflows to assess trait prioritization and crop performance using decentralized on-farm data generated with the tricot approach1. I use the breadwheat data from gosset.

Trait prioritization

First I load the packages and data.

library("PlackettLuce")
library("gosset")
library("climatrends")
library("nasapower")

data("breadwheat", package = "gosset")

head(breadwheat)

dat <- breadwheat

Then an analysis comparing trait prioritization. Use all the traits but exclude Overall

traits <- c("yield",
            "grainquality",
            "germination")

# names of colunms with varieties 
items <- paste0("variety_", letters[1:3])

# name of varieties 
itemnames <- sort(unique(unlist(dat[items])))

Then build the rankings using the function rank_tricot() which parse the data from the data.frame into an object of class rankings. I also do a validation inside the loop to remove observations where the best and worst are equal and remove NAs. From the list of ranking objects I fit a Plackett-Luce model using PlackettLuce()2. This model estimates the probability of one variety outperforming all the others (worth) in the trait based on the Luce’s axiom3.

# build the rankings and put into a list
pldat <- list()

# run over the rankings of each trait
for(i in seq_along(traits)){
  
  # select the item names and rankings for the trait in the iteration i
  d_i <- dat[, c(items, paste0(traits[i], c("_best", "_worst")))]
  # not observed as NA
  d_i[d_i == "Not observed"] <- NA
  # check for ties in the response pos == neg
  keep <- d_i[,4] != d_i[,5] & !is.na(d_i[,4]) & !is.na(d_i[,5])
  # keep only the TRUE values out of the later validation
  d_i <- d_i[keep, ]
  
  names(d_i)[4:5] <- c("best", "worst")
  
  R_i <- rank_tricot(d_i, items = 1:3, input = c("best", "worst"))
  
  pldat[[i]] <- R_i
  
}

pldat

# fit the PlackettLuce model 
mod <- lapply(pldat, PlackettLuce)

The function worth_map() is a visualization tool to help in identifying variety performance based on different traits. It plots the log-worth (worth in log scale) of each trait indicating the variety performance. Positive values indicate a superior performance for a given trait, while negative values a under performance.

worth_map(mod, labels = traits)

Crop performance with environmental data

Here I use the overall performance of varieties combined with environmental data to to assess crop variety performance. This approach was also used by van Etten et al. (2019)4. I use the function temperature() from the climatrends5 package to compute temperature indices for the first 80 days after planting in each data point. The climate data used is NASA POWER data supported by the package nasapower6.

temp <- temperature(dat[, c("lon","lat")], 
                    day.one = dat[, "planting_date"],
                    span = 80)

Then I build the rankings using the function rank_tricot() but now as a grouped_rankings by adding the argument group = TRUE. This enables the ranking to be linked to covariates, in that case the temperature indices. Then with the function pltree() I fit a Plackett-Luce tree using the maximum night temperature (maxNT) and maximum day temperature (maxDT).

R <- rank_tricot(dat, 
                 items = c("variety_a","variety_b","variety_c"), 
                 input = c("overall_best","overall_worst"),
                 group = TRUE)

pld <- cbind(R, temp)

pl <- pltree(R ~ maxNT + maxDT, 
             alpha = 0.1,
             gamma = TRUE,
             data = pld)

The sets of functions below help in the visualization and identification of varieties with better performance in each node of the tree.

plot(pl)

node_rules(pl)

top_items(pl, top = 5)

worst_regret(pl)

worth_map(pl)

References

1.
van Etten, J., Manners, R., Steinke, J., Matthus, E. & de Sousa, K. The tricot approach: Guide for large-scale participatory experiments. (Alliance of Bioversity International; CIAT; Alliance Bioversity International; CIAT, 2020).
2.
Turner, H. L., van Etten, J., Firth, D. & Kosmidis, I. Modelling rankings in R: the PlackettLuce package. Computational Statistics 2020, 1027–1057 (2020).
3.
Luce, R. D. Individual Choice Behavior. 153 (Courier Corporation, 1959).
4.
van Etten, J., de Sousa, K., Aguilar, A., Barrios, M., et al. Crop variety management for climate adaptation supported by citizen science. Proceedings of the National Academy of Sciences 116, 4194–4199 (2019).
5.
de Sousa, K., van Etten, J. & Solberg, S. Ø. Climatrends: Climate variability indices for ecological modelling. (2020).
6.
Sparks, A. H. nasapower: A NASA POWER Global Meteorology, Surface Solar Energy and Climatology Data Client for R. Journal of Open Source Software 3, 1035 (2018).