This example shows how to evaluate prediction accuracy with multiple environments and multiple traits using a random partition as cross validation. The WheatToy data set available in the BMTME
package and next is loaded using the data()
function as shown below:
With this command, you load the following 2 objects into the R environment:
genoWheatToy
: Genomic matrix of the dataset.phenoWheatToy
: Phenotypic data from the dataset.Since the data set includes multiple environments and multiple traits, the BMORS model could be used, also, for its implementation we will see three different designs for the linear predictor, Using only the (1) lines effect, (2) Environment and lines effect and (3) Environment, lines effect and the interaction of the environment with the lines effect. Remember that the first step that we need do, is order the dataset as follows:
To use only the effect of lines in the linear predictor, we will first occupy the matrix design of the lines, we can construct this with the following code,
Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA
, that consist of a series of nested lists, where each sub list will contain one of the matrix designs, as well as the model with which this matrix will be modeled. In this case only contain the matrix design of the lines.
Next, we provide some commands as pre-analysis for building the cross validation:
pheno <- phenoWheatToy[, c(1:3)] #Use only the first trait to do a cv
colnames(pheno) <- c('Line', 'Env', 'Response')
This is done because the cross validation that will be implemented assumes that for those lines that are selected as missing all traits are missed. The cross validation is implemented with random partitions, with the function: CV.RandomPart()
, as shown below,
Before making the adjustment of the model, the phenotype matrix is obtained, for this purpose we use the following code,
Finally, the model is adjusted and for demonstration purposes only 5000 iterations are used to implement the model. In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary() function.
pm <- BMORS(Y, ETA = ETA, nIter = 5000, burnIn = 2000, thin = 2, progressBar = TRUE,
testingSet = CrossValidation, digits = 4)
summary(pm)
#> Environment Trait Pearson SE_Pearson MAAPE SE_MAAPE
#> 1 Bed2IR DTHD 0.9673 0.0193 0.2844 0.0773
#> 2 Bed2IR PTHT -0.0140 0.2686 0.6591 0.0775
#> 3 Bed5IR DTHD 0.7978 0.1051 0.5031 0.0658
#> 4 Bed5IR PTHT -0.2917 0.2137 0.6165 0.0505
#> 5 Drip DTHD 0.9346 0.0383 0.8571 0.0945
#> 6 Drip PTHT 0.2360 0.1178 0.8872 0.0913
To use only the effect of the environments and lines in the linear predictor, we will first occupy the matrix design of the environment, we can construct this with the following code,
Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA2
, in this case contain the matrix design of the lines and the matrix design of the environment.
Finally, the model is adjusted and for demonstration purposes only 2000 iterations are used to implement the model. Remember that in the past section we construct the dataset object and the crossvalidation object, and we reuse that objects in this model.
In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary()
function.
pm <- BMORS(Y, ETA = ETA, nIter = 2000, burnIn = 1000, thin = 2, progressBar = TRUE,
testingSet = CrossValidation, digits = 4)
summary(pm)
#> Environment Trait Pearson SE_Pearson MAAPE SE_MAAPE
#> 1 Bed2IR DTHD 0.9656 0.0208 0.2787 0.0753
#> 2 Bed2IR PTHT -0.0340 0.2919 0.6849 0.0780
#> 3 Bed5IR DTHD 0.8046 0.1006 0.5081 0.0618
#> 4 Bed5IR PTHT -0.2845 0.2191 0.6127 0.0496
#> 5 Drip DTHD 0.9336 0.0383 0.8579 0.0954
#> 6 Drip PTHT 0.1912 0.1039 0.8903 0.0869
And finally, to use only the effect of the environments and lines with the interaction effect of the environment with the lines in the linear predictor, we will first occupy the matrix design of the interaction, with the following code the construction of all the matrix designs are presented for easy understanding,
# Line effect (section 3.1.1)
LG <- cholesky(genoWheatToy)
ZG <- model.matrix(~0 + as.factor(phenoWheatToy$Gid))
Z.G <- ZG %*% LG
# Environment effect (section 3.1.2)
Z.E <- model.matrix(~0 + as.factor(phenoWheatToy$Env))
#Interaction effect
ZEG <- model.matrix(~0 + as.factor(phenoWheatToy$Gid):as.factor(phenoWheatToy$Env))
G2 <- kronecker(diag(length(unique(phenoWheatToy$Env))), data.matrix(genoWheatToy))
LG2 <- cholesky(G2)
Z.EG <- ZEG %*% LG2
Once the matrix design has been generated, the linear predictor is constructed, for this purpose, a new object will be created that we will call ETA3
, in this case contain the effect of environment, lines and the interaction of both.
ETA3 <- list(Env = list(X = Z.E, model = "BRR"),
Gen = list(X = Z.G, model = 'BRR'),
EnvGen = list(X = Z.EG, model = "BRR"))
Finally, the model is adjusted and for demonstration purposes only 2000 iterations are used to implement the model. Remember that in the first section we construct the dataset object and the crossvalidation object, and we reuse that objects in this model.
In addition, we will show the information of the predictive capacity obtained in each trait for each evaluated environment using the summary()
function.
pm <- BMORS(Y, ETA = ETA3, nIter = 2000, burnIn = 1000, thin = 2, progressBar = TRUE,
testingSet = CrossValidation, digits = 4)
summary(pm)
#> Environment Trait Pearson SE_Pearson MAAPE SE_MAAPE
#> 1 Bed2IR DTHD 0.9730 0.0148 0.2907 0.0551
#> 2 Bed2IR PTHT 0.2248 0.2717 0.7814 0.0956
#> 3 Bed5IR DTHD 0.8239 0.0875 0.3506 0.0456
#> 4 Bed5IR PTHT -0.0762 0.2346 0.5022 0.1277
#> 5 Drip DTHD 0.9341 0.0367 0.4744 0.1012
#> 6 Drip PTHT 0.2484 0.2570 0.6758 0.0438
It is important to point out that it is possible to obtain a boxplot with the results using the following commands.
In this figure we can observe that the best average predictive accuracy is obtained with the trait-environment combination DTHD_Drip
, while the lowest average predictive accuracy obtained was in the trait-environment PTHT_Bed5IR
.