This example illustrates how to fit a model where there is only one environment and several traits, even though it may contradict the model. To do this, the WheatMadaToy
data set is used. To load the data set, the data()
function as shown below:
With the command, the following 2 objects are loaded into the R environment:
genoMada
: Genomic matrix of the data set.phenoMada
: Phenotypic data from the data set.The phenotypic data has the following structure:
GID | PH | FL | FE | NS | SY | NP |
---|---|---|---|---|---|---|
9 | 29.7776 | -8.8882 | -4.93900 | 1.04100 | 169.06 | 28.8025 |
11 | 3.2210 | -7.1111 | -0.36940 | -3.88940 | -107.19 | 58.2516 |
12 | 6.1670 | -9.5337 | -12.43680 | 2.58250 | -160.54 | 17.1278 |
15 | 6.8117 | 4.6377 | 11.78860 | -0.03378 | 235.70 | -19.6571 |
20 | -14.4480 | 3.2525 | 6.40780 | -14.23460 | 131.87 | 42.2962 |
21 | -13.2185 | 3.8902 | 0.09722 | 5.35680 | 164.06 | 36.8239 |
Then we proceed to define the model to be adjusted, as the data set only includes an environment where several traits were evaluated, the BME model is used, and for its implementation, first we need order the dataset as follows:
This is the most important step in the analysis, because if the dataset is not ordered by the identifiers, may cause conflicts between the analysis producing incorrect estimations. Then, the matrix design for the genetic effects should be generated, as shown below,
Then, we extract the phenotypic responses and it’s converted to matrix object as shown in the following command,
Finally, the model is adjusted, and for demonstration purposes only 1250 iterations are used to adjust the model.
fm <- BME(Y = Y, Z1 = Z.G, nIter = 1250, burnIn = 500, thin = 2, bs = 50)
fm
#> Multi-Environment Model Fitted with:
#> 1250 Iterations, burning the first 500 and thining every 2
#> We found 0 NA values
#>
#>
#> Use str() function to found more detailed information.
To know all the details about the output of the fitted model, we use the str()
function, which returns the following information:
str(fm)
#> List of 17
#> $ Y : num [1:50, 1:6] 29.78 3.22 6.17 6.81 -14.45 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:50] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ nIter : num 1250
#> $ burnIn : num 500
#> $ thin : num 2
#> $ dfe : int 5
#> $ Se : num [1:6, 1:6] 527.9 -29.6 28.3 74.3 1523.2 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ yHat : num [1:50, 1:6] 13.75 4.96 2.8 5.32 -5.05 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.yHat : num [1:50, 1:6] 5.55 4.3 4.75 3.64 4.47 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ beta : num [1, 1:6] -1.0257 -0.0218 2.123 0.5051 -15.3521 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.beta : num [1, 1:6] 0.9735 0.4534 0.8844 0.9143 0.0802 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ b1 : num [1:50, 1:6] -11.62 12.92 1.41 5.97 -8.66 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.b1 : num [1:50, 1:6] 2.51 3.57 3.7 3.77 4.05 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ vare : num [1:6, 1:6] 68.01 -2.64 -12.07 19.55 880.99 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.vare : num [1:6, 1:6] 21.2 5.7 11 12.7 431.4 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ varTrait : num [1:6, 1:6] 65.3 -4.75 5.99 3.09 -87.99 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.varTrait: num [1:6, 1:6] 24.13 5.83 11.22 14.97 358.05 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ NAvalues : int 0
#> - attr(*, "class")= chr "BME"
Where we can observe that the returned data corresponds to the observed values ($Y
), the parameters provided for the model fit ($nIter
, $burnIn
, $thin
, etc.), the predicted values ($yHat
) and the estimations of the beta coefficients, random effects of lines and the genetic and residual covariances ($beta
, $SD.beta
, $b1
, $SD.b1
, $varTrait
, $vare
, etc.). Since breeders are interested in understanding the matrix of covariance and genetic correlation, these can be extracted with the following command,
COV_TraitGenetic <- fm$varTrait
COV_TraitGenetic
#> PH FL FE NS SY NP
#> [1,] 65.2990 -4.7494 5.9894 3.0909 -87.9894 -114.4745
#> [2,] -4.7494 5.1916 2.4330 -0.6917 114.4614 -17.2455
#> [3,] 5.9894 2.4330 21.4647 2.4144 -121.1048 -38.7016
#> [4,] 3.0909 -0.6917 2.4144 46.5267 373.6931 -64.6450
#> [5,] -87.9894 114.4614 -121.1048 373.6931 24198.0906 743.9596
#> [6,] -114.4745 -17.2455 -38.7016 -64.6450 743.9596 890.0380
To convert this covariance matrix into a correlation matrix it is suggested to use the following command,
COR_TraitGenetic <- cov2cor(COV_TraitGenetic)
COR_TraitGenetic
#> PH FL FE NS SY
#> [1,] 1.00000000 -0.25794961 0.15998072 0.05607646 -0.06999815
#> [2,] -0.25794961 1.00000000 0.23047779 -0.04450574 0.32293706
#> [3,] 0.15998072 0.23047779 1.00000000 0.07640041 -0.16803835
#> [4,] 0.05607646 -0.04450574 0.07640041 1.00000000 0.35218707
#> [5,] -0.06999815 0.32293706 -0.16803835 0.35218707 1.00000000
#> [6,] -0.47484429 -0.25370026 -0.28000271 -0.31767244 0.16030775
#> NP
#> [1,] -0.4748443
#> [2,] -0.2537003
#> [3,] -0.2800027
#> [4,] -0.3176724
#> [5,] 0.1603078
#> [6,] 1.0000000
Where we can observe that there is not a high correlation (greater than 0.5). Below is a sample of how we can obtain the residual covariance (correlation) matrix,
COV_ResGenetic <- fm$vare
COV_ResGenetic
#> PH FL FE NS SY NP
#> [1,] 68.0140 -2.6405 -12.0747 19.5470 880.9854 -82.9520
#> [2,] -2.6405 12.3595 11.6720 -7.9005 208.5605 -12.6781
#> [3,] -12.0747 11.6720 42.3774 -22.1118 -451.6014 -37.8536
#> [4,] 19.5470 -7.9005 -22.1118 54.3776 394.4754 -63.2850
#> [5,] 880.9854 208.5605 -451.6014 394.4754 62629.4043 1908.1623
#> [6,] -82.9520 -12.6781 -37.8536 -63.2850 1908.1623 681.1306
To convert the residual matrix into the correlation matrix it is suggested to use the following command,
COR_ResGenetic <- cov2cor(COV_ResGenetic)
COR_ResGenetic
#> PH FL FE NS SY NP
#> [1,] 1.00000000 -0.09107235 -0.2249107 0.3214185 0.4268550 -0.3854007
#> [2,] -0.09107235 1.00000000 0.5100088 -0.3047503 0.2370514 -0.1381779
#> [3,] -0.22491066 0.51000881 1.0000000 -0.4606244 -0.2772037 -0.2228050
#> [4,] 0.32141855 -0.30475028 -0.4606244 1.0000000 0.2137572 -0.3288331
#> [5,] 0.42685504 0.23705137 -0.2772037 0.2137572 1.0000000 0.2921534
#> [6,] -0.38540071 -0.13817788 -0.2228050 -0.3288331 0.2921534 1.0000000
Where we can observe that the residual of traits is not highly correlated (greater than 0.5). On the other hand, to extract the predicted values from the model it is necessary call the $yhat
values of the object within the adjusted model, for demonstration purposes we will only extract the first 6 predictions for the 6 traits evaluated.
head(fm$yHat)
#> PH FL FE NS SY NP
#> [1,] 13.7525 -4.4776 2.5276 -5.0744 -135.7338 30.3665
#> [2,] 4.9640 -2.4968 2.6187 -3.5693 -72.3548 29.2320
#> [3,] 2.8000 -1.6734 0.3450 -3.3591 -72.5014 11.8348
#> [4,] 5.3230 1.4192 5.1879 2.0498 51.8079 -22.6639
#> [5,] -5.0480 0.6480 1.5899 -3.3110 34.9088 12.9358
#> [6,] -10.0816 1.3798 0.9706 2.9888 65.4464 25.8167
The software also allows plotting the observed values against the predicted values by trait, as shown below:
On the other hand, the package also allows making cross-validation for the predictions for this we require a data.frame object with the phenotypes as illustrated below:
Once the object is generated, we will use the CV.RandomPart() function, to form the training and testing sets of each random partition of the cross validation, it is suggested to provide the number of random partitions, the percentage of the data to be used for the testing and a seed to guarantee a reproducible analysis.
Since the partitions have been generated for cross validation, we will use the BME() function to fit the model, however, we will include the above object in the testingSet parameter to implement cross validation. In addition, we will use the summary() function, to show the resulting predictive capacity by trait evaluated.
pm <- BME(Y = Y, Z1 = Z.G, nIter = 1000, burnIn = 500, thin = 2, bs = 50,
testingSet = CrossV, progressBar = FALSE)
summary(pm)
#> Environment Trait Pearson SE_Pearson MAAPE SE_MAAPE
#> 1 FE 0.4715 0.2096 0.7472 0.0714
#> 2 FL 0.1293 0.1624 0.7996 0.0739
#> 3 NP 0.4975 0.2827 0.7033 0.0334
#> 4 NS 0.6522 0.0950 0.6664 0.0382
#> 5 PH 0.5811 0.1315 0.7754 0.0757
#> 6 SY 0.1956 0.1446 0.7029 0.0613
We can observe that the results are given around the traits that have been evaluated, showing for each of them the predictive capacity obtained using the Pearson correlation average of all the partitions used in the cross validation; as well as the mean arctangent percentage error (MAAPE). From the results obtained, we can emphasize that the traits that have obtained a better predictive capacity were the NS and PH traits. While the SY and FL traits have low predictive capabilities. In addition, the package offers the ability to generate graphs of the results obtained, for it we will use the function that appears next to show a boxplot with the results obtained.
In this figure we can observe that the PH trait has obtained the best predictive capacity on average, while the SY trait has obtained the lowest predictive capacity. It is also possible to graph the predictive capacity of the traits under study using the MAAPE, just specify it through the select parameter, as shown below,
In this figure we can observe that the FL trait has been the one that has obtained the highest MAAPE index and therefore presents the worst prediction, but it is like the rest of them, so we cannot trust in the predictions of the analysis.