This example illustrates how to fit a model where there is only one environment and several traits, even though it may contradict the model. To do this, the WheatMadaToy
data set is used. To load the data set, the data()
function as shown below:
With the command, the following 2 objects are loaded into the R environment:
genoMada
: Genomic matrix of the data set.phenoMada
: Phenotypic data from the data set.The phenotypic data has the following structure:
GID | PH | FL | FE | NS | SY | NP |
---|---|---|---|---|---|---|
9 | 29.7776 | -8.8882 | -4.93900 | 1.04100 | 169.06 | 28.8025 |
11 | 3.2210 | -7.1111 | -0.36940 | -3.88940 | -107.19 | 58.2516 |
12 | 6.1670 | -9.5337 | -12.43680 | 2.58250 | -160.54 | 17.1278 |
15 | 6.8117 | 4.6377 | 11.78860 | -0.03378 | 235.70 | -19.6571 |
20 | -14.4480 | 3.2525 | 6.40780 | -14.23460 | 131.87 | 42.2962 |
21 | -13.2185 | 3.8902 | 0.09722 | 5.35680 | 164.06 | 36.8239 |
Then we proceed to define the model to be adjusted, as the data set only includes an environment where several traits were evaluated, the BME model is used, and for its implementation, first we need order the dataset as follows:
This is the most important step in the analysis, because if the dataset is not ordered by the identifiers, may cause conflicts between the analysis producing incorrect estimations. Then, the matrix design for the genetic effects should be generated, as shown below,
Then, we extract the phenotypic responses and it’s converted to matrix object as shown in the following command,
Finally, the model is adjusted, and for demonstration purposes only 1250 iterations are used to adjust the model.
fm <- BME(Y = Y, Z1 = Z.G, nIter = 1250, burnIn = 500, thin = 2, bs = 50)
fm
#> Multi-Environment Model Fitted with:
#> 1250 Iterations, burning the first 500 and thining every 2
#> We found 0 NA values
#>
#>
#> Use str() function to found more detailed information.
To know all the details about the output of the fitted model, we use the str()
function, which returns the following information:
str(fm)
#> List of 17
#> $ Y : num [1:50, 1:6] 29.78 3.22 6.17 6.81 -14.45 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:50] "1" "2" "3" "4" ...
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ nIter : num 1250
#> $ burnIn : num 500
#> $ thin : num 2
#> $ dfe : int 5
#> $ Se : num [1:6, 1:6] 527.9 -29.6 28.3 74.3 1523.2 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ yHat : num [1:50, 1:6] 13.04 4.98 3.31 4.91 -4.63 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.yHat : num [1:50, 1:6] 5.46 4.59 4.52 3.44 4.67 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ beta : num [1, 1:6] -1.0826 0.0654 2.2029 0.3619 -15.3457 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.beta : num [1, 1:6] 0.917 0.4936 0.8701 1.0202 0.0834 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ b1 : num [1:50, 1:6] -11.53 12.14 1.89 5.41 -7.71 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.b1 : num [1:50, 1:6] 2.2 3.22 3.55 3.3 4.13 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ vare : num [1:6, 1:6] 65.7 -1.79 -11.14 20.44 866.51 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.vare : num [1:6, 1:6] 16.54 4.98 9.91 12.64 382.24 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ varTrait : num [1:6, 1:6] 61.43 -4.8 6.92 1.67 -142.06 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ SD.varTrait: num [1:6, 1:6] 22.33 5.29 10.63 15.27 371.9 ...
#> ..- attr(*, "dimnames")=List of 2
#> .. ..$ : NULL
#> .. ..$ : chr [1:6] "PH" "FL" "FE" "NS" ...
#> $ NAvalues : int 0
#> - attr(*, "class")= chr "BME"
Where we can observe that the returned data corresponds to the observed values ($Y
), the parameters provided for the model fit ($nIter
, $burnIn
, $thin
, etc.), the predicted values ($yHat
) and the estimations of the beta coefficients, random effects of lines and the genetic and residual covariances ($beta
, $SD.beta
, $b1
, $SD.b1
, $varTrait
, $vare
, etc.). Since breeders are interested in understanding the matrix of covariance and genetic correlation, these can be extracted with the following command,
COV_TraitGenetic <- fm$varTrait
COV_TraitGenetic
#> PH FL FE NS SY NP
#> [1,] 61.4285 -4.8039 6.9172 1.6715 -142.0596 -110.4548
#> [2,] -4.8039 5.8966 3.2135 -1.4313 147.0114 -18.5556
#> [3,] 6.9172 3.2135 25.0409 -0.2195 -161.7941 -49.5990
#> [4,] 1.6715 -1.4313 -0.2195 53.2704 468.4873 -58.3836
#> [5,] -142.0596 147.0114 -161.7941 468.4873 27543.1675 942.5792
#> [6,] -110.4548 -18.5556 -49.5990 -58.3836 942.5792 905.7937
To convert this covariance matrix into a correlation matrix it is suggested to use the following command,
COR_TraitGenetic <- cov2cor(COV_TraitGenetic)
COR_TraitGenetic
#> PH FL FE NS SY
#> [1,] 1.00000000 -0.25241094 0.176368252 0.029219881 -0.1092141
#> [2,] -0.25241094 1.00000000 0.264455439 -0.080758242 0.3647903
#> [3,] 0.17636825 0.26445544 1.000000000 -0.006009891 -0.1948188
#> [4,] 0.02921988 -0.08075824 -0.006009891 1.000000000 0.3867657
#> [5,] -0.10921409 0.36479026 -0.194818759 0.386765655 1.0000000
#> [6,] -0.46825777 -0.25389814 -0.329331525 -0.265786699 0.1887106
#> NP
#> [1,] -0.4682578
#> [2,] -0.2538981
#> [3,] -0.3293315
#> [4,] -0.2657867
#> [5,] 0.1887106
#> [6,] 1.0000000
Where we can observe that there is not a high correlation (greater than 0.5). Below is a sample of how we can obtain the residual covariance (correlation) matrix,
COV_ResGenetic <- fm$vare
COV_ResGenetic
#> PH FL FE NS SY NP
#> [1,] 65.7048 -1.7895 -11.1416 20.4376 866.5089 -82.3637
#> [2,] -1.7895 11.6255 10.0978 -7.0416 206.6082 -11.0698
#> [3,] -11.1416 10.0978 39.7963 -19.4887 -421.9174 -27.1981
#> [4,] 20.4376 -7.0416 -19.4887 52.9486 318.3345 -78.9841
#> [5,] 866.5089 206.6082 -421.9174 318.3345 62014.9436 1942.2813
#> [6,] -82.3637 -11.0698 -27.1981 -78.9841 1942.2813 683.5397
To convert the residual matrix into the correlation matrix it is suggested to use the following command,
COR_ResGenetic <- cov2cor(COV_ResGenetic)
COR_ResGenetic
#> PH FL FE NS SY NP
#> [1,] 1.00000000 -0.06474815 -0.2178852 0.3465007 0.4292658 -0.3886471
#> [2,] -0.06474815 1.00000000 0.4694611 -0.2838169 0.2433288 -0.1241801
#> [3,] -0.21788517 0.46946108 1.0000000 -0.4245553 -0.2685703 -0.1649056
#> [4,] 0.34650069 -0.28381692 -0.4245553 1.0000000 0.1756743 -0.4151744
#> [5,] 0.42926581 0.24332882 -0.2685703 0.1756743 1.0000000 0.2983198
#> [6,] -0.38864710 -0.12418012 -0.1649056 -0.4151744 0.2983198 1.0000000
Where we can observe that the residual of traits is not highly correlated (greater than 0.5). On the other hand, to extract the predicted values from the model it is necessary call the $yhat
values of the object within the adjusted model, for demonstration purposes we will only extract the first 6 predictions for the 6 traits evaluated.
head(fm$yHat)
#> PH FL FE NS SY NP
#> [1,] 13.0399 -4.6628 1.7420 -5.2667 -137.1660 33.2012
#> [2,] 4.9794 -2.6906 2.4134 -2.5778 -85.6570 26.4790
#> [3,] 3.3091 -2.0779 0.1708 -2.9434 -98.3687 10.0068
#> [4,] 4.9142 1.6687 5.9605 1.8267 33.6472 -25.3203
#> [5,] -4.6268 1.2114 2.0855 -3.1853 36.5101 7.2651
#> [6,] -8.8045 1.2874 0.6305 3.6720 98.5343 26.5961
The software also allows plotting the observed values against the predicted values by trait, as shown below:
On the other hand, the package also allows making cross-validation for the predictions for this we require a data.frame object with the phenotypes as illustrated below:
Once the object is generated, we will use the CV.RandomPart() function, to form the training and testing sets of each random partition of the cross validation, it is suggested to provide the number of random partitions, the percentage of the data to be used for the testing and a seed to guarantee a reproducible analysis.
Since the partitions have been generated for cross validation, we will use the BME() function to fit the model, however, we will include the above object in the testingSet parameter to implement cross validation. In addition, we will use the summary() function, to show the resulting predictive capacity by trait evaluated.
pm <- BME(Y = Y, Z1 = Z.G, nIter = 1000, burnIn = 500, thin = 2, bs = 50,
testingSet = CrossV, progressBar = FALSE)
summary(pm)
#> Environment Trait Pearson SE_Pearson MAAPE SE_MAAPE
#> 1 FE 0.5024 0.2181 0.7426 0.0701
#> 2 FL 0.1314 0.1737 0.7845 0.0700
#> 3 NP 0.5184 0.2819 0.7033 0.0360
#> 4 NS 0.6654 0.1033 0.6653 0.0404
#> 5 PH 0.5878 0.1335 0.7819 0.0764
#> 6 SY 0.2048 0.1283 0.7022 0.0571
We can observe that the results are given around the traits that have been evaluated, showing for each of them the predictive capacity obtained using the Pearson correlation average of all the partitions used in the cross validation; as well as the mean arctangent percentage error (MAAPE). From the results obtained, we can emphasize that the traits that have obtained a better predictive capacity were the NS and PH traits. While the SY and FL traits have low predictive capabilities. In addition, the package offers the ability to generate graphs of the results obtained, for it we will use the function that appears next to show a boxplot with the results obtained.
In this figure we can observe that the PH trait has obtained the best predictive capacity on average, while the SY trait has obtained the lowest predictive capacity. It is also possible to graph the predictive capacity of the traits under study using the MAAPE, just specify it through the select parameter, as shown below,
In this figure we can observe that the FL trait has been the one that has obtained the highest MAAPE index and therefore presents the worst prediction, but it is like the rest of them, so we cannot trust in the predictions of the analysis.