gm.cv {gmvalid}R Documentation

Cross Validation for Graphical (Chain) Models

Description

Predicts a binary outcome variable in a given graphical (chain) model using k-fold cross validation.

Usage

gm.cv(k, data, outcome=1, strategy = c("backwards", "forwards"),
        chain = FALSE, options="", conf.level = 0.95)

Arguments

k Number of folds the data should be split into in order to estimate the success of prediction in the cross validation.
data Data frame or a table (array). Variables should have names, data has to be discrete.
outcome Variable index of the outcome variable. Default is 1.
strategy Type of model selection. "backwards" searches for not significant edges to delete, starting from the saturated model as default. "forwards" adds significant edges, starting from the main effects model. The default strategy is "backwards". Selections may be abbreviated.
chain Character string to specify the block structure of directed graphical models. The syntax is "vs1|vs2|vs3" where vs1,vs2,vs3 are sets of variables and the variables in vs1 are prior to those in vs2 etc. Otherwise the model will be undirected. Only lowercase letters are allowed!
options Character string specifying further options for the search strategy. Possible options can be found in the MIM help when searching for "stepwise" (backwards, forwards) or "startsearch" (eh). See details.
conf.level Confidence level of the interval (default is 0.95).

Details

Outcome variable must be the first in data, categorized as 1="unaffected" and 2="affected".

The procedure is as follows

1. Data Processing:
The data set is splitted into k folds

2. Model Selection:
A model is selected using the observations from k - j-th folds, j=1,...,k. P-values of edges are stored.

3. Calculate Risk:
A ratio table of being affected rather than unaffected is calulated for the joint probability of all influences associated with the outcome variable. This is done for every fold.
If the ratio is greater or equal to one, the risk is set to 2, otherwise it is set to 1. For each fold a risk table is generated.

4. Prediction:
The risk table from step(3), corresponding to the k-j-th fold, is used to predict the observations in the j-th fold. The prediction "PRED" is compared to the real outcome "OUT" in each fold by calculating the success probability using the following formula:

success.prob = 1/n ( 1 - SUM(1:n)|OUT - PRED|),
n = number of oberservations in the j-th fold

MIM options for stepwise procedures (backwards, forwards):
"A" - uses the AIC as selection criterion
"B" - uses the BIC as selection criterion
"J" - joggles between backward and forward
"N" - non coherent mode
"U" - unrestricted, allows for non-decomposable models;

Value

A list containing:

pvalue Matrix of the calculated p-values in each fold for each edge obtained by the model selection. NA's mark missing edges.
ratio List of ratio tables (see details, step (3)).
risk List of risk tables (see details, step (3)).
success Matrix with the best prediction models in each fold, the number of edges that point to the outcome variable and the probability of successful prediction (see details, step (4)). The initial blockstructure in the prediction step is: "variable set of influences | outcome variable". A given chain is only used during the model selection step. The success probability is calculated using those clique structures the outcome variable is involved in.

Note

The function requires the MIM program. Make sure that it is running before using the function. mimR will only work properly if your temporary directory has a path where every folder has a name containing only 8 letters or less. mimR needs the Rgraphviz package. Therefore you will have to add "Bioconductor" to your R repositories.

Author(s)

Ronja Foraita, Fabian Sobotka
Bremen Institute for Prevention Research and Social Medicine
(BIPS) http://www.bips.uni-bremen.de

References

Foraita R (2008) Outcome prediction in graphical (chain) models using cross validation. Slides. Please contact foraita@bips.uni-bremen.de.

Edwards D (2000) An Introduction to Graphical Modelling. Second Edition, Springer Verlag.

See Also

gm.mim

Examples

  
  ABC <- gm.modelsim(500,"ABC,CD")
  out <- gm.cv(5,data=ABC, strategy="f")
  out
  
  ### DAG using a stepwise selection
  out.dag <- gm.cv(3,data=ABC,option="j",chain="d|b|c|a")  
  
  ### Chain graph using BIC as selection criteria and allowing for 
  ### non-decomposable models
  cg <- gm.modelsim(1000,"ABD,BCE")  
  out.cg <- gm.cv(3,data=cg,option="bu",chain="cb|de|a")
  
  ## Not run: 
gm.cv(3,data=ABC,chain="DBD|A") # you have to use lowercase letters
            gm.cv(3,data=ABC,chain="dca|b") # a is supposed to be outcome variable 
                                  # and thus have to be in the very right block    
            
## End(Not run)

[Package gmvalid version 1.0 Index]