Once a model has been estimated, it can be used to predict choices for a set of alternatives. This vignette demonstrates examples of how to so using the predictChoices()
function along with the results of an estimated model.
To predict choices, you first need to define a set of alternatives for which you want to make predictions. Each row should be an alternative, and each column should be an attribute. I will predict choices on the full yogurt
data set, which was used to estimate each of the models used in this example.
This example uses the yogurt data set from Jain et al. (1994). The data set contains 2,412 choice observations from a series of yogurt purchases by a panel of 100 households in Springfield, Missouri, over a roughly two-year period. The data were collected by optical scanners and contain information about the price, brand, and a “feature” variable, which identifies whether a newspaper advertisement was shown to the customer. There are four brands of yogurt: Yoplait, Dannon, Weight Watchers, and Hiland, with market shares of 34%, 40%, 23% and 3%, respectively.
head(yogurt)
#> # A tibble: 6 × 15
#> id obsID alt choice price feat brand dannon hiland weight yoplait
#> <dbl> <int> <int> <dbl> <dbl> <dbl> <chr> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 1 0 8.1 0 dannon 1 0 0 0
#> 2 1 1 2 0 6.10 0 hiland 0 1 0 0
#> 3 1 1 3 1 7.90 0 weight 0 0 1 0
#> 4 1 1 4 0 10.8 0 yoplait 0 0 0 1
#> 5 1 2 1 1 9.80 0 dannon 1 0 0 0
#> 6 1 2 2 0 6.40 0 hiland 0 1 0 0
#> # … with 4 more variables: brand_dannon <int>, brand_hiland <int>,
#> # brand_weight <int>, brand_yoplait <int>
In the example below, I estimate a preference space MNL model called mnl_pref
. I can then use the predictChoices()
function with the mnl_pref
model to predict the choices for each set of alternatives in the yogurt
data set:
# Estimate the model
<- logitr(
mnl_pref data = yogurt,
choice = 'choice',
obsID = 'obsID',
pars = c('price', 'feat', 'brand')
)
# Predict choices
<- predictChoices(
choices_mnl_pref model = mnl_pref,
alts = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
head(choices_mnl_pref[c('obsID', 'choice', 'choice_predict')])
#> obsID choice choice_predict
#> 1.1 1 0 0
#> 1.2 1 0 0
#> 1.3 1 1 1
#> 1.4 1 0 0
#> 2.5 2 1 1
#> 2.6 2 0 0
The resulting choices_mnl_pref
data frame contains the same alts
data frame with an additional column, choice_predict
, which contains the predicted choices. You can quickly compute the accuracy by dividing the number of correctly predicted choices by the total number of choices:
<- subset(choices_mnl_pref, choice == 1)
chosen $correct <- chosen$choice == chosen$choice_predict
chosensum(chosen$correct) / nrow(chosen)
#> [1] 0.3897181
You can also use WTP space models to predict choices. For example, here are the results from an equivalent model but in the WTP space:
# Estimate the model
<- logitr(
mnl_wtp data = yogurt,
choice = 'choice',
obsID = 'obsID',
pars = c('feat', 'brand'),
price = 'price',
modelSpace = 'wtp',
numMultiStarts = 10
)
# Make predictions
<- predictChoices(
choices_mnl_wtp model = mnl_wtp,
alts = yogurt,
altID = "alt",
obsID = "obsID"
)
#> NOTE: Using results from run 8 of 10 multistart runs
#> (the run with the largest log-likelihood value)
# Preview actual and predicted choices
head(choices_mnl_wtp[c('obsID', 'choice', 'choice_predict')])
#> obsID choice choice_predict
#> 1.1 1 0 0
#> 1.2 1 0 0
#> 1.3 1 1 0
#> 1.4 1 0 1
#> 2.5 2 1 1
#> 2.6 2 0 0
You can also use mixed logit models to predict choices. Heterogeneity is modeled by simulating draws from the population estimates of the estimated model. Here is an example using a preference space mixed logit model:
# Estimate the model
<- logitr(
mxl_pref data = yogurt,
choice = 'choice',
obsID = 'obsID',
pars = c('price', 'feat', 'brand'),
randPars = c(feat = 'n', brand = 'n'),
numMultiStarts = 5
)
# Make predictions
<- predictChoices(
choices_mxl_pref model = mxl_pref,
alts = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
head(choices_mxl_pref[c('obsID', 'choice', 'choice_predict')])
#> obsID choice choice_predict
#> 1.1 1 0 1
#> 1.2 1 0 0
#> 1.3 1 1 0
#> 1.4 1 0 0
#> 2.5 2 1 0
#> 2.6 2 0 0
Likewise, mixed logit WTP space models can also be used to predict choices:
# Estimate the model
<- logitr(
mxl_wtp data = yogurt,
choice = 'choice',
obsID = 'obsID',
pars = c('feat', 'brand'),
price = 'price',
randPars = c(feat = 'n', brand = 'n'),
modelSpace = 'wtp',
numMultiStarts = 5
)
# Make predictions
<- predictChoices(
choices_mxl_wtp model = mxl_wtp,
alts = yogurt,
altID = "alt",
obsID = "obsID"
)
# Preview actual and predicted choices
head(choices_mxl_wtp[c('obsID', 'choice', 'choice_predict')])
#> obsID choice choice_predict
#> 1.1 1 0 0
#> 1.2 1 0 0
#> 1.3 1 1 1
#> 1.4 1 0 0
#> 2.5 2 1 0
#> 2.6 2 0 0
library(dplyr)
# Combine models into one data frame
<- rbind(
choices
choices_mnl_pref, choices_mnl_wtp, choices_mxl_pref, choices_mxl_wtp)$model <- c(
choicesrep("mnl_pref", nrow(choices_mnl_pref)),
rep("mnl_wtp", nrow(choices_mnl_wtp)),
rep("mxl_pref", nrow(choices_mxl_pref)),
rep("mxl_wtp", nrow(choices_mxl_wtp)))
# Compute prediction accuracy by model
%>%
choices filter(choice == 1) %>%
mutate(predict_correct = (choice_predict == choice)) %>%
group_by(model) %>%
summarise(p_correct = sum(predict_correct) / n())
#> # A tibble: 4 × 2
#> model p_correct
#> <chr> <dbl>
#> 1 mnl_pref 0.390
#> 2 mnl_wtp 0.362
#> 3 mxl_pref 0.390
#> 4 mxl_wtp 0.379
The models all perform about the same with ~38% correct predictions. This is significantly better than random predictions, which should be 25%.