The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Creates Ideal Data for Generalized Linear Models
Version: 1.0.0
Author: Greg McMahan
Maintainer: Greg McMahan <gmcmacran@gmail.com>
Description: Creates ideal data for all distributions in the generalized linear model framework.
License: GPL-3
Encoding: UTF-8
Imports: assertthat, stats, stringr, dplyr, statmod, magrittr, MASS, tweedie, ggplot2, cplm
RoxygenNote: 7.2.3
Suggests: testthat (≥ 3.0.0), knitr, rmarkdown, covr
VignetteBuilder: knitr
Config/testthat/edition: 3
NeedsCompilation: no
Packaged: 2023-12-01 04:16:34 UTC; gmcma
Repository: CRAN
Date/Publication: 2023-12-01 10:50:04 UTC

GlmSimulatoR: Creates Ideal Data for Generalized Linear Models

Description

Creates ideal data for all distributions in the generalized linear model framework.


Pipe operator

Description

See magrittr::%>% for details.

Usage

lhs %>% rhs

Create ideal data for a generalized linear model.

Description

Create ideal data for a generalized linear model.

Usage

simulate_gaussian(
  N = 10000,
  link = "identity",
  weights = 1:3,
  x_range = 1,
  unrelated = 0,
  ancillary = 1
)

simulate_binomial(
  N = 10000,
  link = "logit",
  weights = c(0.1, 0.2),
  x_range = 1,
  unrelated = 0
)

simulate_gamma(
  N = 10000,
  link = "inverse",
  weights = 1:3,
  x_range = 1,
  unrelated = 0,
  ancillary = 0.05
)

simulate_poisson(
  N = 10000,
  link = "log",
  weights = c(0.5, 1),
  x_range = 1,
  unrelated = 0
)

simulate_inverse_gaussian(
  N = 10000,
  link = "1/mu^2",
  weights = 1:3,
  x_range = 1,
  unrelated = 0,
  ancillary = 0.3333
)

simulate_negative_binomial(
  N = 10000,
  link = "log",
  weights = c(0.5, 1),
  x_range = 1,
  unrelated = 0,
  ancillary = 1
)

simulate_tweedie(
  N = 10000,
  link = "log",
  weights = 0.02,
  x_range = 1,
  unrelated = 0,
  ancillary = 1.15
)

Arguments

N

Sample size. (Default: 10000)

link

Link function. See family for details.

weights

Betas in glm model.

x_range

range of x variables.

unrelated

Number of unrelated features to return. (Default: 0)

ancillary

Ancillary parameter for continuous families and negative binomial. See details.

Details

For many families, it is possible to pick weights that cause inverse link(X * weights) to be mathematically invalid. For example, the log link for binomial regression defines P(Y=1) as exp(X * weights) which can be above one. If this happens, the function will error with a helpful message.

The intercept in the underlying link(Y) = X * weights + intercept is always max(weights). In simulate_gaussian(link = "inverse", weights = 1:3), the model is (1/Y) = 1*X1 + 2*X2 + 3*X3 + 3.

links

The default link is the first link listed for each family.

ancillary parameter

Value

A tibble with a response variable and predictors.

Examples

library(GlmSimulatoR)
library(ggplot2)
library(MASS)

# Do glm and lm estimate the same weights? Yes
set.seed(1)
simdata <- simulate_gaussian()
linear_model <- lm(Y ~ X1 + X2 + X3, data = simdata)
glm_model <- glm(Y ~ X1 + X2 + X3,
  data = simdata,
  family = gaussian(link = "identity")
)
summary(linear_model)
summary(glm_model)
rm(linear_model, glm_model, simdata)

# If the link is not identity, will the response
# variable still be normal? Yes
set.seed(1)
simdata <- simulate_gaussian(N = 1000, link = "log", weights = c(.1, .2))

ggplot(simdata, aes(x = Y)) +
  geom_histogram(bins = 30)
rm(simdata)

# Is AIC lower for the correct link? For ten thousand data points, depends
# on seed!
set.seed(1)
simdata <- simulate_gaussian(N = 10000, link = "inverse", weights = 1)
glm_correct_link <- glm(Y ~ X1,
  data = simdata,
  family = gaussian(link = "inverse")
)
glm_wrong_link <- glm(Y ~ X1,
  data = simdata,
  family = gaussian(link = "identity")
)
summary(glm_correct_link)$aic
summary(glm_wrong_link)$aic
rm(simdata, glm_correct_link, glm_wrong_link)


# Does a stepwise search find the correct model for logistic regression? Yes
# 3 related variables. 3 unrelated variables.
set.seed(1)
simdata <- simulate_binomial(
  N = 10000, link = "logit",
  weights = c(.3, .4, .5), unrelated = 3
)

scope_arg <- list(
  lower = Y ~ 1,
  upper = Y ~ X1 + X2 + X3 + Unrelated1 + Unrelated2 + Unrelated3
)

starting_model <- glm(Y ~ 1,
  data = simdata,
  family = binomial(link = "logit")
)
glm_model <- stepAIC(starting_model, scope_arg)
summary(glm_model)
rm(simdata, scope_arg, starting_model, glm_model)

# When the resposne is a gamma distribution, what does a scatter plot between
# X and Y look like?
set.seed(1)
simdata <- simulate_gamma(weights = 1)
ggplot(simdata, aes(x = X1, y = Y)) +
  geom_point()
rm(simdata)

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.