The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Standard parametric survival models force you to choose from a short list of hazard shapes: constant (exponential), monotone power-law (Weibull), exponential growth (Gompertz). But real failure patterns rarely cooperate:
Each of these demands a hazard function that standard families cannot express. You can reach for semi-parametric methods (Cox regression), but then you lose the ability to extrapolate, simulate, or compute closed-form quantities like mean time to failure.
flexhaz takes a different approach: you write
the hazard function, and the package derives everything
else.
Given a hazard (failure rate) function \(h(t)\), the full distribution follows:
\[ h(t) \;\xrightarrow{\;\int\;}\; H(t) = \int_0^t h(u)\,du \;\xrightarrow{\;\exp\;}\; S(t) = e^{-H(t)} \;\xrightarrow{\;\times h\;}\; f(t) = h(t)\,S(t) \]
If you can express \(h(t)\) as an R
function of time and parameters, flexhaz gives you survival
curves, CDFs, densities, quantiles, random sampling, log-likelihoods,
MLE fitting, and residual diagnostics — automatically.
Let’s start with the simplest case: the exponential distribution with constant failure rate.
# Exponential with failure rate lambda = 0.1 (MTTF = 10 time units)
exp_dist <- dfr_exponential(lambda = 0.1)
print(exp_dist)
#> Dynamic failure rate (DFR) distribution with failure rate:
#> function (t, par, ...)
#> {
#> rep(par[[1]], length(t))
#> }
#> <bytecode: 0x6099ff6195a8>
#> <environment: 0x6099fd6f3d50>
#> It has a survival function given by:
#> S(t|rate) = exp(-H(t,...))
#> where H(t,...) is the cumulative hazard function.All distribution functions follow a two-step pattern — the method returns a closure that you then call with time values:
# Get closures
S <- surv(exp_dist)
h <- hazard(exp_dist)
f <- density(exp_dist)
# Evaluate at specific times
S(10) # P(survive past t=10) = exp(-0.1 * 10) ~ 0.37
#> [1] 0.3678794
h(10) # Hazard at t=10 = 0.1 (constant for exponential)
#> [1] 0.1
f(10) # PDF at t=10
#> [1] 0.03678794You can verify these analytically: \(S(10) = e^{-0.1 \times 10} = e^{-1} \approx 0.368\), the hazard is constant at \(\lambda = 0.1\), and \(f(10) = \lambda \, S(10) = 0.1 \times 0.368 \approx 0.0368\).
Now let’s fit a model to survival data using maximum likelihood estimation.
set.seed(123)
n <- 50
df <- data.frame(
t = rexp(n, rate = 0.08), # True lambda = 0.08
delta = rep(1, n) # All exact observations
)
head(df)
#> t delta
#> 1 10.5432158 1
#> 2 7.2076284 1
#> 3 16.6131858 1
#> 4 0.3947170 1
#> 5 0.7026372 1
#> 6 3.9562652 1# Template with no parameters (to be estimated)
exp_template <- dfr_exponential()
solver <- fit(exp_template)
result <- solver(df, par = c(0.1)) # Initial guess
coef(result)
#> [1] 0.07077333
# Compare to analytical MLE: lambda_hat = n / sum(t)
n / sum(df$t)
#> [1] 0.07077323The numerical MLE matches the closed-form solution \(\hat\lambda = n / \sum t_i\) to five decimal places. Both estimate \(\lambda \approx 0.071\), which is consistent with the true value of 0.08 given the sampling variability at \(n = 50\).
Real survival data often includes censoring —
observations where the exact failure time is not directly observed. The
flexhaz package uses a delta column to encode
three types of observation:
delta |
Meaning | Log-likelihood contribution |
|---|---|---|
1 |
Exact failure | \(\log h(t) - H(t)\) |
0 |
Right-censored (survived past \(t\)) | \(-H(t)\) |
-1 |
Left-censored (failed before \(t\)) | \(\log(1 - e^{-H(t)})\) |
Right-censoring is the most common type — the subject was still alive (or the device was still running) when observation ended.
df_censored <- data.frame(
t = c(5, 8, 12, 15, 20, 25, 30, 30, 30, 30),
delta = c(1, 1, 1, 1, 1, 0, 0, 0, 0, 0) # Last 5 right-censored at t=30
)
solver <- fit(dfr_exponential())
result <- solver(df_censored, par = c(0.1))
coef(result)
#> [1] 0.02439454The censored MLE properly accounts for the five units that survived past \(t = 30\). You can verify: the analytical MLE for censored exponential data is \(\hat\lambda = d / \sum t_i\) where \(d\) is the number of failures, giving \(5 / 205 \approx 0.024\) — exactly what the optimizer returns.
Left-censoring arises when we know failure occurred before an inspection time, but not exactly when. This is common in periodic-inspection studies: you check a device at time \(t\) and discover it has already failed.
df_left <- data.frame(
t = c(5, 10, 15, 20),
delta = c(-1, -1, 1, 0) # left-censored, left-censored, exact, right-censored
)
solver <- fit(dfr_exponential())
result <- solver(df_left, par = c(0.1))
coef(result)
#> [1] 0.07184419This dataset mixes all three observation types: two left-censored, one exact failure at \(t = 15\), and one right-censored at \(t = 20\). The estimate \(\hat\lambda \approx 0.072\) is higher than a right-censored-only estimate because the left-censored observations provide evidence of earlier failures. All three types can coexist in the same dataset — the log-likelihood sums the appropriate contribution for each observation.
By default, flexhaz looks for columns named
t (observation time) and delta (event
indicator). When your data uses different column names — as is common
with clinical datasets — use the ob_col and
delta_col arguments:
clinical_data <- data.frame(
time = c(5, 8, 12, 15, 20),
status = c(1, 1, 1, 0, 0)
)
dist <- dfr_exponential()
dist$ob_col <- "time"
dist$delta_col <- "status"
solver <- fit(dist)
result <- solver(clinical_data, par = c(0.1))
coef(result)
#> [1] 0.05The estimate \(\hat\lambda = 0.05\) equals \(3 / 60 = d / \sum t_i\), confirming that the column mapping works correctly.
You can also set custom columns at construction time via
dfr_dist():
The exponential assumes constant failure rate. For increasing or decreasing failure rates, use Weibull:
# Weibull with increasing failure rate (wear-out)
weib_dist <- dfr_weibull(shape = 2, scale = 20)
# Compare hazard functions
plot(weib_dist, what = "hazard", xlim = c(0, 30), col = "blue",
main = "Hazard Comparison")
plot(dfr_exponential(lambda = 0.05), what = "hazard", add = TRUE, col = "red")
legend("topleft", c("Weibull (shape=2)", "Exponential"),
col = c("blue", "red"), lwd = 2)Weibull shape parameter interpretation:
Where flexhaz truly shines is modeling
non-standard hazard patterns that no built-in
distribution can express.
\[h(t) = a\,e^{-bt} + c + d\,t^k\]
Three additive components: exponential decay (infant mortality), constant baseline (useful life), and power-law growth (wear-out).
bathtub <- dfr_dist(
rate = function(t, par, ...) {
a <- par[[1]] # infant mortality magnitude
b <- par[[2]] # infant mortality decay
c <- par[[3]] # baseline hazard
d <- par[[4]] # wear-out coefficient
k <- par[[5]] # wear-out exponent
a * exp(-b * t) + c + d * t^k
},
par = c(a = 0.5, b = 0.5, c = 0.01, d = 5e-5, k = 2.5)
)h <- hazard(bathtub)
t_seq <- seq(0.01, 30, length.out = 300)
plot(t_seq, sapply(t_seq, h), type = "l", lwd = 2, col = "darkred",
xlab = "Time", ylab = "Hazard rate h(t)",
main = "Bathtub Hazard Function")plot(bathtub, what = "survival", xlim = c(0, 30),
col = "darkblue", lwd = 2,
main = "Survival Function S(t)")# Generate failure times, right-censored at t = 25
set.seed(42)
samp <- sampler(bathtub)
raw_times <- samp(80)
censor_time <- 25
df <- data.frame(
t = pmin(raw_times, censor_time),
delta = as.integer(raw_times <= censor_time)
)
cat("Observed failures:", sum(df$delta), "out of", nrow(df), "\n")
#> Observed failures: 68 out of 80
cat("Right-censored:", sum(1 - df$delta), "\n")
#> Right-censored: 12
# Fit the model
solver <- fit(bathtub)
result <- solver(df,
par = c(0.3, 0.3, 0.02, 1e-4, 2),
method = "Nelder-Mead"
)cat("Estimated parameters:\n")
#> Estimated parameters:
print(coef(result))
#> [1] 0.4828059584 0.6478481588 0.0128496924 0.0003329348 1.7388288598
cat("\nTrue parameters:\n")
#>
#> True parameters:
print(c(a = 0.5, b = 0.5, c = 0.01, d = 5e-5, k = 2.5))
#> a b c d k
#> 0.50000 0.50000 0.01000 0.00005 2.50000With five parameters and only 80 observations, the optimizer recovers the general shape but not every parameter exactly. In particular, \(d\) and \(k\) trade off: a smaller exponent with a larger coefficient can produce a similar wear-out curve. The Q-Q plot below is the real test of fit quality.
For common parametric families, the package provides optimized constructors with analytical cumulative hazard and score functions:
| Constructor | Hazard Shape | Parameters | Use Case |
|---|---|---|---|
dfr_exponential(lambda) |
Constant | failure rate | Random failures, useful life |
dfr_weibull(shape, scale) |
Power-law | shape \(k\), scale \(\sigma\) | Wear-out, infant mortality |
dfr_gompertz(a, b) |
Exponential growth | initial rate, growth rate | Biological aging |
dfr_loglogistic(alpha, beta) |
Non-monotonic (hump) | scale, shape | Initial risk that diminishes |
Use these when a standard family fits your problem — they include analytical derivatives for faster, more accurate MLE.
| Function | Purpose |
|---|---|
hazard() |
Get hazard (failure rate) function |
surv() |
Get survival function S(t) |
density() |
Get PDF f(t) |
cdf() |
Get CDF F(t) |
inv_cdf() |
Get quantile function |
sampler() |
Generate random samples |
fit() |
Maximum likelihood estimation |
loglik() |
Get log-likelihood function |
residuals() |
Model diagnostics (Cox-Snell, Martingale) |
plot() |
Visualize distribution functions |
qqplot_residuals() |
Q-Q plot for goodness-of-fit |
vignette("reliability_engineering") —
Five real-world case studiesvignette("failure_rate") —
Mathematical foundations of hazard-based modelingvignette("custom_distributions") — The
three-level optimization paradigmvignette("custom_derivatives") —
Analytical score and Hessian functionsThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.