The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

heavytails

Estimators, diagnostics, and goodness-of-fit tools for heavy-tailed distributions in R.

heavytails implements the estimators and algorithms from The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation (Nair, Wierman & Zwart, 2022) — Chapters 8 and 9. It covers tail index estimation, visual diagnostics, Pareto model fitting, and a complete goodness-of-fit pipeline, all using base R with no heavy dependencies.

Installation

From CRAN:

install.packages("heavytails")

Development version from GitHub:

# install.packages("remotes")
remotes::install_github("0diraf/heavytails")

Quick Example

library(heavytails)

set.seed(1)
x <- rpareto(n = 1000, alpha = 2, xm = 1)

# Tail index estimation
hill_estimator(x, k = 50)
mle_pareto(x)

# Visual diagnostics
hill_plot(x)
rank_plot(x)

# Goodness-of-fit pipeline
fit  <- plfit(x)
xmin <- fit$xmin
alph <- fit$alpha

ks_gof(x, alpha = alph, xm = xmin, n_boot = 500)
lr_test_pareto(x, alpha = alph, xmin = xmin)

Functions

Tail Index Estimators

Function Description Reference
hill_estimator() Hill (1975) MLE on top-k order statistics Ch. 9, §9.2
moments_estimator() Dekkers-Einmahl-de Haan (1989) moments estimator Ch. 9, §9.3
pickands_estimator() Pickands (1975) spacing-based estimator Ch. 9, §9.3
pot_estimator() GPD fit via MLE on excesses over threshold u Ch. 9, §9.4
plfit() KS-minimization for optimal k̂ and α̂ (Clauset et al. 2009) Ch. 9, §9.5
doublebootstrap() Automatic k selection via double bootstrap (Danielsson et al. 2001) Ch. 9, §9.5

Pareto Estimation

Function Description
mle_pareto() Parametric MLE for Pareto(xm, α) with optional bias correction
wls_pareto() Weighted least-squares log-rank regression estimator
ks_xmin() KS-based selection of the optimal xmin threshold

Visual Diagnostics

Function Description
hill_plot() Hill plot: α̂ vs. k to assess stability
moments_plot() Moments estimator plot: ξ̂ vs. k
pickands_plot() Pickands estimator plot: ξ̂ vs. k
rank_plot() Log-rank vs. log-x plot for Pareto linearity
qq_pareto() Pareto Q-Q plot

All plot functions return the underlying data.frame invisibly, so results can be captured for custom plotting with ggplot2 or base R.

Goodness-of-Fit

Function Description
ks_gof() Bootstrap KS test for Pareto fit
lr_test_pareto() Likelihood-ratio test: Pareto vs. alternative distributions

Pareto Utilities

Function Description
rpareto() Generate Pareto(xm, α) random variates
pareto_cdf() Pareto CDF
dpareto() Pareto density

Example: Hill Plot

A Hill plot shows how the tail index estimate changes with k. Stability across a range of k values is evidence that the tail is power-law distributed.

library(heavytails)

set.seed(42)
x <- rpareto(n = 2000, alpha = 1.5, xm = 1)

hill_plot(x, alpha_true = 1.5,
          main = "Hill Plot", col = "steelblue")

The dashed red line (when alpha_true is supplied) marks the true value for simulation studies.

Example: Full Clauset et al. Pipeline

library(heavytails)

set.seed(1)
x <- rpareto(n = 1000, alpha = 2, xm = 1)

# Step 1: estimate xmin and alpha
fit <- plfit(x)

# Step 2: goodness-of-fit test
gof <- ks_gof(x, alpha = fit$alpha, xm = fit$xmin, n_boot = 1000)
gof$p_value   # > 0.1 → cannot reject Pareto

# Step 3: compare against alternative distributions
lr_test_pareto(x, alpha = fit$alpha, xmin = fit$xmin)

Reference

Nair, J., Wierman, A., & Zwart, B. (2022). The Fundamentals of Heavy Tails: Properties, Emergence, and Estimation. Cambridge University Press. doi:10.1017/9781009053730

License

MIT

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.