The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Introduction to DEmixR

Farrokh Habibzadeh

2025-09-21

Introduction

The DEmixR package provides tools for fitting and evaluating two-component mixture models (normal and lognormal) using robust global optimization via Differential Evolution (DEoptim), with local refinement by quasi-Newton optimization.

It is designed to be more robust than standard EM-based methods, particularly in challenging situations where the mixture components overlap strongly or have complex likelihood surfaces.

Installation

You can install the released version of DEmixR from CRAN with:

install.packages("DEmixR")

Overview of Functions

prelim_plots() – Diagnostic plots (histogram, QQ, PP, log-QQ) select_best_mixture() – Compare lognormal vs normal mixtures using BIC fit_norm2() – Fit a two-component normal mixture fit_lognorm2() – Fit a two-component lognormal mixture bootstrap_mix2() – Bootstrap mixture parameters (parametric or nonparametric) evaluate_init() – Refine initial parameter values with local optimization

When to Use Which Function

Use prelim_plots() for initial data exploration and distribution assessment Use select_best_mixture() when unsure whether normal or lognormal distribution is appropriate Use fit_norm2()/fit_lognorm2() when you know the appropriate distribution family Use bootstrap_mix2() for uncertainty quantification and confidence intervals Use evaluate_init() for testing specific starting values or refining solutions

Two-Component Normal Mixture

We first generate synthetic data from a two-component normal mixture and draw the basic plots with DEmixR.

Diagnostic Plots

# Load the package
library(DEmixR)

# Simulation data
set.seed(123)
x <- c(rnorm(3000, mean = 0, sd = 1),
       rnorm(2000, mean = 3, sd = 1.2))

# Diagnostic plots
prelim_plots(x, which = c("hist", "qq"))

Model Selection

# Automatically choose the best mixture family
best_model <- select_best_mixture(x, n_runs = 3, NP = 50, itermax = 2000)

##   |                                                                              |                                                                      |   0%  |                                                                              |+++++++++++++++++++++++                                               |  33%  |                                                                              |+++++++++++++++++++++++++++++++++++++++++++++++                       |  67%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

best_model$best$family

## [1] "normal"

best_model$BICs

##   normal 
## 19547.24

Normal Mixture

# Fit a two-component normal mixture
fit <- fit_norm2(x, n_runs = 5, quiet = 2, parallelType = 0)

##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

fit$par      # estimated parameters

##           p          m1          s1          m2          s2 
## 0.599738895 0.005086783 0.983210109 2.984878390 1.181022679

fit$logLik   # log-likelihood

## [1] -9752.326

fit$AIC      # Akaike Information Criterion

## [1] 19514.65

fit$BIC      # Bayesian Information Criterion

## [1] 19547.24

Lognormal Mixture

# Simulation data
set.seed(123)
y <- c(rlnorm(3000, meanlog = 0, sdlog = 0.5),
       rlnorm(2000, meanlog = 1, sdlog = 0.7))

# Fit a two-component lognormal mixture
fit_ln <- fit_lognorm2(y, n_runs = 5, parallelType = 0)

##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

fit_ln$par

##           p          m1          s1          m2          s2 
##  0.57503793 -0.01180295  0.48857490  0.95267679  0.69951039

Bootstrap

# Bootstrap confidence intervals
boot_res <- bootstrap_mix2(fit_ln, B = 100, parametric = TRUE, parallelType = 0)
boot_res$central

##           p          m1          s1          m2          s2 
##  0.57428385 -0.01681622  0.48844615  0.95834689  0.69416098

boot_res$ci

##               p          m1        s1        m2        s2
## 2.5%  0.4115206 -0.06678695 0.4343919 0.7041057 0.6095682
## 97.5% 0.7041991  0.05896932 0.5272649 1.2061600 0.7729422

Evaluate Initialization Values

evaluate_init(par_init = c(0.5, 0, 0.6, 1.1, 0.8), y, family = "lognormal")

## $success
## [1] TRUE
## 
## $par
## [1]  0.57497232 -0.01185789  0.48853997  0.95258083  0.69950326
## 
## $logLik
## [1] -7547.12
## 
## $convergence
## [1] 0
## 
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"

Practical Example: Biomarker Data Analysis

# Simulated biomarker data with two populations
set.seed(456)
biomarker <- c(rlnorm(4000, meanlog = 2.5, sdlog = 0.4),  # Healthy group
               rlnorm(1000, meanlog = 3.8, sdlog = 0.6))  # Disease group

# Initial exploration
prelim_plots(biomarker, which = c("hist", "logqq"))

# Fit lognormal mixture
fit_bio <- try(fit_lognorm2(biomarker, n_runs = 5, parallelType = 0, quiet = 2))

##   |                                                                              |                                                                      |   0%  |                                                                              |++++++++++++++                                                        |  20%  |                                                                              |++++++++++++++++++++++++++++                                          |  40%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++                            |  60%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++              |  80%  |                                                                              |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%

if (!inherits(fit_bio, "try-error")) {
  cat("Biomarker analysis results:\n")
  print(fit_bio$par)
  
  # Estimate proportion in each group
  cat("\nEstimated proportion in group 1 (healthy):", 
      round(fit_bio$par[1], 3), "\n")
  cat("Estimated proportion in group 2 (disease):", 
      round(1 - fit_bio$par[1], 3), "\n")
  
  # Bootstrap for confidence intervals
  boot_bio <- try(bootstrap_mix2(fit_bio, B = 100, quiet = 2))
  if (!inherits(boot_bio, "try-error")) {
    cat("\n95% CI for proportion in group 1:\n")
    print(boot_bio$ci[, "p"])
  }
}

## Biomarker analysis results:
##         p        m1        s1        m2        s2 
## 0.7942352 2.5045852 0.3877220 3.7999468 0.6123135 
## 
## Estimated proportion in group 1 (healthy): 0.794 
## Estimated proportion in group 2 (disease): 0.206 
## 
## 95% CI for proportion in group 1:
##      2.5%     97.5% 
## 0.7543291 0.8244741

Advanced Usage

This section illustrates some of the optional parameters for fine-tuning performance and diagnostics.

Fitting with `fit_*`

# Fit lognormal mixture with custom optimization settings
fit_ln <- fit_lognorm2(
  x,
  NP = 150,           # population size for DEoptim
  n_runs = 20,        # independent runs to avoid local optima
  itermax = 200,      # maximum iterations per run
  parallelType = 1,   # use parallelization across platforms
  quiet = 1,          # print minimal progress info
  par_init = NULL     # optionally supply initial values
)

Key options: - NP: larger = better exploration, but slower. - n_runs: increase for more robustness; with 1000, it is very slow, but have ultra robustness - parallelType: 0 = serial, 1 = socket clusters (Windows/Mac/Linux), 2 = forking (Mac/Linux only). - quiet: 0 = verbose, 1 = minimal, 2 = silent.

Bootstrapping with `bootstrap_mix2`

boot_res <- bootstrap_mix2(
  fit_ln,
  B = 500,            # number of bootstrap replications
  parallelType = 1,   # parallel computation
  parametric = TRUE,  # parametric bootstrap
  ci_level = 0.90,    # confidence interval level
  quiet = 1           # show progress bar
)

boot_res$central   # means for normal and medians for lognormal
boot_res$ci        # bootstrap confidence intervals

Notes: - Supports both parametric and nonparametric bootstrap. - Uses an internal function to avoid label switching.

Preliminary Plots with `prelim_plots`

prelim_plots(
  x,
  which = c("hist", "qq", "pp", "logqq"),
  hist_bins = 40,
  col_hist = "grey85",
  col_density = "darkorange",
  col_qq = "grey60",
  col_line = "darkorange"
)

Options: - which: choose one or more of "hist", "qq", "pp", "logqq". - hist_bins: number of bins for histogram. - col_*: customize plot colors.

Model Selection with `select_best_mixture`

sel <- select_best_mixture(
  x,
  n_runs = 5,
  NP = 100,
  itermax = 150,
  quiet = 2
)

sel$best$family
sel$best$par

What it does: - Fits both Normal and Lognormal mixtures. - Compares them via BIC. - Returns the best-performing family.

Evaluating Initial Values with `evaluate_init`

init_eval <- evaluate_init(
  x,
  family = "normal",
  n_starts = 10,      # number of random initializations
  NP = 50,            # DEoptim population size
  itermax = 100,      # maximum iterations
  quiet = 2           # suppress output
)

print(init_eval)

Usage: - Helps check the stability of optimization. - Useful to diagnose poor convergence.

Summary

The DEmixR package provides robust and flexible tools for mixture model analysis, making it a strong alternative to existing EM-based approaches. Its use of global optimization helps avoid local minima and improve reliability, especially in complex or overlapping mixtures.

Note: Harmless socket connection warnings may appear when using parallelization.

Key Features

Robust estimation: Differential Evolution optimization avoids local minima
Comprehensive diagnostics: Visualization, model selection, and bootstrap uncertainty
Flexible: Supports both normal and lognormal distributions
User-friendly: Sensible defaults with options for customization
Efficient: Optional parallel computation for faster results

Best Practices

Always start with prelim_plots() to understand your data distribution
Use select_best_mixture() when uncertain about distribution family
Increase n_runs for challenging optimization problems
Use bootstrap methods for uncertainty quantification
Test different initial values with evaluate_init() if convergence is problematic For more information, see the help files for each function: ?fit_norm2, ?bootstrap_mix2, etc.

References

Mullen, K.M., et al. (2011). DEoptim: An R Package for Global Optimization by Differential Evolution. Journal of Statistical Software, 40(6), 1-26.
R Core Team (2024). R: A Language and Environment for Statistical Computing.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.