The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The DEmixR package provides tools for fitting and
evaluating two-component mixture models (normal and
lognormal) using robust global optimization via Differential Evolution
(DEoptim
), with local refinement by quasi-Newton
optimization.
It is designed to be more robust than standard EM-based methods, particularly in challenging situations where the mixture components overlap strongly or have complex likelihood surfaces.
You can install the released version of DEmixR
from CRAN
with:
prelim_plots()
– Diagnostic plots (histogram, QQ, PP,
log-QQ) select_best_mixture()
– Compare lognormal vs normal
mixtures using BIC fit_norm2()
– Fit a two-component
normal mixture fit_lognorm2()
– Fit a
two-component lognormal mixture
bootstrap_mix2()
– Bootstrap mixture parameters (parametric
or nonparametric) evaluate_init()
– Refine initial
parameter values with local optimization
Use prelim_plots()
for initial data exploration and
distribution assessment Use select_best_mixture()
when
unsure whether normal or lognormal
distribution is appropriate Use
fit_norm2()
/fit_lognorm2()
when you know the
appropriate distribution family Use bootstrap_mix2()
for
uncertainty quantification and confidence intervals Use
evaluate_init()
for testing specific starting values or
refining solutions
We first generate synthetic data from a two-component
normal mixture and draw the basic plots with
DEmixR
.
# Load the package
library(DEmixR)
# Simulation data
set.seed(123)
x <- c(rnorm(3000, mean = 0, sd = 1),
rnorm(2000, mean = 3, sd = 1.2))
# Diagnostic plots
prelim_plots(x, which = c("hist", "qq"))
# Automatically choose the best mixture family
best_model <- select_best_mixture(x, n_runs = 3, NP = 50, itermax = 2000)
## | | | 0% | |+++++++++++++++++++++++ | 33% | |+++++++++++++++++++++++++++++++++++++++++++++++ | 67% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## [1] "normal"
## normal
## 19547.24
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## p m1 s1 m2 s2
## 0.599738895 0.005086783 0.983210109 2.984878390 1.181022679
## [1] -9752.326
## [1] 19514.65
## [1] 19547.24
# Simulation data
set.seed(123)
y <- c(rlnorm(3000, meanlog = 0, sdlog = 0.5),
rlnorm(2000, meanlog = 1, sdlog = 0.7))
# Fit a two-component lognormal mixture
fit_ln <- fit_lognorm2(y, n_runs = 5, parallelType = 0)
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
## p m1 s1 m2 s2
## 0.57503793 -0.01180295 0.48857490 0.95267679 0.69951039
# Bootstrap confidence intervals
boot_res <- bootstrap_mix2(fit_ln, B = 100, parametric = TRUE, parallelType = 0)
boot_res$central
## p m1 s1 m2 s2
## 0.57428385 -0.01681622 0.48844615 0.95834689 0.69416098
## p m1 s1 m2 s2
## 2.5% 0.4115206 -0.06678695 0.4343919 0.7041057 0.6095682
## 97.5% 0.7041991 0.05896932 0.5272649 1.2061600 0.7729422
## $success
## [1] TRUE
##
## $par
## [1] 0.57497232 -0.01185789 0.48853997 0.95258083 0.69950326
##
## $logLik
## [1] -7547.12
##
## $convergence
## [1] 0
##
## $message
## [1] "CONVERGENCE: REL_REDUCTION_OF_F <= FACTR*EPSMCH"
# Simulated biomarker data with two populations
set.seed(456)
biomarker <- c(rlnorm(4000, meanlog = 2.5, sdlog = 0.4), # Healthy group
rlnorm(1000, meanlog = 3.8, sdlog = 0.6)) # Disease group
# Initial exploration
prelim_plots(biomarker, which = c("hist", "logqq"))
# Fit lognormal mixture
fit_bio <- try(fit_lognorm2(biomarker, n_runs = 5, parallelType = 0, quiet = 2))
## | | | 0% | |++++++++++++++ | 20% | |++++++++++++++++++++++++++++ | 40% | |++++++++++++++++++++++++++++++++++++++++++ | 60% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++ | 80% | |++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 100%
if (!inherits(fit_bio, "try-error")) {
cat("Biomarker analysis results:\n")
print(fit_bio$par)
# Estimate proportion in each group
cat("\nEstimated proportion in group 1 (healthy):",
round(fit_bio$par[1], 3), "\n")
cat("Estimated proportion in group 2 (disease):",
round(1 - fit_bio$par[1], 3), "\n")
# Bootstrap for confidence intervals
boot_bio <- try(bootstrap_mix2(fit_bio, B = 100, quiet = 2))
if (!inherits(boot_bio, "try-error")) {
cat("\n95% CI for proportion in group 1:\n")
print(boot_bio$ci[, "p"])
}
}
## Biomarker analysis results:
## p m1 s1 m2 s2
## 0.7942352 2.5045852 0.3877220 3.7999468 0.6123135
##
## Estimated proportion in group 1 (healthy): 0.794
## Estimated proportion in group 2 (disease): 0.206
##
## 95% CI for proportion in group 1:
## 2.5% 97.5%
## 0.7543291 0.8244741
This section illustrates some of the optional parameters for fine-tuning performance and diagnostics.
fit_*
# Fit lognormal mixture with custom optimization settings
fit_ln <- fit_lognorm2(
x,
NP = 150, # population size for DEoptim
n_runs = 20, # independent runs to avoid local optima
itermax = 200, # maximum iterations per run
parallelType = 1, # use parallelization across platforms
quiet = 1, # print minimal progress info
par_init = NULL # optionally supply initial values
)
Key options: - NP
: larger = better
exploration, but slower. - n_runs
: increase for more
robustness; with 1000, it is very slow, but have ultra robustness -
parallelType
: 0 = serial
,
1 = socket clusters (Windows/Mac/Linux)
,
2 = forking (Mac/Linux only)
. - quiet
:
0 = verbose
, 1 = minimal
,
2 = silent
.
bootstrap_mix2
boot_res <- bootstrap_mix2(
fit_ln,
B = 500, # number of bootstrap replications
parallelType = 1, # parallel computation
parametric = TRUE, # parametric bootstrap
ci_level = 0.90, # confidence interval level
quiet = 1 # show progress bar
)
boot_res$central # means for normal and medians for lognormal
boot_res$ci # bootstrap confidence intervals
Notes: - Supports both parametric and nonparametric bootstrap. - Uses an internal function to avoid label switching.
prelim_plots
prelim_plots(
x,
which = c("hist", "qq", "pp", "logqq"),
hist_bins = 40,
col_hist = "grey85",
col_density = "darkorange",
col_qq = "grey60",
col_line = "darkorange"
)
Options: - which
: choose one or more of
"hist", "qq", "pp", "logqq"
. - hist_bins
:
number of bins for histogram. - col_*
: customize plot
colors.
select_best_mixture
sel <- select_best_mixture(
x,
n_runs = 5,
NP = 100,
itermax = 150,
quiet = 2
)
sel$best$family
sel$best$par
What it does: - Fits both Normal and Lognormal mixtures. - Compares them via BIC. - Returns the best-performing family.
evaluate_init
init_eval <- evaluate_init(
x,
family = "normal",
n_starts = 10, # number of random initializations
NP = 50, # DEoptim population size
itermax = 100, # maximum iterations
quiet = 2 # suppress output
)
print(init_eval)
Usage: - Helps check the stability of optimization. - Useful to diagnose poor convergence.
The DEmixR package provides robust and flexible tools for mixture model analysis, making it a strong alternative to existing EM-based approaches. Its use of global optimization helps avoid local minima and improve reliability, especially in complex or overlapping mixtures.
Note: Harmless socket connection warnings may appear when using parallelization.
prelim_plots()
to understand your
data distributionselect_best_mixture()
when uncertain about
distribution familyn_runs
for challenging optimization
problemsevaluate_init()
if
convergence is problematic For more information, see the help files for
each function: ?fit_norm2
, ?bootstrap_mix2
,
etc.These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.