Cellwise Robust Multi-Group Gaussian Mixture Model

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

This vignette reproduces the weather example described in Puchhammer, Wilms and Filzmoser (2025). The original data is from Geosphere Austria (2022) and included in this package.

Data Preparation

The original data from GeoSphere Austria (2022) is pre-cleaned and saved in the data frame object weatherAUT2021. Additional information can be found on the helping page.

# get meta data for the data set
? weatherAUT2021

# load the data
data("weatherAUT2021")

# inspect the data
head(weatherAUT2021)
#>        p     s   vv    t rsum  rel             name      lon      lat  alt
#> 1 941.35 154.5 2.15 6.15 49.5 77.5    AIGEN/ENNSTAL 14.13833 47.53278  641
#> 2 945.80 172.5 2.95 6.25 39.5 76.0      ALLENTSTEIG 15.36694 48.69083  599
#> 3 985.50 152.0 2.55 8.20 46.5 78.0        AMSTETTEN 14.89500 48.10889  266
#> 4 893.20 123.5 1.85 5.10 66.0 74.5      BAD GASTEIN 13.13333 47.11055 1092
#> 5 984.50 176.0 1.50 8.45 40.5 75.0 BAD GLEICHENBERG 15.90361 46.87222  269
#> 6 992.05 188.5 2.20 8.95 60.5 77.0  BAD RADKERSBURG 15.99333 46.69222  207

# select variables, station names and number of observations
data = weatherAUT2021 %>% select(p:rel)
stations = weatherAUT2021$name
n = dim(data)[1]

The predefined groups are based in the underlying geographical landscape consisting of Alpine mountains, hills and flatter areas in Austria.

# build 5 groups of observations based on spatial proximity and geography
cut_lon = c(min(weatherAUT2021$lon)-0.2, 12, 16, max(weatherAUT2021$lon) + 0.2)
cut_lat = c(min(weatherAUT2021$lat)-0.2, 48, max(weatherAUT2021$lat) + 0.2)
groups = ssMRCD::groups_gridbased(weatherAUT2021$lon, 
                                  weatherAUT2021$lat, 
                                  cut_lon, 
                                  cut_lat)
N = length(unique(groups))
table(groups)
#> groups
#>  1  2  3  4  5 
#> 31 80 35 16 21

# calculate MG-GMM
model = cellMGGMM(X = data, groups = groups,
                  nsteps = 100, alpha = 0.5,
                  maxcond = 100)
#> Iteration finished due to computational inaccuracies (increase of objective function in last step of size: 0.0001193831180899). Check convergence plot.

# mixture probabilities
cat("Pi (in %):\n")
#> Pi (in %):
round(model$pi_groups*100, 2)
#>        1     2     3     4     5
#> 1 100.00  0.00  0.00  0.00  0.00
#> 2  24.97 71.95  0.00  3.08  0.00
#> 3   0.00  0.00 94.02  5.98  0.00
#> 4  12.48  0.00  5.83 64.57 17.11
#> 5   0.00  0.00  9.01 15.35 75.64

# percentage of outliers
cat("% Outliers per group and variable:\n")
#> % Outliers per group and variable:
round(sapply(1:N, function(x) colMeans(1-model$W[groups == x, ]))*100, 2)
#>      [,1]  [,2] [,3] [,4] [,5]
#> p    6.45  3.75 2.86 0.00 0.00
#> s    0.00  2.50 0.00 0.00 0.00
#> vv   6.45 12.50 8.57 6.25 9.52
#> t    6.45  3.75 0.00 0.00 4.76
#> rsum 0.00  0.00 2.86 0.00 4.76
#> rel  0.00  2.50 0.00 6.25 4.76

# calculate residuals
res = residuals_mggmm(X = data, 
                groups = groups,
                Sigma = model$Sigma,
                mu = model$mu, 
                probs = model$probs,
                W = model$W)

References

GeoSphere Austria (2022): https://data.hub.geosphere.at.

Puchhammer P., Wilms I. and Filzmoser P. (2025): A smooth multi-group Gaussian Mixture Model for cellwise robust covariance estimation. https://doi.org/10.48550/arXiv.2504.02547

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.