The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
This vignette reproduces the weather example described in Puchhammer, Wilms and Filzmoser (2025). The original data is from Geosphere Austria (2022) and included in this package.
library(ssMRCD)
library(ggplot2)
library(dplyr)
The original data from GeoSphere Austria (2022) is pre-cleaned and
saved in the data frame object weatherAUT2021
. Additional
information can be found on the helping page.
# get meta data for the data set
? weatherAUT2021
# load the data
data("weatherAUT2021")
# inspect the data
head(weatherAUT2021)
#> p s vv t rsum rel name lon lat alt
#> 1 941.35 154.5 2.15 6.15 49.5 77.5 AIGEN/ENNSTAL 14.13833 47.53278 641
#> 2 945.80 172.5 2.95 6.25 39.5 76.0 ALLENTSTEIG 15.36694 48.69083 599
#> 3 985.50 152.0 2.55 8.20 46.5 78.0 AMSTETTEN 14.89500 48.10889 266
#> 4 893.20 123.5 1.85 5.10 66.0 74.5 BAD GASTEIN 13.13333 47.11055 1092
#> 5 984.50 176.0 1.50 8.45 40.5 75.0 BAD GLEICHENBERG 15.90361 46.87222 269
#> 6 992.05 188.5 2.20 8.95 60.5 77.0 BAD RADKERSBURG 15.99333 46.69222 207
# select variables, station names and number of observations
= weatherAUT2021 %>% select(p:rel)
data = weatherAUT2021$name
stations = dim(data)[1] n
The predefined groups are based in the underlying geographical landscape consisting of Alpine mountains, hills and flatter areas in Austria.
# build 5 groups of observations based on spatial proximity and geography
= c(min(weatherAUT2021$lon)-0.2, 12, 16, max(weatherAUT2021$lon) + 0.2)
cut_lon = c(min(weatherAUT2021$lat)-0.2, 48, max(weatherAUT2021$lat) + 0.2)
cut_lat = ssMRCD::groups_gridbased(weatherAUT2021$lon,
groups $lat,
weatherAUT2021
cut_lon,
cut_lat)= length(unique(groups))
N table(groups)
#> groups
#> 1 2 3 4 5
#> 31 80 35 16 21
# calculate MG-GMM
= cellMGGMM(X = data, groups = groups,
model nsteps = 100, alpha = 0.5,
maxcond = 100)
#> Iteration finished due to computational inaccuracies (increase of objective function in last step of size: 0.0001193831180899). Check convergence plot.
# mixture probabilities
cat("Pi (in %):\n")
#> Pi (in %):
round(model$pi_groups*100, 2)
#> 1 2 3 4 5
#> 1 100.00 0.00 0.00 0.00 0.00
#> 2 24.97 71.95 0.00 3.08 0.00
#> 3 0.00 0.00 94.02 5.98 0.00
#> 4 12.48 0.00 5.83 64.57 17.11
#> 5 0.00 0.00 9.01 15.35 75.64
# percentage of outliers
cat("% Outliers per group and variable:\n")
#> % Outliers per group and variable:
round(sapply(1:N, function(x) colMeans(1-model$W[groups == x, ]))*100, 2)
#> [,1] [,2] [,3] [,4] [,5]
#> p 6.45 3.75 2.86 0.00 0.00
#> s 0.00 2.50 0.00 0.00 0.00
#> vv 6.45 12.50 8.57 6.25 9.52
#> t 6.45 3.75 0.00 0.00 4.76
#> rsum 0.00 0.00 2.86 0.00 4.76
#> rel 0.00 2.50 0.00 6.25 4.76
# calculate residuals
= residuals_mggmm(X = data,
res groups = groups,
Sigma = model$Sigma,
mu = model$mu,
probs = model$probs,
W = model$W)
GeoSphere Austria (2022): https://data.hub.geosphere.at.
Puchhammer P., Wilms I. and Filzmoser P. (2025): A smooth multi-group Gaussian Mixture Model for cellwise robust covariance estimation. https://doi.org/10.48550/arXiv.2504.02547
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.