The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
We want to design a structure that incorporates all these features.
compare_methods() function, then unpacks what the results
mean.
We use the bundled steel_industry dataset: one full year
of 15-minute energy measurements from a Korean steel plant, including
reactive power, power factor, CO2 emissions, and time-of-day
indicators.
Three preprocessing steps:
steel_industry doesn’t ship with explicit class labels,
but Usage_kWh gives us natural ones: low, medium, and high
consumption regimes, defined by tertile cutoffs. We use these as a
yardstick for evaluating how meaningful each clustering result is.
true_labels <- label_by_quantile(data_thin$Usage_kWh,
probs = c(1/3, 2/3))
table(true_labels)
#> true_labels
#> 1 2 3
#> 1184 1152 1168Each class has roughly N/3 observations.
compare_methods() takes a named list of feature
extractors. Each is just a function that takes the raw data and returns
a numeric feature matrix.
feature_methods <- list(
pca_only = function(d) {
pca <- prcomp(d, center = FALSE, scale. = FALSE)
pca$x[, 1:3]
},
pca_circular = function(d) {
pca <- prcomp(d, center = FALSE, scale. = FALSE)
phase <- compute_phase(d, axis = "feature")
circ <- extract_circular_features(phase)
cbind(pca$x[, 1:3], circ)
}
)We try DBSCAN with two different parameter settings: one with a larger neighborhood radius (loose) and one with a smaller one (tight). This is a parameter sweep disguised as a method comparison.
compare_methods() runs every combination, evaluates each
with the requested metrics, and returns a single comparison table.
comparison <- compare_methods(
data = data_scaled,
feature_methods = feature_methods,
cluster_methods = cluster_methods,
metrics = c("dbi", "accuracy", "n_clusters", "n_noise"),
true_labels = true_labels,
normalize = NULL,
verbose = FALSE
)
print(comparison)
#> feature_method cluster_method dbi accuracy n_clusters n_noise
#> 1 pca_only dbscan_loose 0.9056772 0.5907534 4 15
#> 2 pca_only dbscan_tight 0.6312147 0.7285959 13 56
#> 3 pca_circular dbscan_loose 0.5957794 0.7619863 14 126
#> 4 pca_circular dbscan_tight 0.8596812 0.7796804 38 254Four rows, one per combination, four metrics each. Now to read it.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.