Getting Started with cyclicwave

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

A modular toolkit for clustering time series data and detecting anomalies using classical, wavelet-based, Hilbert-based, and circular feature extraction methods. It supports DBSCAN, OPTICS clustering with consistent output formats and provides a comparison function that allows users to compare multiple feature/algorithm combinations with a single call.

We use the bundled power_consumption dataset, recorded at 10-minute intervals across three urban zones in Tetouan, Morocco.

library(cyclicwave)
data(power_consumption)

The data

Each row is a single time point. The last three columns are the zone-wise power consumption signals; the rest are weather variables we will ignore in this example.

dim(power_consumption)
#> [1] 13906     9
head(power_consumption, 3)
#>        Datetime Temperature Humidity WindSpeed GeneralDiffuseFlows DiffuseFlows
#> 1 1/1/2017 0:00       6.559     73.8     0.083               0.051        0.119
#> 2 1/1/2017 0:10       6.414     74.5     0.083               0.070        0.085
#> 3 1/1/2017 0:20       6.313     74.5     0.080               0.062        0.100
#>   PowerConsumption_Zone1 PowerConsumption_Zone2 PowerConsumption_Zone3
#> 1               34055.70               16128.88               20240.96
#> 2               29814.68               19375.08               20131.08
#> 3               29128.10               19006.69               19668.43

For this walkthrough we will work with a 1000-row slice to keep everything fast. The exact same code runs on the full dataset; it just takes longer.

pwr <- power_consumption[1:1000, ]
zones_matrix <- as.matrix(pwr[, 7:9])

Step 1: reshape into long format

DBSCAN expects a 2D matrix where each row is one observation. We flatten it and attach a zone identifier per row.

flat <- flatten_with_zones(zones_matrix)
length(flat$values)   
#> [1] 3000
table(flat$zones)    
#> 
#>    1    2    3 
#> 1000 1000 1000

After this step we have a single long vector with 3000 values and a matching zones vector of identifiers.

Step 2: extract rolling features

Each observation needs more than a single value to be informative. We compute rolling mean and standard deviation over a 10-point window

rolling <- rolling_stats(zones_matrix,
                         window_size = 10,
                         stats = c("mean", "sd"))

rolling_stats returns a list of matrices. We flatten each to align with our long-format values.

raw_features <- cbind(
  zone  = flat$zones,
  value = flat$values,
  mavg  = as.vector(rolling$mean),
  sd    = as.vector(rolling$sd)
)
head(raw_features, 3)
#>      zone    value     mavg       sd
#> [1,]    1 34055.70 29712.61 2601.235
#> [2,]    1 29814.68 29197.97 2646.171
#> [3,]    1 29128.10 28740.98 2701.318

The first column is the zone identifier; it is metadata, not a feature. We will exclude it from clustering and normalization.

Step 3: normalize

DBSCAN is distance-based, so feature scales matter.

raw_features[, 2:4] <- normalize_features(raw_features[, 2:4],
                                          method = "zscore")

Step 4: choose epsilon (visual heuristic)

DBSCAN needs an eps parameter: the neighborhood radius. The k-distance plot is the standard visual heuristic. We look for an elbow in the sorted distances curve.

plot_k_distance(raw_features[, 2:4], k = 7)

Step 5: run DBSCAN

result <- run_dbscan(raw_features[, 2:4],
                     eps = 0.3,
                     min_pts = 7)

result$n_clusters
#> [1] 3
result$n_noise
#> [1] 63

The result is a list with a standardized structure.

Step 6: evaluate

The Davies-Bouldin Index summarizes how compact and separated the clusters are. Lower values are better.

davies_bouldin(raw_features[, 2:4], result$cluster)
#> [1] 0.4434717

We can visualize the partition by projecting onto the first two principal components and coloring by cluster.

plot_clusters_pca(raw_features[, 2:4], result$cluster)

For function-level reference, see the help pages, e.g. ?run_dbscan.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.