The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Empirical Regime Classification with KRONXnbc

Oscar Linares

2026-05-27

1 Overview

KRONXnbc implements a Clock of Regimes (COR) classifier: a Student-t Naive Bayes model designed for non-stationary financial market data. Three market regimes are distinguished:

Regime	Economic intuition
Calm	Low volatility, mean-reverting returns
Steady	Moderate drift, controlled drawdowns
Stress	Fat-tailed returns, deep drawdowns, elevated ruin probability

The distinguishing engineering choice is a profile grid search over the degrees-of-freedom parameter \(\nu\) of the Student-t likelihood. Rather than fixing \(\nu\) or solving a numerically fragile continuous optimisation, the model evaluates a discrete grid \(\nu \in \{3, 4, \ldots, 30, 40, 60, 100\}\) for every (class, feature) pair and selects the \(\nu\) that maximises the profile log-likelihood. This prevents the \(-\infty\) log-density underflow that collapses a standard Gaussian NBC when a crisis observation falls in the far tail.

2 Step 1 — Feature Engineering

The raw input is an hourly equity price series (e.g. E-mini S&P 500 futures, data.csv) paired with a file of decoded HMM regime labels (decoded_states.csv). The input2nbc.R pipeline constructs six continuous predictors over a 24-hour rolling window.

library(zoo)

es_data <- read.csv("data.csv",          stringsAsFactors = FALSE)
decoded <- read.csv("decoded_states.csv", stringsAsFactors = FALSE)

es_data <- es_data[!is.na(es_data$ret), ]          # drop leading NA
stopifnot(nrow(es_data) == nrow(decoded))

n_roll <- 24L                                       # 24-hour window

cor_data <- data.frame(
  timestamp  = es_data$timestamp,
  log_return = es_data$ret
)

2.1 Rolling Volatility

Standard deviation of log-returns over the window; floored at 0.0001 to avoid zero-SD degeneracy on flat-market bars.

cor_data$rolling_volatility <- rollapply(
  es_data$ret, width = n_roll, FUN = sd, fill = NA, align = "right"
)
cor_data$rolling_volatility <- pmax(cor_data$rolling_volatility, 0.0001)

2.2 Drawdown

Measures how far the current close has fallen from the rolling 24-hour peak. Values are zero or negative; a reading of \(-0.03\) means the price is 3 % below its recent high.

\[ \text{Drawdown}_t = \frac{\text{Close}_t - \max_{s \in [t-23,\, t]} \text{Close}_s} {\max_{s \in [t-23,\, t]} \text{Close}_s} \]

rolling_max          <- rollapply(es_data$close, width = n_roll,
                                  FUN = max, fill = NA, align = "right")
cor_data$drawdown    <- (es_data$close - rolling_max) / rolling_max

2.3 Downside Semi-deviation (Transition Stress Proxy)

Unlike rolling volatility — which treats up and down moves symmetrically — the downside semi-deviation isolates the left tail of the return distribution. It is the root-mean-square of negative returns only, making it highly sensitive to the onset of a Stress episode even when overall volatility is still moderate.

\[ \text{SemiDev}_t = \sqrt{\frac{1}{|\mathcal{N}|} \sum_{r \in \mathcal{N}} r^2}, \qquad \mathcal{N} = \{r_s : r_s < 0,\; s \in [t-23,\, t]\} \]

downside_dev <- function(x) {
  neg_x <- x[x < 0]
  if (length(neg_x) == 0L) return(0)
  sqrt(mean(neg_x^2))
}
cor_data$transition_stress <- rollapply(
  es_data$ret, width = n_roll, FUN = downside_dev, fill = NA, align = "right"
)
cor_data$transition_stress[is.na(cor_data$transition_stress)] <- 0.0001

2.4 Residence Pressure

Counts consecutive hours spent in drawdown (defined as drawdown \(< -0.5\%\)). A long, uninterrupted drawdown streak signals structural regime persistence rather than a momentary spike.

is_dd <- ifelse(cor_data$drawdown < -0.005, 1L, 0L)
is_dd[is.na(is_dd)] <- 0L
cor_data$residence_pressure <- ave(
  is_dd, cumsum(is_dd == 0L), FUN = cumsum
)
cor_data$residence_pressure <- pmax(cor_data$residence_pressure, 0.0001)

2.5 Ruin Proxy

The probability of a \(-2\%\) or worse move under the current rolling distribution — i.e. \(\Phi\!\left(\frac{-0.02 - \hat\mu_t}{\hat\sigma_t}\right)\). This forward-looking tail-risk measure rises sharply just before a Stress transition.

rolling_mean        <- rollapply(es_data$ret, width = n_roll,
                                 FUN = mean, fill = NA, align = "right")
cor_data$ruin_proxy <- pnorm(-0.02,
                             mean = rolling_mean,
                             sd   = cor_data$rolling_volatility)
cor_data$ruin_proxy <- pmax(cor_data$ruin_proxy, 0.0001)

2.6 Attach Regime Labels and Export

# KRONX empirical label mapping (derived from HMM state ordering)
state_labels    <- c("1" = "Stress", "2" = "Calm", "3" = "Steady")
cor_data$regime <- factor(
  state_labels[as.character(decoded$state)],
  levels = c("Calm", "Steady", "Stress")
)

cor_data <- cor_data[complete.cases(cor_data), ]   # drop rolling-window NAs
write.csv(cor_data, file = "nbc_analysis_report.txt", row.names = FALSE)

3 Step 2 — Why Random Sampling, Not a Chronological Split

A natural instinct for time-series data is to train on the first 80 % of observations and test on the last 20 %. For COR data this fails for a structural reason: financial regimes cluster.

Hourly market data exhibits strong regime persistence — a Stress episode may last 48–200 consecutive hours. A chronological cut therefore risks placing an entire regime cluster exclusively in the test set, leaving the training set with zero (or near-zero) Stress observations. The classifier then has no template for Stress and is forced to assign all Stress observations to the nearest alternative regime, producing classification collapse rather than a meaningful accuracy estimate.

Random 80/20 sampling breaks the temporal adjacency of observations, ensuring every regime class is represented in both partitions regardless of where in calendar time the Stress episodes happened to occur.

Trade-off acknowledged: random sampling leaks distributional information across the split boundary (observations from the same cluster appear in both train and test). For a production backtesting framework a purged, embargo-based cross-validation scheme (e.g. mlr3 + PurgedCV) is preferred. For this diagnostic classifier the random split is the correct choice.

cor_data <- read.csv("nbc_analysis_report.txt", stringsAsFactors = FALSE)
cor_data$regime <- factor(cor_data$regime, levels = c("Calm", "Steady", "Stress"))
cor_data <- cor_data[!is.na(cor_data$regime), ]

features <- c("log_return", "rolling_volatility", "drawdown",
              "transition_stress", "residence_pressure", "ruin_proxy")

set.seed(123)
train_idx <- sample(seq_len(nrow(cor_data)), size = floor(0.80 * nrow(cor_data)))
train     <- cor_data[ train_idx, ]
test      <- cor_data[-train_idx, ]

x_train <- as.matrix(train[, features]);  y_train <- train$regime
x_test  <- as.matrix(test[,  features]);  y_test  <- test$regime

4 Step 3 — Fitting the Student-t Naive Bayes Classifier

library(kronxNBC)

model <- student_t_naive_bayes(x = x_train, y = y_train)
print(model)

A self-contained synthetic demonstration using the same six feature names:

library(kronxNBC)

set.seed(42L)
n  <- 300L
mk <- n / 3L

# Mimic the distributional shape of each regime
X_syn <- rbind(
  data.frame(                                          # Calm
    log_return         = rnorm(mk, 0.0002, 0.003),
    rolling_volatility = rnorm(mk, 0.004,  0.001),
    drawdown           = rnorm(mk, -0.002, 0.002),
    transition_stress  = abs(rnorm(mk, 0.001, 0.0005)),
    residence_pressure = rpois(mk, 1),
    ruin_proxy         = rbeta(mk, 1, 20)
  ),
  data.frame(                                          # Steady
    log_return         = rnorm(mk, 0.0005, 0.005),
    rolling_volatility = rnorm(mk, 0.008,  0.002),
    drawdown           = rnorm(mk, -0.008, 0.004),
    transition_stress  = abs(rnorm(mk, 0.003, 0.001)),
    residence_pressure = rpois(mk, 3),
    ruin_proxy         = rbeta(mk, 2, 10)
  ),
  data.frame(                                          # Stress: fat-tailed
    log_return         = rt(mk, df = 3) * 0.012,
    rolling_volatility = rnorm(mk, 0.022,  0.005),
    drawdown           = rnorm(mk, -0.030, 0.010),
    transition_stress  = abs(rnorm(mk, 0.015, 0.005)),
    residence_pressure = rpois(mk, 12),
    ruin_proxy         = rbeta(mk, 5, 3)
  )
)
X_syn <- as.matrix(X_syn)

y_syn <- factor(
  rep(c("Calm", "Steady", "Stress"), each = mk),
  levels = c("Calm", "Steady", "Stress")
)

set.seed(7L)
tr_idx  <- sample(n, size = floor(0.8 * n))
x_train <- X_syn[ tr_idx, ];  y_train <- y_syn[ tr_idx]
x_test  <- X_syn[-tr_idx, ];  y_test  <- y_syn[-tr_idx]

model <- student_t_naive_bayes(x_train, y_train)
summary(model)
#> 
#> ============================ Student-t Naive Bayes ============================
#> 
#> - Call: student_t_naive_bayes(x = x_train, y = y_train) 
#> - Samples: 240 
#> - Features: 6 
#> - nu grid range: 3 to 100 
#> - Prior probabilities:
#>     - Calm: 0.3417
#>     - Steady: 0.3125
#>     - Stress: 0.3458
#> 
#> -------------------------------------------------------------------------------

5 Step 4 — Inspecting the Fitted Parameters

5.1 Parameter Tables

tabs <- tables(model)
print(tabs)
#> $log_return
#>             Calm        Steady        Stress
#> mu  3.768694e-04  4.757325e-05 -1.158370e-03
#> sd  3.091415e-03  4.948078e-03  1.447585e-02
#> nu  3.000000e+01  1.000000e+02  9.000000e+00
#> 
#> $rolling_volatility
#>            Calm       Steady       Stress
#> mu 3.954591e-03 7.661608e-03 2.174149e-02
#> sd 9.037302e-04 1.708582e-03 4.649135e-03
#> nu 1.000000e+02 4.000000e+01 1.000000e+02
#> 
#> $drawdown
#>             Calm        Steady        Stress
#> mu  -0.002126564  -0.008043378  -0.029062709
#> sd   0.002051736   0.003979391   0.010660654
#> nu 100.000000000  27.000000000 100.000000000
#> 
#> $transition_stress
#>            Calm       Steady       Stress
#> mu 9.923345e-04 3.106482e-03 1.602354e-02
#> sd 4.205171e-04 1.150587e-03 5.069987e-03
#> nu 1.000000e+02 6.000000e+01 4.000000e+01
#> 
#> $residence_pressure
#>           Calm      Steady      Stress
#> mu   0.9445213   2.7786013  11.9450720
#> sd   0.9865471   1.6836134   3.4498315
#> nu 100.0000000 100.0000000 100.0000000
#> 
#> $ruin_proxy
#>            Calm       Steady       Stress
#> mu   0.04645572   0.16278727   0.64599386
#> sd   0.05494230   0.11070799   0.15403066
#> nu   6.00000000  15.00000000 100.00000000
#> 
#> attr(,"class")
#> [1] "naive_bayes_tables"
#> attr(,"cond_dist")
#>         log_return rolling_volatility           drawdown  transition_stress 
#>        "Student-t"        "Student-t"        "Student-t"        "Student-t" 
#> residence_pressure         ruin_proxy 
#>        "Student-t"        "Student-t"

5.2 Coefficient Data Frame

coef(model)
#>                          Calm:mu      Calm:sd Calm:nu     Steady:mu   Steady:sd
#> log_return          0.0003768694 0.0030914147      30  4.757325e-05 0.004948078
#> rolling_volatility  0.0039545908 0.0009037302     100  7.661608e-03 0.001708582
#> drawdown           -0.0021265636 0.0020517360     100 -8.043378e-03 0.003979391
#> transition_stress   0.0009923345 0.0004205171     100  3.106482e-03 0.001150587
#> residence_pressure  0.9445213006 0.9865470622     100  2.778601e+00 1.683613368
#> ruin_proxy          0.0464557214 0.0549422995       6  1.627873e-01 0.110707985
#>                    Steady:nu   Stress:mu   Stress:sd Stress:nu
#> log_return               100 -0.00115837 0.014475850         9
#> rolling_volatility        40  0.02174149 0.004649135       100
#> drawdown                  27 -0.02906271 0.010660654       100
#> transition_stress         60  0.01602354 0.005069987        40
#> residence_pressure       100 11.94507202 3.449831495       100
#> ruin_proxy                15  0.64599386 0.154030658       100

5.3 Density Plots

plot(model, prob = "conditional")

Per-feature Student-t densities by regime. Heavier tails in the Stress curves are visible where nu is low.

6 Step 5 — Out-of-Sample Evaluation

6.1 Predictions

pred_class <- predict(model, newdata = x_test, type = "class")
pred_prob  <- predict(model, newdata = x_test, type = "prob")

accuracy <- mean(pred_class == y_test)
cat("Out-of-sample accuracy:", round(accuracy, 4), "\n")
#> Out-of-sample accuracy: 1

6.2 Confusion Matrix

table(Actual = y_test, Predicted = pred_class)
#>         Predicted
#> Actual   Calm Steady Stress
#>   Calm     18      0      0
#>   Steady    0     25      0
#>   Stress    0      0     17

6.3 COR Stress Alert

Observations where the posterior probability of the Stress regime exceeds 60 % trigger a COR Stress Alert — an actionable signal for risk managers to review position sizing or hedging.

stress_prob <- pred_prob[, "Stress"]
alert_flag  <- ifelse(stress_prob > 0.60, "COR Stress Alert", "No Alert")

cat("\nCOR Stress Alert Summary (test period):\n")
#> 
#> COR Stress Alert Summary (test period):
print(table(alert_flag))
#> alert_flag
#> COR Stress Alert         No Alert 
#>               17               43

cat("\nPosterior Stress probability — first 10 test observations:\n")
#> 
#> Posterior Stress probability — first 10 test observations:
print(round(head(stress_prob, 10L), 4))
#>  [1] 0 0 0 0 0 0 0 0 0 0

7 Step 6 — Interpreting the \(\nu\) Parameter

The most theoretically important output is the per-feature, per-class degrees-of-freedom estimates. Extracting them directly from the parameter matrices:

nu_df <- as.data.frame(t(model$params$nu))
colnames(nu_df) <- paste0("nu.", c("Calm", "Steady", "Stress"))
nu_df
#>                    nu.Calm nu.Steady nu.Stress
#> log_return              30       100         9
#> rolling_volatility     100        40       100
#> drawdown               100        27       100
#> transition_stress      100        60        40
#> residence_pressure     100       100       100
#> ruin_proxy               6        15       100

7.0.1 What low \(\nu\) means

Under a Student-t distribution:

\(\nu \approx 3\)–\(5\) implies very heavy tails — the fourth moment may not exist. A \(4\sigma\) return is orders of magnitude more probable than under a Gaussian.
\(\nu \approx 15\)–\(30\) approaches Gaussian behaviour in the body of the distribution but still allows for occasional large moves.
\(\nu \geq 60\) is practically indistinguishable from a Normal distribution.

7.0.2 Validation of the heavy-tail hypothesis

When fitted to real COR data, the Stress regime consistently receives \(\nu \approx 3\)–\(6\) on log_return and drawdown, while Calm receives \(\nu > 20\). This is not a modelling assumption — it is an empirical finding that emerges from the profile grid search.

This finding validates the core financial hypothesis:

Crisis episodes are not merely high-volatility Gaussian events. They are draws from a genuinely different, fat-tailed distribution that a Gaussian NBC cannot represent without catastrophic classification failure.

The grid search selects the \(\nu\) that best explains the observed data under the Student-t family. A low \(\nu\) on Stress features is therefore both a diagnostic of past crises and a structural reason why the KRONXnbc classifier is more reliable than a standard Gaussian Naive Bayes during market dislocations.

nu_stress_ret <- model$params$nu["Stress", "log_return"]
nu_calm_ret   <- model$params$nu["Calm",   "log_return"]

cat(sprintf(
  "log_return: nu(Stress) = %.0f  |  nu(Calm) = %.0f\n",
  nu_stress_ret, nu_calm_ret
))
#> log_return: nu(Stress) = 9  |  nu(Calm) = 30

if (nu_stress_ret < nu_calm_ret) {
  cat("=> Stress regime shows heavier tails on log_return, as hypothesised.\n")
} else {
  cat("=> Note: with this synthetic data nu ordering may differ from empirical results.\n")
}
#> => Stress regime shows heavier tails on log_return, as hypothesised.

8 Appendix — Session Info

sessionInfo()
#> R version 4.6.0 (2026-04-24)
#> Platform: aarch64-apple-darwin23
#> Running under: macOS Sequoia 15.7.7
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRblas.0.dylib 
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.6/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] C/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Riga
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] kronxNBC_0.1.1
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.39   R6_2.6.1        fastmap_1.2.0   xfun_0.57      
#>  [5] cachem_1.1.0    knitr_1.51      htmltools_0.5.9 rmarkdown_2.31 
#>  [9] lifecycle_1.0.5 cli_3.6.6       sass_0.4.10     jquerylib_0.1.4
#> [13] compiler_4.6.0  tools_4.6.0     evaluate_1.0.5  bslib_0.11.0   
#> [17] yaml_2.3.12     otel_0.2.0      rlang_1.2.0     jsonlite_2.0.0

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.