The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

palimpsestr: Probabilistic Decomposition of Archaeological Palimpsests Using Stratigraphic Entanglement Fields

Enzo Cocca

2026-04-03

Archaeological palimpsests — deposits in which material from multiple occupation phases is superimposed and partially intermixed — represent one of the most persistent analytical challenges in field archaeology. The palimpsestr R package implements the Stratigraphic Entanglement Field (SEF) framework, a probabilistic method for decomposing palimpsests into latent depositional phases by jointly modelling four evidence domains: spatial proximity, vertical distribution, chronological overlap, and cultural similarity. The package provides a diagonal-covariance Gaussian mixture model fitted via Expectation-Maximisation, augmented with taphonomic weighting and stratigraphic entanglement penalties. Three interpretable diagnostics — the Stratigraphic Entanglement Index (SEI), Excavation Stratigraphic Energy (ESE), and Palimpsest Dissolution Index (PDI) — allow practitioners to assess deposit coherence, detect intrusive finds, and evaluate overall phase separability. This article describes the methodology and demonstrates a complete analytical workflow on a realistic multi-period Roman villa dataset.

Introduction

Archaeological palimpsests arise when successive episodes of human occupation leave material traces within the same sedimentary matrix, producing deposits in which artefacts, ecofacts, and features from chronologically distinct phases co-occur in close spatial and vertical proximity. The phenomenon is ubiquitous: from deeply stratified Near Eastern tells to shallow open-air Palaeolithic sites where millennia of activity are compressed into a few centimetres of deposit. Disentangling the depositional history of such contexts is essential for any behavioural, technological, or settlement-pattern inference, yet the analytical toolkit available to excavators has remained surprisingly limited.

The Harris Matrix (Harris, 1989) provides a rigorous formalism for recording observed stratigraphic relationships, but its relationships are deterministic and binary — a unit either overlies another or it does not. It cannot express the gradual, probabilistic mixing that characterises many palimpsests. Conversely, spatial clustering methods such as k-means (Kintigh and Ammerman, 1982) operate on coordinate data alone, ignoring vertical stratigraphy, chronological evidence from radiocarbon or ceramic seriation, and cultural-typological information. Neither approach offers a unified probabilistic framework, and neither produces quantitative diagnostics for deposit integrity or mixing intensity.

palimpsestr addresses these limitations by integrating four evidence domains — horizontal coordinates, vertical elevation, chronological range, and cultural class — into a single model that produces soft (probabilistic) phase assignments together with three diagnostic statistics. The Stratigraphic Entanglement Index (SEI) quantifies pairwise depositional coherence; Excavation Stratigraphic Energy (ESE) captures local disruption; and the Palimpsest Dissolution Index (PDI) summarises global phase separability. All three are interpretable by field practitioners and can be exported to GIS for spatial interrogation.

Methodology

The Stratigraphic Entanglement Index (SEI)

The SEI quantifies how strongly two finds \(i\) and \(j\) are “entangled” — that is, how much evidence supports their co-occurrence within the same depositional event. It is defined as:

\[SEI_{ij} = \frac{w_s}{d_{xy}(i,j) + \epsilon} + \frac{w_z}{\max(|z_i - z_j|, z_{floor})} + w_t \cdot O_t(i,j) + w_c \cdot \mathbb{1}[c_i = c_j]\]

where:

Each SEI component is normalised to [0, 1] by dividing by its within-dataset maximum, making weights directly interpretable as relative importance. The SEI matrix is row-normalised and used both as a regularisation penalty during model fitting and as the basis for the ESE diagnostic.

The Diagonal Gaussian Mixture EM

Phase decomposition proceeds via a Gaussian mixture model fitted by Expectation-Maximisation (EM). The feature matrix for each find is constructed by concatenating: scaled horizontal coordinates \((x, y)\), vertical coordinate \((z)\), temporal midpoint and temporal span of the date range, and a one-hot encoding of the cultural class variable. This multi-domain feature representation ensures that the clustering draws on all available evidence simultaneously.

The algorithm is initialised with K-means, and the resulting hard assignments are converted to softmax probabilities. The EM procedure then iterates between an E-step (computing posterior phase membership probabilities given current parameters) and an M-step (updating mixture weights, means, and diagonal covariance matrices given current memberships). Two modifications distinguish SEF from a standard Gaussian mixture:

  1. Taphonomic weighting: if a taphonomic integrity score is available for each find, it is used to down-weight poorly preserved or disturbed items during parameter estimation, reducing their influence on cluster centroids.
  2. Stratigraphic entanglement penalties: the SEI matrix is incorporated as a soft constraint, encouraging finds with high mutual entanglement to receive similar phase assignments.

Convergence is assessed by monitoring the log-likelihood; the algorithm terminates when the relative change falls below a tolerance threshold or a maximum number of iterations is reached.

Diagnostic Statistics

Three statistics summarise different aspects of the decomposition:

The Dataset: Villa Romana

We demonstrate the complete analytical workflow using villa_romana, a realistic multi-period Roman villa dataset included in the package. It simulates 300 finds from a villa with 4 occupation phases across 18 stratigraphic units, incorporating realistic post-depositional disturbances.

library(palimpsestr)
data(villa_romana)
cat("Finds:", nrow(villa_romana), "\n")
#> Finds: 300
cat("Stratigraphic units:", length(unique(villa_romana$context)), "\n")
#> Stratigraphic units: 18
cat("Material classes:", length(unique(villa_romana$class)), "\n")
#> Material classes: 13
cat("Date range:", min(villa_romana$date_min), "to", max(villa_romana$date_max), "\n")
#> Date range: -269 to 628

The four phases represent:

Phase Period Chronology
1 Republican 2nd–1st c. BCE
2 Early Imperial 1st–2nd c. CE
3 Late Imperial 3rd–4th c. CE
4 Late Antique 5th–6th c. CE

The dataset includes realistic disturbances: bioturbation (8% vertical displacement), construction cuts (5% stratigraphic intrusions), and residual pottery (3% old material in younger contexts).

cat("Material classes:\n")
#> Material classes:
sort(table(villa_romana$class), decreasing = TRUE)
#> 
#>      ceramic_coarse                bone    ceramic_amphorae          metal_iron 
#>                  85                  52                  28                  25 
#>   ceramic_sigillata               glass              lithic     ceramic_campana 
#>                  21                  19                  19                  16 
#> ceramic_sigillata_d        metal_bronze                coin              marble 
#>                  14                   8                   6                   4 
#>                tile 
#>                   3
str(villa_romana)
#> 'data.frame':    300 obs. of  10 variables:
#>  $ id        : chr  "VR_0001" "VR_0002" "VR_0003" "VR_0004" ...
#>  $ x         : num  45.9 42.8 39.4 38.7 47 ...
#>  $ y         : num  49.2 38 54.3 53 37.3 ...
#>  $ z         : num  3.56 3.69 3.64 3.81 4.22 ...
#>  $ context   : chr  "US_103" "US_102" "US_104" "US_103" ...
#>  $ date_min  : num  -172 -172 -173 -153 -242 -198 -168 -212 -165 -156 ...
#>  $ date_max  : num  -100 -67 -104 -37 -145 -102 -110 -126 -125 -95 ...
#>  $ class     : chr  "bone" "ceramic_coarse" "ceramic_coarse" "bone" ...
#>  $ taf_score : num  0.063 0.299 0.299 0.103 0.079 0.338 0.271 0.087 0.419 0.048 ...
#>  $ true_phase: int  1 1 1 1 1 1 1 1 1 1 ...

Phase Count Selection

When the number of phases is unknown, compare_k() fits models across a range of \(K\) values and reports BIC, ICL, and PDI for each. The optimal \(K\) minimises BIC while maximising PDI.

ck <- compare_k(
  villa_romana,
  k_values = 2:7,
  tafonomy = "taf_score",
  context  = "context",
  seed     = 42
)
print(ck)
#>   k       pdi mean_entropy mean_local_sei mean_energy    loglik       bic
#> 1 2 1.0000000 0.000000e+00        207.028    10.30843  8996.519 -17576.66
#> 2 3 1.0000000 0.000000e+00        207.028    10.30843 12637.253 -24647.09
#> 3 4 0.9701078 4.143940e-02        207.028    10.30843 13982.562 -27126.67
#> 4 5 1.0000000 0.000000e+00        207.028    10.30843 15580.391 -30111.29
#> 5 6 0.9999949 9.076127e-06        207.028    10.30843 14334.154 -27407.77
#> 6 7 0.9999953 9.066056e-06        207.028    10.30843 14598.013 -27724.45
#>         icl tot_withinss pseudo_bic
#> 1 -17576.66    1297.8397   644.7383
#> 2 -24647.09    1101.2130   698.1198
#> 3 -27151.53     984.1808   767.0805
#> 4 -30111.29     852.3565   826.6071
#> 5 -27407.78     769.4746   898.5861
#> 6 -27724.46     734.8830   987.4553
if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_compare_k(ck)
}
Model selection diagnostics: BIC, PDI, and mean entropy across candidate phase counts. The dashed line marks the optimal K.
Model selection diagnostics: BIC, PDI, and mean entropy across candidate phase counts. The dashed line marks the optimal K.

Fitting the Model (K = 4)

Based on the selection diagnostics and archaeological knowledge of the site (4 occupation periods), we fit with K = 4 and multiple random initialisations.

fit <- fit_sef(
  villa_romana,
  k        = 4,
  tafonomy = "taf_score",
  context  = "context",
  n_init   = 10,
  seed     = 42
)
print(fit)
#> <sef_fit>
#> Observations: 300
#> Estimated phases: 4
#> Mean entropy: 0.000
#> Mean local SEI: 207.028
#> Mean energy: 10.308
#> LogLik: 14635.455
#> BIC: -28432.453
#> ICL: -28432.453
#> PDI: 1.000
#> Converged: yes
#> Initialisations: 10
summary(fit)
#> $n
#> [1] 300
#> 
#> $k
#> [1] 4
#> 
#> $pdi
#> [1] 1
#> 
#> $mean_entropy
#> [1] 0
#> 
#> $mean_energy
#> [1] 10.30843
#> 
#> $loglik
#> [1] 14635.45
#> 
#> $bic
#> [1] -28432.45
#> 
#> $phase_count
#> 
#>   1   2   3   4 
#> 113  43  72  72

Model Overview

cat("PDI:", pdi(fit), "\n")
#> PDI: 1
sef_summary(fit)
#> $n
#> [1] 300
#> 
#> $k
#> [1] 4
#> 
#> $pdi
#> [1] 1
#> 
#> $mean_entropy
#> [1] 0
#> 
#> $mean_energy
#> [1] 10.30843
#> 
#> $loglik
#> [1] 14635.45
#> 
#> $bic
#> [1] -28432.45
#> 
#> $phase_count
#> 
#>   1   2   3   4 
#> 113  43  72  72

EM Convergence

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_convergence(fit)
}
EM convergence trace showing log-likelihood stabilisation across iterations.
EM convergence trace showing log-likelihood stabilisation across iterations.

Spatial Diagnostics

Phase-field Map

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_phasefield(fit)
}
Dominant phase assignment. Point size reflects assignment confidence (inverse entropy). Larger points indicate higher confidence.
Dominant phase assignment. Point size reflects assignment confidence (inverse entropy). Larger points indicate higher confidence.

The phase-field map reveals the spatial structure of the decomposition. The model assigns each find to one of four chronologically coherent phases despite the presence of post-depositional disturbance.

Entropy Map

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_entropy(fit)
}
Spatial distribution of Shannon entropy. High-entropy zones (bright) mark areas of uncertain phase assignment, typically at phase boundaries or in heavily mixed deposits.
Spatial distribution of Shannon entropy. High-entropy zones (bright) mark areas of uncertain phase assignment, typically at phase boundaries or in heavily mixed deposits.

Entropy mapping is particularly useful for identifying the spatial extent of mixing. Zones of elevated entropy correspond to areas where multiple depositional phases are interleaved and cannot be cleanly separated.

Energy Map (ESE)

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_energy(fit)
}
Excavation Stratigraphic Energy field. Elevated ESE values (bright) highlight zones of depositional disruption where neighbouring finds belong to different phases.
Excavation Stratigraphic Energy field. Elevated ESE values (bright) highlight zones of depositional disruption where neighbouring finds belong to different phases.

The energy map highlights local disruptions. Clusters of high-ESE finds may indicate bioturbation channels, pit cuts, or other post-depositional disturbances.

Intrusion Detection Map

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_intrusions(fit, top_n = 10)
}
Candidate intrusions overlaid on the phase-field map. Circled finds exhibit high entropy, high stratigraphic energy, and low entanglement with their neighbours. Top 10 suspects are labelled.
Candidate intrusions overlaid on the phase-field map. Circled finds exhibit high entropy, high stratigraphic energy, and low entanglement with their neighbours. Top 10 suspects are labelled.

Vertical Phase Profile

if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_phase_profile(fit)
}
Vertical phase profile: depth vs horizontal position. Phase 1 (deepest) corresponds to the earliest occupation; Phase 4 (shallowest) to the most recent.
Vertical phase profile: depth vs horizontal position. Phase 1 (deepest) corresponds to the earliest occupation; Phase 4 (shallowest) to the most recent.

The vertical profile confirms the expected stratigraphic ordering: earlier phases occupy deeper strata, later phases are shallower.

Intrusion Analysis

The detect_intrusions() function computes a composite intrusion probability for each find, combining entropy, ESE, and SEI connectivity into a single score.

di <- detect_intrusions(fit)
suspects <- di[di$intrusion_prob > 0.5, ]
cat("Suspected intrusions:", nrow(suspects), "out of", nrow(villa_romana), "finds",
    sprintf("(%.1f%%)\n", 100 * nrow(suspects) / nrow(villa_romana)))
#> Suspected intrusions: 19 out of 300 finds (6.3%)
if (nrow(suspects) > 0) {
  # Merge with source data and phase assignments
  phase_tab <- as_phase_table(fit)
  susp_data <- merge(suspects, villa_romana[, c("id", "context", "class")], by = "id")
  susp_data <- merge(susp_data, phase_tab[, c("id", "dominant_phase")], by = "id")
  susp_data <- susp_data[order(susp_data$intrusion_prob, decreasing = TRUE), ]
  susp_show <- susp_data[, c("id", "context", "class", "dominant_phase", "intrusion_prob")]
  names(susp_show) <- c("Find", "US", "Class", "Phase", "Intrusion Prob.")
  knitr::kable(head(susp_show, 15), row.names = FALSE, digits = 3,
               caption = "Top suspected intrusions ranked by intrusion probability.")
}
Top suspected intrusions ranked by intrusion probability.
Find US Class Phase Intrusion Prob.
VR_0295 US_403 tile 1 0.624
VR_0280 US_403 lithic 2 0.611
VR_0266 US_404 lithic 2 0.592
VR_0184 US_304 ceramic_amphorae 3 0.584
VR_0292 US_404 lithic 2 0.580
VR_0260 US_402 lithic 2 0.559
VR_0255 US_402 lithic 2 0.544
VR_0031 US_101 metal_iron 1 0.533
VR_0297 US_401 bone 4 0.529
VR_0005 US_102 bone 4 0.522
VR_0015 US_101 bone 4 0.521
VR_0258 US_403 metal_iron 1 0.519
VR_0287 US_403 ceramic_sigillata_d 4 0.517
VR_0261 US_402 lithic 2 0.512
VR_0220 US_304 ceramic_sigillata_d 4 0.511

Flagged finds warrant closer examination: they may represent genuinely intrusive material (e.g., items displaced by animal burrowing or construction cuts), or they may sit at genuine phase boundaries where assignment is inherently ambiguous.

Stratigraphic Unit Purity

The us_summary_table() function aggregates diagnostics per stratigraphic unit, reporting dominant phase, purity, mean entropy, energy, and intrusion count.

us_tab <- us_summary_table(fit)
knitr::kable(us_tab, row.names = FALSE, digits = 3,
             caption = "Per-US diagnostics: dominant phase, purity, mean entropy, mean ESE, and number of intrusions.")
Per-US diagnostics: dominant phase, purity, mean entropy, mean ESE, and number of intrusions.
context n_finds dominant_phase purity mean_entropy mean_energy mean_local_sei n_intrusions
US_101 20 2 0.400 0 10.436 182.756 2
US_102 16 2 0.688 0 10.490 165.276 1
US_103 13 4 0.462 0 10.124 181.638 0
US_104 14 2 0.429 0 10.322 170.486 1
US_201 18 3 0.444 0 9.687 220.272 0
US_202 12 3 0.583 0 9.621 218.802 0
US_203 21 3 0.571 0 9.961 220.554 0
US_204 18 3 0.556 0 9.433 226.365 0
US_205 19 3 0.579 0 9.352 219.409 0
US_301 19 1 0.421 0 10.649 226.346 0
US_302 19 1 0.421 0 10.541 232.135 0
US_303 20 1 0.600 0 10.394 233.207 0
US_304 19 4 0.421 0 10.414 221.838 2
US_305 18 1 0.667 0 10.404 227.608 0
US_401 14 1 0.714 0 11.026 198.514 2
US_402 19 1 0.474 0 10.869 186.837 3
US_403 11 1 0.545 0 11.456 161.082 5
US_404 10 1 0.500 0 11.084 180.713 3

Units with purity > 90% indicate consistent phase assignment within the context. Lower purity and higher mean entropy correspond to disturbed contexts where bioturbation or construction cuts have introduced material from adjacent phases.

n_pure <- sum(us_tab$purity >= 0.9)
cat(n_pure, "out of", nrow(us_tab), "units have purity >= 90%\n")
#> 0 out of 18 units have purity >= 90%

Phase Transition Matrix

Shows how often finds from phase \(i\) sit directly above finds from phase \(j\), revealing spatial adjacency between phases:

tm <- phase_transition_matrix(fit)
print(tm)
#>        phase1 phase2 phase3 phase4
#> phase1     37     14     30     32
#> phase2     16     14      6      7
#> phase3     28      5     22     16
#> phase4     31     10     14     17

High off-diagonal values indicate zones of contact between phases. This is expected at the boundary between successive occupation periods.

Model Validation

When true phase assignments are known (e.g., from simulation or independently dated contexts), palimpsestr provides two validation metrics:

ari <- adjusted_rand_index(fit, villa_romana$true_phase)
cat("Adjusted Rand Index:", round(ari, 3), "\n")
#> Adjusted Rand Index: 0.133
confusion_matrix(fit, villa_romana$true_phase)
#>          true
#> estimated  3  1  2  4
#>         1 44 15 24 30
#>         2  1 31  3  8
#>         3 16  2 52  2
#>         4 29 17 16 10
if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_confusion(fit, villa_romana$true_phase)
}
Confusion matrix heatmap. Strong diagonal = correct phase recovery. Off-diagonal cells indicate misattributed finds, often corresponding to bioturbated or redeposited material.
Confusion matrix heatmap. Strong diagonal = correct phase recovery. Off-diagonal cells indicate misattributed finds, often corresponding to bioturbated or redeposited material.

Sensitivity Analysis of SEI Weights

The SEI combines four evidence domains (spatial, vertical, temporal, cultural) with configurable weights. To assess how weight choices affect the results, we compare three configurations:

configs <- list(
  equal    = c(ws = 1, wz = 1, wt = 1, wc = 1),
  spatial  = c(ws = 2, wz = 1, wt = 0.5, wc = 0.5),
  temporal = c(ws = 0.5, wz = 0.5, wt = 2, wc = 1)
)

sens <- do.call(rbind, lapply(names(configs), function(nm) {
  w <- configs[[nm]]
  f <- fit_sef(villa_romana, k = 4, weights = w, seed = 42,
               tafonomy = "taf_score", context = "context")
  data.frame(
    config = nm,
    pdi = pdi(f),
    ari = adjusted_rand_index(f, villa_romana$true_phase),
    mean_entropy = mean(f$entropy, na.rm = TRUE),
    stringsAsFactors = FALSE
  )
}))
knitr::kable(sens, digits = 4,
             caption = "Sensitivity of PDI, ARI, and mean entropy to weight configuration.")
Sensitivity of PDI, ARI, and mean entropy to weight configuration.
config pdi ari mean_entropy
equal 0.9701 0.2279 0.0414
spatial 0.9701 0.2279 0.0414
temporal 0.9701 0.2279 0.0414

The results show that the model is reasonably robust to weight variation, though emphasising the most informative domain for a given site can improve performance. The optimize_weights() function provides a data-driven approach to weight selection via cross-validated log-likelihood.

Bootstrap Confidence Intervals

Uncertainty on key statistics can be quantified via bootstrap resampling:

bs <- bootstrap_sef(fit, n_boot = 50, true_labels = villa_romana$true_phase, verbose = FALSE)
knitr::kable(bs, digits = 4,
             caption = "Bootstrap confidence intervals (50 replicates) for key model statistics.")
Bootstrap confidence intervals (50 replicates) for key model statistics.
statistic estimate lower upper se
pdi 1.0000 0.9937 1.0000 0.0017
mean_entropy 0.0000 0.0000 0.0088 0.0023
mean_energy 10.3084 9.4868 10.5417 0.3201
loglik 14635.4547 13261.3458 14842.8517 422.1786
ari 0.1330 0.0424 0.1892 0.0392
if (requireNamespace("ggplot2", quietly = TRUE)) {
  gg_bootstrap(bs)
}
Bootstrap confidence intervals for PDI, mean entropy, mean energy, log-likelihood, and ARI.
Bootstrap confidence intervals for PDI, mean entropy, mean energy, log-likelihood, and ARI.

Harris Matrix Integration

palimpsestr can incorporate stratigraphic information from a Harris Matrix as a penalty during model fitting. The harris_from_contexts() function auto-generates a penalty matrix from the mean depth of finds in each context. Alternatively, read_harris() imports an external Harris Matrix from a CSV edge list.

H <- harris_from_contexts(villa_romana, z_col = "z", context_col = "context")
cat("Harris penalty matrix:", nrow(H), "x", ncol(H), "\n")
#> Harris penalty matrix: 300 x 300
fit_h <- fit_sef(villa_romana, k = 4, harris = H,
                 tafonomy = "taf_score", context = "context",
                 n_init = 5, seed = 42)
cat("PDI without Harris:", round(pdi(fit), 4), "\n")
#> PDI without Harris: 1
cat("PDI with Harris:   ", round(pdi(fit_h), 4), "\n")
#> PDI with Harris:    1
cat("ARI without Harris:", round(adjusted_rand_index(fit, villa_romana$true_phase), 4), "\n")
#> ARI without Harris: 0.133
cat("ARI with Harris:   ", round(adjusted_rand_index(fit_h, villa_romana$true_phase), 4), "\n")
#> ARI with Harris:    0.133

Validate phases against stratigraphy

validate_phases_harris(fit)
#>     upper  lower upper_phase lower_phase consistent
#> 1  US_104 US_102           2           2       TRUE
#> 2  US_102 US_101           2           2       TRUE
#> 3  US_101 US_103           2           4       TRUE
#> 4  US_103 US_205           4           3      FALSE
#> 5  US_205 US_201           3           3       TRUE
#> 6  US_201 US_203           3           3       TRUE
#> 7  US_203 US_202           3           3       TRUE
#> 8  US_202 US_204           3           3       TRUE
#> 9  US_204 US_303           3           1      FALSE
#> 10 US_303 US_301           1           1       TRUE
#> 11 US_301 US_304           1           4       TRUE
#> 12 US_304 US_302           4           1      FALSE
#> 13 US_302 US_305           1           1       TRUE
#> 14 US_305 US_401           1           1       TRUE
#> 15 US_401 US_402           1           1       TRUE
#> 16 US_402 US_404           1           1       TRUE
#> 17 US_404 US_403           1           1       TRUE

The consistent column flags whether the dominant phase ordering respects stratigraphic depth. FALSE indicates an inversion that may warrant investigation.

Interpretive Report

The report_sef() function generates a plain-language summary of the analysis, suitable for inclusion in excavation reports. It is available in English and Italian.

report_sef(fit, lang = "en")
#> # SEF Analysis Report
#> 
#> ## Overview
#> 
#> The Stratigraphic Entanglement Field model with **K = 4 phases** was fitted to a dataset of **300 finds** distributed across **18 stratigraphic contexts**.
#> 
#> The Palimpsest Dissolution Index (PDI) is **1.000**, indicating **excellent phase separation**. Mean entropy is 0.0000 and mean ESE is 10.308.
#> 
#> Model diagnostics: LogLik = 14635.5, BIC = -28432.5.
#> 
#> ## Phase Composition
#> 
#> ### Phase 1 (113 finds, 37.7%)
#> 
#> - **Dating**: average 280 CE (range: -236 to 628)
#> - **Material classes**: ceramic_coarse (85), metal_iron (25), tile (3)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.475
#> - **Contexts**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#> 
#> ### Phase 2 (43 finds, 14.3%)
#> 
#> - **Dating**: average 5 CE (range: -223 to 586)
#> - **Material classes**: ceramic_campana (16), lithic (19), metal_bronze (8)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.578
#> - **Contexts**: US_101, US_102, US_103, US_104, US_202, US_205, ... (11 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#> 
#> ### Phase 3 (72 finds, 24.0%)
#> 
#> - **Dating**: average 163 CE (range: -193 to 609)
#> - **Material classes**: ceramic_amphorae (28), ceramic_sigillata (21), glass (19), marble (4)
#> - **Mean entropy**: 0.0000; **Mean energy**: 9.843
#> - **Contexts**: US_104, US_201, US_202, US_203, US_204, US_205, ... (13 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#> 
#> ### Phase 4 (72 finds, 24.0%)
#> 
#> - **Dating**: average 202 CE (range: -269 to 618)
#> - **Material classes**: bone (52), ceramic_sigillata_d (14), coin (6)
#> - **Mean entropy**: 0.0000; **Mean energy**: 10.352
#> - **Contexts**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 total)
#> - **Assessment**: Entropy is very low, suggesting this phase is internally coherent and well-defined.
#> 
#> ## Intrusion Detection
#> 
#> **19 finds** (6.3%) have an intrusion probability above 0.5, flagging them as potentially displaced or redeposited.
#> 
#> Top suspects:
#> 
#> | ID | Context | Intrusion Prob. |
#> |---|---|---|
#> | VR_0295 | US_403 | 0.624 |
#> | VR_0280 | US_403 | 0.611 |
#> | VR_0266 | US_404 | 0.592 |
#> | VR_0184 | US_304 | 0.584 |
#> | VR_0292 | US_404 | 0.580 |
#> | VR_0260 | US_402 | 0.559 |
#> | VR_0255 | US_402 | 0.544 |
#> | VR_0031 | US_101 | 0.533 |
#> | VR_0297 | US_401 | 0.529 |
#> | VR_0005 | US_102 | 0.522 |
#> 
#> ## Stratigraphic Unit Purity
#> 
#> Of **18 stratigraphic units**: **0 pure** (>90%), **1 mixed** (70-90%), **17 palimpsest** (<70%).
#> 
#> Units classified as palimpsest require special attention, as material from multiple depositional episodes co-occurs within them.
#> 
#> ## Recommendations
#> 
#> The deposit shows good phase separation. The model can be used with confidence for phase attribution. Focus review on flagged intrusions and boundary zones.
#> 
#> ---
#> *Report generated by palimpsestr 0.10.0*
report_sef(fit, lang = "it")
#> # Rapporto Analisi SEF
#> 
#> ## Panoramica
#> 
#> Il modello Stratigraphic Entanglement Field con **K = 4 fasi** e' stato applicato a un dataset di **300 reperti** distribuiti in **18 contesti stratigrafici**.
#> 
#> Il Palimpsest Dissolution Index (PDI) e' **1.000**, indicando una separazione **eccellente** delle fasi. Entropia media: 0.0000. Energia media: 10.308.
#> 
#> Diagnostiche: LogLik = 14635.5, BIC = -28432.5.
#> 
#> ## Composizione delle Fasi
#> 
#> ### Fase 1 (113 reperti, 37.7%)
#> 
#> - **Datazione media**: 280 d.C.
#> - **Classi materiali**: ceramic_coarse (85), metal_iron (25), tile (3)
#> - **Entropia media**: 0.0000; **Energia media**: 10.475
#> - **US**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 totali)
#> 
#> ### Fase 2 (43 reperti, 14.3%)
#> 
#> - **Datazione media**: 5 d.C.
#> - **Classi materiali**: ceramic_campana (16), lithic (19), metal_bronze (8)
#> - **Entropia media**: 0.0000; **Energia media**: 10.578
#> - **US**: US_101, US_102, US_103, US_104, US_202, US_205, ... (11 totali)
#> 
#> ### Fase 3 (72 reperti, 24.0%)
#> 
#> - **Datazione media**: 163 d.C.
#> - **Classi materiali**: ceramic_amphorae (28), ceramic_sigillata (21), glass (19), marble (4)
#> - **Entropia media**: 0.0000; **Energia media**: 9.843
#> - **US**: US_104, US_201, US_202, US_203, US_204, US_205, ... (13 totali)
#> 
#> ### Fase 4 (72 reperti, 24.0%)
#> 
#> - **Datazione media**: 202 d.C.
#> - **Classi materiali**: bone (52), ceramic_sigillata_d (14), coin (6)
#> - **Entropia media**: 0.0000; **Energia media**: 10.352
#> - **US**: US_101, US_102, US_103, US_104, US_201, US_202, ... (18 totali)
#> 
#> ## Intrusioni
#> 
#> **19 reperti** (6.3%) presentano probabilita' di intrusione superiore a 0.5.
#> 
#> ## Purezza delle US
#> 
#> Su **18 US**: **0 pure** (>90%), **1 miste** (70-90%), **17 palinsesto** (<70%).
#> 
#> ---
#> *Rapporto generato da palimpsestr 0.10.0*

Structured Exports

All results can be exported to CSV for integration with external databases and GIS:

export_results(fit, dir = "results/", format = "csv", prefix = "villa_romana")

This creates 4 files:

GIS Export

For integration with GIS workflows, palimpsestr can export results as sf spatial objects:

if (requireNamespace("sf", quietly = TRUE)) {
  sf_pts   <- as_sf_phase(fit, crs = 32632L, dims = "XYZ")
  sf_links <- as_sf_links(fit, quantile_threshold = 0.90, crs = 32632L)
  sf::st_write(sf_pts,   "villa_romana.gpkg", layer = "phases")
  sf::st_write(sf_links, "villa_romana.gpkg", layer = "sei_links")
}

Interactive Visualization

When plotly is available, any gg_* plot can be converted to an interactive visualization using as_plotly():

as_plotly(gg_phasefield(fit))
as_plotly(gg_intrusions(fit))

Comparison with Traditional Approaches

Aspect Harris Matrix k-means palimpsestr (SEF)
Evidence domains Stratigraphy only Spatial only Spatial + vertical + temporal + cultural
Assignment type Deterministic Hard Probabilistic (soft)
Mixing detection No No Yes (entropy, ESE)
Intrusion detection Manual No Automatic
Scalability Manual Good Good
Interpretability High Low High (PDI, reports)

The SEF framework does not replace the Harris Matrix. Rather, it extends it. The Harris Matrix remains indispensable for documenting observed physical relationships between depositional units. What palimpsestr adds is a quantitative, probabilistic layer of analysis that operates on find-level data within those units — precisely the scale at which palimpsest formation is most problematic.

Model Limitations and Methodological Notes

Diagonal Covariance Assumption

The EM uses diagonal Gaussians, assuming conditional independence between features within each phase. This is a deliberate parsimony choice: for \(p\) features and \(k\) phases, diagonal covariance requires \(k(2p+1)-1\) parameters versus \(k(p(p+3)/2+1)-1\) for full covariance. With typical archaeological datasets (n = 50–500, p = 7–10), full covariance risks overfitting.

SEI Component Normalisation

Each SEI component is normalised to [0, 1] by dividing by its within-dataset maximum. This makes weights interpretable but means absolute SEI values are not comparable across datasets. Cross-site comparisons should use derived statistics (PDI, mean entropy) which are scale-invariant.

PDI Interpretation

The PDI is a descriptive measure, not a formal test statistic. The interpretive guideline (PDI > 0.7 = well-resolved) is empirically derived from simulation studies and should not be treated as a formal decision boundary. Bootstrap confidence intervals provide uncertainty quantification.

Intrusion Detection

The intrusion probability score is a heuristic combining rescaled entropy, ESE, and inverse local SEI. The 0.5 threshold is an exploratory guideline. Users should examine flagged observations individually using domain knowledge.

Phase Labelling

Phase numbers are arbitrary in mixture models (label switching). Use reorder_phases() to ensure phase 1 = deepest (oldest) stratum. The bootstrap procedure applies phase reordering to each replicate.

Cross-Validation

The cv_sef() and optimize_weights() functions standardise test data using the training set’s center and scale, ensuring consistent feature scales across folds.

Scalability

The SEI and ESE matrices are \(O(n^2)\) in memory. For datasets exceeding ~2000 finds, use sei_sparse() or the max_dist parameter in sei_matrix() to limit computation to spatial neighbours.

Conclusion

The palimpsestr package offers a unified probabilistic framework for decomposing archaeological palimpsests, addressing a long-standing gap between qualitative stratigraphic reasoning and quantitative spatial analysis. By jointly modelling spatial proximity, vertical distribution, chronological overlap, and cultural similarity, the Stratigraphic Entanglement Field approach produces soft phase assignments that acknowledge — rather than suppress — the inherent uncertainty of mixed deposits. The three diagnostic statistics (SEI, ESE, PDI) provide interpretable, actionable summaries at the find, deposit, and assemblage levels respectively.

The package is available on GitHub at https://github.com/enzococca/palimpsestr.

# install.packages("remotes")
remotes::install_github("enzococca/palimpsestr")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.