The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Provenance-weighted classification uncertainty

A deterministic key gives one answer. But the inputs to that key carry error – analytical error on a lab value, estimation error on a VLM-extracted colour, regional-average error on a SoilGrids prior. classify_with_uncertainty() propagates that error into a probability distribution over classes, and – crucially – it scales the error to how each value was actually obtained.

The idea

classification_robustness() (since v0.9.42) perturbs every horizon cell by the same fixed fraction and reports one number: the share of Monte-Carlo runs that kept the baseline class. That treats a lab-measured clay value and a photo-guessed one as equally trustworthy.

classify_with_uncertainty() does two things differently:

  1. It returns the whole posteriorP(class = x) for every class the profile reached – not just a single robustness percentage.

  2. It weights the perturbation by the provenance ledger. Each (horizon, attribute) cell is perturbed by an amount set by its evidence grade:

    Grade Source Multiplicative half-width
    A measured 3 %
    B spectra-predicted 7 %
    C prior-inferred 10 %
    D VLM-extracted 17 %
    E user-assumed 30 %

    (pH and Munsell columns get additive half-widths instead; see get_perturbation_scale().)

So a classification that rests on assumptions is reported as genuinely uncertain, while one resting on measurements is reported as firm – even when both sit the same distance from a key boundary.

A first run

library(soilKey)

p <- make_ferralsol_canonical()
u <- classify_with_uncertainty(p, n = 200, system = "wrb2022")
u
#> <soilkey_uncertainty>  system=wrb2022  level=rsg
#>   baseline    : Ferralsols
#>   MC runs     : 200 (200 successful)
#>   entropy     : 0.000
#>   posterior   :
#>     Ferralsols                       100.0%
#>   most decisive attribute: ...

The fields of the returned soilkey_uncertainty object:

u$posterior     # named numeric vector, P(class), sums to 1
u$top1          # the modal class
u$entropy       # Shannon entropy -- 0 for a certain result
u$sensitivity   # data.table(attribute, importance)

Provenance changes the answer

The same profile, classified twice – once with its clay values measured, once with them merely assumed:

# Mark every clay value as a bare assumption (grade E).
p_assumed <- make_ferralsol_canonical()
for (i in seq_len(nrow(p_assumed$horizons))) {
  p_assumed$add_measurement(i, "clay_pct",
                            p_assumed$horizons$clay_pct[i],
                            source = "user_assumed", confidence = 0.2,
                            overwrite = TRUE)
}

classify_with_uncertainty(p,         n = 200)$entropy   # low  -- measured
classify_with_uncertainty(p_assumed, n = 200)$entropy   # high -- assumed

Nothing about the soil changed; only what we know about it did. That is exactly what a pedometric uncertainty estimate should capture.

The sensitivity ranking

classify_with_uncertainty() also runs a leave-one-attribute-out pass: each attribute is held fixed in turn, and the drop in flip rate measures how much that attribute drives the instability.

u$sensitivity
#>      attribute importance
#> 1     clay_pct      0.18
#> 2     cec_cmol      0.06
#> ...

The top row is the measurement that would most sharpen the classification – a direct, defensible answer to “what should I measure next?”.

Backward compatibility

classification_robustness() is unchanged by default. It gains one new argument, provenance_aware, off by default:

# Identical to every previous release:
classification_robustness(p, system = "wrb2022", n = 100)

# Opt in to grade-scaled perturbation:
classification_robustness(p, system = "wrb2022", n = 100,
                          provenance_aware = TRUE)

In the Shiny app

The Uncertainty tab of the Pro Shiny app (run_classify_app()) wraps all of this: pick a system, set the number of Monte-Carlo runs, and read off the posterior bar chart, the entropy, and the attribute-sensitivity table.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.