The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Type: Package
Title: Computed ABC Analysis
Version: 1.0
Description: Identify the most relative data points by dividing a numeric data set into three classes A, B, and C, where class A items are the "import few", class C items are the "trivial many" with class B items being something in between, resembling the idea of the Pareto principle. This ABC classification is done using an ABC curve, which plots cumulative "Yield" against "Effort", similar to a Lorenz curve. Class borders are then precisely mathematically defined on that curve, aiding in interpretation. Based on: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.
Depends: R (≥ 2.10.0)
Imports: ggplot2, plotrix, grDevices, graphics, stats, utils
LazyData: true
Suggests: datasets, testthat (≥ 3.0.0)
License: GPL-3
URL: https://github.com/AndreHDev/cABC_Analysis
Encoding: UTF-8
Date: 2026-04-20
RoxygenNote: 7.3.3
NeedsCompilation: no
Config/testthat/edition: 3
Packaged: 2026-04-24 14:03:32 UTC; andre
Author: Jorn Lotsch ORCID iD [aut], André Himmelspach ORCID iD [aut, cre]
Maintainer: André Himmelspach <himmelspach@med.uni-frankfurt.de>
Repository: CRAN
Date/Publication: 2026-04-28 19:00:33 UTC

SwissInhabitants in 1900

Description

Number of inhabitants in the 2896 villages of Switzerland in the year 1900.

Usage

data("SwissInhabitants")

Format

A numeric vector of length 2896 containing the population counts.

Details

This data set consists of the number of inhabitants in the 2896 communes (cities and villages) in Switzerland in 1900. The data is unordered for anonymity reasons.

Source

Schuler, M., Ullmann, D. (2002). Eidgenossische Volkszahlung: Bevoelkerungsentwicklung der Gemeinden. Bundesamt fur Statistik, Neuchatel, Switzerland.

References

Behnisch, M., Ultsch, A. (2010). Population Patterns in Switzerland 1850-2000. In: Gaul, W. et al (Eds), Advances in Data Analysis, Data Handling and Business Intelligence, Springer, Heidelberg, pp. 163-173.

Examples

data(SwissInhabitants)
summary(SwissInhabitants)

ABC Classification

Description

Divides a numeric dataset into three classes (A, B, and C) using ABC analysis. The classification is based on geometric properties of the ABC curve and identifies regions of high, balanced, and low efficiency. Class interpretation:

A: Low effort, high yield (Pareto items)
B: Balanced effort and yield
C: High effort, low yield (submarginal items)

Usage

cABC_analysis(Data, PlotIt = FALSE, useGGPlot = TRUE)

Arguments

Data

Positive numeric vector which is not uniformly distributed. If matrix or dataframe then the first column will be used.

PlotIt

Logical. If TRUE, an ABC plot is generated.

useGGPlot

Logical, default TRUE. If TRUE a ggplot2 plot is produced; if FALSE a base-R plot is produced. Only relevant when PlotIt = TRUE.

Details

Calculation of Boundaries is done on the ABC Curve (see cABC_curve) with:

Pareto Point: The point with minimal distance to (0,1) -> A|B Boundary
Breakeven Point: The point where slope equals to 1
Juren Point: The point with minimal distance to (BreakevenPoint_x,1) -> B|C Boundary

For more calculation details see: Ultsch A, Lotsch J (2015) "Computed ABC Analysis for rational Selection of most informative Variables in multivariate Data". PLoS ONE 10(6): e0129767. <doi:10.1371/journal.pone.0129767>.

Data cleaning: Before classification, non-numeric values and NAs are coerced to 0, negative values are set to 0. A warning is issued when items are converted. If a matrix or data frame is supplied, only the first column is used.

Degenerate inputs (single point, all-identical values, very small datasets) are caught before curve fitting, see cABC_handle_specials for the full behavior. Boundary duplicate values that span two classes after classification are resolved by cABC_postprocess_classes. In both cases a warning is issued when a special case is triggered.

Value

A list containing:

Aind, Bind, Cind

Integer vectors of indices (into the original Data) for items assigned to classes A, B, and C respectively. In special-case returns (single point or all-identical), only Aind is populated; Bind and Cind are integer(0).

ABexchanged

Logical; TRUE if the Pareto point and Break-even point were swapped to maintain coordinate logic (i.e. the Break-even point was to the left of the Pareto point on the curve).

A, B, C

c(x, y) coordinates for the Pareto point (A), the Break-even point (B), and the Submarginal point (C). NULL in special-case returns.

smallestAData

Cumulative yield at the boundary of Class A. NULL in special-case returns.

smallestBData

Cumulative yield at the boundary of Class B. NULL in special-case returns.

AlimitIndInInterpolation

Index of the A boundary in the interpolated [p, ABC] curve. NULL in special-case returns.

BlimitIndInInterpolation

Index of the C boundary in the interpolated [p, ABC] curve. NULL in special-case returns.

p

Numeric vector of effort values (x-axis) of the interpolation curve. NULL in special-case returns.

ABC

Numeric vector of yield values (y-axis) of the interpolation curve. NULL in special-case returns.

ABLimit

Data value closest to the threshold separating Class A from Class B. NULL in special-case returns.

BCLimit

Data value closest to the threshold separating Class B from Class C. NULL in special-case returns.

Author(s)

André Himmelspach (01/2026)

Examples

data("SwissInhabitants")
abc <- cABC_analysis(SwissInhabitants, PlotIt = TRUE)

# Extract the data belonging to each class
A <- abc$Aind; B <- abc$Bind; C <- abc$Cind
Agroup <- SwissInhabitants[A]
Bgroup <- SwissInhabitants[B]
Cgroup <- SwissInhabitants[C]


cABC Curve Computation

Description

Computes cumulative percentage of largest data (effort) and cumulative percentages of sum of largest Data (yield) with monotone hyman spline interpolation used to generate in between points.

Usage

cABC_curve(Data, p)

Arguments

Data

Numeric vector/matrix. First column used if matrix. Only positive values used.

p

Optional x-values for spline interpolation. Default: finer grid for large datasets.

Value

List containing: Curve: Data frame with Effort (x) and Yield (y) of interpolated curve Slope: Data frame with p (x-values) and cABC (first derivative)


Handle Special Cases Before ABC Classification

Description

Checks for degenerate input conditions that would make a standard ABC analysis undefined or unreliable, and returns an early result or warning where appropriate. This function is called by cABC_analysis before the ABC curve is computed.

Usage

cABC_handle_specials(Data)

Arguments

Data

Named numeric vector of positive values (already cleaned by cABC_analysis: no NAs, no non-positives, names preserved).

Details

The following special cases are handled:

Single data point

If only one positive value remains after cleaning, it is assigned to Class A and a warning is issued. The returned list has Aind = 1 and all other fields empty/NULL.

All values identical

If every data point has the same value, all items are considered equally important. They are all assigned to Class A (not Class B as the warning text historically stated) and a warning is issued. The returned list has Aind set to all indices and all other fields empty/NULL.

Very small dataset

If three or fewer positive values remain after cleaning, a warning is issued that the ABC classification may be unstable, but processing continues normally and NULL is returned so that cABC_analysis proceeds with the standard algorithm.

Value

NULL if no special case applies (normal processing should continue). Otherwise a named list with the same structure as the return value of cABC_analysis, where only Aind (and optionally Bind, Cind) are populated and all curve-related fields are NULL or empty.


cABC Plot

Description

Draws an ABC curve together with identity and optional uniform reference curves, ABC set boundaries (A-B and B-C), labels, and point counts.

Usage

cABC_plot(
  CurveData,
  CleanData,
  Boundaries,
  Set_counts,
  x_vals,
  y_vals,
  LineType = 0,
  LineWidth = 3,
  ShowUniform = TRUE,
  Plot_title = "ABC plot",
  defaultAxes = FALSE
)

Arguments

CurveData

Data about the ABC Curve as returned by ABC_curve

CleanData

Clean original input data.

Boundaries

A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries.

Set_counts

A list with elements nA, nB, nC giving the number of observations in sets A, B, and C.

x_vals

Numeric vector of x coordinates of original data points.

y_vals

Numeric vector of y coordinates of original data points.

LineType

Integer. If 0 (default), the ABC curve is drawn as a line.

LineWidth

Numeric. Line width for the ABC curve. Default is 3.

ShowUniform

Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves.

Plot_title

Character string. Title of the plot. Default is "ABC plot".

defaultAxes

Logical. If TRUE (default FALSE), base R axes are drawn by plot(). If FALSE, custom axes with ticks at 0–1 in steps of 0.1 are drawn.

Details

The plot always uses a square coordinate system with both axes ranging from 0 to 1. The diagonal y = 1 - x (equilibrium line) and the identity line y = x are drawn as references. ABC set boundaries (A|B and B|C) are visualized with stars and orthogonal boundary lines.

Value

Base R Plot


cABC_plotGG

Description

ggplot2 version matching base R cABC_plot

Usage

cABC_plotGG(
  CurveData,
  CleanData,
  Boundaries,
  Set_counts,
  x_vals,
  y_vals,
  LineWidth = 1.25,
  ShowUniform = TRUE,
  Plot_title = "ABC plot"
)

Arguments

CurveData

Data about the ABC Curve as returned by ABC_curve

CleanData

Clean original input data.

Boundaries

A list with numeric vectors A, B, and C, each of length 2, giving the x/y coordinates of the ABC boundaries.

Set_counts

A list with elements nA, nB, nC giving the number of observations in sets A, B, and C.

x_vals

Numeric vector of x coordinates of original data points.

y_vals

Numeric vector of y coordinates of original data points.

LineWidth

Numeric. Line width for the ABC curve. Default is 3.

ShowUniform

Logical. If TRUE (default), the uniform reference curve is drawn in addition to the identity and ABC curves.

Plot_title

Character string. Title of the plot. Default is "ABC plot".

Details

The plot always uses a square coordinate system with both axes ranging from 0 to 1. The diagonal y = 1 - x (equilibrium line) and the identity line y = x are drawn as references. ABC set boundaries (A|B and B|C) are visualized with stars and orthogonal boundary lines. Shows individual points if they are less then 20.

Value

ggplot2 object


Post-process ABC Classes to Resolve Boundary Duplicates

Description

After the initial class assignment in cABC_analysis, it is possible for data points with the same value to be split across two or even all three classes (A, B, C) because the geometric boundary cuts through a run of identical values. This function detects such duplicates and consolidates all occurrences of an ambiguous value into a single class using a deterministic tie-breaking strategy.

Usage

cABC_postprocess_classes(Aind, Bind, Cind, Data, sorted_data, ABLimit, BCLimit)

Arguments

Aind

Integer vector of indices currently assigned to Class A.

Bind

Integer vector of indices currently assigned to Class B.

Cind

Integer vector of indices currently assigned to Class C.

Data

Named numeric vector of the (unsorted) input data, as cleaned by cABC_analysis.

sorted_data

Numeric vector; Data sorted in decreasing order (used internally for boundary reference).

ABLimit

Numeric scalar; the data value closest to the A/B boundary threshold (as computed in cABC_analysis).

BCLimit

Numeric scalar; the data value closest to the B/C boundary threshold (as computed in cABC_analysis).

Details

Tie-breaking rules:

  1. The class that contains the most occurrences of the duplicate value wins outright.

  2. If all three classes are tied, the duplicate value is compared to both boundary limits. It is assigned to whichever boundary (ABLimit or BCLimit) it is closest to, then placed in the class above that boundary (i.e. closer to AB → A if dup_val >= ABLimit, else B; closer to BC → B if dup_val >= BCLimit, else C). If equidistant from both boundaries it is assigned to B.

  3. If exactly two classes are tied, the pair determines the rule:

    • A vs B: compare to ABLimit; >= ABLimit → A, otherwise → B.

    • B vs C: compare to BCLimit; >= BCLimit → B, otherwise → C.

    • A vs C: always assign to A, since the value was already deemed important enough to appear in the top class.

A warning is issued whenever at least one duplicate boundary value is found, prompting the user to inspect the data and the ABC plot.

Value

A named list with three elements:

Aind

Sorted integer vector of indices for Class A after deduplication.

Bind

Sorted integer vector of indices for Class B after deduplication.

Cind

Sorted integer vector of indices for Class C after deduplication.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.