deaviz turns the numbers behind a Data Envelopment
Analysis (DEA), whether they are input/output profiles or the
computational outcomes of various DEA models, into plots of efficiency
distributions, input/output relationships, efficient frontier
representations, projection biplots, benchmarking networks,
cross-efficiency maps, and multi-period trajectories for panel data.
This vignette walks through a typical workflow using two bundled
datasets.
The workflow has three steps that map onto the package’s naming convention:
dea_data() object
(inputs, outputs, DMU labels);compute_*() the quantities you need
(efficiency, cross-efficiency, weights, self-organising maps);plot_*() the result.Most plots compute efficiency scores internally, which relies on the
Benchmarking package; a few
embeddings/layouts use smacof,
igraph/graphlayouts, or kohonen.
These are all Suggests: install them to reproduce every figure
below. (Where a suggested package is missing, the corresponding chunk is
simply skipped, so this vignette always builds.)
deaviz ships with chinese_cities, a classic
cross-sectional DEA benchmark of 35 cities with three inputs and three
outputs (Sueyoshi, 1992). dea_data() records which columns
are inputs, which are outputs, and which identifies the DMU. Columns can
be given by name or by position.
So, the dea_data object can be defined either by the
input and output variable names:
d <- dea_data(
chinese_cities,
inputs = c("industrial_labour_force", "working_funds", "investments"),
outputs = c("gross_industrial_output", "profit_and_tax", "retail_sales"),
id = "DMU"
)
d
#> <dea_data>
#> DMUs : 35
#> Inputs : 3 (industrial_labour_force, working_funds, investments)
#> Outputs : 3 (gross_industrial_output, profit_and_tax, retail_sales)or equivalently by the location of them:
Every downstream function accepts this d object. If your
input/output columns are prefixed i_ / o_,
dea_data() will detect them automatically, allowing you to
omit the inputs and outputs arguments.
The compute_efficiency() function returns radial
efficiency scores, along with peer and multiplier weights.
Returns-to-scale (RTS) and orientation can be specified as
arguments.
Start by looking at the distributions’ spread.
plot_efficiency_distributions() shows the distribution of
efficiency scores, while plot_io_distributions() shows the
raw input/output variables.
plot_efficiency_distributions(d, rts = "vrs", title = "Chinese Cities Efficiency Scores", subtitle = "Variable Return To Scale")Note the use of x_angle = 30: because the input and
output names are long, tilting the x-axis tick labels keeps them
readable. Every plot that displays variable or DMU names on the x-axis
accepts the x_angle argument.
The plot_io_efficients() function compares the number of
efficient versus inefficient DMUs.
plot_io_scatter() lays out every input-against-output
pair if no vector of inputs and/or outputs is assigned to
vars. If a vector is provided, then the scatterplots will
be limited to the pairwise combinations of those variables. Regardless,
the visual marks are colored by efficiency.
plot_io_scatter(d, vars = c("industrial_labour_force", "gross_industrial_output", "retail_sales") , color = "vrs")It is possible to plot scatterplots against the efficiency scores of the DMUs as well:
plot_io_scatter(d, vars = c("industrial_labour_force", "gross_industrial_output", "retail_sales"), efficiency = "vrs" , color = "vrs")The only frontier visualization plot available is via
plot_io_costa_frontier(), which collapses all inputs and
outputs onto a single aggregated frontier (Bana e Costa et al.,
2016).
The package offers two ways to project the multidimensional
input/output space onto a readable plane. First,
plot_io_pca_biplot() uses Principal Component Analysis
(PCA) to draw the input/output loading vectors. Second,
plot_io_mds() uses metric (ratio) multidimensional scaling
via the smacof majorization algorithm (de Leeuw &
Mair, 2009). The graphical application of these projections to DEA
follows Adler & Raveh (2008).
Let’s have a look at the PCA biplot:
The vectors are the dataset’s inputs and outputs, and they show the direction in which the value of the corresponding input or output increases in the 2D space.
In contrast, we can use an MDS algorithm to reduce the dimensionality of the dataset and represent the DMUs visually in a 2D plot.
What to do with overcrowded plots? One solution offered by the
deaviz package is to make the plot interactive so that you
can zoom in and hover over the visual marks to get information about
them.
For inefficient DMUs, DEA identifies the efficient peers they are
benchmarked against. plot_io_lambda_network() draws those
peer relationships weighted by the envelopment (\(\lambda\)) weights, laid out with Sammon
mapping (Sammon, 1969) as in Porembski et al. (2005). Meanwhile,
plot_io_peer_network()lays out who is a peer to whom;
therefore, the edges are directed from the inefficient units to their
targets.
It is sometimes important in the networks to focus on and highlight a
specific DMU and deaviz package addresses that need via the
labels = argument:
It is sometimes important to highlight and focus on a specific DMU
within a network. The deaviz package addresses this need
via the labels argument:
Cross-efficiency scores every DMU using every other DMU’s optimal
weights (Doyle & Green, 1994).
compute_cross_efficiency() builds the matrix, which
plot_cem_heatmap() displays.
plot_cem_unfolding() unfolds the same matrix into a map of
who rates whom favorably (Ashkiani & Mar-Molinero, 2017), and
plot_cem_weights_heatmap() shows the underlying weight
profiles.
What if you want to highlight a specific DMU? Just as before, you can
use the labels argument:
plot_io_radar() and plot_io_parcoo() show
each DMU’s full input/output profile as a radar polygon or a
parallel-coordinates line.
This feature is powerful enough to warrant its own section to explain
it in greater detail. When you pass a single DMU name to the
labels argument, deaviz puts it center stage:
the target DMU is ringed and highlighted with a label, while all other
units fade into the background. In network plots, the focus restricts
the view to the chosen DMU’s immediate sub-network; in panel biplots, it
isolates that specific DMU’s trajectory.
The amount of fade is tunable through the fade argument:
TRUE (default) uses a sensible level, FALSE
turns it off, and a number sets the alpha of the faded marks directly (a
larger number keeps them more visible).
Other labels modes are "all" (label
everyone), "id" (number each marker), and
"max.overlaps" (label as many as fit without
collision).
compute_som() trains a self-organising map (Kohonen,
2001) on the input/output profiles, via the kohonen
package (Wehrens & Kruisselbrink, 2018); plot_io_som()
colours the map by mean efficiency per node.
For panel data, plot_panel_io_biplot() projects every
DMU-period combination onto a shared PCA biplot and connects each DMU’s
data points to form a trajectory over time. The bundled
taiwanese_banks dataset provides a balanced panel of 22
commercial banks from 2009 to 2011 (Kao & Liu, 2014). This dataset
serves as a reproducible benchmark to demonstrate the package’s panel
functionality, regardless of the underlying input and output
specifications. We can visualize these trajectories using the
deaviz panel biplot while retaining all DMU identifiers on
the plot:
plot_panel_io_biplot(
taiwanese_banks, id = "DMU", period = "Year",
inputs = 3:5, outputs = 6:8, labels = "id"
)The trajectories, the paths that the DMUs have traversed based on their input/output profiles, are presented via the segmented vectors that connect each DMU’s position over the periods.
You might want to draw attention to one specific DMU. For instance, here the focus view keeps only Cathay’s three-year path lit while the other banks recede, and the loading vectors are spread apart so their labels stay legible.
plot_panel_io_biplot(
taiwanese_banks, id = "DMU", period = "Year",
inputs = 3:5, outputs = 6:8, labels = "Cathay", fade = 0.25
)It is worth noting that the PCA is computed on the pooled data.
Many plots accept the interactive = TRUE argument, which
returns a plotly widget with hover tooltips instead of a
static ggplot object. This feature requires the
plotly package and is best viewed within an HTML
context:
Adler, N., & Raveh, A. (2008). Presenting DEA graphically. Omega, 36(5), 715–729.
Ashkiani, S. (2019). Four Essays on Data Visualization and Anomaly Detection of Data Envelopment Analysis Problems (PhD thesis). Universitat Autonoma de Barcelona. https://ddd.uab.cat/record/240333
Ashkiani, S., & Mar-Molinero, C. (2017). Visualization of cross-efficiency matrices using multidimensional unfolding. In Recent Applications of Data Envelopment Analysis.
Bana e Costa, C. A., Soares de Mello, J. C. C. B., & Angulo Meza, L. (2016). A new approach to the bi-dimensional representation of the DEA efficient frontier with multiple inputs and outputs. European Journal of Operational Research, 255(1), 175–186. https://doi.org/10.1016/j.ejor.2016.05.012
Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European Journal of Operational Research, 2(6), 429–444. https://doi.org/10.1016/0377-2217(78)90138-8
de Leeuw, J., & Mair, P. (2009). Multidimensional scaling using majorization: SMACOF in R. Journal of Statistical Software, 31(3), 1–30. https://doi.org/10.18637/jss.v031.i03
Doyle, J., & Green, R. (1994). Efficiency and cross-efficiency in DEA: Derivations, meanings and uses. Journal of the Operational Research Society, 45(5), 567–578. https://doi.org/10.1057/jors.1994.84
Kao, C., & Liu, S.-T. (2014). Multi-period efficiency measurement in data envelopment analysis: The case of Taiwanese commercial banks. Omega, 47, 90–98. https://doi.org/10.1016/j.omega.2013.09.001
Kohonen, T. (2001). Self-Organizing Maps (3rd ed.). Springer.
Porembski, M., Breitenstein, K., & Alpar, P. (2005). Visualizing efficiency and reference relations in data envelopment analysis with an application to the branches of a German bank. Journal of Productivity Analysis, 23(2), 203–221. https://doi.org/10.1007/s11123-005-1328-5
Sammon, J. W. (1969). A nonlinear mapping for data structure analysis. IEEE Transactions on Computers, C-18(5), 401–409. https://doi.org/10.1109/T-C.1969.222678
Sueyoshi, T. (1992). Measuring the industrial performance of Chinese cities by data envelopment analysis. Socio-Economic Planning Sciences, 26(2), 75–88. https://doi.org/10.1016/0038-0121(92)90015-W
Wehrens, R., & Kruisselbrink, J. (2018). Flexible self-organizing maps in kohonen 3.0. Journal of Statistical Software, 87(7), 1–18. https://doi.org/10.18637/jss.v087.i07
Every function has its own help page with a full argument list and
examples (e.g. ?plot_panel_io_biplot). To cite the package,
see citation("deaviz").