The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Download a copy of the vignette to follow along here: alluvial_plots.Rmd
Alluvial plots can be generated to visualize how changing the number of clusters influences the distribution of observations according to one (or a few) features of interest.
First, some data setup just as was done in the previous vignettes.
library(metasnf)
# Generate data_list
my_dl <- data_list(
list(
data = expression_df,
name = "genes_1_and_2_exp",
domain = "gene_expression",
type = "continuous"
),
list(
data = methylation_df,
name = "genes_1_and_2_meth",
domain = "gene_methylation",
type = "continuous"
),
list(
data = gender_df,
name = "gender",
domain = "demographics",
type = "categorical"
),
list(
data = diagnosis_df,
name = "diagnosis",
domain = "clinical",
type = "categorical"
),
uid = "patient_id"
)
set.seed(42)
my_sc <- snf_config(
my_dl,
n_solutions = 1,
max_k = 40
)
## ℹ No distance functions specified. Using defaults.
## ℹ No clustering functions specified. Using defaults.
sol_df <- batch_snf(
dl = my_dl,
sc = my_sc,
return_sim_mats = TRUE
)
sim_mats <- sim_mats_list(sol_df)
similarity_matrix <- sim_mats[[1]]
Next, assemble a list clustering algorithm functions that cover the
range of the number of clusters you’d like to visualize. The example
below uses spectral_two
to spectral_six
, which
are spectral clustering functions covering 2 clusters to 6 clusters
respectively.
# Spectral clustering functions ranging from 2 to 6 clusters
cluster_sequence <- list(
spectral_two,
spectral_three,
spectral_four
)
Then, we can either generate an alluvial plot covering our similarity
matrix over these clustering algorithms for data in a
data_list
:
alluvial_cluster_plot(
cluster_sequence = cluster_sequence,
similarity_matrix = similarity_matrix,
dl = my_dl,
key_outcome = "gender", # the name of the feature of interest
key_label = "Gender", # how the feature of interest should be displayed
extra_outcomes = "diagnosis", # more features to plot but not colour by
title = "Gender Across Cluster Counts"
)
Or in an external data frame:
extra_data <- dplyr::inner_join(
gender_df,
diagnosis_df,
by = "patient_id"
) |>
dplyr::mutate(uid = paste0("uid_", patient_id))
head(extra_data)
## patient_id gender diagnosis uid
## 1 660 female definite asthma uid_660
## 2 420 female possible asthma uid_420
## 3 252 female definite asthma uid_252
## 4 173 female no asthma uid_173
## 5 327 female definite asthma uid_327
## 6 245 female definite asthma uid_245
alluvial_cluster_plot(
cluster_sequence = cluster_sequence,
similarity_matrix = similarity_matrix,
data = extra_data,
key_outcome = "gender",
key_label = "Gender",
extra_outcomes = "diagnosis",
title = "Gender Across Cluster Counts"
)
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.