The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Sequence Plots: heatmap, index, and distribution

library(Nestimate)

The dataset

trajectories ships with Nestimate: 138 student sequences × 15 weeks, three states (Active, Average, Disengaged) plus NA for missed weeks.

data(trajectories)
dim(trajectories)
#> [1] 138  15
head(trajectories[, 1:8])
#>      1         2            3            4            5            6           
#> [1,] "Active"  "Disengaged" "Disengaged" "Disengaged" "Active"     "Active"    
#> [2,] "Average" "Average"    "Average"    "Average"    "Average"    "Average"   
#> [3,] "Average" "Active"     "Active"     "Active"     "Active"     "Active"    
#> [4,] "Active"  "Active"     "Active"     "Active"     "Active"     "Average"   
#> [5,] "Active"  "Active"     "Active"     "Average"    "Active"     "Average"   
#> [6,] "Average" "Average"    "Disengaged" "Average"    "Disengaged" "Disengaged"
#>      7         8        
#> [1,] "Active"  "Active" 
#> [2,] "Average" "Average"
#> [3,] "Active"  "Active" 
#> [4,] "Average" "Active" 
#> [5,] "Active"  "Active" 
#> [6,] "Average" "Average"
sort(unique(as.vector(trajectories)), na.last = NA)
#> [1] "Active"     "Average"    "Disengaged"

sequence_plot() is the single entry point for three views of this data:

type What it shows Uses dendrogram? Facets?
"heatmap" (default) dense carpet, rows sorted by a distance/dendrogram yes no
"index" carpet without dendrogram, row-gap optional no yes
"distribution" stacked area / bar of state proportions over time no yes

Defaults: legend = "right", frame = FALSE.

1. type = "heatmap" — clustered carpet

1.1 Default — LCS distance, ward.D2 dendrogram

sequence_plot(trajectories)

1.2 Switch the sort strategy

sequence_plot(trajectories, sort = "frequency",
              main = "sort = 'frequency'")

sequence_plot(trajectories, sort = "hamming",
              main = "sort = 'hamming'")

sequence_plot(trajectories, sort = "start",
              main = "sort = 'start' (no dendrogram)")

Available sorts: lcs (default), frequency, start, end, plus any build_clusters() distance — hamming, osa, lv, dl, qgram, cosine, jaccard, jw.

1.3 Cluster separators with k

Cut the dendrogram into k groups and overlay thin horizontal lines at the cluster boundaries in the ordered rows. Tune with k_color and k_line_width.

sequence_plot(trajectories, k = 3,
              main = "k = 3 — white separators")

sequence_plot(trajectories, k = 5,
              k_color = "black", k_line_width = 1.2,
              main = "k = 5 — thin black")

1.4 Legend position, custom palette, title

sequence_plot(trajectories,
              legend = "bottom",
              legend_title = "Engagement",
              state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"),
              main = "Custom palette + bottom legend")

1.5 Cell borders + tick thinning

sequence_plot(trajectories,
              cell_border = "grey60", tick = 3,
              main = "Cell grid + every-3rd tick")

1.6 frame = TRUE brings back the outer box

sequence_plot(trajectories, frame = TRUE,
              main = "frame = TRUE")

2. type = "index" — gap-ready carpet with facets

No dendrogram. Rows are sorted within each panel by sort. Supports group (vector or auto from a net_clustering) plus ncol / nrow facet grids.

2.1 Single panel

sequence_plot(trajectories, type = "index",
              main = "index — single panel")

2.2 Visible row gaps

sequence_plot(trajectories, type = "index", row_gap = 0.25,
              main = "index with row_gap = 0.25")

2.3 Faceted by net_clustering (auto 2×2 for k = 3)

cl <- build_clusters(as.data.frame(trajectories), k = 3L,
                   dissimilarity = "hamming", method = "ward.D2")
sequence_plot(cl, type = "index",
              main = "index faceted by build_clusters(k = 3)")

2.4 Force a 1×3 row

sequence_plot(cl, type = "index", ncol = 3, nrow = 1,
              main = "index — ncol = 3, nrow = 1")

3. type = "distribution" — state proportions over time

Stacked area or bar chart of state frequencies per time column.

3.1 Default stacked area

sequence_plot(trajectories, type = "distribution",
              main = "distribution — stacked area")

3.2 Stacked bars, count scale

sequence_plot(trajectories, type = "distribution",
              geom = "bar", scale = "count",
              main = "distribution — bars, count scale")

3.3 NA band on/off

sequence_plot(trajectories, type = "distribution", na = TRUE,
              main = "na = TRUE")

sequence_plot(trajectories, type = "distribution", na = FALSE,
              main = "na = FALSE")

3.4 Faceted by cluster

sequence_plot(cl, type = "distribution",
              main = "distribution by cluster (k = 3)")

Cheat sheet

# Always explore first with the default:
sequence_plot(trajectories)

# Zoom in on cluster structure:
sequence_plot(trajectories, k = 3)
sequence_plot(trajectories, sort = "hamming", k = 4)

# Compare cluster compositions:
cl <- build_clusters(as.data.frame(trajectories), k = 3,
                   dissimilarity = "hamming", method = "ward.D2")
sequence_plot(cl, type = "index")
sequence_plot(cl, type = "distribution")

# Polish for a paper:
sequence_plot(trajectories, k = 3,
              state_colors = c("#2a9d8f", "#e9c46a", "#e76f51"),
              legend_title = "Engagement",
              legend = "bottom",
              cell_border = "grey70",
              main = "Student engagement trajectories")

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.