The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
smallsets
website: lydialucchesi.github.io/smallsets/
Do you use R or Python to preprocess datasets for analyses?
smallsets
is an R package (https://CRAN.R-project.org/package=smallsets) that
transforms the preprocessing code in your R, R Markdown, Python, or
Jupyter Notebook file into a Smallset Timeline. A Smallset Timeline is a
static, compact visualisation composed of small data snapshots of
different preprocessing steps. A full description of the Smallset
Timeline can be found in the paper Smallset
Timelines: A Visual Representation of Data Preprocessing
Decisions in the proceedings of ACM FAccT β22. A short (3 min) and long
(15 min)
YouTube video provide an introduction to the project.
The smallsets
user guide is available here
and in the package in vignette("smallsets")
. If you have
questions or would like help building a Smallset Timeline, please
email Lydia.
Download the smallsets cheatsheet (1-page PDF)
install.packages("smallsets")
Run this snippet of code to build your first Smallset Timeline! Itβs based on the synthetic dataset s_data, with 100 observations and eight variables (C1-C8), and the preprocessing script s_data_preprocess.R, discussed in the following section.
library(smallsets)
set.seed(145)
Smallset_Timeline(data = s_data,
code = system.file("s_data_preprocess.R", package = "smallsets"))
The Smallset Timeline above is based on the R preprocessing script
below, s_data_preprocess.R. Structured comments were added to it,
informing smallsets
what to do.
# smallsets snap s_data caption[Remove rows where C2 is FALSE.]caption
<- s_data[s_data$C2 == TRUE, ]
s_data
# smallsets snap +2 s_data caption[Replace missing values in C6 and C8 with
# column means. Drop C7 because there are too many missing values.]caption
$C6[is.na(s_data$C6)] <- mean(s_data$C6, na.rm = TRUE)
s_data$C8[is.na(s_data$C8)] <- mean(s_data$C8, na.rm = TRUE)
s_data$C7 <- NULL
s_data
# smallsets snap +1 s_data caption[Create a new column, C9, by summing C3 and
# C4.]caption
$C9 <- s_data$C3 + s_data$C4 s_data
smallsets
If you use the smallsets
software, please cite the
Smallset Timeline paper.
Lydia R. Lucchesi, Petra M. Kuhnert, Jenny L. Davis, and Lexing Xie. 2022. Smallset Timelines: A Visual Representation of Data Preprocessing Decisions. In 2022 ACM Conference on Fairness, Accountability, and Transparency (FAccT β22). Association for Computing Machinery, New York, NY, USA, 1136β1153. https://doi.org/10.1145/3531146.3533175
@inproceedings{SmallsetTimelines,
author = {Lucchesi, Lydia R. and Kuhnert, Petra M. and Davis, Jenny L. and Xie, Lexing},
title = {Smallset Timelines: A Visual Representation of Data Preprocessing Decisions},
year = {2022},
isbn = {9781450393522},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
url = {https://doi.org/10.1145/3531146.3533175},
doi = {10.1145/3531146.3533175},
location = {Seoul, Republic of Korea},
series = {FAccT '22}
}
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.