The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
Datasets used in package examples are such an important part of making a package understandable and usable, but is often overlooked.
In developing the heplots
package I collected a large collection of data sets illustrating a
variety of multivariate linear models with some an analyses, and graphical displays. Each of these have much more than the
usual stub examples, that often look like:
data(dataset)
# str(dataset); plot(dataset)
But .Rd
, and now roxygen
, don’t make it easy to work with numerous datasets in a package, or, more importantly, to document what they illustrate. I’m showing the work to create this vignette, in case these ideas are useful to others.
In this release, I started with a file generated by:
vcdExtra::datasets("heplots") |> head(4)
#> Item class dim Title
#> 1 AddHealth data.frame 4344x3 Adolescent Mental Health Data
#> 2 Adopted data.frame 62x6 Adopted Children
#> 3 Bees data.frame 246x6 Captive and maltreated bees
#> 4 Diabetes data.frame 145x6 Diabetes Dataset
Then, in the roxygen documentation, I added @concept
tags to classify these datasets according to methods used.
(@concept
entries are indexed with the package, so they work via help.search()
)
For example,
the documentation for the AddHealth
data contains these lines:
#' @name AddHealth
#' @docType data
...
#' @keywords datasets
#' @concept MANOVA
#' @concept ordered
With standard
processing, these concepts along with the keywords, appear in the Index section of the manual constructed by devtools::build_manual()
. In the pkgdown
site for this package, they are also searchable in the search box.
With a bit of extra processing, I created a dataset datasets.csv used below.
The main methods used in the example datasets are shown in the table below:
In addition, a few examples illustrate special handling for linear hypotheses concerning factors:
The dataset names are linked to the documentation with graphical output on the
pkgdown
website, [http://friendly.github.io/heplots/].
library(here)
library(dplyr)
library(tinytable)
#dsets <- read.csv(here::here("extra", "datasets.csv")) # doesn't work in a vignette
dsets <- read.csv("https://raw.githubusercontent.com/friendly/heplots/master/extra/datasets.csv")
dsets <- dsets |>
dplyr::select(-X) |>
arrange(tolower(dataset))
# link dataset to pkgdown doc
refurl <- "http://friendly.github.io/heplots/reference/"
dsets <- dsets |>
mutate(dataset = glue::glue("[{dataset}]({refurl}{dataset}.html)"))
#knitr::kable(dsets)
tinytable::tt(dsets) |> format_tt(markdown = TRUE)
dataset | rows | cols | title | tags |
---|---|---|---|---|
AddHealth | 4344 | 3 | Adolescent Health Data | MANOVA ordered |
Adopted | 62 | 6 | Adopted Children | MMRA repeated |
Bees | 246 | 6 | Captive and maltreated bees | MANOVA |
Diabetes | 145 | 6 | Diabetes Dataset | MANOVA |
dogfood | 16 | 3 | Dogfood Preferences | MANOVA contrasts candisc |
FootHead | 90 | 7 | Head measurements of football players | MANOVA contrasts |
Headache | 98 | 6 | Treatment of Headache Sufferers for Sensitivity to Noise | MANOVA repeated |
Hernior | 32 | 9 | Recovery from Elective Herniorrhaphy | MMRA candisc |
Iwasaki_Big_Five | 203 | 7 | Personality Traits of Cultural Groups | MANOVA |
mathscore | 12 | 3 | Math scores for basic math and word problems | MANOVA |
MockJury | 114 | 17 | Effects Of Physical Attractiveness Upon Mock Jury Decisions | MANOVA candisc |
NeuroCog | 242 | 10 | Neurocognitive Measures in Psychiatric Groups | MANOVA candisc |
NLSY | 243 | 6 | National Longitudinal Survey of Youth Data | MMRA |
oral | 56 | 5 | Effect of Delay in Oral Practice in Second Language Learning | MANOVA |
Oslo | 332 | 14 | Oslo Transect Subset Data | MANOVA candisc |
Overdose | 17 | 7 | Overdose of Amitriptyline | MMRA cancor |
Parenting | 60 | 4 | Father Parenting Competence | MANOVA contrasts |
peng | 333 | 8 | Size measurements for adult foraging penguins near Palmer Station | MANOVA |
Plastic | 20 | 5 | Plastic Film Data | MANOVA |
Pottery2 | 48 | 12 | Chemical Analysis of Romano-British Pottery | MANOVA candisc |
Probe | 11 | 5 | Response Speed in a Probe Experiment | MANOVA repeated |
RatWeight | 27 | 6 | Weight Gain in Rats Exposed to Thiouracil and Thyroxin | MANOVA repeated |
ReactTime | 10 | 6 | Reaction Time Data | repeated |
Rohwer | 69 | 10 | Rohwer Data Set | MMRA MANCOVA |
RootStock | 48 | 5 | Growth of Apple Trees from Different Root Stocks | MANOVA contrasts |
Sake | 30 | 10 | Taste Ratings of Japanese Rice Wine (Sake) | MMRA |
schooldata | 70 | 8 | School Data | MMRA robust |
Skulls | 150 | 5 | Egyptian Skulls | MANOVA contrasts |
SocGrades | 40 | 10 | Grades in a Sociology Course | MANOVA candisc |
SocialCog | 139 | 5 | Social Cognitive Measures in Psychiatric Groups | MANOVA candisc |
TIPI | 1799 | 16 | Data on the Ten Item Personality Inventory | MANOVA candisc |
VocabGrowth | 64 | 4 | Vocabulary growth data | repeated |
WeightLoss | 34 | 7 | Weight Loss Data | repeated |
This table can be inverted to list the datasets that illustrate each concept:
concepts <- dsets |>
select(dataset, tags) |>
tidyr::separate_longer_delim(tags, delim = " ") |>
arrange(tags, dataset) |>
summarize(datasets = toString(dataset), .by = tags) |>
rename(concept = tags)
#knitr::kable(concepts)
tinytable::tt(concepts) |> format_tt(markdown = TRUE)
concept | datasets |
---|---|
MANCOVA | Rohwer |
MANOVA | AddHealth, Bees, Diabetes, FootHead, Headache, Iwasaki_Big_Five, MockJury, NeuroCog, Oslo, Parenting, Plastic, Pottery2, Probe, RatWeight, RootStock, Skulls, SocGrades, SocialCog, TIPI, dogfood, mathscore, oral, peng |
MMRA | Adopted, Hernior, NLSY, Overdose, Rohwer, Sake, schooldata |
cancor | Overdose |
candisc | Hernior, MockJury, NeuroCog, Oslo, Pottery2, SocGrades, SocialCog, TIPI, dogfood |
contrasts | FootHead, Parenting, RootStock, Skulls, dogfood |
ordered | AddHealth |
repeated | Adopted, Headache, Probe, RatWeight, ReactTime, VocabGrowth, WeightLoss |
robust | schooldata |
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.