The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

CASIdata provides the datasets from Efron & Hastie (2016, ISBN: 9781108107952), Computer Age Statistical Inference: Algorithms, Evidence, and Data Science in an accessible R format for those who want to use them for teaching, study or to try to reproduce or extend analyses from the book. They were downloaded from Trevor Hastie’s web site, https://hastie.su.domains/CASI_files/DATA/, but quite a few files were messy and required some processing to make into R datasets.
Even so, some of the datasets may require data cleaning, renaming of variables, re-shaping or other tidying steps to be useful for analysis. But that’s part of learning.
This package is not yet on CRAN. You can install it from this GitHub repo or from R-universe
remotes::install.github("friendly/CASIdata")
install.packages('CASIdata', repos = c('https://friendly.r-universe.dev'))Loading package: CASIdata
| Dataset | dim | Title |
|---|---|---|
| DTI | 15443x4 | DTI Brain Imaging Data |
| als | 1822x371 | ALS Data |
| baseball | 18x3 | Baseball Batting Averages |
| bivnorm | 40x2 | Bivariate Normal Data |
| butterfly | 24x2 | Butterfly Species Data |
| cellinfusion | 25x4 | Cell Infusion Data |
| cholesterol | 164x2 | Cholesterol Data |
| diabetes | 442x12 | Diabetes Data |
| doseresponse | 11x2 | Dose Response Data |
| galaxy | 270x3 | Galaxy Data |
| haplotype | 197x102 | Human Ancestry Haplotype Data |
| insurance | 60x3 | Insurance Life Table Data |
| leukemia_small | 3571x72 | Leukemia Gene Expression Data (Small) |
| ncog | 96x6 | NCOG Head and Neck Cancer Data |
| nodes | 844x2 | Lymph Nodes Cancer Data |
| pediatric | 1620x7 | Pediatric Cancer Survival Data |
| police | 2748x1 | Police Racial Bias Data |
| prostz | 6032x1 | Prostate Cancer Z-values |
| student_score | 22x5 | Student Score Data |
| supernova | 39x11 | Type Ia Supernova Data |
| vasoconstriction | 39x2 | Vasoconstriction Data |
The following dataset appears in data-raw/CASI-save.R
but is not (yet) included in the package:
| Dataset | Reason |
|---|---|
SPAM |
Variable names need cleanup; requires mapping from UCI Spambase documentation |
See data-raw/missing-datasets.md for details on
resolving this.
These large datasets are referenced in the book but not included in the package due to size constraints. They can be downloaded directly from the sources listed below.
protein_kernel <- matrix(scan("https://hastie.su.domains/CASI_files/DATA/protein_kernel.txt", what=0), 1708, 1708)protein_label <- scan("https://hastie.su.domains/CASI_files/DATA/protein_label.txt", what=0)prostmat <- read.csv("https://hastie.su.domains/CASI_files/DATA/prostmat.csv")data-raw/missing-datasets.md for renaming code)leukemia_small.
leukemia_big <- read.csv("https://hastie.su.domains/CASI_files/DATA/leukemia_big.csv")Some datasets had variables renamed for clarity:
| Dataset | Original | Renamed |
|---|---|---|
butterfly |
x, y | k, count |
police |
X2.411 | z |
prostz |
X1.47236666651029 | z |
galaxy |
Reshaped from wide to long format with mag,
red, freq |
No examples yet.
library(CASIdata)
## basic example codeThese binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.