The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

RF100 Dataset Catalog

Overview

The RoboFlow 100 (RF100) benchmark consists of 34 diverse object detection datasets organized into 6 collections. This vignette provides a comprehensive catalog to help you find the right dataset for your task.

The RF100 datasets cover a wide range of domains including:

Example: Finding a Photovoltaic Dataset

One of the motivations for this catalog was answering questions like: “Is there a photovoltaic dataset in torchvision?”

# Search for solar/photovoltaic datasets
search_rf100("solar")
search_rf100("photovoltaic")

# Result shows:
# - solar_panel in infrared collection
# - solar_panel in damage collection

Complete Catalog

Here’s the complete catalog of all RF100 datasets:

library(torchvision)
library(knitr)

catalog <- get_rf100_catalog()

# Display key columns
kable(catalog[, c("collection", "dataset", "description", "total_size_mb", "estimated_images")])

Collections

Biology Collection (9 datasets)

Microscopy and biological imaging datasets for research and diagnostics:

search_rf100(collection = "biology")

Available datasets:

Medical Collection (8 datasets)

Medical imaging datasets for clinical and research applications:

search_rf100(collection = "medical")

Available datasets:

Infrared Collection (4 datasets)

Thermal and infrared imaging datasets:

search_rf100(collection = "infrared")

Available datasets:

Damage Collection (3 datasets)

Infrastructure damage and defect detection:

search_rf100(collection = "damage")

Available datasets:

Underwater Collection (4 datasets)

Marine and underwater imaging datasets:

search_rf100(collection = "underwater")

Available datasets:

Document Collection (6 datasets)

Document analysis and OCR datasets:

search_rf100(collection = "document")

Available datasets:

Usage Example

Once you’ve found a dataset, loading it is straightforward:

library(torchvision)

# Search for blood cell dataset
search_rf100("blood")

# Load the dataset
ds <- rf100_biology_collection(
  dataset = "blood_cell",
  split = "train",
  download = TRUE
)

# Inspect a sample
item <- ds[1]
print(item$y$labels)  # Object classes
print(item$y$boxes)   # Bounding boxes

# Visualize with bounding boxes
boxed <- draw_bounding_boxes(item)
tensor_image_browse(boxed)

Dataset Statistics

catalog <- get_rf100_catalog()

# Total size of all datasets
sum(catalog$total_size_mb) / 1024  # In GB

# Datasets by size
catalog[order(-catalog$total_size_mb), c("dataset", "collection", "total_size_mb")]

# Smallest and largest datasets
catalog[which.min(catalog$total_size_mb), ]
catalog[which.max(catalog$total_size_mb), ]

# Average size by collection
aggregate(total_size_mb ~ collection, data = catalog, FUN = mean)

Filtering and Exploration

The catalog is a regular data frame, so you can use standard R operations:

# Find small datasets (< 20 MB total)
subset(catalog, total_size_mb < 20)

# Find large datasets (> 200 MB total)
subset(catalog, total_size_mb > 200)

# Find datasets with specific keywords
subset(catalog, grepl("tumor|cancer|disease", description, ignore.case = TRUE))

# Datasets with all three splits
subset(catalog, has_train & has_test & has_valid)

Additional Resources

Citation

If you use RF100 datasets in your research, please cite:

@article{roboflow100,
  title={Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
  author={Roboflow},
  journal={arXiv preprint},
  year={2022}
}

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.