RF100 Dataset Catalog

The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

Overview

The RoboFlow 100 (RF100) benchmark consists of 34 diverse object detection datasets organized into 6 collections. This vignette provides a comprehensive catalog to help you find the right dataset for your task.

The RF100 datasets cover a wide range of domains including:

Biology: Microscopy, cells, bacteria, parasites (9 datasets)
Medical: X-rays, MRI, pathology (8 datasets)
Infrared: Thermal imaging, FLIR cameras (4 datasets)
Damage: Defect detection, infrastructure inspection (3 datasets)
Underwater: Marine life, coral, infrastructure (4 datasets)
Document: OCR, document parsing, diagrams (6 datasets)

Quick Search

The easiest way to find datasets is using the search functions:

library(torchvision)

# Search for specific topics
search_rf100("cell")        # Find cell-related datasets
search_rf100("solar")       # Find solar panel datasets
search_rf100("x-ray")       # Find X-ray datasets

# List all datasets in a collection
search_rf100(collection = "biology")
search_rf100(collection = "medical")

# View complete catalog
catalog <- get_rf100_catalog()
View(catalog)

Example: Finding a Photovoltaic Dataset

One of the motivations for this catalog was answering questions like: “Is there a photovoltaic dataset in torchvision?”

# Search for solar/photovoltaic datasets
search_rf100("solar")
search_rf100("photovoltaic")

# Result shows:
# - solar_panel in infrared collection
# - solar_panel in damage collection

Complete Catalog

Here’s the complete catalog of all RF100 datasets:

library(torchvision)
library(knitr)

catalog <- get_rf100_catalog()

# Display key columns
kable(catalog[, c("collection", "dataset", "description", "total_size_mb", "estimated_images")])

Collections

Biology Collection (9 datasets)

Microscopy and biological imaging datasets for research and diagnostics:

search_rf100(collection = "biology")

Available datasets:

stomata_cell: Plant stomata cells for biology research
blood_cell: Blood cell detection (RBC, WBC, platelets)
parasite: Parasite detection in microscopy images
cell: General cell detection in microscopy
bacteria: Bacteria detection in microscopy images
cotton_disease: Cotton plant disease detection
mitosis: Mitosis phase detection in cell images
phage: Bacteriophage detection in microscopy
liver_disease: Liver disease pathology detection

Medical Collection (8 datasets)

Medical imaging datasets for clinical and research applications:

search_rf100(collection = "medical")

Available datasets:

radio_signal: Radio signal detection in medical imaging
rheumatology: Rheumatology X-ray abnormality detection
knee: ACL and knee X-ray analysis
abdomen_mri: Abdomen MRI organ detection
brain_axial_mri: Brain axial MRI structure detection
gynecology_mri: Gynecology MRI structure detection
brain_tumor: Brain tumor detection in MRI scans
fracture: Bone fracture detection in X-rays

Infrared Collection (4 datasets)

Thermal and infrared imaging datasets:

search_rf100(collection = "infrared")

Available datasets:

thermal_dog_and_people: Thermal imaging of dogs and people
solar_panel: Solar panel detection in infrared imagery
thermal_cheetah: Thermal imaging of cheetahs
ir_object: FLIR camera object detection

Damage Collection (3 datasets)

Infrastructure damage and defect detection:

search_rf100(collection = "damage")

Available datasets:

liquid_crystals: 4-fold defect detection in LCD displays
solar_panel: Solar panel defect and damage detection
asbestos: Asbestos detection for safety inspection

Underwater Collection (4 datasets)

Marine and underwater imaging datasets:

search_rf100(collection = "underwater")

Available datasets:

pipes: Underwater pipe detection for infrastructure
aquarium: Aquarium fish and species detection
objects: Underwater object detection
coral: Coral reef detection and monitoring

Document Collection (6 datasets)

Document analysis and OCR datasets:

search_rf100(collection = "document")

Available datasets:

tweeter_post: Twitter post element detection
tweeter_profile: Twitter profile element detection
document_part: Document structure and part detection
activity_diagram: Activity diagram element detection
signature: Signature detection in documents
paper_part: Academic paper structure detection

Usage Example

Once you’ve found a dataset, loading it is straightforward:

library(torchvision)

# Search for blood cell dataset
search_rf100("blood")

# Load the dataset
ds <- rf100_biology_collection(
  dataset = "blood_cell",
  split = "train",
  download = TRUE
)

# Inspect a sample
item <- ds[1]
print(item$y$labels)  # Object classes
print(item$y$boxes)   # Bounding boxes

# Visualize with bounding boxes
boxed <- draw_bounding_boxes(item)
tensor_image_browse(boxed)

Dataset Statistics

catalog <- get_rf100_catalog()

# Total size of all datasets
sum(catalog$total_size_mb) / 1024  # In GB

# Datasets by size
catalog[order(-catalog$total_size_mb), c("dataset", "collection", "total_size_mb")]

# Smallest and largest datasets
catalog[which.min(catalog$total_size_mb), ]
catalog[which.max(catalog$total_size_mb), ]

# Average size by collection
aggregate(total_size_mb ~ collection, data = catalog, FUN = mean)

Filtering and Exploration

The catalog is a regular data frame, so you can use standard R operations:

# Find small datasets (< 20 MB total)
subset(catalog, total_size_mb < 20)

# Find large datasets (> 200 MB total)
subset(catalog, total_size_mb > 200)

# Find datasets with specific keywords
subset(catalog, grepl("tumor|cancer|disease", description, ignore.case = TRUE))

# Datasets with all three splits
subset(catalog, has_train & has_test & has_valid)

Additional Resources

RoboFlow Universe: Browse datasets at https://universe.roboflow.com/browse/
Collection Functions: See ?rf100_biology_collection, ?rf100_medical_collection, etc.
Visualization: See ?draw_bounding_boxes for visualizing detections

Citation

If you use RF100 datasets in your research, please cite:

@article{roboflow100,
  title={Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
  author={Roboflow},
  journal={arXiv preprint},
  year={2022}
}

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.