The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
The RoboFlow 100 (RF100) benchmark consists of 34 diverse object detection datasets organized into 6 collections. This vignette provides a comprehensive catalog to help you find the right dataset for your task.
The RF100 datasets cover a wide range of domains including:
The easiest way to find datasets is using the search functions:
library(torchvision)
# Search for specific topics
search_rf100("cell") # Find cell-related datasets
search_rf100("solar") # Find solar panel datasets
search_rf100("x-ray") # Find X-ray datasets
# List all datasets in a collection
search_rf100(collection = "biology")
search_rf100(collection = "medical")
# View complete catalog
catalog <- get_rf100_catalog()
View(catalog)One of the motivations for this catalog was answering questions like: “Is there a photovoltaic dataset in torchvision?”
Here’s the complete catalog of all RF100 datasets:
Microscopy and biological imaging datasets for research and diagnostics:
Available datasets:
stomata_cell: Plant stomata cells for biology
researchblood_cell: Blood cell detection (RBC, WBC,
platelets)parasite: Parasite detection in microscopy imagescell: General cell detection in microscopybacteria: Bacteria detection in microscopy imagescotton_disease: Cotton plant disease detectionmitosis: Mitosis phase detection in cell imagesphage: Bacteriophage detection in microscopyliver_disease: Liver disease pathology detectionMedical imaging datasets for clinical and research applications:
Available datasets:
radio_signal: Radio signal detection in medical
imagingrheumatology: Rheumatology X-ray abnormality
detectionknee: ACL and knee X-ray analysisabdomen_mri: Abdomen MRI organ detectionbrain_axial_mri: Brain axial MRI structure
detectiongynecology_mri: Gynecology MRI structure detectionbrain_tumor: Brain tumor detection in MRI scansfracture: Bone fracture detection in X-raysThermal and infrared imaging datasets:
Available datasets:
thermal_dog_and_people: Thermal imaging of dogs and
peoplesolar_panel: Solar panel detection in infrared
imagerythermal_cheetah: Thermal imaging of cheetahsir_object: FLIR camera object detectionInfrastructure damage and defect detection:
Available datasets:
liquid_crystals: 4-fold defect detection in LCD
displayssolar_panel: Solar panel defect and damage
detectionasbestos: Asbestos detection for safety inspectionMarine and underwater imaging datasets:
Available datasets:
pipes: Underwater pipe detection for
infrastructureaquarium: Aquarium fish and species detectionobjects: Underwater object detectioncoral: Coral reef detection and monitoringDocument analysis and OCR datasets:
Available datasets:
tweeter_post: Twitter post element detectiontweeter_profile: Twitter profile element detectiondocument_part: Document structure and part
detectionactivity_diagram: Activity diagram element
detectionsignature: Signature detection in documentspaper_part: Academic paper structure detectionOnce you’ve found a dataset, loading it is straightforward:
library(torchvision)
# Search for blood cell dataset
search_rf100("blood")
# Load the dataset
ds <- rf100_biology_collection(
dataset = "blood_cell",
split = "train",
download = TRUE
)
# Inspect a sample
item <- ds[1]
print(item$y$labels) # Object classes
print(item$y$boxes) # Bounding boxes
# Visualize with bounding boxes
boxed <- draw_bounding_boxes(item)
tensor_image_browse(boxed)catalog <- get_rf100_catalog()
# Total size of all datasets
sum(catalog$total_size_mb) / 1024 # In GB
# Datasets by size
catalog[order(-catalog$total_size_mb), c("dataset", "collection", "total_size_mb")]
# Smallest and largest datasets
catalog[which.min(catalog$total_size_mb), ]
catalog[which.max(catalog$total_size_mb), ]
# Average size by collection
aggregate(total_size_mb ~ collection, data = catalog, FUN = mean)The catalog is a regular data frame, so you can use standard R operations:
# Find small datasets (< 20 MB total)
subset(catalog, total_size_mb < 20)
# Find large datasets (> 200 MB total)
subset(catalog, total_size_mb > 200)
# Find datasets with specific keywords
subset(catalog, grepl("tumor|cancer|disease", description, ignore.case = TRUE))
# Datasets with all three splits
subset(catalog, has_train & has_test & has_valid)?rf100_biology_collection,
?rf100_medical_collection, etc.?draw_bounding_boxes for visualizing detectionsIf you use RF100 datasets in your research, please cite:
@article{roboflow100,
title={Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
author={Roboflow},
journal={arXiv preprint},
year={2022}
}
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.