The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

crawlee: Tidy Interface for Reproducible Web Crawling

A tidy, pipe-friendly toolkit for reproducible web crawling and structured data collection, inspired by the architecture of the 'Crawlee' library. Provides a unified crawler with a deduplicating, resumable request queue, content-type aware handlers, structured storage backends and rich console logging via 'cli'. Supports crawling HTML pages, sitemaps, RSS and Atom feeds and PDF documents, with optional headless-browser rendering and helpers for retrieval-augmented generation.

Version: 0.1.0
Depends: R (≥ 4.1.0)
Imports: cli, httr2, R6, rlang, rvest, tibble, vctrs, xml2
Suggests: arrow, chromote, DBI, dplyr, duckdb, httptest2, jsonlite, knitr, later, nanoparquet, pdftools, polite, promises, rmarkdown, testthat (≥ 3.0.0)
Published: 2026-07-03
DOI: 10.32614/CRAN.package.crawlee (may not be active yet)
Author: Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut]
Maintainer: Andre Leite <leite at castlab.org>
BugReports: https://github.com/StrategicProjects/crawlee/issues
License: MIT + file LICENSE
URL: https://github.com/StrategicProjects/crawlee, https://strategicprojects.github.io/crawlee/
NeedsCompilation: no
Language: en-US
Materials: README, NEWS
CRAN checks: crawlee results

Documentation:

Reference manual: crawlee.html , crawlee.pdf
Vignettes: Getting started with crawlee (source, R code)
Crawling a website (source, R code)
A RAG pipeline (source, R code)
Scaling and politeness (source, R code)
Storage and resumable runs (source, R code)

Downloads:

Package source: crawlee_0.1.0.tar.gz
Windows binaries: r-devel: not available, r-release: not available, r-oldrel: not available
macOS binaries: r-release (arm64): crawlee_0.1.0.tgz, r-oldrel (arm64): crawlee_0.1.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): crawlee_0.1.0.tgz

Linking:

Please use the canonical form https://CRAN.R-project.org/package=crawlee to link to this page.

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.