The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
A tidy, pipe-friendly toolkit for reproducible web crawling and structured data collection, inspired by the architecture of the 'Crawlee' library. Provides a unified crawler with a deduplicating, resumable request queue, content-type aware handlers, structured storage backends and rich console logging via 'cli'. Supports crawling HTML pages, sitemaps, RSS and Atom feeds and PDF documents, with optional headless-browser rendering and helpers for retrieval-augmented generation.
| Version: | 0.1.0 |
| Depends: | R (≥ 4.1.0) |
| Imports: | cli, httr2, R6, rlang, rvest, tibble, vctrs, xml2 |
| Suggests: | arrow, chromote, DBI, dplyr, duckdb, httptest2, jsonlite, knitr, later, nanoparquet, pdftools, polite, promises, rmarkdown, testthat (≥ 3.0.0) |
| Published: | 2026-07-03 |
| DOI: | 10.32614/CRAN.package.crawlee (may not be active yet) |
| Author: | Andre Leite [aut, cre], Marcos Wasilew [aut], Hugo Vasconcelos [aut], Carlos Amorin [aut], Diogo Bezerra [aut] |
| Maintainer: | Andre Leite <leite at castlab.org> |
| BugReports: | https://github.com/StrategicProjects/crawlee/issues |
| License: | MIT + file LICENSE |
| URL: | https://github.com/StrategicProjects/crawlee, https://strategicprojects.github.io/crawlee/ |
| NeedsCompilation: | no |
| Language: | en-US |
| Materials: | README, NEWS |
| CRAN checks: | crawlee results |
| Reference manual: | crawlee.html , crawlee.pdf |
| Vignettes: |
Getting started with crawlee (source, R code) Crawling a website (source, R code) A RAG pipeline (source, R code) Scaling and politeness (source, R code) Storage and resumable runs (source, R code) |
| Package source: | crawlee_0.1.0.tar.gz |
| Windows binaries: | r-devel: not available, r-release: not available, r-oldrel: not available |
| macOS binaries: | r-release (arm64): crawlee_0.1.0.tgz, r-oldrel (arm64): crawlee_0.1.0.tgz, r-release (x86_64): not available, r-oldrel (x86_64): crawlee_0.1.0.tgz |
Please use the canonical form https://CRAN.R-project.org/package=crawlee to link to this page.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.