The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.
easyScieloPak is an R package that allows you to search and access academic articles from SciELO programmatically.
The main goal of easyScieloPak is to simplify the process of querying SciELO from R by: - Making queries readable and reproducible. - Allowing filters like year, collection (country), language, journal, and subject category. - Handling pagination, data parsing, and cleaning automatically. - Providing clear and validated feedback when a query is incorrect. - Minimizing errors due to anti-scraping measures (e.g., 403 HTTP errors).
You can install the development version of
easyScieloPak from GitHub using either
devtools
or remotes
:
install.packages(“devtools”) devtools::install_github(“https://github.com/PabloIxcamparij/easyScieloPack.git”)
install.packages(“remotes”) remotes::install_github(“https://github.com/PabloIxcamparij/easyScieloPack.git”)
library(easyScieloPak)
df <- search_scielo(“salud ambiental”, collections = “Ecuador”, languages = “es”, n_max = 5) head(df)
df <- search_scielo(“ecology”, collections = “Chile”, languages = “en”, n_max = 8)
View(df) # View results in RStudio
Each filter only supports one value at a time (e.g., only one country, language, journal, or category).
Web scraping may be sensitive to structural changes in the SciELO website.
The number of fetched articles is limited by n_max
(default fallback is 100).
No official API is available, so the package depends on website scraping.
Rate-limiting / Blocking (403 errors): In some cases, SciELO may detect automated access and temporarily block the search, resulting in a 403 HTTP error. This is a common limitation of scraping. If this occurs, try the following:
Note: Reinstalling the package has no direct effect on the block.
-Default fallback limit: If the total number of available results cannot be determined, the query will default to fetching a maximum of 100 articles.
Recent Improvements -Rotating User-Agents: Each request uses a different User-Agent string (Chrome, Firefox, Safari variants) to appear more like a real browser and avoid blocking.
-Random delays between requests reduce server load and minimize scraping detection.
-Retry logic: If a request fails, the package retries automatically with a different User-Agent.
The current version of easyScieloPak
is fully functional
for basic academic exploration through SciELO. However, the following
enhancements are planned for future versions:
Support for multiple filter values: Currently,
each filter (e.g., language, category, journal) only accepts a single
value. Future versions aim to support multiple values for broader and
more flexible queries (e.g.,
languages("es", "en", "pt")
).
Improved scraping resistance: We plan to implement smarter mechanisms to reduce the chances of triggering SciELO’s anti-scraping protections (e.g., rotating user agents, request throttling, caching mechanisms).
Caching and offline mode: Possibility to cache previous search results locally for offline use or repeated queries.
Enhanced error diagnostics: Provide clearer messages and helper functions when 403 or parsing issues occur.
Journal/code normalization functions: Automatic mapping of journal names to their normalized internal identifiers.
SciELO is a multidisciplinary open-access platform hosting scientific journals from over 15 countries. It plays a vital role in disseminating research output from Latin America and beyond.
This package provides a lightweight, unofficial method to interact with SciELO’s search interface.
Feel free to open issues or submit pull requests to improve functionality, usability, or documentation.
These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.