The hardware and bandwidth for this mirror is donated by dogado GmbH, the Webhosting and Full Service-Cloud Provider. Check out our Wordpress Tutorial.
If you wish to report a bug, or if you are interested in having us mirror your free-software or open-source project, please feel free to contact us at mirror[@]dogado.de.

puremoe

PubMed Unified REtrieval for Multi-Output Exploration

An R package that provides a single interface for accessing a range of NLM/PubMed databases, including PubMed abstract records, iCite bibliometric data, PubTator3 named entity annotations, and full-text entries from PubMed Central (PMC). This unified interface simplifies the data retrieval process, allowing users to interact with multiple PubMed services/APIs/output formats through a single R function.

The package also includes MeSH thesaurus resources as simple data frames, including Descriptor Terms, Descriptor Tree Structures, Supplementary Concept Terms, and Pharmacological Actions; it also includes descriptor-level word embeddings (Noh & Kavuluru 2021). Via the mesh-resources library.

Installation

Get the released version from CRAN:

install.packages('puremoe')

Or the development version from GitHub with:

remotes::install_github("jaytimm/puremoe")

Usage

The package has two basic functions: search_pubmed and get_records. The former fetches PMIDs from the PubMed API based on user search; the latter scrapes PMID records from a user-specified PubMed endpoint – pubmed_abstracts, pubmed_affiliations, pubtations, icites, or pmc_fulltext.

Search syntax is the same as that implemented in standard PubMed search.

pmids <- puremoe::search_pubmed('("political ideology"[TiAb])',
                                 use_pub_years = F)

# pmids <- puremoe::search_pubmed('immunity', 
#                                  use_pub_years = T,
#                                  start_year = 2022,
#                                  end_year = 2024) 

Get record-level data

pubmed <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_abstracts', 
                       cores = 3, 
                       sleep = 1) 

affiliations <- pmids |> 
  puremoe::get_records(endpoint = 'pubmed_affiliations', 
                       cores = 1, 
                       sleep = 0.5)

icites <- pmids |>
  puremoe::get_records(endpoint = 'icites',
                       cores = 3,
                       sleep = 0.25)

pubtations <- pmids |> 
  puremoe::get_records(endpoint = 'pubtations',
                       cores = 2)

When the endpoint is PMC, the get_records() function takes a vector of filepaths (from the PMC Open Access list) instead of PMIDs.

pmclist <- puremoe::data_pmc_list(use_persistent_storage = T)
pmc_pmids <- pmclist[PMID %in% pmids]

pmc_fulltext <- pmc_pmids$fpath[1:5] |> 
  puremoe::get_records(endpoint = 'pmc_fulltext', cores = 1)

Summary

These binaries (installable software) and packages are in development.
They may not be fully stable and should be used with caution. We make no claims about them.
Health stats visible at Monitor.