rplos tutorial

The rplos package interacts with the API services of PLoS (Public Library of Science) Journals. In order to use rplos, you need to obtain your own key to their API services. Instruction for obtaining and installing keys so they load automatically when you launch R are on our GitHub Wiki page Installation and use of API keys.

This tutorial will go through three use cases to demonstrate the kinds of things possible in rplos.

Load package from CRAN

install.packages("rplos")
library(rplos)

Search across PLoS papers in various sections of papers

searchplos is a general search, and in this case searches for the term Helianthus and returns the DOI's of matching papers

searchplos(q= "Helianthus", fl= "id", limit = 5)
##                             id
## 1 10.1371/journal.pone.0057533
## 2 10.1371/journal.pone.0045899
## 3 10.1371/journal.pone.0037191
## 4 10.1371/journal.pone.0051360
## 5 10.1371/journal.pone.0070347

Get only full article DOIs

searchplos(q="*:*", fl='id', fq='doc_type:full', start=0, limit=5)
##                             id
## 1 10.1371/journal.pntd.0001738
## 2 10.1371/journal.pone.0035044
## 3 10.1371/journal.pone.0067070
## 4 10.1371/journal.pone.0095362
## 5 10.1371/journal.pone.0031015

Get DOIs for only PLoS One articles

searchplos(q="*:*", fl='id', fq='cross_published_journal_key:PLoSONE', start=0, limit=5)
##                                        id
## 1      10.1371/journal.pone.0067079/title
## 2   10.1371/journal.pone.0067079/abstract
## 3 10.1371/journal.pone.0067079/references
## 4       10.1371/journal.pone.0067079/body
## 5       10.1371/journal.pone.0095366/body

Get DOIs for full article in PLoS One

searchplos(q="*:*", fl='id',
   fq=list('cross_published_journal_key:PLoSONE', 'doc_type:full'),
   start=0, limit=5)
##                             id
## 1 10.1371/journal.pone.0035044
## 2 10.1371/journal.pone.0067070
## 3 10.1371/journal.pone.0095362
## 4 10.1371/journal.pone.0031015
## 5 10.1371/journal.pone.0059231

Serch for many terms

q <- c('ecology','evolution','science')
lapply(q, function(x) searchplos(x, limit=2))
## [[1]]
##                             id
## 1 10.1371/journal.pone.0059813
## 2 10.1371/journal.pone.0001248
## 
## [[2]]
##                                                        id
## 1 10.1371/annotation/c55d5089-ba2f-449d-8696-2bc8395978db
## 2 10.1371/annotation/9773af53-a076-4946-a3f1-83914226c10d
## 
## [[3]]
##                             id
## 1 10.1371/journal.pbio.0020122
## 2 10.1371/journal.pbio.1001166

Search on specific sections

A suite of functions were created as light wrappers around searchplos as a shorthand to search specific sections of a paper.

plosauthor searches across authors, and in this case returns the authors of the matching papers. the fl parameter determines what is returned

plosauthor(q = "Eisen", fl = "author", limit = 5)
##             author
## 1 Jonathan A Eisen
## 2 Jonathan A Eisen
## 3 Jonathan A Eisen
## 4 Jonathan A Eisen
## 5 Jonathan A Eisen

plosabstract searches across abstracts, and in this case returns the id and title of the matching papers

plosabstract(q = 'drosophila', fl='id,title', limit = 5)
##                             id
## 1 10.1371/journal.pbio.0040198
## 2 10.1371/journal.pbio.0030246
## 3 10.1371/journal.pone.0012421
## 4 10.1371/journal.pbio.0030389
## 5 10.1371/journal.pone.0002817
##                                                                             title
## 1                                                                     All for All
## 2                                     School Students as Drosophila Experimenters
## 3                            Host Range and Specificity of the Drosophila C Virus
## 4                     New Environments Set the Stage for Changing Tastes in Mates
## 5 High-Resolution, In Vivo Magnetic Resonance Imaging of Drosophila at 18.8 Tesla

plostitle searches across titles, and in this case returns the title and journal of the matching papers

plostitle(q='drosophila', fl='title,journal', limit=5)
##         journal                                                 title
## 1  PLoS Biology           Combinatorial Coding for Drosophila Neurons
## 2  PLoS Biology           School Students as Drosophila Experimenters
## 3      PLoS ONE              A Tripartite Synapse Model in Drosophila
## 4      PLoS ONE                             A DNA Virus of Drosophila
## 5 PLoS Genetics Phenotypic Plasticity of the Drosophila Transcriptome

Faceted search

Facet by journal

facetplos(q='*:*', facet.field='journal')
## $facet_queries
## NULL
## 
## $facet_fields
## $facet_fields$journal
##                                  X1     X2
## 1                          plos one 792679
## 2                     plos genetics  36699
## 3                    plos pathogens  32259
## 4        plos computational biology  26944
## 5                      plos biology  25123
## 6  plos neglected tropical diseases  21071
## 7                     plos medicine  17693
## 8              plos clinical trials    521
## 9                  plos collections     20
## 10                     plos medicin      9
## 
## 
## $facet_dates
## NULL
## 
## $facet_ranges
## NULL

Using facet.query to get counts

facetplos(q='*:*', facet.field='journal', facet.query='cell,bird')
## $facet_queries
##        term value
## 1 cell,bird    14
## 
## $facet_fields
## $facet_fields$journal
##                                  X1     X2
## 1                          plos one 792679
## 2                     plos genetics  36699
## 3                    plos pathogens  32259
## 4        plos computational biology  26944
## 5                      plos biology  25123
## 6  plos neglected tropical diseases  21071
## 7                     plos medicine  17693
## 8              plos clinical trials    521
## 9                  plos collections     20
## 10                     plos medicin      9
## 
## 
## $facet_dates
## NULL
## 
## $facet_ranges
## NULL

Date faceting

facetplos(q='*:*', url=url, facet.date='publication_date',
  facet.date.start='NOW/DAY-5DAYS', facet.date.end='NOW', facet.date.gap='+1DAY')
## $facet_queries
## NULL
## 
## $facet_fields
## NULL
## 
## $facet_dates
## $facet_dates$publication_date
##                   date value
## 1 2014-05-08T00:00:00Z  2872
## 2 2014-05-09T00:00:00Z  1208
## 3 2014-05-10T00:00:00Z     0
## 4 2014-05-11T00:00:00Z  1111
## 5 2014-05-12T00:00:00Z  2379
## 6 2014-05-13T00:00:00Z  1268
## 
## 
## $facet_ranges
## NULL

Highlighted search

Search for the term alcohol in the abstracts of articles, return only 10 results

highplos(q='alcohol', hl.fl = 'abstract', rows=2)
## $`10.1371/journal.pmed.0040151`
## $`10.1371/journal.pmed.0040151`$abstract
## [1] "Background: <em>Alcohol</em> consumption causes an estimated 4% of the global disease burden, prompting"
## 
## 
## $`10.1371/journal.pone.0027752`
## $`10.1371/journal.pone.0027752`$abstract
## [1] "Background: The negative influences of <em>alcohol</em> on TB management with regard to delays in seeking"

Search for the term alcohol in the abstracts of articles, and return fragment size of 20 characters, return only 5 results

highplos(q='alcohol', hl.fl='abstract', hl.fragsize=20, rows=2)
## $`10.1371/journal.pmed.0040151`
## $`10.1371/journal.pmed.0040151`$abstract
## [1] "Background: <em>Alcohol</em>"
## 
## 
## $`10.1371/journal.pone.0027752`
## $`10.1371/journal.pone.0027752`$abstract
## [1] " of <em>alcohol</em> on TB management"

Search for the term experiment across all sections of an article, return id (DOI) and title fl only, search in full articles only (via fq='doc_type:full'), and return only 10 results

highplos(q='everything:"experiment"', fl='id,title', fq='doc_type:full',
   rows=2)
## $`10.1371/journal.pone.0039681`
## $`10.1371/journal.pone.0039681`$everything
## [1] " Selection of Transcriptomics <em>Experiments</em> Improves Guilt-by-Association Analyses Transcriptomics <em>Experiment</em>"
## 
## 
## $`10.1371/annotation/9b8741e2-0f5f-49f9-9eaa-1b0cb9b8d25f`
## $`10.1371/annotation/9b8741e2-0f5f-49f9-9eaa-1b0cb9b8d25f`$everything
## [1] " in the labels under the bars. The labels should read <em>Experiment</em> 3 / <em>Experiment</em> 4 instead of <em>Experiment</em>"

Search for terms and visualize results as a histogram OR as a plot through time

plosword allows you to search for 1 to K words and visualize the results as a histogram, comparing number of matching papers for each word

out <- plosword(list("monkey", "Helianthus", "sunflower", "protein", "whale"),
    vis = "TRUE")
out$table
##   No_Articles       Term
## 1        7340     monkey
## 2         245 Helianthus
## 3         673  sunflower
## 4       79470    protein
## 5         876      whale
out$plot

plot of chunk plosword1plot

You can also pass in curl options, in this case get verbose information on the curl call.

plosword('Helianthus', callopts=list(verbose=TRUE))
## Number of articles with search term 
##                                 245

Visualize terms

plot_througtime allows you to search for up to 2 words and visualize the results as a line plot through time, comparing number of articles matching through time. Visualize with the ggplot2 package, only up to two terms for now.

plot_throughtime(terms = "phylogeny", limit = 200)

plot of chunk throughtime1

OR using google visualizations through the googleVis package, check it your self using, e.g. (not shown here)

plot_throughtime(terms = list("drosophila", "flower"), limit = 200, gvis = TRUE)

…And a google visualization will render on your local browser and you can play with three types of plots (point, histogram, line), all through time. The plot is not shown here, but try it out for yourself!!