Introduction to drughelper

Javier García & Fernando Carazo

2021-06-16

Drughelper is an R package to identify and correct some drug names of the user’s interest in order to easily work with them. Drughelper is constantly updating its dataset (once a month) from Chembl’s database.

Installation

install.packages("drughelper")

Drughelper functionality

Drughelper has been created to be as interactive as possible, only one function is needed to get the main information of the input drugs. Also, a vector with the name or synonyms of required drugs is needed as an argument to the function:

library(drughelper)

vectorofdrugs <- c("Procaine", "Furazosin", "Embelin", "NotADrug")

What does drughelper return

checkDrugSynonym finds possible synonyms for each one of the drugs in the input and returns a dataframe with the best matched synonym for each drug. In the “matching” column, three types of matchings can occur: Exact match, if the drug matches any of the possible synonyms, or either it matches the name of the drug itself. If it does not appear exactly, an approximation may be found, in that case, an approximate matching is returned. Finally if a drug is not found “No match” will be returned.

checkDrugSynonym(vectorofdrugs)
#>           x Approved DrugHelperID     Suggested.Synonym Cl.Phase
#> 1  Procaine     TRUE       DH0250              PROCAINE        4
#> 2 Furazosin     TRUE         DH01              PRAZOSIN        4
#> 3   Embelin     TRUE      DH01648 CLOBETASOL PROPIONATE        4
#> 4  NotADrug    FALSE         <NA>              NOTADRUG       NA
#>                      Matching
#> 1                 Exact match
#> 2                 Exact match
#> 3           Approximate match
#> 4 No match / clinical phase 0

Download data manually

The dataset used is downloaded automatically when checkDrugSynonym is called, but can also be downloaded manually:

downloadAbsentFile()

If data has already been downloaded, the function will not download anything.

Case studies

Two case studies are explained, in which we compare the number of drug matches that appear in different studies both with and without the Drughelper function. The objective is to see if when comparing a drug name with all its synonyms, more matches appear or not.

Case 1:

In this first approach, we have compared four different studies, three of them from the PharmacoGx project and the other one from the BeatAML functional genomic study.

PharmacoGx is an R package which has data from the cancer cell line encyclopedia (CCLE), the Genomics of Drug Sensitivity Cancer project (GDSC) and the connectivity map (CMAP) from the broad institute. They have 24, 139 and 5 unique drugs, respectively.

BeatAML is a program which contains different datasets on acute myeloid leukemia (AML). In this case the data used belongs to the drug response dataset, containing 122 unique drugs.

Case 2:

Now a large dataset of 1996 drugs from #nombre del proyecto# is compared with the previous ones