Drughelper is an R package to identify and correct some drug names of the user’s interest in order to easily work with them. Drughelper is constantly updating its dataset (once a month) from Chembl’s database.
install.packages("drughelper")
Drughelper has been created to be as interactive as possible, only one function is needed to get the main information of the input drugs. Also, a vector with the name or synonyms of required drugs is needed as an argument to the function:
library(drughelper)
<- c("Procaine", "Furazosin", "Embelin", "NotADrug") vectorofdrugs
checkDrugSynonym
finds possible synonyms for each one of the drugs in the input and returns a dataframe with the best matched synonym for each drug. In the “matching” column, three types of matchings can occur: Exact match, if the drug matches any of the possible synonyms, or either it matches the name of the drug itself. If it does not appear exactly, an approximation may be found, in that case, an approximate matching is returned. Finally if a drug is not found “No match” will be returned.
checkDrugSynonym(vectorofdrugs)
#> x Approved DrugHelperID Suggested.Synonym Cl.Phase
#> 1 Procaine TRUE DH0250 PROCAINE 4
#> 2 Furazosin TRUE DH01 PRAZOSIN 4
#> 3 Embelin TRUE DH01648 CLOBETASOL PROPIONATE 4
#> 4 NotADrug FALSE <NA> NOTADRUG NA
#> Matching
#> 1 Exact match
#> 2 Exact match
#> 3 Approximate match
#> 4 No match / clinical phase 0
The dataset used is downloaded automatically when checkDrugSynonym
is called, but can also be downloaded manually:
downloadAbsentFile()
If data has already been downloaded, the function will not download anything.
Two case studies are explained, in which we compare the number of drug matches that appear in different studies both with and without the Drughelper function. The objective is to see if when comparing a drug name with all its synonyms, more matches appear or not.
In this first approach, we have compared four different studies, three of them from the PharmacoGx project and the other one from the BeatAML functional genomic study.
PharmacoGx is an R package which has data from the cancer cell line encyclopedia (CCLE), the Genomics of Drug Sensitivity Cancer project (GDSC) and the connectivity map (CMAP) from the broad institute. They have 24, 139 and 5 unique drugs, respectively.
BeatAML is a program which contains different datasets on acute myeloid leukemia (AML). In this case the data used belongs to the drug response dataset, containing 122 unique drugs.
Now a large dataset of 1996 drugs from #nombre del proyecto# is compared with the previous ones