Drughelper is an R package to identify and correct some drug names of the user’s interest in order to easily work with them. Drughelper is constantly updating its dataset (once a month) from Chembl’s database.
install.packages("drughelper")
Drughelper has been created to be as interactive as possible, only one function is needed to get the main information of the input drugs. Also, a vector with the name or synonyms of required drugs is needed as an argument to the function:
library(drughelper)
<- c("Procaine", "Furazosin", "Embelin", "NotADrug") vectorofdrugs
The dataset used is downloaded automatically when checkDrugSynonym
is called, but can also be downloaded manually:
downloadAbsentFile()
If data has already been downloaded, the function will not download anything.
checkDrugSynonym
finds possible synonyms for each one of the drugs in the input and returns a dataframe with the best matched synonym for each drug. In the “matching” column, three types of matchings can occur: Exact match, if the drug matches any of the possible synonyms, or either it matches the name of the drug itself. If it does not appear exactly, an approximation may be found, in that case, an approximate matching is returned. Finally if a drug is not found “No match” will be returned.
checkDrugSynonym(vectorofdrugs)
Two case studies are explained, in which we compare the number of drug matches that appear in different studies both with and without the Drughelper function. The objective is to see if when comparing a drug name with all its synonyms, more matches appear or not.
In this first approach, we have compared four different studies, three of them from the PharmacoGx project and the other one from the BeatAML functional genomic study.
PharmacoGx is an R package which has data from the cancer cell line encyclopedia (CCLE), the Genomics of Drug Sensitivity Cancer project (GDSC) and the connectivity map (CMAP) from the broad institute. They have 24, 139 and 5 unique drugs, respectively.
BeatAML is a program which contains different datasets on acute myeloid leukemia (AML). In this case the data used belongs to the drug response dataset, containing 122 unique drugs.
head(checkDrugSynonym(vector_CCLE))
#> x Approved DrugHelperID Suggested.Synonym Cl.Phase Matching
#> 1 PD-0325901 TRUE DH05859 MIRDAMETINIB 2 Exact match
#> 2 17-AAG TRUE DH04193 TANESPIMYCIN 3 Exact match
#> 3 AEW541 TRUE DH07870 AEW-541 1 Exact match
#> 4 Nilotinib TRUE DH01297 NILOTINIB 4 Exact match
#> 5 PHA-665752 FALSE <NA> PHA-665752 0 or NA No match
#> 6 lapatinib TRUE DH0223 LAPATINIB 4 Exact match
head(checkDrugSynonym(vector_GDSC))
#> x Approved DrugHelperID Suggested.Synonym Cl.Phase Matching
#> 1 Doxorubicin TRUE DH0619 DOXORUBICIN 4 Exact match
#> 2 etoposide TRUE DH0586 ETOPOSIDE 4 Exact match
#> 3 Gemcitabine TRUE DH0604 GEMCITABINE 4 Exact match
#> 4 Mitomycin-C TRUE DH0210 MITOMYCIN 4 Exact match
#> 5 Vinorelbine TRUE DH01559 VINORELBINE 4 Exact match
#> 6 NSC-87877 FALSE <NA> NSC-87877 0 or NA No match
head(checkDrugSynonym(vector_CMAP))
#> x Approved DrugHelperID Suggested.Synonym Cl.Phase
#> 1 acetylsalicylic acid TRUE DH042 ASPIRIN 4
#> 2 rosiglitazone TRUE DH0286 ROSIGLITAZONE 4
#> 3 alvespimycin TRUE DH05770 ALVESPIMYCIN 4
#> 4 vorinostat TRUE DH0182 VORINOSTAT 4
#> 5 pioglitazone TRUE DH0284 PIOGLITAZONE 4
#> Matching
#> 1 Exact match
#> 2 Exact match
#> 3 Exact match
#> 4 Exact match
#> 5 Exact match
head(checkDrugSynonym(vBeatAML))
#> x Approved DrugHelperID Suggested.Synonym Cl.Phase
#> 1 17-AAG (Tanespimycin) TRUE DH04193 TANESPIMYCIN 3
#> 2 A-674563 FALSE <NA> A-674563 0 or NA
#> 3 ABT-737 TRUE DH07797 ABT 737 1
#> 4 AT7519 TRUE DH05915 AT-7519 2
#> 5 AZD1480 TRUE DH06030 AZD-1480 2
#> 6 Afatinib (BIBW-2992) TRUE DH01656 AFATINIB 4
#> Matching
#> 1 Exact match
#> 2 No match
#> 3 Exact match
#> 4 Exact match
#> 5 Exact match
#> 6 Exact match
A large dataset of 1996 drugs from DrugSniper is also compared with the other ones. DrugSniper is a tool to exploit loss-of-function screens, created by the bioengineering department in Tecnun, University of Navarra. The data used is called “Gene Info” and contained data about both approved and investigational drugs targeted to protein inhibition retrieved publicly available in the ChEMBL and DrugBank online repositories.
Drughelper offers an improvement in drug name identification, comparing for example, GDSC and DrugSniper (Gene Info), without the function 24 drugs appear as part of both studies, but when Drughelper is applied, the number of drugs increase up to 64 drugs. Other cross studies offer poor improvements, because e.g., CMAP database which does not provide significant results since it consists of only 5 entries