Introduction to drughelper

Javier García & Fernando Carazo

2021-07-06

Drughelper is an R package to identify and correct some drug names of the user’s interest in order to easily work with them. Drughelper is constantly updating its dataset (once a month) from Chembl’s database.

Installation

install.packages("drughelper")

Drughelper functionality

Drughelper has been created to be as interactive as possible, only one function is needed to get the main information of the input drugs. Also, a vector with the name or synonyms of required drugs is needed as an argument to the function:

library(drughelper)

vectorofdrugs <- c("Procaine", "Furazosin", "Embelin", "NotADrug")

Download data manually

The dataset used is downloaded automatically when checkDrugSynonym is called, but can also be downloaded manually:

downloadAbsentFile()

If data has already been downloaded, the function will not download anything.

What does drughelper return

checkDrugSynonym finds possible synonyms for each one of the drugs in the input and returns a dataframe with the best matched synonym for each drug. In the “matching” column, three types of matchings can occur: Exact match, if the drug matches any of the possible synonyms, or either it matches the name of the drug itself. If it does not appear exactly, an approximation may be found, in that case, an approximate matching is returned. Finally if a drug is not found “No match” will be returned.

checkDrugSynonym(vectorofdrugs)

Case studies

Two case studies are explained, in which we compare the number of drug matches that appear in different studies both with and without the Drughelper function. The objective is to see if when comparing a drug name with all its synonyms, more matches appear or not.

Case 1:

In this first approach, we have compared four different studies, three of them from the PharmacoGx project and the other one from the BeatAML functional genomic study.

PharmacoGx is an R package which has data from the cancer cell line encyclopedia (CCLE), the Genomics of Drug Sensitivity Cancer project (GDSC) and the connectivity map (CMAP) from the broad institute. They have 24, 139 and 5 unique drugs, respectively.

BeatAML is a program which contains different datasets on acute myeloid leukemia (AML). In this case the data used belongs to the drug response dataset, containing 122 unique drugs.

head(checkDrugSynonym(vector_CCLE))
#>            x Approved DrugHelperID Suggested.Synonym Cl.Phase    Matching
#> 1 PD-0325901     TRUE      DH05859      MIRDAMETINIB        2 Exact match
#> 2     17-AAG     TRUE      DH04193      TANESPIMYCIN        3 Exact match
#> 3     AEW541     TRUE      DH07870           AEW-541        1 Exact match
#> 4  Nilotinib     TRUE      DH01297         NILOTINIB        4 Exact match
#> 5 PHA-665752    FALSE         <NA>        PHA-665752  0 or NA    No match
#> 6  lapatinib     TRUE       DH0223         LAPATINIB        4 Exact match
head(checkDrugSynonym(vector_GDSC))
#>             x Approved DrugHelperID Suggested.Synonym Cl.Phase    Matching
#> 1 Doxorubicin     TRUE       DH0619       DOXORUBICIN        4 Exact match
#> 2   etoposide     TRUE       DH0586         ETOPOSIDE        4 Exact match
#> 3 Gemcitabine     TRUE       DH0604       GEMCITABINE        4 Exact match
#> 4 Mitomycin-C     TRUE       DH0210         MITOMYCIN        4 Exact match
#> 5 Vinorelbine     TRUE      DH01559       VINORELBINE        4 Exact match
#> 6   NSC-87877    FALSE         <NA>         NSC-87877  0 or NA    No match
head(checkDrugSynonym(vector_CMAP))
#>                      x Approved DrugHelperID Suggested.Synonym Cl.Phase
#> 1 acetylsalicylic acid     TRUE        DH042           ASPIRIN        4
#> 2        rosiglitazone     TRUE       DH0286     ROSIGLITAZONE        4
#> 3         alvespimycin     TRUE      DH05770      ALVESPIMYCIN        4
#> 4           vorinostat     TRUE       DH0182        VORINOSTAT        4
#> 5         pioglitazone     TRUE       DH0284      PIOGLITAZONE        4
#>      Matching
#> 1 Exact match
#> 2 Exact match
#> 3 Exact match
#> 4 Exact match
#> 5 Exact match
head(checkDrugSynonym(vBeatAML))
#>                       x Approved DrugHelperID Suggested.Synonym Cl.Phase
#> 1 17-AAG (Tanespimycin)     TRUE      DH04193      TANESPIMYCIN        3
#> 2              A-674563    FALSE         <NA>          A-674563  0 or NA
#> 3               ABT-737     TRUE      DH07797           ABT 737        1
#> 4                AT7519     TRUE      DH05915           AT-7519        2
#> 5               AZD1480     TRUE      DH06030          AZD-1480        2
#> 6  Afatinib (BIBW-2992)     TRUE      DH01656          AFATINIB        4
#>      Matching
#> 1 Exact match
#> 2    No match
#> 3 Exact match
#> 4 Exact match
#> 5 Exact match
#> 6 Exact match

Case 2:

A large dataset of 1996 drugs from DrugSniper is also compared with the other ones. DrugSniper is a tool to exploit loss-of-function screens, created by the bioengineering department in Tecnun, University of Navarra. The data used is called “Gene Info” and contained data about both approved and investigational drugs targeted to protein inhibition retrieved publicly available in the ChEMBL and DrugBank online repositories.

Results

Drughelper offers an improvement in drug name identification, comparing for example, GDSC and DrugSniper (Gene Info), without the function 24 drugs appear as part of both studies, but when Drughelper is applied, the number of drugs increase up to 64 drugs. Other cross studies offer poor improvements, because e.g., CMAP database which does not provide significant results since it consists of only 5 entries