drugfindR

library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr     1.1.4     ✔ readr     2.1.5
#> ✔ forcats   1.0.0     ✔ stringr   1.5.1
#> ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
#> ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
#> ✔ purrr     1.0.2     
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag()    masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(drugfindR)

Introduction

drugfindR provides end-users with a convenient method for accessing the Library of Integrated Network-Based Cellular Signatures (LINCS). The LINCS project aims to create a network-based understanding of biology by systematically cataloging changes in cellular processes, namely gene expression, that occur when cells are exposed to a variety of perturbing agents. iLINCS is an integrated web-based platform designed for the analysis of omics data and signatures of cellular perturbagens. While the iLINCS analysis workflows integrate vast omics data resources and a range of analytic and visual tools into a comprehensive platform, drugfindR is advantageous in that it is scriptable and usable from within R without relying on the iLINCS web platform. drugfindR also possesses the capability of running all input signatures simultaneously, which makes investigating a particular gene or drug extremely efficient. From the output data generated by drugfindR, end-users may understand how the overexpression or knockdown of a specific gene affects the expression of genes within the same cellular system, identify downstream molecular consequences of gene perturbation within a system, and investigate candidate drugs that may be repurposed for other physiological reasons.

Installation

drugfindR can be installed from GitHub using the devtools package:

devtools::install_github("CogDisResLab/drugfindR")

Use Cases

drugfindR has multiple features that make interfacing with the iLINCS database and analyzing LINCS data simple and efficient. However, the package is explicitly designed for two primary use cases:

  1. Using an input transcriptomic signature to identify candidate drugs in the iLINCS database
  2. Identifying drugs or other genes that are similar (or opposite) in function to a given drug or gene.

Package Design

This package provides two different ways to achieve these use cases. First, there is a set of five functions that can be deployed in a pipeline for the results. Then, there are two functions investigateTarget() and investigateSignature() that perform the entire pipeline in one function call with sensible defaults.

Pipeline Components

The five pipeline functions are:

  1. getSignature(): This function takes a LINCS ID and returns the corresponding signature.
  2. prepareSignature(): This function takes a transcriptomic signature and prepares it for analysis by drugfindR.
  3. filterSignature(): This function takes a signature and filters it to given thresholds.
  4. getConcordants(): This function takes a signature and returns the concordant signatures from the iLINCS database.
  5. consensusConcordants(): This function takes a list of concordant signatures and returns a list of consensus signature.

Use Case 1: Identifying Candidate Drugs from an Input Signature

For this case, we will use one of the signatures that was used in the paper [“Identification of candidate repurposable drugs to combat COVID - 19 using a signature - based approach” by O’Donovan, Imami, et al] (https://www.nature.com/articles/s41598-021-84044-9).

In that paper, the authors used the available gene expression data from cells infected with SARS-CoV-2 to identify potential drugs that could be repurposed to treat COVID-19. We will use one of the signatures that they have provided in their paper to showcase how drugfindR can be used to identify candidate drugs from an input signature. We will use the dCovid_diffexp.tsv signature from the paper.

Step 1: Get the Signature

This signature is available with the package. Our first step is to download the signature so we can work with it. The read_tsv() function from the readr package can be used to read the signature into R from a remote URL or a local file.

# Load the signature from the paper

diffexp <- read_tsv(
    system.file("extdata", "dCovid_diffexp.tsv",
        package = "drugfindR"
    )
)
#> Rows: 4090 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): hgnc_symbol
#> dbl (2): logFC, PValue
#> 
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Take a look at the signature

head(diffexp) |>
    knitr::kable()
hgnc_symbol logFC PValue
CCL4L2 -3.976903 1.80e-06
IL5RA -4.830082 8.70e-06
FN1 4.942690 1.17e-05
GSTM1 -8.211472 1.53e-05
CD180 2.153878 2.02e-05
FAM20C 3.112871 2.55e-05

We can see that the signature has ncol(diffexp) columns and nrow(diffexp) rows. The names of the columns are typical of what you would get from edgeR or DESeq2.

Step 2: Prepare the Signature

The next step is to prepare the signature for analysis by drugfindR. This step is necessary because the signature can be in many different formats, with different names for columns. iLICNS needs columns to be in a specific order and with specific names. The prepareSignature() function takes care of this for us.

prepareSignature() takes three optional arguments:

  1. geneColumn: The name of the column in the input that contains the gene names. The default is "Symbol".
  2. logfcColumn: The name of the column in the input that contains the log fold change values. The default is "logFC".
  3. pvalColumn: The name of the column in the input that contains the p-values. The default is "PValue".
# Prepare the signature for analysis
# The only thing that is different from the defaults is the gene_column
# However, we will specify all three arguments for clarity

signature <- prepareSignature(diffexp,
    geneColumn = "hgnc_symbol",
    logfcColumn = "logFC", pvalColumn = "PValue"
)

# Take a look at the signature

head(signature) |>
    knitr::kable()
signatureID ID_geneid Name_GeneSymbol Value_LogDiffExp Significance_pvalue
1 InputSig 4860 PNP 1.709692 0.0024364
5 InputSig 4125 MAN2B1 1.100506 0.0030275
8 InputSig 2274 FHL2 1.330287 0.1423757
14 InputSig 351 APP 1.050513 0.0108441
26 InputSig 7077 TIMP2 1.795990 0.0124167
29 InputSig 23659 PLA2G15 1.113303 0.0495367

We can see that the signature has been reordered and renamed. The first column is now names(signature)[1], the second column is now names(signature)[2], and the third column is now names(signature)[3], which is what iLINCS expects.

Step 3: Filter the Signature

Now that we have the signature in the correct format and filtered to the L1000 genes, we can filter it to the thresholds that we want. This filter step is necessary because we would like to use the genes that have a high enough change for it to matter.

The filterSignature() function can filter based on logFC values in two ways:

  1. Absolute Threshold: You can give an absolute threshold ( or a pair of absolute thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.

  2. Percentile Threshold: You can give a percentile threshold (or a pair of percentile thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.

The filterSignature() function takes three arguments:

  1. signature: The signature to filter.
  2. direction: This argument specifies whether to filter for upregulated genes, downregulated genes, or both. The default is "any".
  3. One of threshold or prop: The threshold argument is used to specify an absolute threshold (or a pair of absolute thresholds) for the logFC values. The prop argument is used to specify a percentile threshold (or a pair of percentile thresholds) for the logFC values. They can not be specified together.
# Filter the signature to only include genes that are upregulated by at least
# 1.5 logFC

filteredSignatureUp <- filterSignature(signature,
    direction = "up",
    threshold = 1.5
)

filteredSignatureUp |>
    head() |>
    knitr::kable()
signatureID ID_geneid Name_GeneSymbol Value_LogDiffExp Significance_pvalue
InputSig 4860 PNP 1.709692 0.0024364
InputSig 7077 TIMP2 1.795990 0.0124167
InputSig 890 CCNA2 1.750080 0.0163805
InputSig 991 CDC20 3.399809 0.0003236
InputSig 9093 DNAJA3 2.340929 0.0199756
InputSig 1509 CTSD 1.547209 0.0019360
# Filter the signature to only include genes that are downregulated by at least
# 1.5 logFC
filteredSignatureDn <- filterSignature(signature,
    direction = "down",
    threshold = 1.5
)

filteredSignatureDn |>
    head() |>
    knitr::kable()
signatureID ID_geneid Name_GeneSymbol Value_LogDiffExp Significance_pvalue
InputSig 2810 SFN -2.056475 0.0151082
InputSig 3815 KIT -1.635917 0.0225406
InputSig 2625 GATA3 -2.310024 0.0018368
InputSig 10123 ARL4C -1.638483 0.0005734
InputSig 2624 GATA2 -2.728454 0.0110850
InputSig 2946 GSTM2 -1.666703 0.0055149

Step 4: Get the Concordant Signatures

Now that we have the filtered signatures for both upregulated and downregulated genes, we can get the concordant signatures from the iLINCS database. The getConcordants() function takes a signature and returns the concordant signatures from the iLINCS database. It also requires specification of the database to target for the concordant signatures.

The getConcordants() function takes the following arguments:

  1. signature: The signature to get concordant signatures for.
  2. ilincsLibrary: The iLINCS library to target for concordant signatures. This can be one of c(“OE”, “KD”, “CP”), standing for overexpression, knockdown, and chemical perturbagens, respectively.
  3. direction: This argument specifies whether the input signature is upregulated or downregulated. This is useful to annotate the output. This is NULL by default.
# Get the concordant signatures for the upregulated signature

upConcordants <- getConcordants(filteredSignatureUp, ilincsLibrary = "CP")

upConcordants |>
    head() |>
    knitr::kable()
signatureid compound concentration time cellline similarity pValue sig_direction
LINCSCP_99767 Mitoxantrone 0.04uM 24h HCC515 0.6259367 0e+00 Up
LINCSCP_207881 Navitoclax 10uM 6h H1299 0.6121090 0e+00 Up
LINCSCP_207785 Myriocin 10uM 6h H1299 0.5997347 0e+00 Up
LINCSCP_285868 CHEMBL259628 10uM 6h SKM1 0.5835506 1e-07 Up
LINCSCP_163143 Crizotinib 0.12uM 3h SKBR3 0.5759650 1e-07 Up
LINCSCP_64976 PP-242 0.5uM 24h VCAP -0.5674210 2e-07 Up

# Get the concordant signatures for the downregulated signature

dnConcordants <- getConcordants(filteredSignatureDn, ilincsLibrary = "CP")

dnConcordants |>
    head() |>
    knitr::kable()
signatureid compound concentration time cellline similarity pValue sig_direction
LINCSCP_260147 PD-98059 10uM 6h NEU 0.9276283 6.00e-07 Down
LINCSCP_149954 848193-68-0 0.12uM 24h NPC.TAK -0.9235333 9.00e-07 Down
LINCSCP_40982 Devazepide 10uM 24h NEU 0.9133641 2.00e-06 Down
LINCSCP_292245 MLS003568137 10uM 24h VCAP -0.8890042 9.30e-06 Down
LINCSCP_150380 VX-745 0.04uM 24h NPC.TAK 0.8854791 1.13e-05 Down
LINCSCP_150801 Mitoxantrone 1.11uM 24h PC3 0.8736419 2.09e-05 Down

Step 5: Get the list of Consensus Concordant Signatures

Now that we have the concordant signatures for both the upregulated and downregulated signatures, we can get the list of consensus concordant signatures. The consensusConcordants() function takes a list of concordant signatures and returns a list of consensus signatures. This function also takes a number of optional arguments that can be used to control the consensus list generation.

By default the consensus list performs the following steps:

  1. Combine the list of concordant signatures into a single data frame.
  2. For each individual signature origin (Gene or Drug), choose the one with the largest absolute concordance value.

Additionally, we can filter by the cell line to only include the cell lines of interest.

The consensusConcordants() function takes the following arguments:

  1. ...: One or Two (see paired) Data Frames with the concordants
  2. paired: A logical value indicating whether the input is a single data frame with paired signatures or two data frames with unpaired signatures. The default is FALSE.
  3. cellLines: A character vector of cell lines to filter the consensus list to. The default is NULL, which means no filtering.
  4. cutoff: The absolute cutoff value of similarity to use when filtering the consensus list. The default is 0.321.
# Get the consensus concordant signatures for the upregulated signature

consensus <- consensusConcordants(upConcordants, dnConcordants,
    paired = TRUE, cutoff = 0.2
)

consensus |>
    head() |>
    knitr::kable()
TargetSignature Target TargetCellLine TargetTime TargetConcentration Similarity SignatureDirection pValue
LINCSCP_260147 PD-98059 NEU 6h 10uM 0.9276283 Down 6.00e-07
LINCSCP_149954 848193-68-0 NPC.TAK 24h 0.12uM -0.9235333 Down 9.00e-07
LINCSCP_40982 Devazepide NEU 24h 10uM 0.9133641 Down 2.00e-06
LINCSCP_292245 MLS003568137 VCAP 24h 10uM -0.8890042 Down 9.30e-06
LINCSCP_150380 VX-745 NPC.TAK 24h 0.04uM 0.8854791 Down 1.13e-05
LINCSCP_150801 Mitoxantrone PC3 24h 1.11uM 0.8736419 Down 2.09e-05

Alternate One-Step Method

The above method breaks down the entire method into five steps. However, drugfindR also provides two functions that perform the entire pipeline in one function call with sensible defaults. These functions are investigateTarget() and investigateSignature().

For this use case, investigateSignature() is the function that we want to use. It takes the following required arguments:

  1. expr: The signature to investigate.
  2. outputLib: The iLINCS library to target for concordant signatures. This can be one of c(“OE”, “KD”, “CP”), standing for overexpression, knockdown, and chemical perturbagens, respectively.
  3. filterThreshold: The absolute threshold (or a pair of absolute thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.
  4. filterProp: The percentile threshold (or a pair of percentile thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.

Other arguments that have sensible defaults are:

  1. similarityThreshold: The absolute cutoff value of similarity to use when filtering the consensus list. The default is 0.2.
  2. paired: A logical value indicating whether the to split the input dataframe in up and downregulated signatures. The default is TRUE.
  3. outputCellLines: A character vector of cell lines to filter the consensus list to. The default is NULL, which means no filtering.
  4. geneColumn: The name of the column in the input that contains the gene names. The default is "Symbol".
  5. logfcColumn: The name of the column in the input that contains the log fold change values. The default is "logFC".
  6. pvalColumn: The name of the column in the input that contains the p-values. The default is "PValue".
  7. sourceName: The name of the source of the signature. The default is "Input".
  8. sourceCellLine: The cell line of the source of the signature. The default is "NA".
  9. sourceTime: The time of the source of the signature. The default is "NA".
  10. sourceConcentration: The concentration of the source of the signature. The default is "NA".
investigated <- investigateSignature(diffexp,
    outputLib = "CP", filterThreshold = 1.5,
    geneColumn = "hgnc_symbol", logfcColumn = "logFC",
    pvalColumn = "PValue"
)

investigated |>
    head() |>
    knitr::kable()
Source Target Similarity SourceSignature SourceCellLine SourceTime TargetSignature TargetCellLine TargetConcentration TargetTime
Input PD-98059 0.9276283 InputSig NA NA LINCSCP_260147 NEU 10uM 6h
Input 848193-68-0 -0.9235333 InputSig NA NA LINCSCP_149954 NPC.TAK 0.12uM 24h
Input Devazepide 0.9133641 InputSig NA NA LINCSCP_40982 NEU 10uM 24h
Input MLS003568137 -0.8890042 InputSig NA NA LINCSCP_292245 VCAP 10uM 24h
Input VX-745 0.8854791 InputSig NA NA LINCSCP_150380 NPC.TAK 0.04uM 24h
Input Mitoxantrone 0.8736419 InputSig NA NA LINCSCP_150801 PC3 1.11uM 24h

Environment Setup

devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#>  setting  value
#>  version  R version 4.4.0 (2024-04-24)
#>  os       Ubuntu 24.04 LTS
#>  system   x86_64, linux-gnu
#>  ui       X11
#>  language (EN)
#>  collate  C
#>  ctype    en_US.UTF-8
#>  tz       Etc/UTC
#>  date     2024-06-17
#>  pandoc   3.1.12.1 @ /usr/local/bin/ (via rmarkdown)
#> 
#> ─ Packages ───────────────────────────────────────────────────────────────────
#>  package      * version  date (UTC) lib source
#>  BiocGenerics   0.51.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  BiocManager    1.30.23  2024-05-04 [2] RSPM (R 4.4.0)
#>  BiocStyle    * 2.33.1   2024-06-12 [2] Bioconductor 3.20 (R 4.4.0)
#>  bit            4.0.5    2022-11-15 [2] RSPM (R 4.4.0)
#>  bit64          4.0.5    2020-08-30 [2] RSPM (R 4.4.0)
#>  bslib          0.7.0    2024-03-29 [2] RSPM (R 4.4.0)
#>  buildtools     1.0.0    2024-06-08 [3] local (/pkg)
#>  cachem         1.1.0    2024-05-16 [2] RSPM (R 4.4.0)
#>  cli            3.6.2    2023-12-11 [2] RSPM (R 4.4.0)
#>  colorspace     2.1-0    2023-01-23 [2] RSPM (R 4.4.0)
#>  crayon         1.5.2    2022-09-29 [2] RSPM (R 4.4.0)
#>  curl           5.2.1    2024-03-01 [2] RSPM (R 4.4.0)
#>  devtools       2.4.5    2022-10-11 [2] RSPM (R 4.4.0)
#>  digest         0.6.35   2024-03-11 [2] RSPM (R 4.4.0)
#>  dplyr        * 1.1.4    2023-11-17 [2] RSPM (R 4.4.0)
#>  drugfindR    * 0.99.657 2024-06-17 [1] https://cogdisreslab.r-universe.dev (R 4.4.0)
#>  ellipsis       0.3.2    2021-04-29 [2] RSPM (R 4.4.0)
#>  evaluate       0.24.0   2024-06-10 [2] RSPM (R 4.4.0)
#>  fansi          1.0.6    2023-12-08 [2] RSPM (R 4.4.0)
#>  fastmap        1.2.0    2024-05-15 [2] RSPM (R 4.4.0)
#>  forcats      * 1.0.0    2023-01-29 [2] RSPM (R 4.4.0)
#>  fs             1.6.4    2024-04-25 [2] RSPM (R 4.4.0)
#>  generics       0.1.3    2022-07-05 [2] RSPM (R 4.4.0)
#>  ggplot2      * 3.5.1    2024-04-23 [2] RSPM (R 4.4.0)
#>  glue           1.7.0    2024-01-09 [2] RSPM (R 4.4.0)
#>  gtable         0.3.5    2024-04-22 [2] RSPM (R 4.4.0)
#>  hms            1.1.3    2023-03-21 [2] RSPM (R 4.4.0)
#>  htmltools      0.5.8.1  2024-04-04 [2] RSPM (R 4.4.0)
#>  htmlwidgets    1.6.4    2023-12-06 [2] RSPM (R 4.4.0)
#>  httpuv         1.6.15   2024-03-26 [2] RSPM (R 4.4.0)
#>  httr           1.4.7    2023-08-15 [2] RSPM (R 4.4.0)
#>  jquerylib      0.1.4    2021-04-26 [2] RSPM (R 4.4.0)
#>  jsonlite       1.8.8    2023-12-04 [2] RSPM (R 4.4.0)
#>  knitr          1.47     2024-05-29 [2] RSPM (R 4.4.0)
#>  later          1.3.2    2023-12-06 [2] RSPM (R 4.4.0)
#>  lifecycle      1.0.4    2023-11-07 [2] RSPM (R 4.4.0)
#>  lubridate    * 1.9.3    2023-09-27 [2] RSPM (R 4.4.0)
#>  magrittr       2.0.3    2022-03-30 [2] RSPM (R 4.4.0)
#>  maketools      1.3.0    2024-02-25 [3] RSPM (R 4.4.0)
#>  memoise        2.0.1    2021-11-26 [2] RSPM (R 4.4.0)
#>  mime           0.12     2021-09-28 [2] RSPM (R 4.4.0)
#>  miniUI         0.1.1.1  2018-05-18 [2] RSPM (R 4.4.0)
#>  munsell        0.5.1    2024-04-01 [2] RSPM (R 4.4.0)
#>  pillar         1.9.0    2023-03-22 [2] RSPM (R 4.4.0)
#>  pkgbuild       1.4.4    2024-03-17 [2] RSPM (R 4.4.0)
#>  pkgconfig      2.0.3    2019-09-22 [2] RSPM (R 4.4.0)
#>  pkgload        1.3.4    2024-01-16 [2] RSPM (R 4.4.0)
#>  profvis        0.3.8    2023-05-02 [2] RSPM (R 4.4.0)
#>  promises       1.3.0    2024-04-05 [2] RSPM (R 4.4.0)
#>  purrr        * 1.0.2    2023-08-10 [2] RSPM (R 4.4.0)
#>  R6             2.5.1    2021-08-19 [2] RSPM (R 4.4.0)
#>  Rcpp           1.0.12   2024-01-09 [2] RSPM (R 4.4.0)
#>  readr        * 2.1.5    2024-01-10 [2] RSPM (R 4.4.0)
#>  remotes        2.5.0    2024-03-17 [2] RSPM (R 4.4.0)
#>  rlang          1.1.4    2024-06-04 [2] RSPM (R 4.4.0)
#>  rmarkdown      2.27     2024-05-17 [2] RSPM (R 4.4.0)
#>  S4Vectors      0.43.0   2024-05-01 [2] Bioconductor 3.20 (R 4.4.0)
#>  sass           0.4.9    2024-03-15 [2] RSPM (R 4.4.0)
#>  scales         1.3.0    2023-11-28 [2] RSPM (R 4.4.0)
#>  sessioninfo    1.2.2    2021-12-06 [2] RSPM (R 4.4.0)
#>  shiny          1.8.1.1  2024-04-02 [2] RSPM (R 4.4.0)
#>  stringi        1.8.4    2024-05-06 [2] RSPM (R 4.4.0)
#>  stringr      * 1.5.1    2023-11-14 [2] RSPM (R 4.4.0)
#>  sys            3.4.2    2023-05-23 [2] RSPM (R 4.4.0)
#>  tibble       * 3.2.1    2023-03-20 [2] RSPM (R 4.4.0)
#>  tidyr        * 1.3.1    2024-01-24 [2] RSPM (R 4.4.0)
#>  tidyselect     1.2.1    2024-03-11 [2] RSPM (R 4.4.0)
#>  tidyverse    * 2.0.0    2023-02-22 [2] RSPM (R 4.4.0)
#>  timechange     0.3.0    2024-01-18 [2] RSPM (R 4.4.0)
#>  tzdb           0.4.0    2023-05-12 [2] RSPM (R 4.4.0)
#>  urlchecker     1.0.1    2021-11-30 [2] RSPM (R 4.4.0)
#>  usethis        2.2.3    2024-02-19 [2] RSPM (R 4.4.0)
#>  utf8           1.2.4    2023-10-22 [2] RSPM (R 4.4.0)
#>  vctrs          0.6.5    2023-12-01 [2] RSPM (R 4.4.0)
#>  vroom          1.6.5    2023-12-05 [2] RSPM (R 4.4.0)
#>  withr          3.0.0    2024-01-16 [2] RSPM (R 4.4.0)
#>  xfun           0.45     2024-06-16 [2] RSPM (R 4.4.0)
#>  xtable         1.8-4    2019-04-21 [2] RSPM (R 4.4.0)
#>  yaml           2.3.8    2023-12-11 [2] RSPM (R 4.4.0)
#> 
#>  [1] /tmp/Rtmp6bXqJc/Rinst26d0400f2893
#>  [2] /github/workspace/pkglib
#>  [3] /usr/local/lib/R/site-library
#>  [4] /usr/lib/R/site-library
#>  [5] /usr/lib/R/library
#> 
#> ──────────────────────────────────────────────────────────────────────────────