library(tidyverse)
#> ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
#> ✔ dplyr 1.1.4 ✔ readr 2.1.5
#> ✔ forcats 1.0.0 ✔ stringr 1.5.1
#> ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
#> ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
#> ✔ purrr 1.0.2
#> ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
#> ✖ dplyr::filter() masks stats::filter()
#> ✖ dplyr::lag() masks stats::lag()
#> ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(drugfindR)
drugfindR
provides end-users with a convenient method
for accessing the Library of Integrated Network-Based Cellular
Signatures (LINCS). The LINCS project aims to create a network-based
understanding of biology by systematically cataloging changes in
cellular processes, namely gene expression, that occur when cells are
exposed to a variety of perturbing agents. iLINCS is an integrated
web-based platform designed for the analysis of omics data and
signatures of cellular perturbagens. While the iLINCS analysis workflows
integrate vast omics data resources and a range of analytic and visual
tools into a comprehensive platform, drugfindR
is
advantageous in that it is scriptable and usable from within R without
relying on the iLINCS web platform. drugfindR
also
possesses the capability of running all input signatures simultaneously,
which makes investigating a particular gene or drug extremely efficient.
From the output data generated by drugfindR
, end-users may
understand how the overexpression or knockdown of a specific gene
affects the expression of genes within the same cellular system,
identify downstream molecular consequences of gene perturbation within a
system, and investigate candidate drugs that may be repurposed for other
physiological reasons.
drugfindR
can be installed from GitHub using the
devtools
package:
drugfindR
has multiple features that make interfacing
with the iLINCS database and analyzing LINCS data simple and efficient.
However, the package is explicitly designed for two primary use
cases:
This package provides two different ways to achieve these use cases.
First, there is a set of five functions that can be deployed in a
pipeline for the results. Then, there are two functions
investigateTarget()
and investigateSignature()
that perform the entire pipeline in one function call with sensible
defaults.
The five pipeline functions are:
getSignature()
: This function takes a LINCS ID and
returns the corresponding signature.prepareSignature()
: This function takes a
transcriptomic signature and prepares it for analysis by
drugfindR
.filterSignature()
: This function takes a signature and
filters it to given thresholds.getConcordants()
: This function takes a signature and
returns the concordant signatures from the iLINCS database.consensusConcordants()
: This function takes a list of
concordant signatures and returns a list of consensus signature.For this case, we will use one of the signatures that was used in the paper [“Identification of candidate repurposable drugs to combat COVID - 19 using a signature - based approach” by O’Donovan, Imami, et al] (https://www.nature.com/articles/s41598-021-84044-9).
In that paper, the authors used the available gene expression data
from cells infected with SARS-CoV-2 to identify potential drugs that
could be repurposed to treat COVID-19. We will use one of the signatures
that they have provided in their paper to showcase how
drugfindR
can be used to identify candidate drugs from an
input signature. We will use the dCovid_diffexp.tsv
signature from the paper.
This signature is available with the package. Our first step is to
download the signature so we can work with it. The
read_tsv()
function from the readr
package can
be used to read the signature into R from a remote URL or a local
file.
# Load the signature from the paper
diffexp <- read_tsv(
system.file("extdata", "dCovid_diffexp.tsv",
package = "drugfindR"
)
)
#> Rows: 4090 Columns: 3
#> ── Column specification ────────────────────────────────────────────────────────
#> Delimiter: "\t"
#> chr (1): hgnc_symbol
#> dbl (2): logFC, PValue
#>
#> ℹ Use `spec()` to retrieve the full column specification for this data.
#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Take a look at the signature
head(diffexp) |>
knitr::kable()
hgnc_symbol | logFC | PValue |
---|---|---|
CCL4L2 | -3.976903 | 1.80e-06 |
IL5RA | -4.830082 | 8.70e-06 |
FN1 | 4.942690 | 1.17e-05 |
GSTM1 | -8.211472 | 1.53e-05 |
CD180 | 2.153878 | 2.02e-05 |
FAM20C | 3.112871 | 2.55e-05 |
We can see that the signature has ncol(diffexp)
columns
and nrow(diffexp)
rows. The names of the columns are
typical of what you would get from edgeR
or DESeq2.
The next step is to prepare the signature for analysis by
drugfindR
. This step is necessary because the signature can
be in many different formats, with different names for columns. iLICNS
needs columns to be in a specific order and with specific names. The
prepareSignature()
function takes care of this for us.
prepareSignature()
takes three optional arguments:
geneColumn
: The name of the column in the input that
contains the gene names. The default is "Symbol"
.logfcColumn
: The name of the column in the input that
contains the log fold change values. The default is
"logFC"
.pvalColumn
: The name of the column in the input that
contains the p-values. The default is "PValue"
.# Prepare the signature for analysis
# The only thing that is different from the defaults is the gene_column
# However, we will specify all three arguments for clarity
signature <- prepareSignature(diffexp,
geneColumn = "hgnc_symbol",
logfcColumn = "logFC", pvalColumn = "PValue"
)
# Take a look at the signature
head(signature) |>
knitr::kable()
signatureID | ID_geneid | Name_GeneSymbol | Value_LogDiffExp | Significance_pvalue | |
---|---|---|---|---|---|
1 | InputSig | 4860 | PNP | 1.709692 | 0.0024364 |
5 | InputSig | 4125 | MAN2B1 | 1.100506 | 0.0030275 |
8 | InputSig | 2274 | FHL2 | 1.330287 | 0.1423757 |
14 | InputSig | 351 | APP | 1.050513 | 0.0108441 |
26 | InputSig | 7077 | TIMP2 | 1.795990 | 0.0124167 |
29 | InputSig | 23659 | PLA2G15 | 1.113303 | 0.0495367 |
We can see that the signature has been reordered and renamed. The
first column is now names(signature)[1]
, the second column
is now names(signature)[2]
, and the third column is now
names(signature)[3]
, which is what iLINCS expects.
Now that we have the signature in the correct format and filtered to the L1000 genes, we can filter it to the thresholds that we want. This filter step is necessary because we would like to use the genes that have a high enough change for it to matter.
The filterSignature()
function can filter based on logFC
values in two ways:
Absolute Threshold: You can give an absolute threshold ( or a pair of absolute thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.
Percentile Threshold: You can give a percentile threshold (or a pair of percentile thresholds) for the logFC values. Any genes that do not meet the threshold will be removed from the signature.
The filterSignature()
function takes three
arguments:
signature
: The signature to filter.direction
: This argument specifies whether to filter
for upregulated genes, downregulated genes, or both. The default is
"any"
.threshold
or prop
: The threshold
argument is used to specify an absolute threshold (or a pair of absolute
thresholds) for the logFC values. The prop argument is used to specify a
percentile threshold (or a pair of percentile thresholds) for the logFC
values. They can not be specified together.# Filter the signature to only include genes that are upregulated by at least
# 1.5 logFC
filteredSignatureUp <- filterSignature(signature,
direction = "up",
threshold = 1.5
)
filteredSignatureUp |>
head() |>
knitr::kable()
signatureID | ID_geneid | Name_GeneSymbol | Value_LogDiffExp | Significance_pvalue |
---|---|---|---|---|
InputSig | 4860 | PNP | 1.709692 | 0.0024364 |
InputSig | 7077 | TIMP2 | 1.795990 | 0.0124167 |
InputSig | 890 | CCNA2 | 1.750080 | 0.0163805 |
InputSig | 991 | CDC20 | 3.399809 | 0.0003236 |
InputSig | 9093 | DNAJA3 | 2.340929 | 0.0199756 |
InputSig | 1509 | CTSD | 1.547209 | 0.0019360 |
# Filter the signature to only include genes that are downregulated by at least
# 1.5 logFC
filteredSignatureDn <- filterSignature(signature,
direction = "down",
threshold = 1.5
)
filteredSignatureDn |>
head() |>
knitr::kable()
signatureID | ID_geneid | Name_GeneSymbol | Value_LogDiffExp | Significance_pvalue |
---|---|---|---|---|
InputSig | 2810 | SFN | -2.056475 | 0.0151082 |
InputSig | 3815 | KIT | -1.635917 | 0.0225406 |
InputSig | 2625 | GATA3 | -2.310024 | 0.0018368 |
InputSig | 10123 | ARL4C | -1.638483 | 0.0005734 |
InputSig | 2624 | GATA2 | -2.728454 | 0.0110850 |
InputSig | 2946 | GSTM2 | -1.666703 | 0.0055149 |
Now that we have the filtered signatures for both upregulated and
downregulated genes, we can get the concordant signatures from the
iLINCS database. The getConcordants()
function takes a
signature and returns the concordant signatures from the iLINCS
database. It also requires specification of the database to target for
the concordant signatures.
The getConcordants()
function takes the following
arguments:
signature
: The signature to get concordant signatures
for.ilincsLibrary
: The iLINCS library to target for
concordant signatures. This can be one of c(“OE”, “KD”, “CP”), standing
for overexpression, knockdown, and chemical perturbagens,
respectively.direction
: This argument specifies whether the input
signature is upregulated or downregulated. This is useful to annotate
the output. This is NULL
by default.# Get the concordant signatures for the upregulated signature
upConcordants <- getConcordants(filteredSignatureUp, ilincsLibrary = "CP")
upConcordants |>
head() |>
knitr::kable()
signatureid | compound | concentration | time | cellline | similarity | pValue | sig_direction |
---|---|---|---|---|---|---|---|
LINCSCP_99767 | Mitoxantrone | 0.04uM | 24h | HCC515 | 0.6259367 | 0e+00 | Up |
LINCSCP_207881 | Navitoclax | 10uM | 6h | H1299 | 0.6121090 | 0e+00 | Up |
LINCSCP_207785 | Myriocin | 10uM | 6h | H1299 | 0.5997347 | 0e+00 | Up |
LINCSCP_285868 | CHEMBL259628 | 10uM | 6h | SKM1 | 0.5835506 | 1e-07 | Up |
LINCSCP_163143 | Crizotinib | 0.12uM | 3h | SKBR3 | 0.5759650 | 1e-07 | Up |
LINCSCP_64976 | PP-242 | 0.5uM | 24h | VCAP | -0.5674210 | 2e-07 | Up |
# Get the concordant signatures for the downregulated signature
dnConcordants <- getConcordants(filteredSignatureDn, ilincsLibrary = "CP")
dnConcordants |>
head() |>
knitr::kable()
signatureid | compound | concentration | time | cellline | similarity | pValue | sig_direction |
---|---|---|---|---|---|---|---|
LINCSCP_260147 | PD-98059 | 10uM | 6h | NEU | 0.9276283 | 6.00e-07 | Down |
LINCSCP_149954 | 848193-68-0 | 0.12uM | 24h | NPC.TAK | -0.9235333 | 9.00e-07 | Down |
LINCSCP_40982 | Devazepide | 10uM | 24h | NEU | 0.9133641 | 2.00e-06 | Down |
LINCSCP_292245 | MLS003568137 | 10uM | 24h | VCAP | -0.8890042 | 9.30e-06 | Down |
LINCSCP_150380 | VX-745 | 0.04uM | 24h | NPC.TAK | 0.8854791 | 1.13e-05 | Down |
LINCSCP_150801 | Mitoxantrone | 1.11uM | 24h | PC3 | 0.8736419 | 2.09e-05 | Down |
Now that we have the concordant signatures for both the upregulated
and downregulated signatures, we can get the list of consensus
concordant signatures. The consensusConcordants()
function
takes a list of concordant signatures and returns a list of consensus
signatures. This function also takes a number of optional arguments that
can be used to control the consensus list generation.
By default the consensus list performs the following steps:
Additionally, we can filter by the cell line to only include the cell lines of interest.
The consensusConcordants()
function takes the following
arguments:
...
: One or Two (see paired) Data Frames with the
concordantspaired
: A logical value indicating whether the input is
a single data frame with paired signatures or two data frames with
unpaired signatures. The default is FALSE
.cellLines
: A character vector of cell lines to filter
the consensus list to. The default is NULL
, which means no
filtering.cutoff
: The absolute cutoff value of similarity to use
when filtering the consensus list. The default is
0.321
.# Get the consensus concordant signatures for the upregulated signature
consensus <- consensusConcordants(upConcordants, dnConcordants,
paired = TRUE, cutoff = 0.2
)
consensus |>
head() |>
knitr::kable()
TargetSignature | Target | TargetCellLine | TargetTime | TargetConcentration | Similarity | SignatureDirection | pValue |
---|---|---|---|---|---|---|---|
LINCSCP_260147 | PD-98059 | NEU | 6h | 10uM | 0.9276283 | Down | 6.00e-07 |
LINCSCP_149954 | 848193-68-0 | NPC.TAK | 24h | 0.12uM | -0.9235333 | Down | 9.00e-07 |
LINCSCP_40982 | Devazepide | NEU | 24h | 10uM | 0.9133641 | Down | 2.00e-06 |
LINCSCP_292245 | MLS003568137 | VCAP | 24h | 10uM | -0.8890042 | Down | 9.30e-06 |
LINCSCP_150380 | VX-745 | NPC.TAK | 24h | 0.04uM | 0.8854791 | Down | 1.13e-05 |
LINCSCP_150801 | Mitoxantrone | PC3 | 24h | 1.11uM | 0.8736419 | Down | 2.09e-05 |
The above method breaks down the entire method into five steps.
However, drugfindR
also provides two functions that perform
the entire pipeline in one function call with sensible defaults. These
functions are investigateTarget()
and
investigateSignature()
.
For this use case, investigateSignature()
is the
function that we want to use. It takes the following required
arguments:
expr
: The signature to investigate.outputLib
: The iLINCS library to target for concordant
signatures. This can be one of c(“OE”, “KD”, “CP”), standing for
overexpression, knockdown, and chemical perturbagens, respectively.filterThreshold
: The absolute threshold (or a pair of
absolute thresholds) for the logFC values. Any genes that do not meet
the threshold will be removed from the signature.filterProp
: The percentile threshold (or a pair of
percentile thresholds) for the logFC values. Any genes that do not meet
the threshold will be removed from the signature.Other arguments that have sensible defaults are:
similarityThreshold
: The absolute cutoff value of
similarity to use when filtering the consensus list. The default is
0.2
.paired
: A logical value indicating whether the to split
the input dataframe in up and downregulated signatures. The default is
TRUE
.outputCellLines
: A character vector of cell lines to
filter the consensus list to. The default is NULL
, which
means no filtering.geneColumn
: The name of the column in the input that
contains the gene names. The default is "Symbol"
.logfcColumn
: The name of the column in the input that
contains the log fold change values. The default is
"logFC"
.pvalColumn
: The name of the column in the input that
contains the p-values. The default is "PValue"
.sourceName
: The name of the source of the signature.
The default is "Input"
.sourceCellLine
: The cell line of the source of the
signature. The default is "NA"
.sourceTime
: The time of the source of the signature.
The default is "NA"
.sourceConcentration
: The concentration of the source of
the signature. The default is "NA"
.investigated <- investigateSignature(diffexp,
outputLib = "CP", filterThreshold = 1.5,
geneColumn = "hgnc_symbol", logfcColumn = "logFC",
pvalColumn = "PValue"
)
investigated |>
head() |>
knitr::kable()
Source | Target | Similarity | SourceSignature | SourceCellLine | SourceTime | TargetSignature | TargetCellLine | TargetConcentration | TargetTime |
---|---|---|---|---|---|---|---|---|---|
Input | PD-98059 | 0.9276283 | InputSig | NA | NA | LINCSCP_260147 | NEU | 10uM | 6h |
Input | 848193-68-0 | -0.9235333 | InputSig | NA | NA | LINCSCP_149954 | NPC.TAK | 0.12uM | 24h |
Input | Devazepide | 0.9133641 | InputSig | NA | NA | LINCSCP_40982 | NEU | 10uM | 24h |
Input | MLS003568137 | -0.8890042 | InputSig | NA | NA | LINCSCP_292245 | VCAP | 10uM | 24h |
Input | VX-745 | 0.8854791 | InputSig | NA | NA | LINCSCP_150380 | NPC.TAK | 0.04uM | 24h |
Input | Mitoxantrone | 0.8736419 | InputSig | NA | NA | LINCSCP_150801 | PC3 | 1.11uM | 24h |
devtools::session_info()
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 4.4.2 (2024-10-31)
#> os Ubuntu 24.04.1 LTS
#> system x86_64, linux-gnu
#> ui X11
#> language (EN)
#> collate C
#> ctype en_US.UTF-8
#> tz Etc/UTC
#> date 2024-11-16
#> pandoc 3.2.1 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> BiocGenerics 0.53.3 2024-11-15 [2] https://bioc.r-universe.dev (R 4.4.2)
#> BiocManager 1.30.25 2024-08-28 [2] RSPM (R 4.4.0)
#> BiocStyle * 2.35.0 2024-10-30 [2] https://bioc.r-universe.dev (R 4.4.1)
#> bit 4.5.0 2024-09-20 [2] RSPM (R 4.4.0)
#> bit64 4.5.2 2024-09-22 [2] RSPM (R 4.4.0)
#> bslib 0.8.0 2024-07-29 [2] RSPM (R 4.4.0)
#> buildtools 1.0.0 2024-11-11 [3] local (/pkg)
#> cachem 1.1.0 2024-05-16 [2] RSPM (R 4.4.0)
#> cli 3.6.3 2024-06-21 [2] RSPM (R 4.4.0)
#> colorspace 2.1-1 2024-07-26 [2] RSPM (R 4.4.0)
#> crayon 1.5.3 2024-06-20 [2] RSPM (R 4.4.0)
#> curl 6.0.1 2024-11-14 [2] RSPM (R 4.4.0)
#> devtools 2.4.5 2022-10-11 [2] RSPM (R 4.4.0)
#> digest 0.6.37 2024-08-19 [2] RSPM (R 4.4.0)
#> dplyr * 1.1.4 2023-11-17 [2] RSPM (R 4.4.0)
#> drugfindR * 0.99.1017 2024-11-16 [1] https://cogdisreslab.r-universe.dev (R 4.4.2)
#> ellipsis 0.3.2 2021-04-29 [2] RSPM (R 4.4.0)
#> evaluate 1.0.1 2024-10-10 [2] RSPM (R 4.4.0)
#> fansi 1.0.6 2023-12-08 [2] RSPM (R 4.4.0)
#> fastmap 1.2.0 2024-05-15 [2] RSPM (R 4.4.0)
#> forcats * 1.0.0 2023-01-29 [2] RSPM (R 4.4.0)
#> fs 1.6.5 2024-10-30 [2] RSPM (R 4.4.0)
#> generics 0.1.3 2022-07-05 [2] RSPM (R 4.4.0)
#> ggplot2 * 3.5.1 2024-04-23 [2] RSPM (R 4.4.0)
#> glue 1.8.0 2024-09-30 [2] RSPM (R 4.4.0)
#> gtable 0.3.6 2024-10-25 [2] RSPM (R 4.4.0)
#> hms 1.1.3 2023-03-21 [2] RSPM (R 4.4.0)
#> htmltools 0.5.8.1 2024-04-04 [2] RSPM (R 4.4.0)
#> htmlwidgets 1.6.4 2023-12-06 [2] RSPM (R 4.4.0)
#> httpuv 1.6.15 2024-03-26 [2] RSPM (R 4.4.0)
#> httr2 1.0.6 2024-11-04 [2] RSPM (R 4.4.0)
#> jquerylib 0.1.4 2021-04-26 [2] RSPM (R 4.4.0)
#> jsonlite 1.8.9 2024-09-20 [2] RSPM (R 4.4.0)
#> knitr 1.49 2024-11-08 [2] RSPM (R 4.4.0)
#> later 1.3.2 2023-12-06 [2] RSPM (R 4.4.0)
#> lifecycle 1.0.4 2023-11-07 [2] RSPM (R 4.4.0)
#> lubridate * 1.9.3 2023-09-27 [2] RSPM (R 4.4.0)
#> magrittr 2.0.3 2022-03-30 [2] RSPM (R 4.4.0)
#> maketools 1.3.1 2024-10-04 [3] RSPM (R 4.4.0)
#> memoise 2.0.1 2021-11-26 [2] RSPM (R 4.4.0)
#> mime 0.12 2021-09-28 [2] RSPM (R 4.4.0)
#> miniUI 0.1.1.1 2018-05-18 [2] RSPM (R 4.4.0)
#> munsell 0.5.1 2024-04-01 [2] RSPM (R 4.4.0)
#> pillar 1.9.0 2023-03-22 [2] RSPM (R 4.4.0)
#> pkgbuild 1.4.5 2024-10-28 [2] RSPM (R 4.4.0)
#> pkgconfig 2.0.3 2019-09-22 [2] RSPM (R 4.4.0)
#> pkgload 1.4.0 2024-06-28 [2] RSPM (R 4.4.0)
#> profvis 0.4.0 2024-09-20 [2] RSPM (R 4.4.0)
#> promises 1.3.0 2024-04-05 [2] RSPM (R 4.4.0)
#> purrr * 1.0.2 2023-08-10 [2] RSPM (R 4.4.0)
#> R6 2.5.1 2021-08-19 [2] RSPM (R 4.4.0)
#> rappdirs 0.3.3 2021-01-31 [2] RSPM (R 4.4.0)
#> Rcpp 1.0.13-1 2024-11-02 [2] RSPM (R 4.4.0)
#> readr * 2.1.5 2024-01-10 [2] RSPM (R 4.4.0)
#> remotes 2.5.0 2024-03-17 [2] RSPM (R 4.4.0)
#> rlang 1.1.4 2024-06-04 [2] RSPM (R 4.4.0)
#> rmarkdown 2.29 2024-11-04 [2] RSPM (R 4.4.0)
#> S4Vectors 0.45.2 2024-11-16 [2] https://bioc.r-universe.dev (R 4.4.2)
#> sass 0.4.9 2024-03-15 [2] RSPM (R 4.4.0)
#> scales 1.3.0 2023-11-28 [2] RSPM (R 4.4.0)
#> sessioninfo 1.2.2 2021-12-06 [2] RSPM (R 4.4.0)
#> shiny 1.9.1 2024-08-01 [2] RSPM (R 4.4.0)
#> stringi 1.8.4 2024-05-06 [2] RSPM (R 4.4.0)
#> stringr * 1.5.1 2023-11-14 [2] RSPM (R 4.4.0)
#> sys 3.4.3 2024-10-04 [2] RSPM (R 4.4.0)
#> tibble * 3.2.1 2023-03-20 [2] RSPM (R 4.4.0)
#> tidyr * 1.3.1 2024-01-24 [2] RSPM (R 4.4.0)
#> tidyselect 1.2.1 2024-03-11 [2] RSPM (R 4.4.0)
#> tidyverse * 2.0.0 2023-02-22 [2] RSPM (R 4.4.0)
#> timechange 0.3.0 2024-01-18 [2] RSPM (R 4.4.0)
#> tzdb 0.4.0 2023-05-12 [2] RSPM (R 4.4.0)
#> urlchecker 1.0.1 2021-11-30 [2] RSPM (R 4.4.0)
#> usethis 3.0.0 2024-07-29 [2] RSPM (R 4.4.0)
#> utf8 1.2.4 2023-10-22 [2] RSPM (R 4.4.0)
#> vctrs 0.6.5 2023-12-01 [2] RSPM (R 4.4.0)
#> vroom 1.6.5 2023-12-05 [2] RSPM (R 4.4.0)
#> withr 3.0.2 2024-10-28 [2] RSPM (R 4.4.0)
#> xfun 0.49 2024-10-31 [2] RSPM (R 4.4.0)
#> xtable 1.8-4 2019-04-21 [2] RSPM (R 4.4.0)
#> yaml 2.3.10 2024-07-26 [2] RSPM (R 4.4.0)
#>
#> [1] /tmp/RtmpJndRLp/Rinst1fc5437a0e02
#> [2] /github/workspace/pkglib
#> [3] /usr/local/lib/R/site-library
#> [4] /usr/lib/R/site-library
#> [5] /usr/lib/R/library
#>
#> ──────────────────────────────────────────────────────────────────────────────