Title: | Network-based interpretation of highthroughput data |
---|---|
Description: | The PCSF package performs an integrated analysis of highthroughput data using the interaction networks as a template, and interprets the biological landscape of interaction networks with respect to the data, which potentially leads to predictions of functional units. It also interactively visualize the resulting subnetwork with functional enrichment analysis. |
Authors: | Murodzhon Akhmedov, Amanda Kedaigle, Renan Escalante, Roberto Montemanni, Francesco Bertoni, Ernest Fraenkel, Ivo Kwee |
Maintainer: | Murodzhon Akhmedov <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.99.1 |
Built: | 2024-12-02 02:58:26 UTC |
Source: | https://github.com/CogDisResLab/PCSF |
Given a list of edges, construct_interactome
generates
an interaction network which is used as a template network to interpret the highthrougput data.
construct_interactome(ppi)
construct_interactome(ppi)
ppi |
A list of edges. A |
An interaction network as igraph object.
Murodzhon Akhmedov
## Not run: library("PCSF") data("STRING") ppi <- construct_interactome(STRING) ## End(Not run)
## Not run: library("PCSF") data("STRING") ppi <- construct_interactome(STRING) ## End(Not run)
enrichment_analysis
performs functional enrichment analysis on the subnetwork
obtained by the PCSF_rand
, and returns an annotated subnetwork with top 15
functional enrichments and a list of tables with a complete enrichment analysis for
each cluster.
enrichment_analysis(subnet, mode = NULL, gene_universe)
enrichment_analysis(subnet, mode = NULL, gene_universe)
subnet |
A subnetwork provided by |
mode |
A binary variable to choose the method for enrichment analysis, where 0 is for EnrichR API and 1 is for topGO package. |
gene_universe |
A complete list of genes (vector of gene symbols) used as background in enrichment analysis by topGO package. |
An enrichment analysis of the final subnetwork obtained by multiple runs of the PCSF (with rando noise added edge costs) is performed for functional interpretation. The subnetwork is clustered using an edge betweenness clustering algorithm from the igraph package, and for each cluster functional enrichment is done by employing either EnrichR API (Chen et al., 2013) or topGO (Alexa and Rahnenfuhrer, 2009) package that is specified by the user. Important to note that EnrichR API requires a working Internet connection to perform the enrichment. If the user does not specify which tool to use for enrichment analysis, the package employs EnrichR as a default if there is Internet connection, otherwise it uses topGO.
An interactive visualization of the final subnetwork is plotted, where the node sizes and edge widths are proportional to the frequency of show ups throughout total runs. Nodes are colored according to the cluster membership, and the top 15 functional enrichment terms are displayed in tabular format during the hover-over of the node in that cluster.
A list composed of an interactive subnetwork and a table with enrichment analysis results. An interactive subnetwork annotated with enrichment analysis can be reached by $subnet. A full list of enrichment analysis for each cluster can be reached by $enrichment.
Murodzhon Akhmedov
Chen E.Y., Christopher M.T., Yan K., Qiaonan D., Zichen W., Gabriela V.M., Neil R.C., and Avi M. (2013) Enrichr: Interactive and Collaborative Html5 Gene List Enrichment Analysis Tool. BMC Bioinformatics 14 (1). BioMed Central: 1.
Alexa A. and Rahnenfuhrer J. (2009). topGO: Enrichment Analysis for Gene Ontology. R package version 2.28.0.
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 0.0005) res <- enrichment_analysis(subnet) res <- enrichment_analysis(subnet, mode=0) ## End(Not run) ## Not run: library(topGO) gene_universe <- V(ppi)$name res <- enrichment_analysis(subnet, mode=1, gene_universe) ## End(Not run) ## Not run: plot(res$subnet) write.table(res$enrichment[[1]],file="cluster1_complete_enrichment.txt", append = FALSE, quote = FALSE, sep ="\t", row.names=FALSE) ## End(Not run)
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 0.0005) res <- enrichment_analysis(subnet) res <- enrichment_analysis(subnet, mode=0) ## End(Not run) ## Not run: library(topGO) gene_universe <- V(ppi)$name res <- enrichment_analysis(subnet, mode=1, gene_universe) ## End(Not run) ## Not run: plot(res$subnet) write.table(res$enrichment[[1]],file="cluster1_complete_enrichment.txt", append = FALSE, quote = FALSE, sep ="\t", row.names=FALSE) ## End(Not run)
PCSF
returns a subnetwork obtained by solving the PCSF on the given interaction network.
PCSF(ppi, terminals, w = 2, b = 1, mu = 5e-04)
PCSF(ppi, terminals, w = 2, b = 1, mu = 5e-04)
ppi |
An interaction network, an igraph object. |
terminals |
A list of terminal genes with prizes to be analyzed in the PCSF context.
A named |
w |
A |
b |
A |
mu |
A |
The PCSF is a well-know problem in graph theory.
Given an undirected graph G = (V, E), where the vertices are labeled with prizes
and the edges are labeled with costs
, the goal is to identify
a subnetwork G' = (V', E') with a forest structure. The target is to minimize
the total edge costs in E', the total node prizes left out of V', and the
number of trees in G'. This is equivalent to minimization of the following
objective function:
where, k is the number of trees in the forest, and it is regulated by parameter .
The parameter
is used to tune the prizes of nodes.
This optimization problem nicely maps onto the problem of finding differentially
enriched subnetworks in the cell protein-protein interaction (PPI) network.
The vertices of interaction network correspond to genes or proteins, and edges
represent the interactions among them. We can assign prizes
to vertices based on measurements of differential expression, copy number, or
mutation, and costs to edges based on confidence scores for those intra-cellular
interactions from experimental observation, yielding a proper input to the PCSF
problem. Vertices that are assigned a prize are referred to terminal nodes,
whereas the vertices which are not observed in patient data are not assigned a
prize and are called Steiner nodes. After scoring the interactome, the
PCSF is used to detect a relevant subnetwork (forest), which corresponds to a
portion of the interactome, where many genes are highly correlated in terms of
their functions and may regulate the differentially active biological process
of interest. The PCSF aims to identify neighborhoods in interaction networks
potentially belonging to the key dysregulated pathways of a disease.
In order to avoid a bias towards the hub nodes of PPI networks to appear in solution
of PCSF, we penalize the prizes of Steiner nodes according to their degree
distribution in PPI, and it is regulated by parameter :
The parameter also affects the total number of Steiner nodes in the solution.
Higher the value of
smaller the number of Steiners in the subnetwork,
and vice-versa. Based on our previous analysis the recommended range of
for biological networks is between 1e-4 and 5e-2, and users can choose the values
resulting subnetworks with vertex sets that have desirable Steiner/terminal
node ratio and average Steiner/terminal in-degree ratio
in the template interaction network.
The final subnetwork obtained by the PCSF. It return an igraph object with the node prize and edge cost attributes.
Murodzhon Akhmedov
Akhmedov M., LeNail A., Bertoni F., Kwee I., Fraenkel E., and Montemanni R. (2017) A Fast Prize-Collecting Steiner Forest Algorithm for Functional Analyses in Biological Networks. Lecture Notes in Computer Science, to appear.
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF(ppi, terminals, w = 2, b = 1, mu = 0.0005) ## End(Not run)
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF(ppi, terminals, w = 2, b = 1, mu = 0.0005) ## End(Not run)
PCSF_rand
returns a union of subnetworks obtained by solving the PCSF on the
given interaction network by adding a random noise to edge costs each time.
PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 5e-04)
PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 5e-04)
ppi |
An interaction network as an igraph object. |
terminals |
A list of terminal genes with prizes to be analyzed in the PCSF context.
A named |
n |
An |
r |
A |
w |
A |
b |
A |
mu |
A |
In order to increase the robustness of the resulting structure, it is recommended to solve the PCSF several times on the same network while adding some noise to the edge costs each time, and combine all results in a final subnetwork. The union of all outputs may explain the underlying biology better.
The final subnetwork obtained by taking the union of the PCSF outputs generated by adding a random noise to edge costs each time. It returns an igraph object with the node prize and edge cost attributes representing the total number of show ups throughout all runs.
Murodzhon Akhmedov
Akhmedov M., LeNail A., Bertoni F., Kwee I., Fraenkel E., and Montemanni R. (2017) A Fast Prize-Collecting Steiner Forest Algorithm for Functional Analyses in Biological Networks. Lecture Notes in Computer Science, to appear.
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r =0.1, w = 2, b = 2, mu = 0.0005) ## End(Not run)
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r =0.1, w = 2, b = 2, mu = 0.0005) ## End(Not run)
plot.PCSF
plots an interactive figure of the subnetwork obrained by
the PCSF method.
## S3 method for class 'PCSF' plot(x, style = 0, edge_width = 5, node_size = 40, node_label_cex = 30, Steiner_node_color = "lightblue", Terminal_node_color = "lightgreen", Terminal_node_legend = "Terminal", Steiner_node_legend = "Steiner", ...)
## S3 method for class 'PCSF' plot(x, style = 0, edge_width = 5, node_size = 40, node_label_cex = 30, Steiner_node_color = "lightblue", Terminal_node_color = "lightgreen", Terminal_node_legend = "Terminal", Steiner_node_legend = "Steiner", ...)
x |
A subnetwork obtained by the PCSF method. It is a "PCSF" object derived from igraph class and it has the edge cost and vertex prize attributes. |
style |
A |
edge_width |
A |
node_size |
A |
node_label_cex |
A |
Steiner_node_color |
A |
Terminal_node_color |
A |
Terminal_node_legend |
A |
Steiner_node_legend |
A |
... |
Ignored. |
This function plots an interactive subnetwork obtained by the PCSF
and PCSF_rand
.
The node sizes and edge widths are respectively proportional to the node prizes and edge costs
while plotting the subnetwork from PCSF
. In contrast, the node sizes and edge widths are
proportional to the total number of abondance in randomized runs while plotting the subnetwork
from PCSF_rand
. The node names are displayed during the hover-over.
Murodzhon Akhmedov
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF(ppi, terminals, w = 2, b = 1, mu = 0.0005) plot(subnet) ## End(Not run)
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF(ppi, terminals, w = 2, b = 1, mu = 0.0005) plot(subnet) ## End(Not run)
plot.PCSFe
plots an interactive figure of the subnetwork
to display the functionla enrichment analysis, which is obtained by employing
enrichment_analysis
on the subnetwork.
## S3 method for class 'PCSFe' plot(x, edge_width = 5, node_size = 30, node_label_cex = 1, Terminal_node_legend = "Terminal", Steiner_node_legend = "Steiner", ...)
## S3 method for class 'PCSFe' plot(x, edge_width = 5, node_size = 30, node_label_cex = 1, Terminal_node_legend = "Terminal", Steiner_node_legend = "Steiner", ...)
x |
An output subnetwork provided by the |
edge_width |
A |
node_size |
A |
node_label_cex |
A |
Terminal_node_legend |
A |
Steiner_node_legend |
A |
... |
Ignored. |
An enrichment analysis of the final subnetwork obtained by multiple runs of the PCSF
(with random noise added edge costs) is performed by using enrichment_analysis
.
The subnetwork is clustered using an edge betweenness clustering algorithm from the
igraph package, and for each cluster functional enrichment is done by employing the
ENRICHR API (Chen et al., 2013). An interactive visualization of the final subnetwork
is plotted, where the node sizes and edge widths are proportional to the frequency of show
ups in total randomised runs. Nodes are colored according to the cluster membership, and
the top 15 functional enrichment terms are displayed in tabular format during the hover-over
of the node in that cluster. A specific cluster can be displayed separately in the figure
by selecting from the icon list at the top left side of the figure.
Murodzhon Akhmedov
Chen E.Y., Christopher M.T., Yan K., Qiaonan D., Zichen W., Gabriela V.M., Neil R.C., and Avi M. (2013) Enrichr: Interactive and Collaborative Html5 Gene List Enrichment Analysis Tool. BMC Bioinformatics 14 (1). BioMed Central: 1.
enrichment_analysis
, PCSF_rand
, plot.PCSF
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 0.0005) res <- enrichment_analysis(subnet) plot(res$subnet) ## End(Not run)
## Not run: library("PCSF") data("STRING") data("Tgfb_phospho") terminals <- Tgfb_phospho ppi <- construct_interactome(STRING) subnet <- PCSF_rand(ppi, terminals, n = 10, r = 0.1, w = 2, b = 1, mu = 0.0005) res <- enrichment_analysis(subnet) plot(res$subnet) ## End(Not run)
An interactome data set in which the nodes are named with gene symbols
STRING
STRING
A data frame with three variables, where each row corresponds to
an edge in which the first element is a head
, the second
element is a tail
, and the last element represents the cost
of the edge.
iref_mitab_miscore_2013_08_12_interactome.txt https://github.com/fraenkel-lab/OmicsIntegrator/tree/master/data
This dataset contains differential phosphoproteomic data derived from H358 cells, a model of lung cancer, that were stimulated with TGF-b.
Tgfb_phospho
Tgfb_phospho
A named numeric
vector, where terminal genes are named same as
in the interaction network and numeric values correspond to the importance of
the gene within the study.
Tgfb_phos.txt https://github.com/fraenkel-lab/OmicsIntegrator/tree/master/example/a549