Title: | creedenzymatic |
---|---|
Description: | Combine kinome results from KRSA and UKA and other tools A package for integrating upstream kinases analyses |
Authors: | Ali Sajid Imami [aut, cre], Khaled Alganem [aut], Justin Creeden [aut], Abdul-Rizaq Hamoud [aut] |
Maintainer: | Ali Sajid Imami <[email protected]> |
License: | MIT + file LICENSE |
Version: | 6.1.0 |
Built: | 2024-10-26 04:26:29 UTC |
Source: | https://github.com/CogDisResLab/creedenzymatic |
Align the rows and columns of two (or more) matrices
.align_matrices(m1, m2, ..., L = NULL, na.pad = TRUE, as.3D = TRUE)
.align_matrices(m1, m2, ..., L = NULL, na.pad = TRUE, as.3D = TRUE)
m1 |
a matrix with unique row and column names |
m2 |
a matrix with unique row and column names |
... |
additional matrices with unique row and column names |
L |
a list of matrix objects. If this is given, m1, m2, and ... are ignored |
na.pad |
boolean indicating whether to pad the combined matrix with NAs for rows/columns that are not shared by m1 and m2. |
as.3D |
boolean indicating whether to return the result as a 3D array. If FALSE, will return a list. |
an object containing the aligned matrices. Will either be a list or a 3D array
test_names
are columns in the data.frame
dfCheck whether test_names
are columns in the data.frame
df
.check_colnames(test_names, df, throw_error = T)
.check_colnames(test_names, df, throw_error = T)
test_names |
a vector of column names to test |
df |
the |
throw_error |
boolean indicating whether to throw an error if
any |
boolean indicating whether or not all test_names
are
columns of df
Check for duplicates in a vector
.check_dups(x, name = "")
.check_dups(x, name = "")
x |
the vector |
name |
the name of the object to print in an error message if duplicates are found |
extract the elements from a GCT
object
where the values of row_field
and col_field
are the same. A concrete example is if g
represents
a matrix of signatures of genetic perturbations, and you wan
to extract all the values of the targeted genes.
.extract.gct( g, row_field, col_field, rdesc = NULL, cdesc = NULL, row_keyfield = "id", col_keyfield = "id" )
.extract.gct( g, row_field, col_field, rdesc = NULL, cdesc = NULL, row_keyfield = "id", col_keyfield = "id" )
g |
the GCT object |
row_field |
the column name in rdesc to search on |
col_field |
the column name in cdesc to search on |
rdesc |
a |
cdesc |
a |
row_keyfield |
the column name of |
col_keyfield |
the column name of |
a list of the following elements
a logical matrix of the same dimensions as
ds@mat
indicating which matrix elements have
been extracted
an array index into ds@mat
representing which elements have been extracted
a vector of the extracted values
Check if x is a whole number
.is.wholenumber(x, tol = .Machine$double.eps^0.5)
.is.wholenumber(x, tol = .Machine$double.eps^0.5)
x |
number to test |
tol |
the allowed tolerance |
boolean indicating whether x is tol away from a whole number value
Pad a matrix with additional rows/columns of NA values
.na_pad_matrix(m, row_universe = NULL, col_universe = NULL)
.na_pad_matrix(m, row_universe = NULL, col_universe = NULL)
m |
a matrix with unique row and column names |
row_universe |
a vector with the universe of possible row names |
col_universe |
a vector with the universe of possible column names |
a matrix
Parse a GCTX file into the workspace as a GCT object
.parse.gctx(fname, rid = NULL, cid = NULL, matrix_only = FALSE)
.parse.gctx(fname, rid = NULL, cid = NULL, matrix_only = FALSE)
fname |
path to the GCTX file on disk |
rid |
either a vector of character or integer row indices or a path to a grp file containing character row indices. Only these indicies will be parsed from the file. |
cid |
either a vector of character or integer column indices or a path to a grp file containing character column indices. Only these indicies will be parsed from the file. |
matrix_only |
boolean indicating whether to parse only the matrix (ignoring row and column annotations) |
parse.gctx
also supports parsing of plain text
GCT files, so this function can be used as a general GCT parser.
Other GCTX parsing functions:
.append.dim()
,
.fix.datatypes()
,
.process_ids()
,
.read.gctx.ids()
,
.read.gctx.meta()
,
.write.gct()
,
.write.gctx()
,
.write.gctx.meta()
Read a GMT file and return a list
.parse.gmt(fname)
.parse.gmt(fname)
fname |
the file path to be parsed |
parse.gmt
returns a nested list object. The top
level contains one list per row in fname
. Each of
these is itself a list with the following fields:
- head
: the name of the data (row in fname
)
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
a list of the contents of fname
. See details.
http://clue.io/help for details on the GMT file format
Other CMap parsing functions:
.parse.gmx()
,
.parse.grp()
,
.write.gmt()
,
.write.grp()
Read a GMX file and return a list
.parse.gmx(fname)
.parse.gmx(fname)
fname |
the file path to be parsed |
parse.gmx
returns a nested list object. The top
level contains one list per column in fname
. Each of
these is itself a list with the following fields:
- head
: the name of the data (column in fname
)
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
a list of the contents of fname
. See details.
http://clue.io/help for details on the GMX file format
Other CMap parsing functions:
.parse.gmt()
,
.parse.grp()
,
.write.gmt()
,
.write.grp()
Read a GRP file and return a vector of its contents
.parse.grp(fname)
.parse.grp(fname)
fname |
the file path to be parsed |
a vector of the contents of fname
http://clue.io/help for details on the GRP file format
Other CMap parsing functions:
.parse.gmt()
,
.parse.gmx()
,
.write.gmt()
,
.write.grp()
Read GCTX row or column ids
.read.gctx.ids(gctx_path, dimension = "row")
.read.gctx.ids(gctx_path, dimension = "row")
gctx_path |
path to the GCTX file |
dimension |
which ids to read (row or column) |
a character vector of row or column ids from the provided file
Other GCTX parsing functions:
.append.dim()
,
.fix.datatypes()
,
.parse.gctx()
,
.process_ids()
,
.read.gctx.meta()
,
.write.gct()
,
.write.gctx()
,
.write.gctx.meta()
Parse row or column metadata from GCTX files
.read.gctx.meta(gctx_path, dimension = "row", ids = NULL)
.read.gctx.meta(gctx_path, dimension = "row", ids = NULL)
gctx_path |
the path to the GCTX file |
dimension |
which metadata to read (row or column) |
ids |
a character vector of a subset of row/column ids for which to read the metadata |
a data.frame
of metadata
Other GCTX parsing functions:
.append.dim()
,
.fix.datatypes()
,
.parse.gctx()
,
.process_ids()
,
.read.gctx.ids()
,
.write.gct()
,
.write.gctx()
,
.write.gctx.meta()
Update the matrix of an existing GCTX file
.update.gctx(x, ofile, rid = NULL, cid = NULL)
.update.gctx(x, ofile, rid = NULL, cid = NULL)
x |
an array of data |
ofile |
the filename of the GCTX to update |
rid |
integer indices or character ids of the rows to update |
cid |
integer indices or character ids of the columns to update |
Overwrite the rows and columns of ofile
as indicated by rid
and cid
respectively.
rid
and cid
can either be integer indices
or character ids corresponding to the row and column ids
in ofile
.
Write a GCT object to disk in GCT format
.write.gct(ds, ofile, precision = 4, appenddim = T, ver = 3)
.write.gct(ds, ofile, precision = 4, appenddim = T, ver = 3)
ds |
the GCT object |
ofile |
the desired output filename |
precision |
the numeric precision at which to
save the matrix. See |
appenddim |
boolean indicating whether to append matrix dimensions to filename |
ver |
the GCT version to write. See |
Since GCT is text format, the higher precision
you choose, the larger the file size.
ver
is assumed to be 3, aka GCT version 1.3, which supports
embedded row and column metadata in the GCT file. Any other value
passed to ver
will result in a GCT version 1.2 file which
contains only the matrix data and no annotations.
Other GCTX parsing functions:
.append.dim()
,
.fix.datatypes()
,
.parse.gctx()
,
.process_ids()
,
.read.gctx.ids()
,
.read.gctx.meta()
,
.write.gctx()
,
.write.gctx.meta()
Write a GCT object to disk in GCTX format
.write.gctx( ds, ofile, appenddim = T, compression_level = 0, matrix_only = F, max_chunk_kb = 1024 )
.write.gctx( ds, ofile, appenddim = T, compression_level = 0, matrix_only = F, max_chunk_kb = 1024 )
ds |
a GCT object |
ofile |
the desired file path for writing |
appenddim |
boolean indicating whether the resulting filename will have dimensions appended (e.g. my_file_n384x978.gctx) |
compression_level |
integer between 1-9 indicating how much to compress data before writing. Higher values result in smaller files but slower read times. |
matrix_only |
boolean indicating whether to write only the matrix data (and skip row, column annotations) |
max_chunk_kb |
for chunking, the maximum number of KB a given chunk will occupy |
Other GCTX parsing functions:
.append.dim()
,
.fix.datatypes()
,
.parse.gctx()
,
.process_ids()
,
.read.gctx.ids()
,
.read.gctx.meta()
,
.write.gct()
,
.write.gctx.meta()
Write a nested list to a GMT file
.write.gmt(lst, fname)
.write.gmt(lst, fname)
lst |
the nested list to write. See |
fname |
the desired file name |
lst
needs to be a nested list where each
sub-list is itself a list with the following fields:
- head
: the name of the data
- desc
: description of the corresponding data
- len
: the number of data items
- entry
: a vector of the data items
http://clue.io/help for details on the GMT file format
Other CMap parsing functions:
.parse.gmt()
,
.parse.gmx()
,
.parse.grp()
,
.write.grp()
Write a vector to a GRP file
.write.grp(vals, fname)
.write.grp(vals, fname)
vals |
the vector of values to be written |
fname |
the desired file name |
http://clue.io/help for details on the GRP file format
Other CMap parsing functions:
.parse.gmt()
,
.parse.gmx()
,
.parse.grp()
,
.write.gmt()
data.frame
to a tab-delimited text fileWrite a data.frame
to a tab-delimited text file
.write.tbl(tbl, ofile, ...)
.write.tbl(tbl, ofile, ...)
tbl |
the |
ofile |
the desired file name |
... |
additional arguments passed on to |
This method simply calls write.table
with some
preset arguments that generate a unquoated, tab-delimited file
without row names.
Given a GCT object and either a data.frame
or
a path to an annotation table, apply the annotations to the
gct using the given keyfield
.
annotate.gct(g, annot, dimension = "row", keyfield = "id")
annotate.gct(g, annot, dimension = "row", keyfield = "id")
g |
a GCT object |
annot |
a |
dimension |
either 'row' or 'column' indicating which dimension
of |
keyfield |
the character name of the column in |
a GCT object with annotations applied to the specified dimension
Other GCT utilities:
melt.gct()
,
merge.gct()
,
rank.gct()
,
subset.gct()
reads ranked tables from the different tools (KRSA, UKA, ... etc)
combine_tools( KRSA_df = NULL, UKA_df = NULL, KEA3_df = NULL, PTM_SEA_df = NULL, mapping_df = kinome_mp_file )
combine_tools( KRSA_df = NULL, UKA_df = NULL, KEA3_df = NULL, PTM_SEA_df = NULL, mapping_df = kinome_mp_file )
KRSA_df |
dataframe, KRSA table output (requires at least Kinase and Score columns) |
UKA_df |
dataframe, UKA table output (requires at least Kinase and Score columns) |
KEA3_df |
dataframe, KEA table output (requires at least Kinase and Score columns) |
PTM_SEA_df |
dataframe, PTM_SEA table output (requires at least Kinase and Score columns) |
mapping_df |
kinome mapping df (default is kinome_mp_file_v1) |
This function takes in ranked tables from the different tools (KRSA, UKA, ... etc) and map them to the kinome mapping file and return df ready for the quartile figure
dataframe, ready for quartile figure
reads KRSA, UKA, LFC tables and run creedenzymatic
creedenzymatic( KRSA_table, UKA_table, LFC_table, avg_krsa = T, avg_lfc = T, prefix = "Comp1", ... )
creedenzymatic( KRSA_table, UKA_table, LFC_table, avg_krsa = T, avg_lfc = T, prefix = "Comp1", ... )
KRSA_table |
dataframe, KRSA table output |
UKA_table |
dataframe, UKA table output |
LFC_table |
dataframe, KEA table output |
... |
arguments passed to other functions |
This function takes in table and rank and quartile kinases based on the absolute Score values
dataframe, Ranked and quartiled table
reads combined dataframe (ranked and quartiled) and extracts top kinases based on adjustable criteria
extract_top_kinases(combined_df, min_qrt, min_counts)
extract_top_kinases(combined_df, min_qrt, min_counts)
combined_df |
dataframe, Ranked and quartiled dataframe |
min_qrt |
integer, minimum quartile to count |
min_counts |
integer, number of minimum hits |
This function takes in the combined dataframe (ranked and quartiled) and extracts top kinases based on adjustable criteria
vector, top kinases
The GCT class serves to represent annotated
matrices. The mat
slot contains said data and the
rdesc
and cdesc
slots contain data frames with
annotations about the rows and columns, respectively
mat
a numeric matrix
rid
a character vector of row ids
cid
a character vector of column ids
rdesc
a data.frame
of row descriptors
rdesc
a data.frame
of column descriptors
src
a character indicating the source (usually file path) of the data
parse.gctx
, write.gctx
, read.gctx.meta
, read.gctx.ids
http://clue.io/help for more information on the GCT format
A data frame of CDRL Complete mapping file (UKA+KRSA+KEA3+PTM-SEA) (Latest Version)
kinome_mp_file
kinome_mp_file
A data frame with 527 rows and 26 variables:
A data frame of CDRL Complete mapping file (UKA+KRSA+KEA3+PTM-SEA) (Version 1)
kinome_mp_file_v1
kinome_mp_file_v1
A data frame with 503 rows and 14 variables:
A data frame of CDRL Complete mapping file (UKA+KRSA+KEA3+PTM-SEA) (Version 2)
kinome_mp_file_v2
kinome_mp_file_v2
A data frame with 514 rows and 12 variables:
A data frame of CDRL Complete mapping file (UKA+KRSA+KEA3+PTM-SEA) (Version 3)
kinome_mp_file_v3
kinome_mp_file_v3
A data frame with 530 rows and 26 variables:
A data frame of CDRL Complete mapping file (UKA+KRSA+KEA3+PTM-SEA) (Version 4)
kinome_mp_file_v4
kinome_mp_file_v4
A data frame with 527 rows and 26 variables:
data.table
(aka 'melt')Utilizes the data.table::melt
function to transform the
matrix into long form. Optionally can include the row and column
annotations in the transformed data.table
.
melt.gct( g, suffixes = NULL, remove_symmetries = F, keep_rdesc = T, keep_cdesc = T, ... )
melt.gct( g, suffixes = NULL, remove_symmetries = F, keep_rdesc = T, keep_cdesc = T, ... )
g |
the GCT object |
suffixes |
the character suffixes to be applied if there are collisions between the names of the row and column descriptors |
remove_symmetries |
boolean indicating whether to remove
the lower triangle of the matrix (only applies if |
keep_rdesc |
boolean indicating whether to keep the row descriptors in the final result |
keep_cdesc |
boolean indicating whether to keep the column descriptors in the final result |
... |
further arguments passed along to |
a data.table
object with the row and column ids and the matrix
values and (optinally) the row and column descriptors
Other GCT utilities:
annotate.gct()
,
merge.gct()
,
rank.gct()
,
subset.gct()
Merge two GCT objects together
merge.gct(g1, g2, dimension = "row", matrix_only = F)
merge.gct(g1, g2, dimension = "row", matrix_only = F)
g1 |
the first GCT object |
g2 |
the second GCT object |
dimension |
the dimension on which to merge (row or column) |
matrix_only |
boolean idicating whether to keep only the
data matrices from |
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
rank.gct()
,
subset.gct()
A data frame of CDRL Complete mapping CDRL Complete mapping of peptides - used for ptm-sea (PTK PamChip 86402)
ptk_pamchip_86402_array_layout_ptmsea
ptk_pamchip_86402_array_layout_ptmsea
A data frame with x rows and x variables:
A data frame of CDRL Complete mapping CDRL Complete mapping of peptides to HGNC symbols (PTK PamChip 86402)
ptk_pamchip_86402_mapping
ptk_pamchip_86402_mapping
A data frame with 193 rows and 2 variables:
Takes the combined ranked dataframe (KRSA, UKA, .. etc) and generate a quartile figure
quartile_figure(df, grouping = "KinaseFamily")
quartile_figure(df, grouping = "KinaseFamily")
df |
dataframe, combined mapped tables |
grouping |
character to choose grouping (KinaseFamily, subfamily, or group). Default is KinaseFamily |
ggplot figure
This function will scale the scores on a percentile and quartile scales
rank_kinases( df, trns = c("raw", "abs"), sort = c("desc", "asc"), tool = c("KRSA", "UKA") )
rank_kinases( df, trns = c("raw", "abs"), sort = c("desc", "asc"), tool = c("KRSA", "UKA") )
df |
dataframe with 2 columns: Kinase, Score |
trns |
for transformation of the score, the values accepted for this argument are abs and raw (abs: use absolute values of scores, raw: no transformation) |
sort |
accepts either asc or desc (ascending and descending) |
tool |
specifying the name of the tool |
Convert a GCT object's matrix to ranks
rank.gct(g, dim = "col", decreasing = T)
rank.gct(g, dim = "col", decreasing = T)
g |
the |
dim |
the dimension along which to rank (row or column) |
decreasing |
boolean indicating whether higher values should get lower ranks |
a modified version of g
, with the
values in the matrix converted to ranks
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
merge.gct()
,
subset.gct()
reads a dataframe of Peptides IDs and their Scores (LFC, p-value, ... etc) and run KEA3 on a subset of these peptides or all of them
read_kea( df, filter = T, cutoff = 0.2, cutoff_abs = T, direction = "higher", rm_duplicates = T, method = "MeanRank", lib = c("kinase-substrate"), ... )
read_kea( df, filter = T, cutoff = 0.2, cutoff_abs = T, direction = "higher", rm_duplicates = T, method = "MeanRank", lib = c("kinase-substrate"), ... )
df |
dataframe, must have at least Peptide and Score columns |
filter |
boolean to subset peptides or not |
cutoff |
numeric to act as the cutoff to filter out peptides |
cutoff_abs |
boolean (use absolute value or not) default is TRUE |
direction |
("lower", "higher) filter based on lower than or higher than the cutoff values (default to "higher") |
rm_duplicates |
boolean (TRUE or FALSE) remove genes duplicates |
method |
"MeanRank" takes the mean rank across all libraries or "MeanFDR" takes the mean of FDR across all libraries (default is "MeanRank") |
lib |
searched kea libraries "kinase-substrate" or "all" (default is "kinase-substrate" which will return only kinase libraries like ChengKSIN, PTMsigDB, PhosDAll) |
... |
arguments passed to rank_kinases function |
This function a dataframe of Peptides IDs and their Scores (LFC, p-value, ... etc) and run KEA3 on a subset of these peptides or all of them
dataframe, Ranked and quartiled table
reads KRSA table and checks for correct format
read_krsa(df, ...)
read_krsa(df, ...)
df |
dataframe, table output (requires at least Kinase and Score columns) |
... |
arguments passed to rank_kinases function |
This function takes in table and rank and quartile kinases based on the absolute Score values
dataframe, Ranked and quartiled table
reads a dataframe of Peptides IDs and their Scores (LFC, p-value, ... etc) and run PTM-SEA
read_ptmsea(df, ...)
read_ptmsea(df, ...)
df |
dataframe, must have at least Peptide and Score columns |
... |
arguments passed to run ptm-sea function |
lib |
searched PTM-SEA libraries "kinase-substrate" or "all" (default is "kinase-substrate" which will return only kinase libraries like ChengKSIN, PTMsigDB, PhosDAll) |
This function a dataframe of Peptides IDs and their Scores (LFC, p-value, ... etc) and run PTM-SEA
dataframe, Ranked and quartiled table
reads UKA table and checks for correct format
read_uka(df, ...)
read_uka(df, ...)
df |
dataframe, UKA table output (requires at least Kinase and Z columns) |
... |
arguments passed to rank_kinases function |
This function takes in UKA table and rank and quartile kinases based on the absolute Score values
dataframe, Ranked and quartiled UKA table
This function takes in HGNC gene symbols and connect to KEA3 API and returns results
run_kea(gene_set, lib = "kinase-substrate")
run_kea(gene_set, lib = "kinase-substrate")
gene_set |
vector, HGNC gene symbols based on the differentially phosphorylated peptides |
lib |
searched kea libraries "kinase-substrate" or "all" (default is "kinase-substrate" which will return only kinase libraries like ChengKSIN, PTMsigDB, PhosDAll) |
list, tables from each KEA3 library
This function takes in a gct file (created by the read_prmsea function) and run PTM-SEA API and returns results
run_ptmsea(gct_object, lib = "iptmnet", nperm = 1000, min.overlap = 1, ...)
run_ptmsea(gct_object, lib = "iptmnet", nperm = 1000, min.overlap = 1, ...)
lib |
searched kea libraries "iptmnet" or "ptm-sea" or "all (default is "iptmnet" which uses the iptmnet mapping) |
nperm |
number of permutations |
min.overlap |
minimum overlap of target peptides with referernce peptides sets |
... |
additional arguments passed to the ssGSEA_ce function |
gene_set |
vector, HGNC gene symbols based on the differentially phosphorylated peptides |
list
A data frame of CDRL Complete mapping CDRL Complete mapping of peptides - used for ptm-sea (STK PamChip 87102)
stk_pamchip_87102_array_layout_ptmsea
stk_pamchip_87102_array_layout_ptmsea
A data frame with x rows and x variables:
A data frame of CDRL Complete mapping CDRL Complete mapping of peptides to HGNC symbols (STK PamChip 87102)
stk_pamchip_87102_mapping
stk_pamchip_87102_mapping
A data frame with 141 rows and 2 variables:
Subset a gct object using the provided row and column ids
subset.gct(g, rid = NULL, cid = NULL)
subset.gct(g, rid = NULL, cid = NULL)
g |
a gct object |
rid |
a vector of character ids or integer indices for ROWS |
cid |
a vector of character ids or integer indices for COLUMNS |
Other GCT utilities:
annotate.gct()
,
melt.gct()
,
merge.gct()
,
rank.gct()
Transpose a GCT object
transpose.gct(g)
transpose.gct(g)
g |
the |
a modified verion of the input GCT
object
where the matrix has been transposed and the row and column
ids and annotations have been swapped.
A data frame of the UKA Complete DB mapping File (STK + PTK)
uka_db_full
uka_db_full
A data frame with 11385 rows and 17 variables: