creedenzymatic

Intro

creedenzymatic is a pipeline R package to combine, standardize and visualize kinomic analyses from different tools (KRSA, UKA, KEA3, PTM-SEA)

installation

# install.packages("devtools")
#devtools::install_github("kalganem/creedenzymatic")
library(creedenzymatic)
library(tidyverse)

theme_set(theme_bw())

Input

The different kinomic tools (KRSA, UKA, KEA3, PTM-SEA) produce different outputs with various scoring metrics. To standardize the input in this package, we want to extract only two variables from each tool: Kinase and Score.

Kinase: will be the main identifier used in each tool. For example, in KRSA this variable will be indicating kinase family, but with UKA it will be individual kinases.

Score: The main metric used in the corresponding tool. For example, in KRSA this will be a Z score, in UKA it will be either kinase statistic or kinase score.

Input Format

The format of the input is a dataframe with two columns: Kinase and Score. Many of these tools give outputs with many columns beyond these two. So, we need to convert the output from these tools to fit the input format needed here. An example from KRSA and UKA is shown below.

KRSA Example

Usually the final KRSA output has many columns like: Kinase, SamplingAverage, SD, Z, … etc. So, we need to read that file and only select Kinase and Z score columns, and then rename the Z score column to “Score”


# reading an example of KRSA output
krsa_ex <- read_delim("data_files/KRSA_ex.txt", delim = "\t")

krsa_ex
#> # A tibble: 24 × 5
#>    Kinase Observed SamplingAvg    SD     Z
#>    <chr>     <dbl>       <dbl> <dbl> <dbl>
#>  1 DDR          31       21.1   2.65 3.71 
#>  2 TEC          92       81.9   3.09 3.26 
#>  3 SRC         101       94.6   2.70 2.36 
#>  4 ABL          44       37.6   3.14 2.03 
#>  5 PDGFR        86       80.0   3.19 1.89 
#>  6 INSR         24       19.4   2.51 1.83 
#>  7 FRK          11        7.90  1.70 1.83 
#>  8 MET          23       20.5   2.61 0.945
#>  9 SYK          78       75.0   3.23 0.930
#> 10 CSK          45       42.0   3.34 0.912
#> # ℹ 14 more rows

Now we select the Kinase and Z score columns, and then rename the Z score column to “Score”


# selecting the Kinase and Z columns, and renaming Z to Score
krsa_ex %>% select(Kinase, Z) %>% 
  rename(Score = Z) -> krsa_ex


krsa_ex
#> # A tibble: 24 × 2
#>    Kinase Score
#>    <chr>  <dbl>
#>  1 DDR    3.71 
#>  2 TEC    3.26 
#>  3 SRC    2.36 
#>  4 ABL    2.03 
#>  5 PDGFR  1.89 
#>  6 INSR   1.83 
#>  7 FRK    1.83 
#>  8 MET    0.945
#>  9 SYK    0.930
#> 10 CSK    0.912
#> # ℹ 14 more rows

UKA Exmaple

We do the same with the UKA output, take raw output and select kinase and either the kinase statistic or final score as the scoring metric


# reading an example of UKA output
uka_ex <- read_delim("data_files/UKA_ex.txt", delim = "\t")

uka_ex
#> # A tibble: 76 × 10
#>    `Kinase Uniprot ID` `Kinase Name` `Mean Specificity Score`
#>    <chr>               <chr>                            <dbl>
#>  1 P06239              Lck                              1.82 
#>  2 P07948              Lyn                              1.54 
#>  3 Q04912              Ron                              1.07 
#>  4 P42685              FRK                              1.30 
#>  5 P00519              Abl                              1.33 
#>  6 Q06418              Tyro3/Sky                        1.12 
#>  7 P12931              Src                              0.994
#>  8 P16234              PDGFR[alpha]                     0.910
#>  9 P42684              Arg                              0.959
#> 10 P06241              Fyn                              1.04 
#> # ℹ 66 more rows
#> # ℹ 7 more variables: `Mean Significance Score` <dbl>,
#> #   `Mean Final Score` <dbl>, `Median Final score` <dbl>,
#> #   `Mean Kinase Statistic` <dbl>, `Median Kinase Statistic` <dbl>,
#> #   `SD Kinase Statitistic` <dbl>, `Mean peptide set size` <dbl>

Let’s say we want the final score as the metric we want to use, so we select the two columns: kinase and median final score


# selecting the Kinase and kinase statistic columns, and renaming kinase statistic to Score
uka_ex %>% select(`Kinase Name`, `Median Final score`) %>% 
  rename(Kinase = `Kinase Name`, Score = `Median Final score`) -> uka_ex

uka_ex
#> # A tibble: 76 × 2
#>    Kinase       Score
#>    <chr>        <dbl>
#>  1 Lck           2.77
#>  2 Lyn           2.52
#>  3 Ron           2.50
#>  4 FRK           2.35
#>  5 Abl           2.29
#>  6 Tyro3/Sky     2.19
#>  7 Src           2.07
#>  8 PDGFR[alpha]  2.06
#>  9 Arg           1.97
#> 10 Fyn           1.95
#> # ℹ 66 more rows

Using the creedenzymatic package

There are dedicated functions to read and rank KRSA and UKA tables in the creedenzymatic package (read_krsa, read_uka). These two functions will check the input format (must include two columns: Kinase and Score) and then will the rank_kinases function with the specified arguments. The argument are trns and sort. The trns argument is for transformation of the score, the values accepted for this argument are abs and raw (abs: use absolute values of scores, raw: no transformation). The sort argument accepts either asc or desc (ascending and descending)


# read and rank the KRSA table and use absolute values and descending sorting
read_krsa(krsa_ex, trns = "abs", sort = "desc") -> krsa_table_ranked

# read and rank the UKA table and use absolute values and descending sorting
read_uka(uka_ex, trns = "abs", sort = "desc") -> uka_table_ranked


# preview of a ranked KRSA table
krsa_table_ranked
#> # A tibble: 24 × 6
#>    Kinase Score  Rank Method  Perc   Qrt
#>    <chr>  <dbl> <int> <chr>  <dbl> <dbl>
#>  1 DDR    3.71      1 KRSA   1         4
#>  2 TEC    3.26      2 KRSA   0.957     4
#>  3 SRC    2.36      3 KRSA   0.913     4
#>  4 ABL    2.03      4 KRSA   0.870     4
#>  5 PDGFR  1.89      5 KRSA   0.826     4
#>  6 INSR   1.83      6 KRSA   0.783     4
#>  7 FRK    1.83      7 KRSA   0.739     3
#>  8 MET    0.945     8 KRSA   0.696     3
#>  9 SYK    0.930     9 KRSA   0.652     3
#> 10 CSK    0.912    10 KRSA   0.609     3
#> # ℹ 14 more rows

Now we combine the ranked tables into one dataframe using the combine_tools() function.


# combine ranked tables
combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked) -> combined_df


combined_df
#> # A tibble: 153 × 10
#>    Kinase       Score  Rank Method  Perc   Qrt hgnc_symbol group KinaseFamily
#>    <chr>        <dbl> <int> <chr>  <dbl> <dbl> <chr>       <chr> <chr>       
#>  1 LCK           2.77     1 UKA    1         4 LCK         TK    SRC         
#>  2 LYN           2.52     2 UKA    0.987     4 LYN         TK    SRC         
#>  3 RON           2.50     3 UKA    0.973     4 MST1R       TK    MET         
#>  4 FRK           2.35     4 UKA    0.96      4 FRK         TK    SRC         
#>  5 ABL           2.29     5 UKA    0.947     4 ABL1        TK    ABL         
#>  6 TYRO3/SKY     2.19     6 UKA    0.933     4 TYRO3       TK    AXL         
#>  7 SRC           2.07     7 UKA    0.92      4 SRC         TK    SRC         
#>  8 PDGFR[ALPHA]  2.06     8 UKA    0.907     4 PDGFRA      TK    PDGFR       
#>  9 ARG           1.97     9 UKA    0.893     4 ABL2        TK    ABL         
#> 10 FYN           1.95    10 UKA    0.88      4 FYN         TK    SRC         
#> # ℹ 143 more rows
#> # ℹ 1 more variable: subfamily <chr>

# to save file
# write_delim(combined_df,"ce_combined_ranked_file.txt", delim = "\t")

We can visualize the results using the quartile_figure() function (which will return a ggplot). Let’s we want to plot all of the kinases found in quartile 3 or 4 either in KRSA or UKA.


# filter out kinases found in quartile 3 or 4 either in KRSA or UKA and use the quartile_figure() for visualization

combined_df %>% filter(Qrt >= 3) %>% 
  pull(hgnc_symbol) %>% unique() -> sig_kinases

combined_df %>% 
  filter(hgnc_symbol %in% sig_kinases) %>% 
  quartile_figure()

KEA3

To run KEA3, we need a subset of peptides as an input (this subset of peptides could represent top differentially phosphorylated peptides, peptides mapped to a recombinant kinase, … etc). Both KRSA and UKA calculate log2 fold change (LFC) values of peptide phosphorylation levels. These LFC values could be used to determine the top peptides using either an LFC cutoff or rank order of peptides.


# an example of file with peptide IDs and LFC
kea_ex <- read_delim("data_files/KEA3_ex.txt", delim = "\t")

kea_ex %>% 
  select(Peptide, Score = totalMeanLFC) -> kea_ex

# run kea3, using the peptide list with a cutoff value
read_kea(kea_ex, filter = T, cutoff = 0.4, cutoff_abs = T, sort = "asc", trns = "abs",
        rm_duplicates = T, method = "MeanRank", lib = "kinase-substrate") -> kea_table_ranked

# quartile figure using kinase family as the grouping factor
combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, KEA3_df = kea_table_ranked) %>% 
  filter(hgnc_symbol %in% sig_kinases) %>% 
  quartile_figure()


# quartile figure using kinase groups as the grouping factor (more useful in STK runs)
combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, KEA3_df = kea_table_ranked) %>% 
  filter(hgnc_symbol %in% sig_kinases) %>% 
  quartile_figure(grouping = "group")

PTM-SEA

To run PTM-SEA, we need the LFC table of peptides as an input (Same input as KEA3)



read_ptmsea(kea_ex) -> ptm_table_ranked


# quartile figure using kinase family as the grouping factor
combine_tools(KRSA_df = krsa_table_ranked, UKA_df = uka_table_ranked, 
              KEA3_df = kea_table_ranked, PTM_SEA_df =ptm_table_ranked ) %>%
  filter(hgnc_symbol %in% sig_kinases) %>% 
  quartile_figure()


# load("data_files/runData.RData")
# 
# aFull %>% 
#   select(ClassName, SetStat) %>% 
#   group_by(ClassName) %>% 
#   mutate(meanSetStat = mean(SetStat,na.rm = T)) %>% ungroup() %>% 
#   ggplot(aes(reorder(ClassName, meanSetStat), SetStat)) +
#   geom_boxplot(aes(fill = meanSetStat)) +
#   coord_flip()