Wednesday, May 14, 2008

Prediction of phospho-proteins from sequence

I want to be able to predict what proteins in a proteome are more likely to be regulated by phosphorylation and hopefully use mostly sequence information. This post is a quick note to show what I have tried and maybe get some feedback from people that might have tried this before.

The most straightforward way to predict the phospho-proteins is to use existing phospho-site predictors in some way. I have used the GPS 2.0 predictor on the S. cerevisiea proteome with medium cutoff and including only Serine/Threonine kinases. The fraction of tyrosine phosphosites in S. cerevisiae is very low so I decided to for now not try to predict tyrosine phosphorylation.

This produces a ranked list of 4E6 putative phosphosites for the roughly 6000 proteins scored according to the predictor (each site is scored for multiple kinases). My question is how to best make use of these predictions if I mostly want to know what proteins are phosphorylated and not the exact sites. Using a set of known phosphorylated proteins in S. cerevisiae (mostly taken from expasy) I computed different final scores as a function of the of all phospho-site scores:
1) the sum
2) the highest value
3) the average
4) the sum of putative scores if they were above a threshold (4,6,10)
5) the sum of putative phosphosite scores if they were outside ordered protein segments as defined by a secondary structure predictor and above a score threshold

The results are summarized with the area under the ROC curve (known phosphoproteins were considered positives and all other negatives) :


In summary, the sum of all phospho-site scores is the best way that I found so far to predict what proteins are phospho-regulated. My interpretation is that phospho-regulated proteins tend to be multi-phosphorylated and/or regulated by multiple kinases so the maximum site score will not work as well as the sum. As a side note, although there are abundance biases in mass-spec data (the source of most of the phospho-data) protein abundance is a very poor predictor of phospho-regulation (AROC=0.55).

Disregarding putative sites outside predicted secondary structured protein segments did not improve the predictions as I would expect but I should try a few disorder predictors.

Ideas for improvements are welcomed, in particular sequence based methods. I would also like to avoid comparative genomics for now.

Wednesday, May 07, 2008

Drug-drug interactions and network connectivity

How does the effect of drug-drug combinations relate to the cellular interactions of their targets ? Last year, Joseph Lehár and colleagues published a paper in MSB looking into this question.

One way to study the effect of drug combinations on growth of a bacteria for example is to measure the inhibition of growth of all possible combinations of serially diluted doses of two combined drugs and plotting dose-matrices like the ones shown in figure 1 of the paper and shown here adapted from the paper. In fig1A the authors show how the combined effect of increasing doses of two drugs inhibit the growth of a methicillin-resistant Staphylococcus aureus strain. Light colors are equivalent to a strong inhibition of drug. One observation from this figure is that the two drugs can inhibit the growth of this strain in an additive fashion. The question the authors tried to address in this paper is how much does this sort of dose-matrix inform us about the possible interactions of the targets. The drugs could be interacting with the same target, different targets in the same pathway/complex, targets in different pathways both required for growth, etc.

In order to study this they first simulated an abstract metabolic network (using ODEs, see model file in Sup) with two different pathways required for growth, with branched and linear blocks and one negative feedback (see Fig3 in the paper). They simulated the effect of increasing drugs in their models by decreasing the enzyme activities of the simulated targets. For each possible drug-drug combination they then calculated the predicted dose-matrix effect on growth (pathway output). The observed that by fitting the obtained dose-matrices to 4 types of classical dose-matrix models (described in Fig2) they could predict where in this network the two targets would more likely be.
As an example , two sequential targets in an unbranched section of the network embedded in an negative feedback produces a dose-matrix that best fits a potentiation model (shown here, adapted from Fig3).

Having established by simulations that there is information on the drug-matrices that relate to the interaction of their targets they then tested the effect of 10 known antifungal drugs on the sterol pathway (also well established) of Candida glabrata. For each drug-drug combination they tried to fit the experimental dose matrices to the same 4 models and compared the best model fit to the expected for the position of the targets in the sterol pathway. For many cases (72%) the best model fit was the same as predicted from the sterol pathway model but only 54% of the best-fit models were unambiguous. There were some cases were drug-with-itself dose matrices (positive control) did not appear additive as expected. The authors mention that this is due to the "instability in the measured potency of a drug" but I am not sure why a drug-with-itself matrix would not be reproducible.

Finally the authors further tested this relation between drug combinations and target interactions by experimentally measuring drug dose-matrices for 94 drug/compounds in human(HCT116) tumor cells (see text for details).

In summary, even if the prediction accuracy is far from perfect, this work shows that it should be possible to either:
1 - use known pathway models plus drug dose-matrices to improve prediction of the most likely targets of the drugs
2 - use known drug-target relationships plus the drug dose-matrices to predict the network connectivity

One obvious complication is the multiple drug targets for the same compound that would reduce the usefulness of the predictions. Some interesting extensions could be to test drug-drug interactions in KO strains or in combinations with RNAi knock-downs
or protein over-expressions.