Friday, August 21, 2009

PLoS Currents - rapid dissemination of knowledge

PLoS unveiled recently an initiative they call PLoS Currents. It is an experiment in rapid dissemination of research built on top of Google Knol. Essentially, a community of people dedicated to a specific topic, could use PLoS Currents to describe their ongoing work before it is submitted to a peer review journal. They have focused their initial efforts to Influenza research where the speed of dissemination of information might be crucial.

The content of this PLoS Currents: Influenza is not peer reviewed but is moderated by a panel of scientists that will strive to keep the content on topic. There is a FAQ explaining in more detail the initiative. These articles are archived, citable, they can be revised and they should not be considered as peer-reviewed publications. For this reason, PLoS encourages authors to eventually submit these works to a peer-reviewed journal. It remains to be seen how other publishers will react to submissions that are available in these rapid dissemination portals.

PLoS Currents vs Nature Precedings
This initiative is somewhat related to the preprint archives like Nature Precedings and arxive. The main differences seam to be a stronger emphasizes on community moderators and the use of 3rd party technology (Google Knol). The community moderators, which I assume are researchers working on Influenza could be decisive factor in ensuring that other researchers in the field at least know about the project. Using Google Knol lets PLoS focus on the community and hopefully help them get the technical support from Google to develop new tools are they are needed. However the website currently looks a little bit like a hack, which is the downside of using a 3rd party technology. For example, we can click the edit button and see options to change the main website .. although obviously the permissions do not allow us to save these changes.

I think it is an interesting experiment and hopefully more bio-related researchers will get comfortable with sharing and discussing ongoing research before publication. I still believe this would reduce wasteful overlaps.  As usual, I only fear that more of these experiments tend to fragment the required critical mass for such a community site to work.

Tuesday, August 11, 2009

Translationally optimal codons do not appear to significantly associate with phosphorylation sites

I recently read an interesting paper about codon bias at structurally important sites that sent me on a small detour from my usual activities. Tong Zhou, Mason Weems and Claus Wilke, described how translationally optimal codons are associated with structurally important sites in proteins, such as the protein core (Zhou et al. MBE 2009). This work is a continuation of the work from this same lab on what constraints protein evolution. I have written here before a short review of the literature on the subject. As a reminder, it was observed that the expression level is the strongest constraint on a protein's rate of change with highly expressed genes coding for proteins that diverge slower than lowly expressed ones (Drummond et al. MBE 2006). It is currently believed that selection against translation errors is the main driving force restricting this rate of change (Drummond et al. PNAS 2005,Drummond et al. Cell 2008). It has been previously shown that translation rates are introduced, on average, at an order of about 1 to 5 per 10000 codons and that different codons can differ in their error rates by 4 to 9 fold, influenced by translational properties like the availability of their tRNAs (Kramer et al. RNA 2007).

Given this background of information what Zhou and colleagues set out to do, was test if codons that are associated with highly expressed genes tend to be over-represented at structurally important sites. The idea being that such codons, defined as "optimal codons" are less error prone and therefore should be avoided at positions that, when miss-translated, could destabilize proteins. In this work they defined a measure of codon optimality as the odds ratio of codon usage between highly and lowly expressed genes. Without going into many details they showed, in different ways and for different species, that indeed, codon optimality is correlated with the odds of being at a structurally important site.

I decided to test if I could also see a significant association between codon optimality and sites of post-translational modifications. I defined a window of plus or minus 2 amino-acids surrounding a phosphorylation site (of S. cerevisiae) as associated with post-translational modification. The rationale would be that selection for translational robustness could constraint codon usage near a phosphorylation site when compared with other Serine or Threonine sites. For simplification I mostly ignored tyrosine phosphorylation that in S. cerevisiae is a very small fraction of the total phosphorylation observed to date .
For each codon I calculated its over representation at these phosphorylation windows compared to similar windows around all other S/T sites and plotted this value against the log of the codon optimality score calculated by Zhou and colleagues.
Figure 1 - Over-representation of optimal codons at phosphosites
At first impression it would appear that there is a significant correlation between codon optimality and phosphorylation sites. However, as I will try to describe below this is mostly due to differences in gene expression. Given the relatively small number of phosphorylation sites per protein, it is hard to test this association for each protein independently as it was done by Zhou and colleagues for the structurally important sites. The alternative is therefore to try to take into account the differences in gene expression. I first checked if phosphorylated proteins tend to be coded by highly expressed genes.
Figure 2 - Distribution of gene expression of phosphorylated proteins

I figure 2 I plot the distribution of gene expression for phosphorylated and non-phosphorylated proteins. There is only a very small difference observed with phosphoproteins having a marginally higher median gene expression when compared to other proteins. However this difference is small and a KS test does not rule out that they are drawn from the same distribution.

The next possible expression related explanation for the observed correlation would be that highly expressed genes tend to have more phosphorylation sites. Although there is no significant correlation between the gene expression level and the absolute number of phosphorylation sites, what I observed was that highly expressed proteins tend to be smaller in size. This means that there is a significant positive correlation between the fraction of phosphorylated Serine and Threonine sites and gene expression.
Figure 3 - Expression level correlates with fraction of phosphorylated ST sites

Unfortunately, I believe this correlation explains the result observed in figure 1. In order to properly control for this observation I calculated the correlation observed in figure 1 randomizing the phosphorylation sites within each phosphoprotein. To compare I also randomized the phosphorylation sites keeping the total number of phosphorylation sites fixed but not restricting the number of phosphorylation sites within each specific phosphoprotein.

Figure 4 - Distribution of R-squared for randomized phosphorylation sites

When randomizing the phosphorylation sites within each phosphoprotein, keeping the number of phosphorylation sites in each specific phosphoproteins constant the average R-squared is higher than the observed with the experimentally determined phosphorylation sites (pink curve). This would mean that the correlation observed in figure 1 is not due to functional constraints acting on the phosphorylation sites but instead is probably due to the correlation observed in figure 3 between the expression level and the fraction of phosphorylated S/T residues.
The observed correlation would appear to be significantly higher than random if we allow the random phosphorylation sites to be drawn from any phosphoprotein without constraining the number of phosphorylation sites in each specific protein (blue curve). I added this because I thought it was an striking example of how a relatively subtle change in assumptions can change the significance of a score.

I also tested if conserved phosphorylation sites tend to be coded by optimal codons when compared with non-conserved phosphorylation sites. For each phosphorylation site I summed over the codon optimality in a window around the site and compared the distribution of this sum for phosphorylation sites that are conserved in zero, one or more than one species. The conservation was defined based on an alignment window of +/- 10AAs of S. cerevisiae proteins against orthologs in C. albicans, S. pombe, D. melanogaster and H. sapiens.
Figure 5 - Distribution of codon optimality scores versus phospho-site conservation

I observe a higher sum of codon optimality for conserved phosphorylation sites (fig 5A) but this difference is not maintained if the codon optimality score of each peptide is normalized by the expression level of the source protein (fig 5B).

In summary, when the gene expression levels are taken into account, it does not appear to be an association between translationally optimal codons with the region around phosphorylation sites.  This is consistent with the weak functional constraints observed by in analysis performed by Landry and colleagues.

Saturday, August 01, 2009

Drug synergies tend to be context specific

ResearchBlogging.org
A little over a year ago I mentioned a paper published in MSB on how drug-combinations could be used to study pathways. Recently, some of the same authors have now published a study in Nature Biotech analyzing drug combinations under different contexts (i.e. different tissues, different species, different outputs, etc).

The underlying methodology of the study is essentially the same as in above mentioned paper. The authors try to study the effect of combining drugs on specific phenotypes. One example of a phenotype could be the inhibition of growth of a pathogenic strain. Different concentrations of two drugs are combined in a matrix form as described in figure 1a (reproduced below) and the phenotype is measured for each case. Two drugs are said to be synergistic if the measured impact on the phenotype of the combined drugs is greater than expected by a neutral model.
The authors ask themselves if drug synergy is or not context dependent. This is an important question for combinatorial therapeutics since we would like to have treatments that are context dependent (i.e. specific). The most straightforward example would be drug treatments against pathogens. Ideally, combinations of drugs would act synergistically against the pathogens but not against the host. Another example would be drug combinations targeting the expression of a particular gene (ex. TNF-alpha) without showing synergy at targeting general cell viability.

In order to test this the authors performed simulations of E.coli metabolism growing under different conditions and a astonishing  panel of ~94000 experimental dose matrices covering several different types of therapeutic conditions. In each experiment, two drugs are tested against a control and a test phenotype and the synergy is measured and compared. The results are summarized as the synergy of the two drugs in the test case and the selectivity of this synergy towards the test phenotype. In other words, for each experiment the authors tested if the synergistic drug pairs in the test phenotype (ex inhibition of growth of the pathogen) are also acting in synergy on the control phenotype (ex. inhibition of growth of host cells).
I reproduce above fig 2b with the results from the flux balance simulations of E.coli metabolism. In these simulations "drugs" were implemented as ideal enzyme inhibitors that reduced flux of their targets. Each cross on this figure represents a "drug" pair targeting two enzymes of the E.coli metabolism.  The test and control phenotypes are, in this case, fermentation versus aerobic conditions. In this plot the authors show that synergistic drug pairs under fermentation tend to have a high selectivity for that condition when compared to aerobic conditions.

The authors then went on to show that this was also the case for most of the experimental cases studied. Some of the experimental cases included cell lines derived from different tissues, highlighting the complexity of drug-interactions in multicellular organisms. These results are consistent with the observation that negative genetic interactions are poorly conserved across species (Tischler et al. Nat Genet. 2008, Roguev et al. Science 2008). Although these results are promising, in respect to the usefulness of combinatorial therapeutic strategies, they emphasize the degree of divergence of cellular interaction networks across species and perhaps even tissues. I am obviously biased but I think that fundamental studies of chemogenomics across species will help us to better understand the potential of combinatorial therapeutics.

There are several examples in this paper of specific interesting cases of drug synergies but most of the results are in supplementary materials. Given that most of the authors are affiliated with a company I expect that there will be little real therapeutic value in the data. Still, it looks like an interesting set for anyone interested in studying drug-gene networks.

Lehár, J., Krueger, A., Avery, W., Heilbut, A., Johansen, L., Price, E., Rickles, R., Short III, G., Staunton, J., Jin, X., Lee, M., Zimmermann, G., & Borisy, A. (2009). Synergistic drug combinations tend to improve therapeutically relevant selectivity Nature Biotechnology, 27 (7), 659-666 DOI: 10.1038/nbt.1549