Friday, June 26, 2009

Reply: On the evolution of protein length and phosphorylation sites

Lars just pointed out in a blog post that the average protein length of a group of proteins is a strong predictor of average number of phosphorylation sites. Although this is intuitive this is something that I honestly had not fully considered. As Lars mentions this has potential implications for some of the calculations in our recently published study on the evolution of phosphorylation in yeast species.

One potential concern relates to figure 1a. We found that, although protein phosphorylation appears to diverge quickly, there is a high conservation of the relative number of phosphosites per protein for different GO groups. Lars suggests that, at least in part, this could be due to relative differences in average protein size for these different groups that in turn is highly conserved across species.

To test this hypothesis more directly I tried to correct for differences in the average protein size of different functional groups by calculating the average number of phosphorylation sites per amino-acid, instead of psites per protein. These values were then corrected for the average number of phosphorylation sites per AA in the proteome.

As before, there is still a high cross-species correlation for the average number of psites per amino-acid for different GO groups. The correlations are only somewhat smaller than before. The individual correlation coefficients among the three species changed from: S. cerevisiae versus C. albicans – R~0.90 to 0.80; S. cerevisiae versus S. pombe – R~0.91 to 0.84; S. pombe versus C. albicans – R~0.88 to 0.83. It would seem that differences in protein length explains only a small part of the observed correlations. Results in figure 1b are also not qualitative affected by this normalization suggesting that observed differences are not due to potential changes in the average size of proteins. In fact the number of amino acids per GO group is almost perfectly correlated across species.

Another potential concern relates to the sequence based prediction of phosphorylation. As explained in the methods, one of the two approaches used to predict if a protein was phosphorylated was the sum over multiple phosphorylation site predictors for the same sequence. Given the correlation shown by Lars, could it be that, at least for one of the methods, we are mostly predicting the average protein size ? To test this I normalized the phosphorylation prediction for each S. cerevisiae protein by their length. I re-tested the predictive power of this normalized value using ROC curves and the known phosphoproteins of S. cerevisiae as postives. The AROC values changed from 0.73 to 0.68. This shows that the phosphorylation propensity is not just predicting protein size although, as expected from Lars' blog post, size alone is actually a decent predictor for phosphorylation (AROC=0.66). The normalized phosphorylation propensity does not correlate with the protein size (CC~0.05) suggesting that there might ways to improve the predictors we used.

Nature or method bias ?
Are larger proteins more likely to be phosphorylated in a cell or are they more likely to be detected in a mass-spec experiment ? It is likely that what we are observing is a combination of both effects but it would be nice to know how much of this observed correlation is due to potential MS bias. I am open to suggestions for potential tests.
This is also important for what I am planning to work on next. A while ago I had noticed that prediction of phosphorylation propensity could also predict ubiquitination and vice-versa. It is possible that they are mostly related by protein size. I will try to look at this in future posts.

Tuesday, June 23, 2009

Comparative analysis of phosphoproteins in yeast species

My first postdoctoral project has just appeared online in PLoS Biology. It is about the evolution of phosphoregulation in yeast species. This analysis follows from a previous work I had done during my PhD on the evolution of protein-protein interactions after gene duplication (paper / blog post).  One of the conclusions from that previous work was that interactions of lower specificity, such as those mediated by short peptides, would be more prone to change. In fact, one of the protein domains that we found associated with high rates of change of protein-protein interactions was the kinase domain.
Given that the substrate specificity of a kinase is usually determined by a few key amino-acids surrounding the target phosphosite it is easy to image how kinase-substrate interactions can be easily created and destroyed with few mutations. It is also well known that these phosphorylation events can have important functional consequences. We therefore postulated that changes in phosphorylation are an important source of phenotypic diversity.

To test this, we collected by mass-spectrometry in vivo phosphorylation sites for 3 yeast species (S. cerevisiae, C. albicans and S. pombe). These were compared in order to estimate the rate of change of kinase-substrate interactions. Since changes in gene expression are generally regarded as one of the main sources of phenotypic diversity we compared these estimates with similar calculations for the rate of change of transcription factor (TF) interactions to promoters. Depending on how we define a divergence of phosphorylation we estimate that kinase-substrate interactions change either at similar rates or at most 2 orders of magnitude slower than TF-promoter interactions.

Although these changes in kinase-substrate interactions appear to be fast, groups of functionally related proteins tend to maintain the same levels of phosphorylation across broad time scales. We could identify a few functional groups and protein complexes with a significant divergence in phosphorylation and we tried to predict the most likely kinases responsible for these changes.

Finally we compiled recently published genetic interaction data for S. pombe (from Assen Roguev's work) and for S. cerevisiae (from Dorothea Fiedler's work) in addition to some novel genetic data produced for this work. We used this information to study the relative conservation of genetic interactions for protein kinases and transcription factors. We observed that both proteins kinases and TFs show a lower than average conservation of genetic interactions.

We think these observations strongly support the initial hypothesis that divergence in kinase-substrate interactions contributes significantly to phenotypic diversity.

Technology opening doors
For me personally it really feels like I was in the right place at the right time. Many of the experimental methods we used are still under heavy development but I was lucky to be very literally next door to the right people. I had the chance to collaborate with Jonathan Trinidad who works for the UCSF Mass Spectrometry Facility directed by Alma Burlingame. I also arrived at a time when the Krogan lab, more specifically Assen Roguev (twitter feed), has been working to develop genetic interaction assays for S. pombe (Roguev A 2007). As we describe in the introduction, these technological developments really allow us to map out the functional and physical interactions of a cell at an incredible rate. What I am hoping for is that soon they are seen in much the same light as genome sequencing. We can and should be using these tools to study, simultaneously, groups of species and not just the same usual model organisms that diverged from each other more than 1 billion years ago.

Evolution of signalling
There are many more protein interactions that are determined by short linear peptide motifs (Neduva PLoS Bio 2005). A large fraction of these determine protein post-translational modifications and are crucial for signal transduction systems. For the next couple of years I will try to continue to study the evolution of signal transduction systems. There are certainly many experimental and computational challenges to address. I am particularly interested in looking at the co-regulation by combinations of post-translational modifications and their co-evolution. I will do my best to share some of that work as it happens here in the blog.

Thursday, June 11, 2009

HFSP fellows meeting (Tokyo 2009)

I spent last week in Japan attending the fellows meeting of the Human Frontier Science Program. I was fortunate enough to get a postdoc fellowship from HFSP to support my current interest in the evolution of signalling systems. The meeting took place in Tokyo and brought together people from all sorts of different fields and at different stages of their careers. This program funds postdocs but also provides funding to young investigators setting up their labs and for teams of PIs working on interdisciplinary projects.

This year marks the 20th anniversary of the program that also coincides with a period of change in leadership. Ernst-Ludwig Winnacker, current Secretary General of the European Research Council, will take over the role of Secretary General of the HFSP organization from Torsten Wiesel. Also, Akito Arima will replace Masao Ito as the president of HFSPO (press release). Probably because of this the meeting had plenty of political moments and speeches. Thankfully most of the people involved in this organization appear to be very lighthearted so these moments were not a burden.

The curse of specialization ? 

A core focus of HFSP is to fund interdisciplinary projects that involve people from different areas or that help researchers change significantly their field of research. There was some time for discussions about the future of the organization as well as the future of "systems biology". For me personally, these debates helped to crystallized many of my own doubts. I am a biochemist but spent 90% of my PhD doing computational work. At this point I feel very much like a jack of all trades and master of none. In my previous work I have mostly hit walls due to lack of data so I plan to spend the next few years leaning a lot more about experimental work. Still, it is hard to be sure of what is best for the future. How much should I sacrifice in productivity to learn new skills ? Is it best to work as a specialist in interdisciplinary teams or be trained as an interdisciplinary person (Eddy SR, PloS Comp Bio 2005) ?

The broad scope of HFSP was well reflected in the topics presented in the meeting (PDF of program). There were many interesting talks, like the keynote by Takao Hensch about "How experience shapes the brain", in particular during the very early stages of life. He showed amazing work about "windows of opportunity" in learning and how these can be manipulated genetically or pharmacologically. Still, when I was looking around in the poster session I could not help but feel a bit of lack of interest since most of the topics were outside my previous work experience. This brings me back to the topic of specialization. Isn't it upsetting that we have to specialize so ? I don't think I can read and enjoy more than a third of a typical issue of Nature. This is for me the curse of specialization, it focuses not only your skills but your interests and curiosity.


Tokyo/Kyoto

Aside from the science, this was my first trip to Japan. I really liked it and hope to come back one day with more time to explore. I loved the temples, gardens, food, colors and all the differences.



Sunday, April 26, 2009

Guestimating PLoS ONE impact factor

Abhishek Tiwari did some analysis on the number of citations that PLoS ONE is getting so far using Scopus database. We had a small discussion over the numbers on FriendFeed and I ended up looking at different set of values also from Scopus. I tried to predict the first Impact Factor for PLoS ONE that might be out sometime this year.

Before showing the numbers I will repeat again that I think the IFs of the journal where a paper is published is a very poor measure of a papers importance. Although it is probably a good measure of the relative value of a journal (within a given field) we should be striving to pick what we read based on the value of a paper instead of the journal.

The Impact Factors that will be published this year are calculated as the total number of citations from 2008 to papers published in 2006 and 2007, divided by the number of citable units in 2006-2007 (articles and reviews). The data I am looking at is from Scopus so it varies a bit from the one in ISI. The variability comes from the decision of what to include as "citable" articles and from the journals that are covered in Scopus versus ISI.

One problem I found with Scopus data was that, for some journals, the database has multiple entries due to small variations in article titles. For PLoS Biology, PLoS Computational Biology and PLoS Genetics the number of articles published should be less than half of what is reported. This does not appear to be the case for PLoS ONE.
I downloaded the tables of published articles and tried to removed redundancies looking at the tittles and authors. I counted only articles and reviews as citable items but used all articles published in 2006-2007 to get the number of citations in the year 2008. I also did the same calculations for the impact factor of the previous year to be able to compare with the data from ISI. The results were comparable but not the same.



In summary, PLoS ONE might get an impact factor of about half of the expected for PLoS Computational Biology. The usual disclaimers should be said: I have no idea of how complete Scopus data is and how exactly it relates to ISI.

Sunday, March 22, 2009

Thank you Nature

A while ago Euan Adie from Nature asked for help to categorize comments in PLoS ONE for analysis. A lot of people took some time to read some of the comments and the final results of this crowdsourcing effort was made available here. They randomly selected two people from the users that contributed some time for this to get some Nature branded ... stuff. I was one of the two lucky recipients. It took a while, but it arrived today:

Thank you NPG for the kind gifts, next time .. white t-shirt ?! :)

Monday, November 17, 2008

Why do we blog?

Martin Fenner, asked some questions to science bloggers in Nature Networks that I think are interesting. Plus, the meme is going around my blogging neighbourhood so I thought I would join in as well:

1. What is your blog about?
It is mostly about science and technology with a particular focus on evolution, bioinformatics and the use of the web in science.

2. What will you never write about?
I will never blog about blog memes like this one. I tend to stay away from religion and politics but never is a very strong word.

3. Have you ever considered leaving science?
Does this mean academic research, research in general or science in general ? In any case no. I love problem solving and the freedom of academic research. The only thing I dislike about it is not being sure that I can keep doing this for as long as I wish.

4. What would you do instead?
If I could not do research I would probably try to work in scientific publishing. Doing research usually means that we have to focus on a very narrow field. Editors on the other hand are almost forced to broaden their scope and I think I would like this. I would also be interested in the use of new technologies in publishing.

5. What do you think will science blogging be like in 5 years?
Five years is a lot of time for the pace of technological development but not a long time for cultural change. I could be wrong but, if anything, there will be only a small increase in adoption of blogging as part of personal and group online presence along with the already existing web pages. I wish blogging (and other tools) would be use to further decentralize research agendas from physical location but I don't think that will happen in 5 years.

6. What is the most extraordinary thing that happened to you because of blogging?
I have gained a lot from blogging. The most concrete example was an invitation to attend SciFoo but there are many other things that are harder to evaluate. In some ways it is related to the benefits of attending conferences. It is useful because you get to interact with other scientists, exchange ideas, forces you to think through different perspectives, etc.

7. Did you write a blog post or comment you later regretted?
I probably did but I don't remember an example right now.

8. When did you first learn about science blogging?
As many other bioinformatic bloggers I started blogging in Nodalpoint, according to the archives in November 2001. I started this blog some two years after that.

9. What do your colleagues at work say about your blogging?
Not much really, I don't think many of them are aware of it. If any, the responses have been generally positive but I don't usually find many people interested in knowing more about blogging in science.

Wednesday, November 12, 2008

Open Science - just do it

My blog is 5 years old today and to celebrate I am trying to actually do some blogging. There are a couple of reasons why I have blogged less in the past months. In part it was due to FriendFeed and also in part because I was trying to finish a project on the evolution of phospho-regulation in yeast species. Nearing the end of a project should actually provide some of the most interesting blogging material but I did not ask for permission from everyone involved to write about ongoing work.

I have to admit that although I have been discussing and evangelizing open science for over two years I have done very little of it. I have used this blog sometimes to put up small analysis or mini-reviews but never to describe ongoing projects. I have tried to start a side-project online but I over-estimated the amount of "spare cycles" I have for this. So, I have talked it over with my supervisor and I am now free to "risk" as much as I want in trying out Open Science. The first project I will be trying to work on will be on E3 target prediction and evolution.

Prediction and evolution of E3 ubiquitin ligase targets
As I have mentioned above, I have been working in the past months on the evolution of phosphorylation and kinase-substrate interactions in yeast species. I am interested in the evolution of regulatory interactions in general because I believe that they are important for the evolution of novel phenotypes. This is why I will be trying to study the evolution of E3 target interactions. In order to get there I will try first to develop some methods to predict ubiquitination and E3 targets. Since a lot of the ideas and methodology applies to other post-translational modifications and even localization signals I will in the future try to generalize the findings to other types of interactions.

Some of the questions that I will try to address:
- How accurately can we predict E3 substrates ?
- How quickly in evolution do E3-targets change ?
- Is there co-regulation by kinases and E3s on the same targets (and how these evolve) ?

Once I have something substantial I will open a code repository on Google Code.