Tuesday, November 27, 2007

Bio::Blogs #17 - call for submissions

The 17th edition of Bio::Blogs will be hosted by Paulo Nuin at Blind.Scientist . Submissions of interesting bioinformatic related blog posts of this month can be sent, until the end of the November, to the usual address (bioblogs at gmail dot com) or to nuin at genedrift dot org.

There is also still time to submit blog posts to the OpenLab 2007 compilation.

Monday, November 19, 2007

Linking Out - Open Science and a new blog

Cameron Neylon posted a request for collaboration in his blog:
...we are using the S. aureus Sortase enzyme to attach a range of molecules to proteins. We have found that this provides a clean, easy, and most importantly general method for attaching things to proteins.
(...)
We are confident that it is possible to get reasonable yields of these conjugates and that the method is robust and easy to apply. This is an exciting result with some potentially exciting applications. However to publish we need to generate some data on applications of these conjugates.


They are looking for collaborators interested in applying this method. Go check the blog posts if you are interested or know someone that works on something similar.

(via Open Access News) Liz Lyon, Associate Director of UK Digital Curation Centre posted an interesting presentation on Open Science: "Open Science and the Research Library: Roles, Challenges and Opportunities?".

(via Fungal Genomes) I found a new blog related to evolution called Thirst for Science with a lot of insightful posts.

Linking out - Personalized medicine

Personalized medicine continues to climb the hype cycle. I have been getting most of the best news coverage on the subject from blogs.

- Bertalan Meskó reviews companies focused on personalized medicine (see part I and II)

- Attila Csordas and Deepak Singh cover the social aspects of personal health and the tie-in to 23andMe

- Gareth Palidwor reads into the details to speculate that the business model of 23andMe might be to sell the aggregated user data.

- Gene Sherpas puts on the brakes, describing the hype as Genomic Voyeurism

I am concerned that all the attention the genomics side of personalized medicine will distort the relative importance of nature versus nurture. Everyone craves for a peek at their own destiny and at their roots. These services hope to provide both of these by looking at our DNA. I don't think they can really do this reliably but nothing stops them from luring people.

Tuesday, November 13, 2007

Last call for Open Laboratory 2007

Bora has issued a last call for submissions to the Science Blogging anthology of 2007. As last year, the objective is to collect some of the best science blog posts of the year and compile it into a book to print on demand (deadline on December 20th 2007). Submissions can be sent using an online form and they will be reviewed by a panel that will compile the final list.
Anyone interested in participating can send in links to their favorite blog posts of the year and also volunteer to be part of the reviewing process (see instructions here).

Monday, November 12, 2007

4th year blog anniversary

It is hard to believe that is has been 4 years that I started blogging here. Not that I am a very prolific blogger with only 328 blog posts in this time. These are not very evenly distributed with more than 200 blog posts in the last two years. The style of blog posts also changed a lot from a link blog with a few sentences to longer more opinionated posts.

Having a glance a the blog posts it is easy to find some very weird ones :)
Your Identity Aura (2005)
Our Collective Mind (2005)
The Human Puppet (2005)
Social Network Dynamics in a Conference Setting (2006)
The Fortune Cookie Genome (2007)

There a lot of serious ones too but I will leave that list to some other time.

Thanks to Nodalpoint and the Nodalpoint regulars (Greg, Neil, Alf and Chris) for introducing me to blogging some 6 years ago and to everyone else that joined in along the way with their blogs and/or comments. It sure makes blogging more enjoyable.

(Image Credit: Picture taken by mattnjuzz and published under CC by-nc-sa. Originally taken from Flick)

Saturday, November 10, 2007

Predicting functional association using mRNA localization

About a month ago Lécuyer and colleagues published a paper in Cell describing an extensive study of mRNA localization in Drosophila embryos during development. The main conclusion of this study was that a very large fraction (71%) of the genes they analyzed (2314) had localization patterns during some stage of the embryonic development. This includes both embryonic localization or sub-cellular localizations.

There is a lot of information that was gathered in this analysis and it should serve as resource for further studies. There is information for different developmental stages so it should also be possible to look for the dynamics of localization of the mRNAs. Another application of this data would be to use it as information source to predict functional association between genes.

Protein localization information as been used in the past for prediction of protein-protein interactions (both physical and genetic interactions). Typically this is done by integrating localization with other data sources in probabilistic analysis [Jansen R et al. 2003, Rhodes DR et al. 2005, Zhong W & Sternberg PW, 2006].

To test if mRNA localization could be used in the same way I took from this website the localization information gathered in the Cell paper and available genetic and protein interaction information for D.melanogaster genes/proteins (can be obtained for example in BioGRID among others). For this analysis I grouped physical and genetic interactions together to have a larger number of interactions to test. The underlying assumption is that both should imply some functional association of the gene pair.

The very first simple test is to have a look at all pairs of genes (with available localization information) and test how the likelihood that they interact depends on the number of cases where they were found to co-localized (see figure below). I discarded any gene for each no interaction was known.
As seen in the figure there is a significant correlation (r=0.63,N=21,p<0.01) between the likelihood of interaction and the number of co-localizations observed for the pair. At this point I did not exclude any localization term but since images were annotated using an hierarchical structure these terms are in some cases very broad.

More specific patterns should be more informative so I removed very broad terms by checking the fraction of genes annotated to each term. I created two groups of more narrow scope, one excluding all terms annotated to more than 50% of genes (denominated "localizations 50") and a second excluding all terms annotated to more than 30% of genes (localizations 30). In the figure below I binned gene pairs according to the number of co-localizations observed in the three groups of localization terms and for each bin calculated the fraction that interact.

As expected, more specific mRNA localization terms (localizations 30) are more informative for prediction of functional association since fewer terms are required to obtain the same or higher likelihood of interaction. The increased likelihood does not come at a cost of fewer pairs annotated. For example, there are similar number of gene pairs in bin "10-14" of the more specific localization terms (localizations 30) as in the bin ">20" for all localization terms (see figure below).
It is important to keep in mind that mRNA localization alone is a very poor predictor of genetic or physical interaction. I took the number of co-localization of each pair (using the terms in "localizations 30") and plotted a ROC curve to determine the area under the ROC curve (AROC or AUC). The AROC value calculated was 0.54, with a 95% confidence lower bound of 0.52 and a p value of 6E-7 of the true area being 0.5. So it is not random (that would be 0.5) but by itself is a very poor predictor.

In summary:
1) the degree of mRNA co-localization significantly correlates with the likelihood of genetic or physical association.
2) less ubiquitous mRNA localization patterns should be more informative for interaction prediction
3) the degree of mRNA co-localization is by itself a poor predictor of interaction but it should be possible to use this information to improve statistical methods to predict genetic/physical interactions.

This was a quick analysis, not thoroughly tested and just meant to confirm that mRNA localization should be useful for genetic/physical interaction predictions. I am not going to pursue this but if there is anyone interested I suggest that it could be interesting to see what terms have more predictive power with the idea of integrating this information with other data sources or also possibly directing future localization studies. Perhaps there is little point of tracking different developmental stages or maybe embryonic localization patterns are not as informative as sub-cellular localizations to predict functional association.


Jansen R, Yu H, Greenbaum D, Kluger Y, Krogan NJ, Chung S, Emili A, Snyder M, Greenblatt JF, Gerstein M. A Bayesian networks approach for predicting protein-protein interactions from genomic data. Science. 2003 Oct 17;302(5644):449-53.
Rhodes DR, Tomlins SA, Varambally S, Mahavisno V, Barrette T, Kalyana-Sundaram S, Ghosh D, Pandey A, Chinnaiyan AM. Probabilistic model of the human protein-protein interaction network.Nat Biotechnol. 2005 Aug;23(8):951-9.
Zhong W, Sternberg PW. Genome-wide prediction of C. elegans genetic interactions.Science. 2006 Mar 10;311(5766):1481-4.

Thursday, November 08, 2007

What I don't like about BPR3

For those that have not heard about it before BPR3 stands for Bloggers for Peer-Reviewed Research Reporting. From their website:

"Bloggers for Peer-Reviewed Research Reporting strives to identify serious academic blog posts about peer-reviewed research by offering an icon and an aggregation site where others can look to find the best academic blogging on the Net."

It is all great except that it already exists and for a long time before BPR3. You can go to the papers section in Postgenomic and select papers by the date they were published, were blogged about, how many bloggers mentioned the paper or limit this search to a particular journal. I have even used this early this year to suggest that the number of citations increases with the number of blog posts mentioning the paper.

In this case I think that unless they really aim to develop something that is better that what Postgenomic already offers, the added competition will only fragment an already poor market. The value of a tracking site like Postgenomic, Techmeme or what BPR3 is proposing to create increases with user base in a non-linear way. This is what people usually refer to as network effects in social web applications. Increasing number of users make the sites more useful, reinforcing the importance of the social application. I suspect Postgenomic is not closed in any way to discussions. The code is even available here for re-use. So, why can't BPR3 and Postgenomic work this out and have a single tracking database and presentation. Let's say that BPR3 could be a mirror for the Postgenomics papers section (why re-invent the wheel).


I am not in favor of any particular site (sorry Euan :), what I think would be useful would be:
1 ) common standards for everyone (publishers, bloggers, etc) to carry information on published literature (number of times paper was read, ratings, comments, blog posts, e-notebook data, etc) attached to single identifier (DOI sounds fine)
2) one independent tracking site with enough users to gain hub status such that everyone gains from high exposure to the science crowd.

Thursday, November 01, 2007

The right to equivalent response

(disclaimer: I worked for Molecular Systems Biology)

The last issue of PLoS Biology caries an editorial about Open Access written by Catriona J. MacCallum. It addresses the definition of Open Access and what the author considers an "insidious" trend of obscuring "the true meaning of open access by confusing it with free access".

I agree with the main point of the editorial, that we should keep in mind the definition of open access and that the capacity to re-use a published work should have more value to the readers.

However, it is very unfortunate that the very fist example MacCallum picks on is the Molecular Systems Biology journal for the simple fact that very recently they have changed the publishing policies to address exactly this issue. Authors can choose one of two CC licenses, deciding for themselves if they want to allow derivatives of their work or not. See post at MSB blog. As it is explained in the blog post the discussions about the licenses actually started several month ago and I think the final implementation is a very balanced decision on their part.

Thomas Lemberger, editor at MSB wrote a reply to the editorial that PLoS decided to publish as a response from the readers. These can only be seen if readers decide to click the link "Read Other Responses" on the right side of the online version.

I am obviously biased but for me this is not really giving the right to equivalent response. It would not have cost them much to issue a correction or publish the letter as correspondence where it would have the same visibility as the editorial. This would signal that they are indeed committed to collaborating with other publishers and journals that support open access (as stated in PLoS core principles).

Bio::Blogs #16, The one with a Halloween theme

The 16# edition of Bio::Blogs is know available at Freelancing science. Jump over there for summary of what has been going on during this month in the bioinformatic related blogs. If not for anything else then just to have a look at the pumpkin. Thanks again to everyone that participated.
Paulo Nuin from Blind.Scientist has volunteered to host the 17# edition that is scheduled to appear as usual on the 1st of December.