Monday, January 22, 2007

Social gene annotation in Connotea

There has been a lot of excitement over the recent web technological developments. Time magazine has recognized this by announcing that instead of profiling an individual in their annual issue of Person of Year they decided to select You as the most influential group of last year. This "you" refers to everyone that is out there on the web building, interacting, blogging, uploading their videos and pictures for the world to see. As with almost every rising meme, the backlash is inevitable. Some see this web euphoria as little more than global narcissism.

This social web holds some powerful promises of more efficient collaboration but clear examples might still be lacking. Scientists, given our need to communicate and collaborate, are a group of individuals that could do more to take advantage of these tools. Unfortunately we seem to be too unaware and too slow to pick them up.

I have shown before that the accumulating body of knowledge in Connotea, in the form of simple tagging of science papers, can in principle be used to highlight papers of higher impact.

I tough that it could also be possible to mine Connotea to retrieve gene annotations. I tested if manuscripts tagged as "cell-cycle" and "yeast" would contain, in their abstracts, mostly genes names related to cell cycle in yeast. There are currently 38 papers in Connotea tagged as cell-cycle and yeast with an associated Pubmed ID. I used a dictionary of S. cerevisiae gene names obtained from SGD and retrieved the abstracts for the 38 manuscripts using eUtils.

Within these abstracts there were 38 gene names associated by a simple pattern match. To evaluate the performance of this social gene annotation I took from the SGD's slim GO mapping the function and processes associated to these genes. I also included the gene description from gene name registry.

Table 1 - Known GO process/function annotations and gene function description associated to the genes predicted to participate in cell-cycle in yeast by social annotations.

From the 38 genes, 14 (~37%) are annotated in the slim GO annotation as participating in cell-cycle,meiosis or cytokinesis. From the remaining, 15 (39%) have a described function associated to the cell-cycle (ex. G1 cyclin involved in cell cycle progression, expression restricted to mother cells in late G1 as controlled by Swi4p-Swi6p, Swi5p and Ash1p,etc). In total roughly 76% of the gene names obtained are associated to cell-cycle in S. cerevisiae.


This simple test highlights the potential usefulness of social bookmarking of science papers. However it was limited to a very specific field and to a very small number of annotated manuscripts. Hopefully someone can come up with a better way of testing this :).