Thursday, July 17, 2008

ISMB 2008


I am leaving soon to Toronto to attend ISMB 2008. I usually stay way from big conferences since typically in small conferences is easier to really have time to talk to everyone. The nice thing about attending a big conference is that it looks like everyone is there. There is no shortage of science bloggers attending and it is going to be nice to get to know the people behind some of the blogs for the first time.

There is a room in FriendFeed were several people attending are gathered and for those not going it will probably be a good place to check for coverage of the conference. Alternatively here is a list of bloggers that are attending ISMB or some of the conferences before/after it:

Saturday, July 05, 2008

On the PLoS business model

Declan Butler wrote a news article about PLoS' business model that has generates a lot of discussion. A good summary of blog reactions is available from Bora's blog and there is a long thread of discussions at FriendFeed.

It is hard to read the piece as impartial reporting due to the general negative undertone. Describing PLoS ONE as a database and referring to PLoS ONE and other PLoS journals of lower impact as "bulk, cheap publishing of lower quality papers". I have nothing against the factual content in the news piece. From that perspective it is an interesting report on the PLoS business model. According to the news story PLoS is on track to become economically self-sustainable within two years. We learn that this is possible due to the expansion of PLoS as a publisher to cover a broader range of subjects and different degrees of perceived impact. This is hardly surprising. I wrote a year ago:
"On an author pays model, the most obvious way to limit the cost per paper and still provide a solid evaluation of perceived impact, is to have journals that cover the broad spectrum of perceived impact. In this way, for the publisher, the overall rejection rates decrease, the papers are evaluated and directed to the appropriate "level" of perceived impact."

Most people agree that in principle Open Access publishing would benefit science. Up until know publishers have been reluctant to admit that there is a viable business model with author fees. Some open access publishers (including BioMedCentral) were already showing that this was a viable business model but PLoS will be the first to have viable business model with high impact factor journals within the set of journals they publish.

Two of the most interesting comments on this discussion so far have come from Timo Hannay at Nascent and from Lars Juhljensen
Timo argues that PLoS has failed to show that it is possible to have a business model for a publisher that only has journals of high editorial input (high rejection rates and high perceived impact). Also, the existence of PLoS creates a barrier to entry to other science publishers interested in publishing with an open access (OA) model. There is no argument against the first statement, so far I have not seen any publisher that has managed to reduce the costs of maintaining such "high impact" journals to the point were authors fees would be sufficient. I think this is possible and the PLoS Community journals are the closest form of this but this is another discussion.
What I disagree with Timo is that PLoS somehow creates barriers to entry to other OA publishers. PLoS did require (still requires) philanthropic grants to establish themselves but pioneers have typically a harder time than creative followers. Anyone trying to follow PLoS has access to the records of success and failures, detailed financial reports and (I think) even the publishing infrastructure that they have developed.

Most people know that the strongest barrier to entry to scientific publishing is a perception of quality. NPG has used this fact to their advantage many times. Journals with Nature brand typically establish themselves quickly among the top of their topic. I am sure Nature invests a lot in excellent professional editors but without the Nature brand supporting these journals there would be nothing to choose from to start with. NPG also publishes many more journals than the Nature branded journals and as Lars has pointed out the majority of these have lower impact factors. I don't think there is financial information available so it is hard to know what is the fraction of NPG's income that comes from the high impact or lower impact journals.

Going back to one of Timo's main points, I don't agree that PLoS creates barriers to market entry to other OA publishers. At least certainly not because they used philanthropic grants until they reached break even point. If there are barriers in the market they are due to perception of quality and strong brand name. Here OA publishers have the added advantage that creating a strong brand is easier when most people perceive OA as something good. From the example of PLoS and to some extent BMC there are now clear paths for any publisher (specially one with a strong brand name) to set up a viable business OA model.

Tuesday, July 01, 2008

Bioinformatics around the globe

Did you ever wanted to have a global impression of the field of bioinformatics ? What types of tools they used, or how different is the work in academia versus industry ? Michael Barton from Bioinformatics Zen created a survey that will be running for the next month (until the 1st of August) that tries to address some of these questions. The more people complete the survey, the more informative the picture will be. The survey is anonymous and all information will be made available for those interested in analyzing it.
If you have a blog you can re-post it on your blog (see intructions here) or send a link to any of these blog pages that host the survey to other bioinformatic/computational biology researchers.

Saturday, June 28, 2008

Capturing biology one model at a time

Mathematical and computational modeling is (I hope) a well accepted requirement in biology. These tools allow us to formalize and study systems of higher complexity that are hard to conceptualize with logic thinking. There have been great advances in our capacity to model different biological systems, from single components to cellular functions and tissues. Many of these efforts have been ongoing separately, each one dealing with a particular layer of abstraction (atoms, interactions, cells, etc) and some of them are now reaching a level of accuracy that rivals some experimental methods. I will try to summarize, in a series of blog posts, the main advances behind some of these models and examples of integration between them with particular emphasis on proteins and cellular networks. I invite others to post about models in their areas of interest to be collected for a review.

From sequence to fold
RNA and proteins once produced adopt structures that have different functional roles. In principle all information required to determine the structure is in the DNA sequence that encodes for the RNA/protein. Although there has been some success in the prediction of RNA structure from sequence ab-initio protein folding remains a difficult challenge (see review by R.Das and D.Baker). A more pragmatic approach has been to use the increasing structural and sequence data made available in public databases to develop sequence based models for protein domains. In this way, for well studied protein folds it is possible to ask the reverse question, what sequences are likely to fold this way.
(To be expanded in a future post, volunteers welcome)

Protein binding models

I am particularly interested in how proteins interact with other components (mainly other proteins and DNA) and in trying to model these interactions from sequence to function. I will leave protein-compound interactions and metabolic networks for more knowledge people.
As mentioned above even without a complete ab-initio folding model, it is possible to predict for some sequences what is their structure or determine to what protein/domain family the sequence belongs from comparative genomics analysis. This by itself might not be very informative from a cellular perspective. We need to know how cellular components interact and hwo these interconnected components create useful functions in a cell.

Docking
Trying to understand and predict how two proteins interact in a complex has been the challenge of structural computational biology for more than two decades . The initial attempt to understand protein-interaction from computational analysis of structural data (what is known today as docking) was published by Wodak and Janin in 1978. In this seminal study, the authors established a computational procedure to reconstitute a protein complex from simplified models of the two interacting proteins. In the twenty-years that have followed the complexity and accuracy of docking methods has steadily increased but still faces difficult hurdles (see reviews Bonvin et al. 2006, Gray, 2006). Docking methods start from the knowledge that two proteins interact and aim at predicting the most likely binding interfaces and conformation of these proteins in a 3D model of the complex. Ultimately, docking approaches might one day also predict new interactions for a protein by exhaustively docking all other proteins in the proteome of the species, but at the moment this is still not feasible.

Interaction types
It should still be possible to use the 3D structures of protein complexes to understand at least particular interactions types. In a recent study, Russel and Aloy have shown that it is possible to transfer structural information on protein-protein interactions by homology to other proteins with identical sequences (Aloy and Russell 2002). In this approach the homologous proteins are aligned to the sequences of the proteins in the 3D complex structure. Mutations in the homologous sequences are evaluated with an empirical potential to determine the likelihood of binding. A similar approach was described soon after by Lu and colleagues and both have been applied on large scale genomic studies (Aloy and Russell 2003 ; Lu et al. 2003). As any other functional annotation by homology this method is limited by how much the target proteins have diverged from the templates. Alloy and Rusell estimated that interaction modeling is reliable above 30% sequence identity (Aloy et al. 2003). Substitutions can also be evaluated with more sophisticated energy potentials after an homology model of the interface under study is created. Examples of tools that can be used to evaluate the impact of mutations on binding propensity include Rosetta and FoldX.
Althougt the methods described above were mostly developed for domain-domain protein interactions similar aproaches have been developed for protein-peptide interactions (see for example McLaughlin et al. 2006) and protein-DNA interactions (see for example Kaplan et al. 2005) .

In summary the accumulation of protein-protein and protein-DNA interaction information along with structures of complexes and the ever increase coverage of sequence space allow us to develop models that describe binding for some domain families. In a future blog post I will try to review the different domain families that are well covered by these binding models.

Previous mini-reviews
Protein sequence evolution

Thursday, June 12, 2008

@World

(caution, fiction ahead)


I wake up in the middle of the night startled by some noise. Pulse racing I try to focus my attention outwards. Something breaking, glass shattering? Is someone out there ? I reach out with my senses but an awkward feeling nags at me, bubbling up to my consciousness. I try hard to focus, it is coming from outside the room , someone is inside my house. I close my eyes but vertigo takes over and weightlessness empowers me. I am in the living room cleaning the floor, picking up a broken glass. The nagging feeling finally assaults me fully. I am moving but I am not in control. Panic rises quickly as I watch helpless the simple and quiet actions of someone else. I stop picking up glass and I feel curious, only it is not exactly me, the feeling is there besides me.
- Hi, who are you ?
The voice catches me by surprise and my fear goes beyond rational control. All I can think of is to escape. to go away from here. For a second time I find myself floating as if searching for a way out. When I open my eyes again I am by the beach and I breath a sigh of relief. The constant sound of the waves calms me down for a few seconds until my eyes start drifting to the side. No, stay there I am in control! I look into the eyes of a total stranger that smiles back at me in recognition. Two voices ask me if I am enjoying the view and I can only scream back in confusion.

I wake up in the middle of the night startled by some noise. I immediately flex my hands in front of my eyes to make sure it was nothing but a nightmare trying hard to calm down. What a dream. I get up and check on the noise coming from the living room realizing that it was just the storm outside. Feeling better I fire up my laptop and grab a glass of water from the kitchen. I open twitter and type away:
- I had the strangest dream !(cursor blinking) Our senses were all connected(enter)
I get up to open the window drinking another sip of water. After a couple of steps I feel a jabbing headache forcing me to stop and bright spots of light blur my vision. I close my eyes in pain and the voices of some unseen crowd thunder in my ears:
- I had the same dream - the all say in unison
The sound of glass shattering on the floor in the last thing I remember before collapsing.

I wake up in the middle of the night startled by some noise (...)

(Twistori was the main motivation for this post)

Previous fiction:
The Fortune Cookie Genome

Tuesday, June 10, 2008

Why does FriendFeed work ?

I have been using FriendFeed for a while and I have to say that it works surprisingly well. It is hard to define what FriendFeed is so the only real way of understanding it is to try it for a while.

One common way to define FF would be as a life-stream aggregator. Each user defines a set of feeds (blog, Flickr, Twitter, bookmarks, comments, etc) providing all other users with a single view of all the online activities of that user. Anyone can select how much to share (even nothing at all) and subscribe to a number of other users. Each item (photo, blog post, bookmark) can serve then as spark for discussions. The users can mark items as interesting or comment on them and this propagates to all other people that subscribe to you. In addition we can select sources to hide if for some reason there is a particular part of a user's activities you don't enjoy. All of this creates a very personalized view of whoever you elect to interact with online.

I still find it striking that there are so many long threads of discussions around items that we share in FriendFeed, sometimes more than in the original site. A couple of examples:
Google code as a science repository (discussion in FF, blog post)
Into the Wonderful (discussion in FF, slideshare site)
Bursty work (discussion in FF, blog post)

Why does it work so well ? One possible reason could be that a group of early adopter scientists happened to get together around this website creating the required critical mass to start the discussions. Still, most of those commenting were already participating on blogs so that might not be it. There might be something about the interface, maybe it is the ease of adding comments and that these comments can be edited that increases the participation. Ongoing discussions get bumped higher in the view so every new comment brings the item back to your attention. In this way you know who saw the item and who is thinking about it. A bit like talking about a movie you saw or a book you read with a bunch of friends.

Anyone interested in the science aspects of it should check out the Life Scientists room with currently around 85 subscribers. Here is an introduction to some of these people, in particular on what they work on. Connecting to other scientists in this way lets you see what are the articles they find interesting and discuss current scientific news. Even maybe start a couple of side-projects for the fun of it.

Monday, June 09, 2008

Evaluation metrics and Pubmed Faceoff

I have been reading recently a lot about evaluation metrics for papers and authors. It started with a blog post in Action Potential (Nature Neuroscience's blog) showing a correlation between the number of downloads of a paper and its citations. From the comments in that blog post I found out about a forum in Nature Network about Citation in Science and also the recently published group of perspectives on "The use and misuse of bibliometric indices in evaluating scholarly performance".

It could have been a coincidence but Pierre sparked a long discussion in FriendFeed when he suggested it would be nice to be able to sort Pubmed queries by the imapact factor of the journal. In reaction to this Euan set up a very creative interface to Pubmed that he named Pubmed Faceoff. He took several different factors into account (citations from Scopus, eigenfactor of the journal, the time the paper was published) and for each paper returned from a Pubmed query creates a face that describes the paper. The idea for the visualization is based on Chernoff Faces. It is really a creative idea and I wish Pubmed could spend more resources in coming up with alternative interfaces like this, something like a "labs" section where they could play with ideas or allow others to create interfaces that they would host.

I wont go here into the whole debate about the evaluation metrics since there is already a lot of discussion going on in some of those links I mentioned.