Tuesday, August 30, 2005

A reflection on scientific journal publication by Philip Bourne

Philip Bourne wrote a very interesting perspective in the last issue of PLoS Computational Biology. He starts by comparing the somewhat convergent evolution of journal publishing and database submissions: "The daily work of any high-throughput scientific journal or biological database consists of information input, information processing, and information output." He also mentions the obvious difficulties of retrieval of information from scientific publications when compared to database data and raises some possibilities on what can be done to improve this. One would be to attribute digital object identifiers (DOI) to the items of content within biological databases. This would allow to track different publications referring to the same items to more easily retrieve the information from a paper automatically. Barend Mons wrote recently a commentary about exactly the same subject in BMC bioinformatics so there might be some grounds for agreement between some journals.
Phil Bourne also proposes that data in a publication should be more "alive". This reminded me of a recent discussion we had on Nodalpoint over reproducible research. Lastly Bourne also suggests that more data should be attached as meta data to the paper. As an example he suggests that gene names can be automatically retrieved and reviewed by the author with minimal effort and integrated with the paper as meta data. This point seems to me quite similar to first one, it would serve in essence to make it easier to retrieve information from a scientific publication.

It is nice to see the discussions coming up, now just find a way to get some journals together to agree on some formats and implementations. Make them opt-out, authors should do them or have to pay extra costs to have the publishing houses do it. Start making the papers on the web more connected to the databases, and vice versa. I want to click a protein name and have a list of possible databases to visit :). Add some kind of "trackback" to the databases, every paper mentioning a protein sends a trackback ping to the protein database and it is automatically updated.

Friday, August 26, 2005

Celebrating my first paper as a first author

The third issue of PLoS Computational Biology is out and in it is the first paper I publish as a first author. It took some time to get it published but I am grateful for the useful reviewer comments. The two main points about this work is that SH3 domains preferentially bind unstructured protein regions and that there is an "optimal" divergence time that one should consider when selecting species to use when looking for conservation of SH3 target sites. If you are interested in this last point I encourage you to read the excellent paper by Sean R. Eddy "A Model of the Statistical Power of Comparative Genome Sequence Analysis" in PLoS Biology.
The future of medicine

From the Personal Genome and Bioinformatics blog, here is a link to some predictions on the future of medicine by Leroy Hood.
I would resume his ideas to 1)you will have your own genome sequenced cheaply and 2)we will have nanotech tools in medicine.
These two things will make medicine more preventive and it will allow humans to live longer. You can say that the ideas where presented in an overly exaggerated way but both ideas have been going around for a long time and I personally see them as very probable. It would be useful to know your own genetic predispositions and I'm sure that engineered proteins and protein complexes will play a role in future treatments.

Another important point is that science should be visible to society. People like Leroy Hood and Craig Venter can sometimes come up with "funny" ways of promoting their investments/research but they also bring science closer to society.

Tuesday, August 23, 2005

Google Talk

There is a big buzz going around the web that tomorrow Google is going to release a IM service called Google Talk. The interesting thing is actually seeing the news going around the internet and geeks trying to connect to the service before it is available to the public. The image that came to mind was of a fire or a virus spreading fast. When they tried to get email users to switch they offered 1G of space, what are they going to offer tomorrow to make users switch to their IM ?

Update: Some beta version available here
Update 2: So there is nothing very special about it. The client is very simple and clean. The only interesting thing about this is not the client is the intention of making IM networks independent of the clients what makes a lot more sense for the users but I don't see yahoo and MSN messengers easily changing to a system like this and without them this is going to be very difficult.

Thursday, August 18, 2005

The h index

From a Nature news I found this paper on the h index. J. E. Hirsch proposes that the scientific performance for a researcher could be measured by this index h that he defines as : "the number of papers with citation number higher or equal to h".
From the news: "n h-index of 50, for example, means someone has written 50 papers that have each had at least 50 citations."

Such a measure would take the journal out of the rating and stop the current "rich get richer" trend for the impact factor. This way the author will focus more on the audience of the journal.

Saturday, August 06, 2005

On patent applications

In the last issue of Nature Biotech there was a list of recent patent applications in systems biology. Most of them are general ideas that have been published several times with different methods. I wonder how can a patent like this be approved and enforced when there are so many research papers available on the subject.

An example:
"A tool providing interactive capabilities for user involvement in extracting and disambiguating biological information in text; useful in generating a biological diagram."
A simple search for text mining in Pubmed will point you to more than 100 papers for example.

The same holds true for most of the other applications you can find in the table.

Monday, August 01, 2005

Here is the provisional abstract for the work of the first year of my PhD. To be published soon in PLoS Computational Biology.
Reproducible research with the publication of a "compendium"

From Faculty1000 (sub only) I found this paper, from the people of the bioconductor project on a new way to publish results as a document where code and words are woven together to create a "compedium". The document can them be browsed and the code changed in an interactive way. The authors give a concrete example using R packages and data from Golub et al. on cancer classification by expression data analysis.

I never used R so I'm still messing around trying to install everything to try this properly but at first glance it looks like an interesting concept. This way with the publication of results you would get immediately the methods, you could change parameters to check some hypothesis, etc. It would certainly help referees to check some ideas quickly. Something like this could also be used in-house as a personal e-lab book to keep track of code, data and ideas.