Friday, March 16, 2007

Bioinformatic web scraping/mash-ups made easy with kapow

In bioinformatics it is common that we might need to use a web service multiple times. Ideally, whoever built the web service provided a way to automatically query the site via an API. Unfortunately, Lincoln Stein's dream of a bioinformatics nation is still not a reality. When there is no programmable interface available and the underlying database information is not available it's usually necessary to write some code to scape the content from the web service.

In come openKapow, a free tool to (easily) build and publish robots to turn any website into a real web service. To illustrate how easy it is to use it I have built a Kapow robot to get, for any human geneID, a list of orthologs (with species and IDs). I downloaded the robotmaker and tried it on the Ensembl database. To be fair Ensembl is probably one of the best bioinformatics resources with available API and easy data mining tools like Biomart. This was just to give an example.

You start the robot by defining the initial webpage and the service inputs and outputs. I decided to create a REST service that would take an Ensembl gene ID and output pairs of gene ID/species name. The robotmaker application is intuitive to use for anyone with a moderate experience with HTML. The robot is created by setting up the steps that should occur to transform the input into the desired output. For example, we have to define were the input should be entered by clicking on the search box:
From here there are a set of loops and conditional statements that you can include to get the list of orthologs:

We can run through the robot steps with a test input and debug it graphically. Once the robot is running it is possible to host it on the openKapow web page, apparently also free of charge. Here is the link for this simple robot (this link might go down in the future). Of course it is also possible to build new robots that use robots that are published on openKapow. Also this example uses a single webpage but it would be more interesting to use this to mash up different services together.