Cellular Consequences of Genetic variation

Sunday, March 16, 2025

State of the lab 12 - Becoming an established scientist

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes year 12, the 3rd year after moving to ETH Zurich. In the last blog post I wrote down some of our overall research directions for the first 5 years of the group at ETH and I will wait another year or two before reflecting back on those commitments. This time, I wanted to try to write down some thoughts I have been having about essentially becoming more established in academia. This includes a longer term perception of group turnover, the time and resources needed to achieve research objectives and some activities that go beyond the management of the research group.

Group member turnover cycles

With 12 years of managing a research group, I have gotten used to some of the broader rhythms of turnover of the lab. Our lab is now almost totally renewed with just 1 lab member that came with the lab from EMBL. While this turnover was somewhat enforced by the move from EMBL to ETH, the turnover of lab members is a constant in academia given the short term nature of the lab members’ positions. In our group PhD students have typically stayed for around 4 years and postdoc have typically stayed for up to 5 years. Since there is some degree of clustering of the hires there tends to be some periods of higher turnover. We have had something like 2 to 3 periods where the lab has seen a large change. In the group, I try to hire from diverse backgrounds (e.g. biology, CS and math) and we work with a range of experimental and computational approaches, including for example yeast genetics, proteomics, structural bioinformatics, machine learning, etc. This creates a nice dynamic of group members building up their projects, while at the same time learning about the capabilities of the rest of the lab. The projects are usually meant to be somewhat synergistic, trying to address bigger goals from the individual problems (see past blog post on this). This means we have had windows of around 3 years when things click together before the turnover starts again. We are just around that exciting stage in the cycle and I am really looking forward to making the best of it. I still don’t enjoy what comes next, when the group will inevitably turnover again. I have accepted that it is an opportunity to steer the ship into new directions but sometimes it is disappointing to change the group just around the time it feels like we can take on almost any challenge.

Longer term view of science

One thing that has been on my mind is that I am sometimes weary about the time it can take to achieve a research goal. I am not talking here about an individual research project which tends to take on the order of 2 to 3 years on average. In our group we have tried to address some bigger research goals, such as trying to understand the evolution of protein phosphorylation or the functional relevance of individual phosphosites. These kinds of challenges take multiple independent projects and over 10 years of time to make a meaningful dent on. These days I will look at a potential long term research goal and I will think about the many different types of methods and steps that will be needed and this can distract me from the excitement of figuring those things out. I should say that I am by no means jaded about doing research. I still get such a thrill discussing the day-to-day results with lab members, being at the frontier and trying to figure things out. It is just when I pause to think about the longer term view, either in the past or trying to project into the future that I sometimes wish things could just move faster. I have taken part in a couple of large multi-PI projects that have moved very quickly and from these I can see the temptation of trying to have large labs.

From junior to “established” PI

There is no point in time when a switch happens and someone is no longer considered a junior PI but after 12 years I can safely assume that label no longer applies to me. This has brought some relatively small changes in my job, one simple one being that I no longer think about tenure. For most of my career I was on fixed term positions, including my first group leader position at EMBL which had a time limit of 9 years. I joined ETH 3 years ago on a tenured contract and not having to think about my next job has left me with a tiny post-tenure slump - what am I aiming for ? Related to the previous section, I have considered that I could enjoy overseeing science at a higher level than as a group leader. As one example, I organized an application for a National Centre of Competence in Research (NCCRs) with 19 PIs interested in human genetics in Switzerland. While the application failed, I was really keen and excited to co-direct the center if it had been funded.

Another aspect of my job that has changed somewhat is a higher commitment to activities outside the lab, such as taking part in committees, advisory panels or formal and informal mentorship of junior PIs. I don’t feel particularly overwhelmed by these activities but that might change if I am required to take part in more committees within ETH. Not everything is an additional burden to an already busy job. I have felt that being more visible and connected in international science comes with benefits, including being easier to at least discuss collaborations or having labs interested in joint grant applications.

Scientists that have worked in academia for longer than I have might find some of these things funny and I am certainly curious about what it will feel like reading this 10 years and more from now. In fact, the blog is now a bit over 20 years old with posts starting in my PhD. While I don’t post much these days I aim to continue at least this yearly series while I feel there are some new things to say beyond the progress in our science.

Monday, November 13, 2023

State of the lab 10 and 11 - the first years at ETH Zurich

Yet another lake by a mountain in Switzerland

This blog post is part of a (nearly) yearly series on running a research group in academia. This post summarizes years 10 and 11, the first 2 years after moving to ETH Zurich. It also marks the end of the first decade as a research group leader, which is meaningful only because we have ten fingers and use 10 as a base for counting but I digress. There has been a lot to adapt to in moving to a new country including all the basics of moving, re-building the group and starting teaching. It was a lot easier than the first time around since I didn't have to set up the group from zero. Some people came with me, some stayed at EMBL-EBI with funding that couldn't be moved and generally speaking we could continue several computational related projects without much interruption. If we were primarily lab based then I think the interruption would have been more dramatic. Unexpectedly, there were more periods of high stress than I typically have. There was no particular reason for the stress but just a combination of multiple small things and probably due mostly to the adaptation to a new place. I will cover here some of the biggest things I am having to adapt to and also some of the research directions planned for the first 5 years of the group at ETH. One aspect that I will not cover is networking and getting to know the Swiss research landscape, but I will come to it in a later post.

The Swiss style of leadership

The EMBL, where I was before, has a very top-down leadership. EMBL is funded by different counties that are represented in the EMBL council. There is a director general who is appointed by the council and has a lot of control. Of course, there is a hierarchical support structure with a senior management team, heads of research units and a group of "senior scientists" that support the director in decision making. I am still figuring out ETH but there is a very different feel to it, both in size and style of leadership. EMBL employs around 2000 people while ETH has around 12,000. Organizationally, ETH is divided into 16 departments, and each department is further split into different institutes. For example, I am in the Department of Biology, which has 6 institutes, and I am in the Institute of Molecular Systems Biology (IMSB). As leadership, there is an executive board, including the president of ETH, then the Department heads, and in each department there is the meeting of heads of institute and the professorial conferences (i.e. all votes from professors). At least in the Department of Biology the heads of the institutes and the leadership of the Department are meant to rotate every 2 years. At these levels - institute and department - the leadership feels highly representative with lots and lots (!) of voting. This representative rotational leadership feels very different from EMBL and I think mirrors more broadly a Swiss way of doing things. The obvious consequence of this is that any change requires deep consensus and therefore radical change is less likely but it is too early to say much more.

Teaching at undergraduate level

During 9 years at EMBL I had almost zero teaching duties. I voluntarily taught some classes in the GABBA PhD program in Portugal and not much more. At ETH teaching is now an important part of my job. I am teaching courses in Bioinformatics and Systems Biology, primarily to biology students, which are all very familiar topics and close to my area of research. I don't particularly enjoy the act of teaching, in particular standing in front of 70-100 students and trying to explain things. As an introvert I am more comfortable with 1-on-1 or small group discussions and I get very tired with the interaction of teaching in a classroom setting. I have always said that Biology students should learn more computational skills so at least I have the opportunity now to influence that at ETH. In fact, the biology curriculum was changed right when I was joining to add more bioinformatics and they do have the chance to learn it with multiple lectures that cover bioinformatics and machine learning. Despite it being a mixed bag for me I am privileged in that I have a very low teaching load in topics that I like. Teaching is an area that I feel I could do more for and it could have an impact, in particular if we made it open to anyone. However, it is still something that I find difficult to fully devote to given the research role.

Our research at ETH during the first 5 years

The start of the research group at ETH has been fantastic. There was another big turnover of the group members during the transition, the second major turnover since the group started 11 years ago. I am really happy with the team we have here and having done this sort of turnover before, I can already see the growing potential of many projects that have started here. So the next 2-3 years is going to be about building up these projects and trying to coordinate them such that they interact and feed off each other. We have very generous stable funding as all other tenured prof positions at ETH - so called endowed professorships in the US or positions with core funding for the European researchers. Surprisingly, there is not a lot of oversight on this research funding which is a big difference from EMBL where the units, and their group leaders, are reviewed every 4 years. So I thought I could at least write down our commitment for research over the first 5 years here, in the spirit of disclosing what we are doing with this public research funding.

Human genetics research - mechanisms linking genotype to phenotype

Human genetics is an area that we started working on in the last 3-4 years or so of EMBL. Some of these things are already visible in recently published articles, including some protein-interaction network-based analyses of trait-associated genes. We continue to actively work on this and one direction of focus is to try to build interaction networks that are specific to different tissues or cell types. We are working on a manuscript on this and it is an area to continue to build upon, to be able to study the differences in cell biology of different cells/tissues and how genetic changes manifest differently in these. A second direction of focus here is to study the relation between common and rare variants linked to related traits using networks.

From cells to proteins - we are finishing a project where we are using protein structures to annotate functional residues in proteins to study mechanisms of pathogenicity. One aspect of this that will need further development is expanding on the prediction of structural modelling of protein interactions with other proteins and other molecules. Finally, we are interested in how genetic variation controls protein levels and ideally how to build computational models that can integrate the impact of genetic variation through control of protein levels, interactions, organs and organismal traits, ideally without a black-box modelling approach. All of these things are actively ongoing and I expect to have progress to report in the coming years.

Post-translational regulation - large scale studies of kinase signalling

There are over 100,000 phophosphosites discovered in human proteins and over 20,000 found in budding yeast proteins. We don't have good methods to study the functional role of these phosphosites nor to reconstruct the kinase/phosphatase-substrate signalling network of different cells. About half of the group is continuing to work on these problems and here at ETH we managed to consolidate the computational and experimental parts of our group which used to run in different locations while I was at EMBL. Because we are doing more of the experimental work now, this part of the group had a slower start but things are now moving along very well. Some of the problems that we are working on include the prediction of the biological process regulated by phosphosites; studying the impact of phosphorylation on protein conformational change; experimental methods to map kinase-substrate interactions and large scale mutational studies of PTMs. The thought has crossed my mind to phased-down a bit this area of research, or at least to move more into mammalian systems in our experimental work to make it more complementary to the human genetics side of the lab.

Structural bioinformatics, protein evolution and other

We have been having a lot of fun with AlphaFold2 ! With the current fast pace of change in protein related bioinformatics methods I am sure we will continue to play with these methods as they come. It is not likely that we will do a lot of method development ourselves, it is not our way, but I think we are very good partners for method developers to help make the bridge to applications. Protein structures, protein design and evolution models are all things we will likely be playing around with in the coming years.

Wednesday, November 16, 2022

20 years of open science or how we haven't radically changed the way we do science online

Around 20 years ago I was a starting PhD student and it was an exciting time for the internet. It was the time of blogs, wikis and a large increase in public participation with more user generated content in what is commonly known as the start of Web 2.0. These were the times of web based online communities such as the now defunct Kuro5hin or the great survivor slashdot.org. I started this blog 19 years ago and I was also "hanging out" in an online community called Nodalpoint. Nodalpoint no longer exists but it was a discussion forum/wiki for bioinformatics with some of these discussions still preserved thanks to the magic of the way back machine.

Around the time of 2002-2006 all of the excitement around Web 2.0 was also infecting academia with many discussions around open science. I know that open science is a vague term that can mean many different things including open access, citizen science, open source and many others. One specific aspect that I want to focus on is the idea of organizing research in a way that is not based on local group structures. In 2005 I wrote a Nodalpoint post on "Virtual collaborative research" which is similar in spirit to open source software development but with a focus on discovery not tool development. Part of this would mean surfacing more of our ongoing research and taking part in research projects that are not organized by traditional research group structures. The idea of being extremely open about ongoing research activities was advocated by others under the term of "open notebook science".

Over the following years I made a few attempts at starting such open research projects with blog posts where I tried to set up tools and ideas where others could take part in (see posts from 2007, 2008 and 2010). The last project idea I tried to propose in such way ended up being one of the major projects from my postdoc and basically one of research lines I am still working on. In the end, none of these attempts really took off as open collaborative research projects. In hindsight, I am not surprised it didn't work. Even within local structures of research institutes and university departments there is so much discussion on incentives for local collaborations. While I think the traditional structures for organizing research do work, as a PhD student and postdoc I was very frustrated by the apparent difficulty of making the most of everyone's expertise. As a group leader I have more capacity to establish collaborations but I still think we aren't using the internet to its full capacity.

So what happened in the decade from 2010 to 2020 ? Blogs and online communities mostly died out and Web2.0 was swallowed by corporations. One major change was the rise of large social networks and the standardization of the stream as way for people to share information and interact. Academia started participating in social networks around the time of Friendfeed (2007-2015) and such participation become mainstream with the popularization of Twitter. I honestly would never have predicted the rise of academic twitter and it is truly a sign of how the geeks have inherited the earth.

The reason I am even thinking about open science these days is that over the past couple of years we have been involved in projects that have illustrated this potential of large collaborations empowered by the internet. I wanted to write this down also to have something to come back to in the future. The first project was a study of phosphorylation changes during SARS-CoV-2 infection. Like many others, when the pandemic sent our research group home, I though about what we could do to help and sent emails to a few people that could be working on the topic. Nevan Krogan, my former postdoc supervisor, was very keen to involve us which lead to several projects including this study of protein phosphorylation. This was probably one of the most exciting projects I have been involved with and included a very spontaneous collaboration among a large international team coordinated by a few people through slack. In this case the network of interactions was provided by Nevan and it was possible because everyone was pushing in the same direction triggered by a catastrophe. I wish everyone could feel the sense of power that I think we felt during this project. There was so much scientific capacity at the disposal of this single project and we could iterate through experiments and data analysis at an incredible pace. It is even hard to express how it felt to be able to just get things done when you had the world experts for what was required to do at every step.

A second even more interesting example was a community effort to study the value of AlphaFold2 in a series of applications. When AlphaFold2 was released, several scientists started sharing their early observations of how AlphaFold2 and predicted structures could be used for different applications. I though all of these examples were really exciting and that we could structure this output into a manuscript. So I just contacted people that were doing this and also asked on social media if anyone else wanted to participate. In the end every contribution to this was quite modular and it was easy to integrate this into a manuscript with a few meetings and a google doc to put things together. Perhaps the less usual thing that happened was receiving actual results through Twitter chat.

I think both of these examples required a trigger - the pandemic and the release of AlphaFold2 - that led to many scientists moving in the same direction. In both of these cases I think we achieved in a few months what would take a single group potentially one to several years to do. Yet, these interactions remain difficult to make. Perhaps simply because we are just too busy with our own research questions or more likely because of the importance of credit and evaluation systems in academia. These days I am actually less in favor of radical sharing of ongoing research, in the spirit of open notebook science. I don't think we have the attention span for it. It would be too difficult to navigate and may lead to more "group think" instead of divergent thinking and ideas. Maybe the simple existence of social networks like twitter are already a good step forward. I certainly get to know more people and what they may be up to via this. Lets see what the next 20 years bring.

Tuesday, March 08, 2022

Independent evaluation of AlphaFold-Multimer

AlphaFold2 has been widely reported as a fantastic leap forward in the prediction of protein structures from sequence, when sequence has enough homologs to build a reasonable multiple sequence alignment. When AlphaFold2 was released (Jumper et al. 2021) there were several independent reports of how it could also be used for the prediction of structures of protein complexes despite the fact that it was not trained to do so (Bryant et al., 2021; Ko and Lee, 2021; Mirdita et al. 2022). Together with the lab of Arne Elofsson, in work led by David Burke in our group and Patrick Bryant in Arne's group, we have shown that it can be applied in reasonably large scale to predict structures of protein complexes for known human interactions (Burke et al. 2021). There is a lot to investigate still but it is clear that this is an extremely exciting direction of research since that lead to a major advances in the structural analysis of cell biology, evolution, biotechnology, etc.

Soon after these first reports, DeepMind released an AlphaFold version that was re-trained specifically for prediction of structures of protein complex - AlphaFold-Multimer (Evans et al. 2021). Given that they reported an even higher success rate with this specifically trained model we were quite excited to give this a try. David Burke selected a set of 650 pairs of human proteins from the Hu.MAP dataset, known to physically interact and for which the experimental structure has been solved. A structure was predicted using AF v2.1.1 (AF-multimer) using default settings and the model_1_multimer parameter set. A second model was predicted using AF using the model1 monomer parameter set and the FoldDock pipeline. For each model, DockQ scores were produced which reflect the similarity of the predicted structure with the experimental structure with a specific focus on the interaction interface residues. A DockQ score value below 0.23 can be considered essentially an incorrect or random model.

Below we show a direct comparison between the two AlphaFold2 models with the AF2 Multimer showing a very significant improvement based on DockQ scores. Of all predictions tested, there were 51% above DockQ>0.23 with AF2 Multimer and 40%>0.23 with "standard" AlphaFold2. This improvement (+11%) is not as large as that reported by the DeepMind team (+25%) on their own test set. There could be several reasons for the difference but more importantly this would be more than enough to justify using Multimer for the prediction of protein complexes.

However, David quickly realised that there were many examples of clashes at the predicted interface with the AF2 Multimer model. In the figure below we show just an example of this which, despite the high DockQ score (0.85) clearly has several overlapping residues. That is, while the interface region is likely to be correct, the model at the interface has serious errors.

These clashes in predicted structures are quite frequent with 69% of predictions having some clash. The clashes can be quite extreme with several involving a very high fraction of the total length of the protein as shown in the distribution below. Such clashes are essentially not seen in the predictions made with the earlier version of AlphaFold2.

While there may be some cases where the clashes could be minimised, as it stands the models produced by AF-multimer may not be usable for a large fraction of cases. However, these issues are of course easy to spot. DeepMind is in fact aware of this bug since around November and have said they are working on it. From the point of view of predicting the regions of the proteins where the interaction will occur AF-multimer may still be usable as it is and hopefully DeepMind will find a fix for this problem.

Wednesday, February 02, 2022

A closer look at the costs of EMBO publishing

There has been a lot of discussions on social media about the price that some publishers are coming up for publishing a paper in their journals - the so called article processing charges (APC). With some journals asking for values that are on the order of 10k and many scientists finding these values to be outrageous. Given that journals don't work to produce the research articles and get academics to do the evaluation, how can these journals claim the costs of publishing a paper to be anywhere close to 10k ? While I agree that these are outrageous values, I don't really believe that the price is mostly profit. A good source of information for the costs associated with running a publisher are those that have been disclosed by EMBO Publishing. Before we go into these I need to disclose that I serve on the Publications Advisory Board of EMBO publishing. I don't receive anything from EMBO and this is merely an advisory committee but it has given me some insight into what is a very real attempt from non-profit publisher to come up with an APC that is low and what they could compromise on their current set-up to achieve it.

With that out of the way lets just look at the most recent numbers that EMBO has disclosed which were for 2019 (see here). EMBO has (or had in 2019) 17 professional scientific editors and 6 support staff, that handled a total of 5,766 submissions in 2019. That is on the order of 28 submissions handled per month per editor, 1.3 per working day. I don't know about you but making a call on 1 paper per day plus finding/chasing reviewers is not easy if you try to do it properly, even if you can make some rejections fairly quickly. From these they ended up publishing 472 (8%). This part is not totally transparent, for example maybe some of the submissions included the reviews and news&views articles that were ultimately also published. If that is the case then the total number published would be 681 (12%). It is also not totally clear if the submissions include also revision submissions. Regardless, this shows that the total of EMBO publishing ends up having acceptance rates that are quite low (10-20%). I should stress that I truly don't know the actual number. As we easily see, this rejection rate is really key for the high estimated cost per paper.

The costs that they have disclosed includes ~2,5 million euro for the EMBO Press office, of which around 2 million is listed as salaries and benefits. The number of staff is there as well so you can guestimate the average salary for the 23 staff and you can also look up EMBO editor salary on Glassdoor to get an idea. I truly don't know what the salary is but I guess on average it could be on the order of 4-6k net per month. The other costs include 1,723,639 euro that EMBO Publishing pays to Wiley which in fact does the actual publishing. The majority of this cost is listed as "Wiley publishing services (incl. production, sales and marketing)" (1,281,552 euro). This is certainly a place where costs are not very transparent, at least to me, and where profit to Wiley is included, likely with a decent margin. I certainly don't know enough about finances to figure out but Wiley is claimed to have around 30% of operating profit margin but for the purposes of some later calculations, lets assume that maybe 50% of these costs are profit that could be magically removed (e.g. EMBO sets up their own publishing infrastructure). Finally, EMBO also lists 1,342,374 euro in "surplus" which is re-invested into some publishing related actives like the EMBO Source Data project, other pilots trying to innovate on the publishing side and back to EMBO itself which further supports EMBO program activities (fellowships, etc).

With these numbers then the total cost includes the 4,225,920 of actual cost and the 1,342,374 for EMBO activities (5,568,294 euro total). So if you don't take anything out of this, you would need a price of 11797 euro for each of the 472 paper published in 2019 to finance this. If you exclude the EMBO surplus that would be 8953 per paper and excluding 50% of Wiley costs it would get down to 7127 per paper. Even without anything from Wiley you would only get to 5301 per paper. Of course, you can also argue that the salaries costs could be lower but what can't really be argued is that academic editors can do this for "free" since that is time that most likely is even more expensive and less efficient.

So the 10k APC number certainly contains parts that can be reduced but we are not talking about a 1k per paper cost. For that you would need to change the rejection rates and this is what really starts mattering in the end. If you go to maybe something like 50% acceptance rates which could correspond to something like 2000 papers published in this case, then the APC could be somewhere on the order of 1500-2500 euro. Keep also in mind that submission numbers would tend to decrease over time if the impact factors go down with higher acceptance rates (yes, some people still care about those). Of course, this scales across multiple journals and this is where the big publishers are just taking advantage since the overall acceptance rate across the large portfolio of journals is much higher than 10% and high acceptance rate journals (e.g. Scientific Reports) can cross-subsidise low acceptance rate journals (Nature).

It is important again to keep in mind that all of these prices per paper have been there for decades but were paid via journals subscription charges instead of APCs and therefore they were not transparent and people were not really paying attention. In the end, the discussion for me is not really around the 30% savings we could have by pushing the publishers to lower their prices, but more about how we go about doing the filtering (i.e. target audience) and subjective evaluation of value to science (i.e. impact). Revolutions are not real solutions in academic publishing. If you propose a solution that requires a majority of people to change their habits in the span of 3 years it is dead on arrival.

Wednesday, January 19, 2022

State of the lab 9 - an informal report on the 9 years of EMBL-EBI

This blog post is part of a yearly series (or close to yearly) on running a research lab in academia. 2021 was the last of 9 years as a group leader at EMBL-EBI, which is the standard time given to group leaders to establish and run their labs at EMBL. For this year's blog post I thought it was a good time to look back at the full 9 years and I am going to (briefly) cover the time at EMBL with some numbers including giving an approximate account of the finances. This is something that I do with the group at the start of every year but it still feels strange to make financial numbers public.

The scientists

A lot has happened during 9 years. Starting with the people, we have had 7 PhD students, 1 of which co-supervised, 13 postdocs and 10 interns/visiting lab members. The total group size was around 10 for the majority of the time which, as a manager, feels about right in what I can do as a direct line manager. It is fair to say that science is a very social activity and working with different people with different personalities, through the good and bad, is really enriching. Not to get all corny but the personal interactions are some of the things that stick with me the most over the time. It is always in those extremes - the "unfairly" rejected paper or unexpected positive response, individual personal and work difficulties that are overcome or sometimes not. Mental well being is an example of such difficulties that across the broader society we are not good at dealing with and that have also not always been easy as a manager.

From these 30 lab members there are 7 that will continue with the group over the next few years: Cristina (senior scientist), Jurgen (postdoc) and Miguel (postdoc) have joined me at ETH and Eirini (PhD student), David (postdoc), Inigo (postdoc) and Danish (postdoc) will remain at EMBL-EBI with funding that cannot be moved. From the PhD students and postdocs that have left all but 2 have left with published papers as first or co-first authors. One PhD student decided not to continue the PhD and one postdoc left after several years without a first author paper. In both cases I feel some blame as the project ended up being difficult and the results were just not very positive.

The publications and science

In total we published 45 original research papers, 3 review articles and 2 news&views over the course of 9 years. This includes only research that was really done after starting the group and also includes 8 preprints that have not yet been published in a journal after peer-review. This is split into 27 papers where I am listed as co-corresponding author and I also think our group played an important role in the final outcome, plus 18 on which our group had some input into. I am showing on the figure the distribution of these papers along the 9 years. The first paper from our group only came at year 3 with the first real significant set of publications coming at year 4 and 5. In regards to the non-tenure track system, even by this crude metric it is easy to see how different it would be if I had to apply to the job market at year 6-7 vs year 8-9. Of course, note that the numbers for 2021 in particular are inflated by preprints that will ultimately be published in a journal most likely in 2022. Another clear trend that feels true to me is the increase of small collaboration efforts where our group just helped out in some modest way. I think this is a reflection of just being more integrated into the local and broader academic networks.

I am not going to go into the scientific outcomes of the 9 years in any detail. I think some of the strongest work we did was on the evolution and functional importance of protein phosphorylation with multiple publications that have built on each other and where I think our contributions move this field forward. There was also a smaller line of research on the genetics of trait variation that I wouldn't consider to be at the cutting edge but it has been fun to work on. In particular it has been interesting to step closer to the fields of human genetics and genetics of human disease where making advances requires the interactions between people with such different ways of viewing science. Just the language barriers between human genetics, cell biology, biochemistry and chemical biology have been fascinating to get into.

The funding

So now something that feels less comfortable or at least less common to discuss - the funding. Before going into any numbers, I should caveat this by saying that these are very rough approximations that of course should not be considered an actual financial statement. These numbers also don't take into account the money spent on the whole infrastructure (administration, grants, IT, etc) but are just the funding spent on research lab members, including my salary, and consumables. With that out of the way, over 9 years we spent approximately 5.7 million euros as broken down per year on the figure. Although we have had a small wet lab running in the last 6 years, I would say that 90% of this was on salaries. Of these around 2.7 million were from external grant funding, plus ~450k from competitive internal postdoc fellowships. This of course just shows how amazing it is to work in a place with core funding. I ended up being very successful early on with 2 million funded in years 2 a 3 and this made me too careless about applying for grants later on which I now consider a real error on my part. I applied in total to 13 external grants with 6 being successful.

So a number that immediately is easy to get but that is probably quite meaningless is the money spent per research paper. We spent a total of ~127k euros per paper or 210k if we only count those where I am listed as co-corresponding. Of course this varies so much per paper really with my very rough estimates on bounds to be something like between 25k to 1 million. Given that we mostly spend the budget on salaries this simply reflects the amount of people time spent on a project.

To new beginnings

This is a somewhat dry recap of the 9 years of EMBL but I thought it would be interesting, at least to me, to have these things written down. Even if these are just numbers, I am curious to see what the next 9-10 years look like. I am sitting in my new office at ETH, just close to two weeks after arriving in Zurich. There is a lot to adapt to, including teaching material that I should be preparing right now. I am curious to see how long it will take me to get into the local academic network and how much the move will impact on our capacity to do work. The lab work is really the part that will take the longest as I don't think we will run any experiment before middle of the year and although we have the budget for an MS instrument that will take even longer to get going. In any case, I am excited about the new beginning here.

Thursday, June 10, 2021

A not so bold proposal for the future of scientific publishing

Around 15 years ago I wrote a blog post about how we could open up more of the scientific process. The particular emphasis that I had in mind was to increase the modularity of the process in order to make it easier to change parts of it without needing a revolution. The idea would be that manuscripts would be posted to preprint servers that could accumulate comments and be revised until they are considered suitable for accreditation as a peer review publication. At the time I also though we could even be more extreme and have all of the lab notebooks open to anyone which I no longer consider to be necessarily useful.

Around 15 years have passed and while I was on point with the direction of travel I was very off the mark in terms of how long it would take us to get there. Quite a lot has happened in the last 15 years with the biggest changes being the rise of open access, preprint servers and social media. PLoS One started as a journal that wanted us to do post-publication peer review. It started with peer reviewed focused on accuracy, wanting then to leverage the magic of internet 2.0 to rank articles by how important they were through likes and active commenting by other scientists. The post-publication peer review aspect was a total failure but the journal was an economic success that led to the great PLoS One Clone Wars with consequences that are still being felt today - just go and see how many new journals your favourite publisher opened this year.

The rise of preprint servers has been the real magic for me. We live in each others scientific past by at least 2 years or so. If you sit down and have a science chat with me I can tell you about all of the work that we are doing which won't be public for some 2 years. If I didn't put our group's papers out as preprints you would be waiting at least 6-12 months to know about them. Preprint servers are a time machine, they move everyone forward in time by 12 months and speed up the exchange of ideas as they are being generated around the globe. If you don't post your manuscripts as preprints you are letting others live in the past and you are missing out on increased visibility of your own research.

Preprint servers also serve the crucial need to dissociate the act of making a manuscript public from the process of peer review, certification as a peer-reviewed paper and dissemination. This is important because it allows the whole scientific publishing system to innovate. This is needed because we waste too much money and time on a system that is currently not working to serve the authors or readers efficiently.

So after nearly 15 the updated version of the proposal is almost unchanged:

I no longer think it would be that useful to have lab notebooks freely available to anyone to read. There are parts of research that are too unclear and I suspect that the noise to information ratio would be too high for this to be of value. However, useful datasets that are not yet published could be more readily made available prior to publication. Along these lines, the ideas in the form of funded grant proposals should be disclosed after the funding period has lapsed. As for the flow from manuscript to publication, the main ideas remain and the system already exist to make these more than just ideas. There are already independent peer review systems like Review Commons. Such systems could eventually be paid and could lead to the establishment of professional paid peer reviewers. Such costs would then be deducted from other publishing costs depending on how the accreditation was done. Eventually "traditional" publishing could be replaced by overlay journals, like preLights, whose job would be to identify peer reviewed preprints that are of interest to a certain community.

Social media for me has been the most surprising change in scientific communication. I didn't expect so many scientists to join online discussions via social media. Then again, I didn't foresee the geekification of society. In many ways social media is already acting as a "publishing" system in the sense of distribution. Most of the articles I read today I find through twitter or Google Scholar recommendations. As we are all limited by the attention we can give, I think one day, instead of complaining about how impact factors distort hiring decisions we will be complaining about how social media biases distort what we think is high value science.

So finally, what can you do to move things along if you feel it is important ? If you think we have too many wasteful rounds of peer reviewing across different journals; that the cost of open access publishing is too high or even simply that publicly funded research should be free to read and openly available to mine ? Then the best single thing you can do today is make your manuscripts available via preprint servers.