This post first appeared on pygaze.org, in March 2016.
Open Science (#openscience) is great! It entails sharing data and code between scientists, so that we can all benefit from each other’s efforts. However, there is a downside to sharing your stuff: You become a helpdesk for people who would like to use it, and sharing distracts from a core part of the job: publishing papers! Because research positions are offered to those who publish a lot, distracting yourself from doing so might put you out of a job in the long run. To solve this problem, publishing open data and software should be valued as much as publishing papers.
Science is not as noble as you might think it is. Research groups compete for a limited source of funding, and individual scientists compete for jobs. Research funding and jobs are granted to those who perform better than others. Usually, how well (groups of) researchers perform is measured by how much papers they publish, in what academic journals they do, and how much their papers are referenced by other scientists. This is what some describe with the expression Publish or Perish: if you don’t publish enough papers, you are unlikely to find another job, or another source of funding.
The first obvious thing to note, is that scientists don’t like it if their existing papers get retracted. Retractions happen when mistakes are found in papers, which often causes the conclusions in the paper to no longer be supported by the data. To err is human, but most consider it not to be scientific. So when a retraction happens, this is big news (in fact, there is a popular blog that is entirely devoted to retraction news). It hurts the scientist, because their reputation is damaged, and because their publication output is reduced. Clearly, the scientific community benefits when an error is removed from the scientific record, but the individual scientist does not.
Another problem is that science is hard. It takes a lot of time to collect data, and scientists put a lot of effort into gathering datasets. Researchers can have a strong attachment to their data, protecting it like it’s their own baby. This is not completely fair, because they spent public money on collecting this data, and one could argue that this imposes a moral obligation to return the data to the public. This would benefit the scientific community, because it would prevent us from having to collect a completely new dataset if we wanted to new analyses of someone else’s existing data.
From the point of view of an individual scientist, this might seem a bit unfair: Why should they spend so much time on collecting data, while others can mooch of their efforts? This sentiment, recently described in the New England Journal of Medicine (NEJM) as a fear of “research parasites”, might be understandable. but it does not aid science as a whole. We would be so much more efficient if we pooled all our efforts! (To be clear: collaboration is what the NEJM editors argued for, even though some have taken their “research parasite” description out of context a bit.)
Another thing that is coveted by scientists, is their own software. We write custom scripts for our data collection, for our analyses, and for simulations. We heavily rely on our software, up to the point where most (if not all) research would be completely impossible without it. As a consequence, a lot of researchers know how to do programming, and some have become coding experts.
Now, sharing your software does not seem as bad as sharing your data. The usual process is that you publish a paper on your software, in an academic journal that is specialised in this (Behavior Research Methods, for example). You then publish your software’s source code online, for free, and you allow other people to use it however they please. This is called open-source software.
Having open-source software is good for the researcher who published it, because when colleagues use the software, they will cite the associated paper. How much your papers are referred to by other scientists, is something that funding organisations and universities consider to be important. So having more citations is good for your career, and open-source software can boost this!
From helpdesking to headdesking
What nobody will tell you when they talk about how great open-source software is, is that you will spend the next years of your life maintaining your software, and answering questions from people who use it. This might not seem like a big deal, right? How many people do you really expect to use your niche product, and if you did your job well there shouldn’t be many bugs to fix in maintenance. This is true in some aspects, but untrue enough to impose a rather large burden on the software-producing scientist.
Code maintenance isn’t just necessary for fixing bugs. It’s also required to make sure your software keeps working with new operating systems, and with other hardware and software that it might rely on. In addition, users will request new features, and you will feel obliged to add these. Depending on the size of your package, maintaining it can become a rather large workload.
The second problem of publishing software for others to use, is that others will use it. This means that they will require enough documentation to be able to do so, and that they will ask you questions when they are unclear on things. Now this is perfectly normal, and completely fine. It’s a great feeling when your software becomes popular, and helping other people is very rewarding. However, answering a lot of questions is rather time consuming, which means the software-producing scientist can spend less time on their other work. Or worse: maybe they decide to spend less time at home, and more time at work, thereby neglecting their friends and family (and not in exchange for additional salary, mind you).
In addition, there is a programmer’s proverb that applies here: If you make your software idiot-proof, idiots will use it. That sounds a bit degrading, and quite unpleasant towards the people that are enthusiastic about your work. I assure you that it is not intended that way, as developers are normally completely happy to help a person, whatever their question is. However, unfortunately, some people have unrealistic expectations of how much support a software developer can provide, and how much specialised knowledge is required to use some software. Not infrequently, a developer will receive questions along the lines of this: “I need your software to do this, but I have no knowledge of this topic. Also, I didn’t read your documentation, and I didn’t familiarise myself with your software (or programming language, if applicable). And did I mention that I have a strict deadline? I want to start testing tomorrow, so I need you to help me right away.” This is not too bad if it happens once or twice (heck, I remember asking others such things when I just learned how to code), but imagine being asked these questions on a daily basis. It does not make you dislike the people who ask, but it does make you dislike the general concept of these questions.
As a developer, I would love to help you with any issues, right away. But as a scientist, I have a lot of research to do, and students to teach. I don’t have time to do everything, and the current system emphasises my research efforts over everything else. So what should I choose?
At this point, it’s clear that open science is not necessarily beneficial for the open scientist, even though it is beneficial to science in general. This is not because data sharing or open-source software are bad ideas, but because the current system gives incentives for a very narrow definition of research output: publications are all that counts. In order to fix this, we need to align the needs of science as a whole with the needs of individual scientists. This means we need to incentivise cooperative behaviour in academia, for example by valuing published data and software equal to published papers.
Bonus rant: Open Access
Science is not as open as a lot of people think. When scientists do experiments and get results, they report on this in academic journals. These journals are usually not open to the public, but only to those who pay for an article or a subscription. This doesn’t seem unfair, because journals have publishing costs to pay for the services they provide: typesetting, printing, and hosting a website. Interestingly, journals outsource some crucial parts of publishing to scientists, who perform these tasks for free.
Most importantly, journal articles are written for free by scientists. In addition, journal editors are senior scientists who choose what papers are accepted for publication, and they do this for free. Then you have peer reviewers, who are scientists that check whether an article is scientifically sound and accurate, and they also do this for free. So an academic journal really only has to provide the infrastructure: a website that allows for article submission and review, and to host the published articles (which might also appear in a printed version).
I don’t want to make it seem like publishers of academic journals have easy jobs, but I do want to make it clear that their costs should be relatively low. So you expect the subscription and article fees they charge to be relatively low too. Unfortunately, things aren’t quite so: Individual articles usually cost around 30 to 40 dollars, and the subscription rates that publishers charge to university libraries are ridiculously high. How high exactly is kept secret, but it’s so high that a lot of universities cannot afford to pay for them. The solution, according to many, is open access: Free access to all articles that scientists produce, for everyone and their mothers.
In practice, open access is being abused by publishers that charge inappropriate sums of money to researchers who wish to publish their work. However, there are new publishers who charge fairer fees, such as eLife, PeerJ, and Collabra. Ideally, universities would collaborate to form their own publishing platform, so that journals become accessible to everyone at a relatively low cost. (And without money leaking away into the private market.)
What have we learned?
Scientists compete with each other for funding and jobs, which makes it a potentially bad idea to share the data that you worked so hard for. In addition, scientists who publish their software often spend a lot of time on helping people who want to use it, which takes time away from actually doing research. This reduced research output can make scientists less likely to be awarded funding, and reduces job security. Finally, scientists pay academic journals extraordinary amounts of money for publishing their work open access, but they also pay publishers ridiculous sums to read closed-access articles. In sum, open science isn’t particularly attractive for individual scientists!
The solution isn’t to stop being open. Openness improves the accuracy and speed of scientific progress in general, which in turn is beneficial to society at large. Think of all the diseases we could have cured if we didn’t waste so much time on competing with each other! Instead, openness should be promoted and supported. Funding bodies and senior scientists shouldn’t over-emphasise publications in flashy journals, and take into account other forms of research output too. These include open software and open data, as those greatly benefit the academic community. Finally, universities should organise a publishing platform that is both sustainable, and not in the hands of corporate players that seek only to make profit. Science is a not-for-profit business, and it’s time we start behaving like that. We’re not here to boost our own ego or to make money, but to quench humanity’s ever-lasting thirst for knowledge, to cure diseases, and to create technology.