Code Ocean: Tackling Reproducibility and Transparency in Scientific Research
Categories
A startup incubated at the Jacobs Technion-Cornell Institute has launched a global platform to address a challenging topic in scientific research — the crisis of reproducibility and transparency.
Code Ocean is a cloud-based platform that makes the computational code used in research both accessible and usable. Researchers and software engineers across the planet can now share and run code with a single click.
Code Ocean CEO, Simon Adar, was part of the 2014 cohort of the Runway Startup Postdoc Program at the Jacobs Technion-Cornell Institute. Over a two-year period, Adar developed his idea with fellow postdocs, as well as business and academic experts. He assembled a dedicated team to build Code Ocean and the platform launched last month.
The goal of Code Ocean is to share code and algorithms more easily via an embedded link, just like a YouTube video, explains Adar. Researchers simply upload code to the platform, then link it to the associated article in an academic journal.
Code Ocean not only makes the code accessible, it also allows other researchers to run it at the press of a button.
“You can change parameters, modify the code, upload your own data, run it again, and see how the results change – without installing anything on your personal computer. Everything runs in the cloud,” says Adar.
This means that research can be reproduced, and even more importantly reused, by others with ease. This presents a unique opportunity to solve a long-standing problem facing the scientific community.
The crisis of reproducibility
In a 2016 paper published in the journal Science, Victoria Stodden et al. note that “Access to the computational steps taken to process data and generate findings is as important as access to data themselves.” Unfortunately, access to these steps has not been routinely available.
When code is not accessible, research cannot be easily replicated; it becomes less accountable and reliable. The effects can be staggering. According to a 2013 article in The Economist, “A rule of thumb among biotechnology venture-capitalists is that half of published research cannot be replicated.”
Traditionally, the code associated with research is in printed form. It is not dynamic and cannot be used in real-time, making it time-consuming and complicated to replicate.
When tackling a problem, researchers routinely look to the work of those that have gone before them. In doing so, they often encounter code that could help them.
“If I come across a piece of code that I think has application to solving what I’m working on currently, the only way I’m going to be absolutely sure is by running the code with my own data set or with my own input,” explains Director of Business Development, Pierre Montagano.
Code Ocean allows that work to take place at the press of a button, “We are speeding up the pace at which science can move,” Montagano said.
Credit Where Credit Is Due
Traditionally, previously published papers are credited in new research, but the code underlying the work is not. Code Ocean rectifies this by allocating digital object identifiers (DOIs) to code.
“By assigning a digital object identifier to that code, to the actual algorithm, the package, then researchers and authors can start getting credit for the actual algorithm itself. It can be cited in other work,” says Montagano.
Each piece of code also has an associated license, allowing researchers to define how it is used; the model is similar to Creative Commons.
When a user encounters interesting code on the platform, they can easily find out how it can be used. “I can go to the details page, and find out what are the associated software and data licenses,” explains Adar, “I can read what that license permits me to do — or not do — with a commercial product or new research. New improvements and discoveries can easily be shared and published back to the community by anyone, which is what science is all about.”
Code Ocean also functions as a collaborative workspace that supports different software and systems. Nowadays, research is often carried by people in different countries and institutions. The new platform allows teams to share and modify their code instantly.
Incubating Startups at the Jacobs Institute at Cornell Tech
The Jacobs Institute’s Runway Startup Postdoc Program gave Adar access to a vast network of talented people who could support the development of Code Ocean, including faculty, industry specialists, and investors.
“You get to hear a lot of ideas, and we had discussions with entrepreneurs on a weekly basis,” he explains. Sessions could be one-on-one or between small groups. Adar said access to expertise and the chance to get down to the “nitty-gritty” of an idea is what makes the experience unique.
Code Ocean envisions its customers as end-users who consume research and authors who want to disseminate their work. The IEEE, the world’s largest association of technical engineering professionals, has recently announced a partnership with Code Ocean. According to that announcement, IEEE authors will be able to link their published articles with their executable algorithms on the Code Ocean platform for free. The company is planning more integrations with other publishers in the near the future.
Going forward, their mission is to make science more open and accountable. As Adar points out, scientists often work for years before research is published. If people are not able to easily access and use that work, it is a waste of researchers time and billions in public funding.
For his part, Montagano stresses the sheer excitement of working with an “exceptional” team on such a pioneering project, “We really feel like we are adding to the better good of science and openness.”