Rare Disease Day Spotlight on PLOS Authors: Open Data Repositories in Practice
Science increasingly involves collaborative research groups, program partnerships and shared learnings to encourage transparency, reproducibility and a responsible transition to a more open way of doing science. Open Science policies and best practices are currently under discussion, definition and development across the wide spectrum of activities that make up the research cycle, from open notebooks, open data and transparent peer review to the interoperability of meta-data and digital identifiers. In particular for open research practices, adoption of emergent and recent policies (i.e. PLOS Data Policy) could be strengthened if accompanied by examples of successful implementation. Examples can serve as a powerful motivator for improved understanding and behavioral change for those confronted with the uncertainties of a more open landscape for the practice and communication of science.
Perhaps it’s a question of making clear to the broad stakeholder community, at all stages and across multiple disciplines, the practical benefit of these polices moving us all toward a more Open Science. It’s not just a theoretical pursuit of Open Science for the sake of being open. The current energy behind Open Science in the European Union, as well as in the United States, stems also from a frustration over wasted resources, time and talent. Practicing Open Science well does enhance reproducibility through improved clarity of methods and reagents, and accelerated reuse of data and code by others.
A Celebration of Open Data
A major benefit of open data is that data can be reused, not only for validation work but also for pushing science forward. Teams of scientists with diverse expertise collaborated to explore preexisting data sets to advance breast cancer research, in the US National Cancer Institute’s Up For A Challenge (U4C) Contest. Finalists in the US National Institutes of Health, Howard Hughes Medical Institute and Wellcome Trust Open Science Prize competition (which included projects by PLOS authors and their related publications) “demonstrated the huge potential for data to be reused to develop new applications and uncover new knowledge,” wrote Robert Kiley, Head of Open Research, and David Carr, Programme Manager, Wellcome Trust, in Figshare’s State of Open Data Report 2017. The report provided insight into how researchers approach publishing their data. In response to surveys asking where they published their data, researchers most commonly did so as an appendix to an article (slightly over 30%) or in a data repository (slightly under 30%), with 20% having published data in a data journal (see the summary infographic).
Open Data Day (March 3, 2018) is an opportunity to showcase the benefits of open data and open data systems, and, according to the grassroots collective’s website, “to encourage the adoption of open data policies in government, business and civil society around the world.” This year, the focus is on four key areas where open data can help solve universal problems: opening research data, tracking public money flows, informing open mapping projects and providing open data for equal development. In Copenhagen, Open Data Day will include announcement of the Danish Open Data Award and in London activities are planned related to Open Science and reproducible research. Participants in The Philippines will benefit from a roundtable discussion on open research as it applies locally and globally. There are no shortage of ideas and data sources for Open Data Day.
PLOS took a leadership position in open data in 2014 with our strengthened Data Policy, and since 2015 our journals maintain a list of recommended repositories to help authors share their data. When we assess repositories for inclusion in our list we are guided by criteria that meet the FAIR principles on open data. We consider this our responsibility as a publisher. The FAIR guiding principles state that beyond making data open as an important component in the data ecosystem, data also need to be Findable, Accessible, Interoperable and Reusable. For inclusion on PLOS’ list of recommended repositories, several criteria were developed, some of which are listed below. For a more complete description of repository criteria, visit the EveryONE blog on Open Data Day!
- Datasets should be available at no cost. All PLOS articles are available to readers free of charge and we believe cost should not be a barrier to access either the scientific literature or accompanying datasets. Repositories are not considered for our recommended list if they charge readers access or subscription fees.
- Repository with stated licensing policies should offer CC 0 or CC BY licenses (or equivalents), for maximum reproducibility and reuse.
- To ensure that datasets will be permanently accessible at the specified location, repositories must issue a stable identifier at publication, such as a digital object identifier (DOI) or an equally robust accession number.
- FAIRsharing.org works with a community of journals, funders and databases in support of standards, polices and educational material to enable funders, librarians, journals, researchers and developers to thrive in the open data world. The repository chosen by PLOS authors should have an entry created in the FAIRsharing database, to allow it to be linked to the PLOS entry.
In addition to considering the PLOS Data Policy and providing a Data Availability Statement for their individual data and datasets, selecting the appropriate data repository is an important part of a researcher’s overall experimental and data plan. To assist authors in choosing the best repository, in addition to the current list of recommended repositories, the complete list of repository criteria will soon be available on PLOS journal websites.
What is the practical importance of open data? As one specific example we can look to a coincidence of timing: February 28 is Rare Disease Day. Rare diseases constitute a group of more than 6,000 different diseases and affect more than 300 million people worldwide. To put this number in perspective, 1 in 20 people live with a rare disease in their life, according to EURORDIS, an alliance of over 700 patient organizations from nearly 70 countries in Europe. In light of Rare Disease Day’s close temporal alignment with Open Data Day, we highlight a selection of articles on rare diseases published at PLOS that utilize a variety of repository options to best make their associated data available. These are examples of authors doing the right thing to advance rare disease research, collective knowledge, and future therapeutic interventions.
- Sorenson et al. (2017) used the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) repository to store their genomic and transcriptomic data relating to fibrolamellar hepatocellular carcinoma (a rare variant of liver cancer).
- Guilhem et al. (2017) used the Data Archiving and Networking Services (DANS) EASY repository to deposit data files from their research on hereditary hemorrhagic telangiectasia (a disease leading to abnormal blood vessel formation). The EASY repository is one of PLOS’ recommended repositories.
- Andersen et al. (2017) carried out a bibliometric analysis on multiple myeloma research (a cancer of white blood cells). Few, if any, dedicated repositories exist exclusively for bibliometric work, so data underlying work like this can be deposited to a discipline-independent repository—in this case Figshare. While subject-specific repositories are preferred, in cases where they are not available authors may use a cross-disciplinary repository.
- Hytönen et al. (2016) published genome data relating to their work on three rare bone diseases in the European Nucleotide Archive (ENA) hosted by EMBL-EBI.
- Piersanti et al. (2015) also used the GEO repository to store their microarray data on gene expression changes in brain cells following infection with viral vectors. This work contributes to the development of gene therapy that could be used in the treatment of several rare diseases affecting the brain.
The theme for Rare Disease Day this year is a carry-over from last year—research. If scientists working in these disease areas make their data open and available for reuse and re-examination, they can extend the impact of their efforts and may open a window to unrealized diagnoses, therapies and perhaps even cures.
In the pursuit of Open Science, practical and even incremental change has the power and potential to bolster momentum and encourage a spirit of collaboration that ultimately brings about large-scale cultural shift. We have seen evidence of this most recently with the preprint movement in biomedical and life sciences. Making open data the norm, whenever possible, and following FAIR sharing principles are additional practices that, like preprints, have the capacity to transform the work and culture of science.
Join the PLOS Communications LinkedIn Group to stay up to date on author interviews, research and organisation highlights.