I dream of data: How to prepare and share research data to meet increasing institutional, funder, and journal requirements

June 8, 2023 PLOS Open Data Open Science

Written by Lindsay Morton

Data sharing has long been a hallmark of high-quality reproducible research. Now, Open Data is becoming the norm, an increasingly expected component of the publication process, encouraged and rewarded by top institutions and major funders. PLOS’ pioneering data policy helps researchers share data in an ethical way that supports reproducibility while honoring privacy. Learn how you can prepare to meet new and increasing data-sharing requirements with ease, and how we can help.

The rise of data sharing

For nearly a decade, PLOS’ data availability policy has required authors to share the research data underlying their findings alongside published articles—with sensible exceptions respecting privacy, vulnerable populations, and dual use research of concern (DURC). That’s hundreds of thousands of datasets available for validation, reuse, reanalysis, cross reference, database inclusion, and more.

In that same time period, much of our scholarly research community have embraced similar policies. Institutions and funders like CERN, Wellcome Trust, Gates Foundation and, soon, all US government funding bodies, require data sharing, while others, including the Plan S signatories, strongly encourage it.

In just the last four years, we’ve seen an increase in public data sharing among PLOS comparators from 41% in 2019 to 47% in 2022. Repository use among PLOS authors has risen from 23% to 28% in the same period, and from 9% to 15% among comparators.

It’s highly likely that data-sharing requirements will continue to increase, and that the rates of both data sharing and repository usage will continue to trend upward in the coming years.

When it comes to data, how Open is Open enough?

Open Data is key to the verification and replication of published research; it demonstrates reliability, makes reuse possible, and enhances understanding of the work. When handled properly, publicly available data has also been associated with a 25% increase in citations on average.

“Open” doesn’t mean simply “available.” Just like Open Access research articles, truly Open Datasets are accessible, shareable, and reusable without restriction. But, importantly, Open is just a type of license. Data sharing doesn’t work in practice unless datasets also meet accepted standards for findability, accessibility, interoperability, and reusability—better known as FAIR. Introduced in 2016, the FAIR principles outline a framework for good data management that can be used as a guide in nearly any circumstances, whether research is published Open Access or in a subscription journal.

Setting yourself up for data-sharing success

Data-sharing does require some extra effort—but it doesn’t have to be demanding. By thinking about data presentation early in the research process you can make sharing your data a natural part of communicating your research and reap dividends in readership, citations, and attention.

Plan ahead

Before you begin your research, create a data management plan. If you’re applying for funding, you may already have given this some thought. Especially when you’re working with a team it can be incredibly helpful to articulate expectations and establish norms for things like the types and formats of data, storage, backup, and naming conventions, and file structures.
Think about future users as you go

Often when deep in a research project it’s easy to use shorthand and temporary solutions—file names that make sense to you and no one else, column headers that don’t specify units of measurement, dates that don’t include the year. It’s difficult and time consuming to go back and fix these things at the time of publication, long after the research has been completed. Instead, take a little time as you complete each element to think about what a researcher (another postdoc in your own lab, for example, or even you yourself a year or two in the future) might need to know in order to use the data and make those changes in the moment. When in doubt, refer back to your data management plan.
Prepare a readme file

Including a file contextualizing your data can make reuse much easier. Your readme file need not be complicated. Try imagining that you are writing a cover email to a collaborator, and include all the details you would want them to have when reviewing your data for the first time. Cornell University Library provides a useful template for organizing data readme files.
Choose a repository

Depositing research data in a purpose-built repository is the best way to meet FAIR standards—especially the standards for findability and reusability. In choosing the right repository for your research, consider the content of your dataset and your audience. Does your project span several disciplines, making it suitable for a general data repository (like Dryad or figshare), or occupy a specific niche with a database all its own, (like GenBank, Crystallography Open Database, or ArrayExpress)? Where are readers most likely to look for a dataset like yours? If you are targeting a specific journal, do they offer any partner integrations you might take advantage of?
Make sure your metadata makes sense

The ‘F’ in FAIR stands for ‘findability’—a quality that depends almost entirely on whether your data has been uploaded to a repository (see point 4), and thoroughly described using metadata. Some metadata may be generated automatically based on factors like your dataset title, field labels, descriptions, and abstract. In addition, most repositories will invite you to enter metadata as part of the upload process. You can prepare in advance by taking some time to consider keywords and search terms that are particularly relevant to your work. And of course, don’t forget to switch your dataset from private to public, and include the correct link in any related publications.

Get started today

Open Data is well established, and becoming increasingly normalized, but it is not quite universally accepted—yet. Now is the perfect time to get ahead of the curve and take advantage of the benefits of Open Data. Your future self will thank you.