Author: Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS We’re testing a new experimental open science feature intended to promote data sharing and…
Extending Accessible Data to more articles, repositories, and outputs
Written by Iain Hrynaszkiewicz
In March 2022, with support from the Wellcome Trust, we launched an experimental “Accessible Data” feature designed to increase research data sharing and reuse. Having observed some interesting preliminary results, we’re extending – and extending the scope of – our “Accessible Data” experiment.
What are we trying to achieve in the next phase of the Accessible Data experiment?
We began the Accessible Data experiment with two goals, to:
- Increase reuse of datasets linked to PLOS articles and
- Increase use of data repositories by offering a visual cue/ reward on articles.
In the next phase of the experiment we have an additional goal: to understand if there are meaningful differences in how readers engage with different types of data, and different types of research output.
To achieve these goals we are increasing the number of articles that will qualify for the icon, by diversifying the repository and output types that are included. In this next phase, articles will display the icon if they:
- Were published from 2016 onwards
- Include a link to a research output in a repository in their Data Availability Statement, and
- The link directs to a unique record in Dryad, Figshare, Open Science Framework (OSF), Github, Zenodo, Gene Expression Omnibus, Sequence Read Archive, BioProject, or Demographic and Health Surveys
Extending the icon to articles that link to six additional repositories – Github, Zenodo, Gene Expression Omnibus, Sequence Read Archive, BioProject, or Demographic and Health Surveys – achieves two things. First, it increases the number of articles that qualify for the feature three-fold, to more than 15,000 articles, rewarding more researchers and increasing our potential to promote discovery of research data, and code. Second, we’re adding different types of repositories, which increases our potential for learning. Dryad, Figshare and OSF, from the first phase, are similar, in so far as they are all generalist data repositories. Gene Expression Omnibus, Sequence Read Archive, BioProject are domain-specific resources commonly used in the Life Sciences, and Demographic and Health Surveys is a key resource in social science and medicine. Domain-focused data repositories tend to have more specific requirements for structure of data and/or metadata, and we want to understand if readers engage differently with domain-specific resources compared to content in generalist repositories.
Github is best known for sharing and versioning code and software*, but is often also used for other content, including research data. With rates of code sharing increasing, and code being used or produced in around half of studies reported in PLOS articles, we want to better understand the value of code linked to PLOS articles – and support a popular resource for sharing research outputs. Indeed, all the newly-added repositories are popular among PLOS authors – the nine supported repositories collectively are hosting about three quarters of the outputs that PLOS authors share in repositories. They are also compatible with the creation of simple tooling that enables us to create links automatically and in some cases “on the fly”, from accession numbers rather than URLs. See an example here.
The results of the next phase of the experiment will inform future strategies to support discovery and reuse of research outputs produced by PLOS authors.
What have we learned so far?
Readers are engaging with the Accessible Data icon
In the first 12 months of the experiment (to March 2023) we have recorded more than 20,000 reader clicks on the icon, which displayed on 3,335 articles at initial launch in March 2022, and was automatically added to more than 1,200 more articles published in the 12 months after launch. Through analysis of 543 Figshare datasets linked to PLOS articles, we observed that the average number of views received per month was 2.5 in the 12 months prior to the launch of the feature and 3.0 in the 12 months following the launch (a statistically significant relative increase of 20%).
The icon could influence future data sharing practice
Rates of repository use are rising among published PLOS authors, 2019-present. But it’s not yet possible to measure if the Accessible Data icon has influenced the rate of data sharing in repositories by PLOS authors because the experiment has not been running long enough to measure effects in published papers. However, our research suggests that the availability of the icon may help normalize data sharing, and influence which repository researchers select.
Getting the “right” data links in articles remains a challenge for publishers and authors
The Accessible Data icon rewards sharing data (and code) in a repository via a weblink. Best practice is sharing via a link-able persistent identifier, such as a DOI, but many PLOS articles link to data in other ways, such as via URLs or private links that are intended to be used for peer review only (a common problem for publishers). There is clearly work to do to improve consistency and practice of how data links are shared, but we decided to be inclusive in how we deploy the Accessible Data icon. It displays as long readers can access the data. We decided it was more important to help researchers as authors – who may be unaware of the nuances of DOIs and private links – and also help them as readers, by including imperfect but functional links to data in our articles.
We are preparing a fuller report of the qualitative and quantitative insights we’ve gained in the experiment in the future. Meanwhile, a summary is available as a poster in Figshare, which was presented at the MetaScience conference – along with the supporting data from our surveys.
Footnote
*We are aware that “Accessible Data” may not be the best term for articles sharing code and software but, right now, we think enabling rapid experimentation and increasing access to code are more important than the potential downsides of an imperfect experimental feature name