PLOS partners with DataSeer to develop Open Science Indicators

September 14, 2022 PLOS In the News Open Science Open Science Indicators Partnerships

Author: Iain Hrynaszkiewicz, Director, Open Research Solutions, PLOS

To provide richer and more transparent information on how PLOS journals support best practice in Open Science, we’re going to begin publishing data on ‘Open Science Indicators’ observed in PLOS articles. These Open Science Indicators will initially include (i) sharing of research data in repositories, (ii) public sharing of code and, (iii) preprint posting, for all PLOS articles from 2019 to present. These indicators – conceptualized by PLOS and developed with DataSeer, using an artificial intelligence-driven approach – are increasingly important to PLOS achieving its mission. We plan to share the results openly to support Open Science initiatives by the wider community.

Why now?

PLOS has been a proponent of Open Science practices since its inception, with landmark initiatives such as the introduction of an enhanced data availability policy in 2014. Our data availability policy and consistent use of Data Availability Statements in publications help ensure that data supporting publications are shared in some form. More recently, we’ve been focused on how we can increase adoption of best practices in data sharing – use of data repositories. We’ve used information from published articles to understand which methods of data sharing researchers use when publishing with PLOS, and to find potential solutions to support improvements in practice. The focus on improvement is important – we are not aiming to audit publications for compliance reasons but to understand how we can facilitate best practice and support research.

Since 2020, we have established a renewed programme of activity – an Open Research Solutions programme – with the goal of measurably increasing adoption of multiple Open Science practices. The introduction of new solutions for sharing protocols, code, research data, and preprints is intended to support this goal.

As a prerequisite to achieving this goal, we need to better understand the needs and practices of the researchers we serve. This, more specifically, means we need to regularly investigate:

What are levels and trends in adoption of Open Science practices?
What is the potential for adoption of Open Science practices?
What is the impact of solutions we co-create with communities?
How do Open Science practices vary between disciplines and regions (and why)?
What are the barriers to adoption and what solutions could support adoption?

And we need to answer these questions in increasingly diverse communities; for increasingly diverse research outputs; and across increasing amounts of content. Measurement of Open Science practices at scale cannot be done manually. To this end, earlier this year, we shared the results of a successful pilot with DataSeer to measure rates of code sharing at PLOS Computational Biology. As well as demonstrating the effectiveness of the journal’s mandatory code-sharing policy at increasing availability of code, this pilot also demonstrated the effectiveness of a DataSeer’s natural language processing and artificial intelligence-driven technology at assessing Open Science practices efficiently in PLOS content.

Plan to deliver Open Science Indicators with DataSeer

We’ve extended our partnership with DataSeer, who are helping to create and deliver an initial suite of three Open Science Indicators, which will help understand how open science is practiced across all PLOS content from 2019 onwards (>66,000 articles). Starting later in 2022, we’ll share for all PLOS research articles information on 3 indicators of best practice in Open Science, which are largely optional in PLOS journals:

Rates of data sharing in data repositories
Rates of code sharing
Rates of preprint posting, in any preprint server before publication

A mock-up of these Open Science Indicators is shown in Figure 1 (these are preliminary results subject to change).

Having collaborated with DataSeer to adapt their technology to measure code sharing, they are now developing the capability to detect PLOS articles previously posted as preprints. While PLOS has a solid understanding of the usage of integrated preprint servers – bioRxiv and medRxiv – we have a less complete picture of usage of other preprint servers. As our portfolio expands to new research communities, in particular outside of biology and medicine, we need to support more researchers in other fields.

Figure 1: Mock-up showing rates of code sharing and rates of data sharing in repositories detected in PLOS articles 2019-2022; and a work-progress estimate of preprint posting

And we’re not stopping at PLOS content. We also need to understand how these indicators compare to relevant content outside of PLOS. As such, our approach will involve assessing the same Open Science Indicators in a topic-matched, randomized sample of Open Access articles from PubMed Central. The greater understanding we have of the adoption of Open Science across disciplines and communities over time, the more knowledge we have to inform the co-creation of solutions that support researcher needs – and the practice of Open Science.

Plan for further engagement

We will be the first publisher to transparently share Open Science Indicators in this way, and to do so repeatedly – the Indicators will be updated regularly after launch to offer trends over time. Open Science practices have of course been assessed by researchers before. Examples include the seminal work of Serghiou et al who assessed transparency indicators across all of PubMed Central, offering a comprehensive snapshot of multiple open practices up until 2020. We are also inspired by the work at BIH Quest on the Charité Metrics dashboard. And we applaud the work of the growing community of tool developers, policy makers and researchers involved in better understanding Open Science practices. Indeed, community alignment on if and how we can measure Open Science practices will be essential to long term progress.

We’ve outlined, above, PLOS’ motivations for measuring Open Science Indicators but we know from our own research that many institutions and funders also want to better understand the extent to which Open Science is being practiced. These stakeholders lack effective solutions and/ or resources needed to do this effectively. While we can’t claim to solve these problems with Open Science Indicators on PLOS articles alone, we will make our data accessible openly to inform further work in this area.

There are of course caveats with the measurement of anything (“When a measure becomes a target, it ceases to be a good measure” – Goodhart’s Law). Importantly, what we are not doing is creating new “metrics” or “rankings” – of journals, individuals, funders or institutions. Open Science Indicators are just that – indications of Open Science practices. But, when combined with qualitative insights and other responsibly-used metrics, Open Science Indicators could help provide a more complete understanding of the impact and credibility of research. This is highlighted by numerous initiatives – such as at the European Commission, UNESCO, HELIOS, FOLEC, and DORA – that have goals to better recognise Open Science practices in how research is assessed. To date, for understanding Open Science behaviors, we tend to only hear facts that are about specific journals, or critiques of a lack of an activity. We see an opportunity for these Indicators is to neutrally measure such activities wherever they are happening and reward the activity.

What’s next?

We’re currently preparing the first major release of the data for later in 2022. The data will, naturally, be available for reuse under an open license and we’ll be sharing insights from the data with editors who lead our journals in advance of public release.

After the first data release, the indicators for research data, code and preprints will be updated regularly as new PLOS content is published. We will also be developing, in partnership with DataSeer, additional Open Science Indicators. Sharing detailed methods information, in the form of protocols, is another Open Science practice we have been focused on of late, and creating an appropriate indicator for this practice is next in our roadmap.

How we develop our approach to presenting and sharing the data for external users will be driven by what we learn from community feedback in the build up to and after the first release. As we’ve noted, the insights from Open Science Indicators are already highly valuable to PLOS, and we’ll be sharing the data publicly to present researchers, policy makers, institutions and other publishers with reliable information about Open Science practices. We hope this will empower them to make more informed decisions about their own policies and practices – and stimulate collaborations. With better quantitative, observationalinformation about Open Science practices, coupled with the growing body of attitudinal data from surveys, we see potential for a more evidence-based understanding of the opportunities and challenges of making Open Science practices the norm.