Written by Lauren Cadwallader, Lindsay Morton, and Iain Hrynaszkiewicz This week, PLOS shares the latest update to our Open Science Indicators (OSIs)…
Different research communities have different priorities, needs, norms, and challenges in communicating their research—it’s no surprise, therefore, that adoption of Open Science practices also differs across topics and disciplines. Open Science Indicators (OSIs) illuminate patterns in Open Science adoption across disciplines in a way that hasn’t been possible before. We’re excited to share preliminary observations on disciplinary differences in the adoption of Open Science practices based on the most recent OSI dataset.
About this analysis
Developed in partnership with DataSeer, Open Science Indicators (OSIs) use Natural Language Processing and Artificial Intelligence to identify and quantify Open Science practices. The most recent dataset encompasses all 74,130 PLOS research articles published between January 1 2019 and March 31 2023 and a comparator set of 8,186 Open Access articles drawn from PMC. The Open Science practices measured are data sharing, code sharing, and preprint posting.
The OSI dataset includes the DOI of every article analyzed, making it easy to match against the range of taxonomies in use at indexing and archiving services that capture disciplinary data. For this analysis, we’ve chosen the Australian and New Zealand Standard Research Classification (ANZSRC) taxonomy, as applied to Dimensions data, because it is an Open classification system and so can be extracted and recreated easily.
A single paper may fit within more than one topic definition, and so may be counted more than once in the analysis that follows.
Data repository use
Data shared in any format has value, but for maximum discoverability, access, and utility, data repositories are considered the gold standard. We focus our analysis on repository use, but rates of data sharing broadly may be calculated from the OSI dataset as well.
Data repository use varies widely, both across disciplines, and between PLOS and comparator articles within the same discipline. For PLOS articles, the disciplines with the lowest rates of data repository use are Health Sciences (19% for PLOS, 6% for comparators) and Biomedical & Clinical Sciences (19% for PLOS, 10% for comparators)—both fields likely to be affected by privacy considerations. Among the other disciplines with below average rates of repository usage are Engineering, Agriculture, and Earth Sciences, where proprietary data is common. The comparator dataset shows similar but not identical results, with Engineering (5%) and Health Sciences (6%) having the lowest adoption rates.
On the opposite end of the spectrum are Information & Computing Sciences, Psychology, and Biological Sciences. The high rate of adoption in the Biological Sciences, and the similarity between the PLOS and comparator rates in the field, suggests a combination of disciplinary norms and mandated data deposition requirements.
Figure 1. Data shared in a repository for topics with more than 100 articles in the comparator sample. Note that each article can have more than one discipline assigned to it. Dashed lines show the overall average repository use for all topics in each cohort (PLOS = 26%; comparators = 13%).
When we examine the types of repositories used most often by discipline, Biological Sciences, Agriculture, and Biomedical & Clinical Sciences are most likely to use a specialist repository with a more narrow discipline-specific scope—again probably influenced by mandated deposition for sequencing, genetic, crystallography, and macromolecular data.
Figure 2. Server usage by discipline for PLOS and comparator data combined, where comparators samples are >100 articles [i.e. matches topic areas in graph above]. This includes data from the 15 most common repositories referenced in the PLOS dataset, which account for 94% of repositories used at PLOS and 84% in comparator articles. Note that each article may have more than one repository associated with it and therefore some topics tally to more than 100%.
Augmenting a research article with publicly available code enhances understanding, facilitates reproducibility and reanalysis, promotes trust, and saves other researchers time and effort.
Interestingly, code generation does not appear to have a strong relationship with code sharing. In topic areas with high rates of code generation, code-sharing rates were widely distributed. For example, Mathematics, Information and Computing Sciences, and the Physical Sciences exhibit higher rates of both code generation and sharing, while the Biological Sciences, Environmental Sciences, and Chemical Sciences had equally high rates of code generation, but much lower rates of code sharing.
Instead, code-sharing behavior appears highly correlated with data shared in a repository, suggesting that established data-sharing norms (not the prevalence/relevance of code) are the deciding factor.
Figure 3. Rates of data shared in a repository plotted alongside the rates of code shared and generated for each topic. Data from the PLOS and comparator datasets have been combined for each topic.
Preprints empower researchers to take more control over their scientific communications, sharing early in order to establish priority, broadcast results, seek community feedback, increase readership, and bolster grant, job or tenure applications.
We explored preprint posting patterns over time and by region as part of our 2022 year-end analysis. Parsing the data by discipline shows that the Biological Sciences display the highest rates of preprint posting for both the PLOS and comparator cohorts. This makes sense, given that biology is one of the largest topic areas published at PLOS, and biology preprint server bioRxiv was our first official partner server.
PLOS has also established preprint server partnerships in the Biomedical and Clinical Sciences with medRxiv, and more recently the Earth and Environmental Sciences, with EarthArXiv—both topic areas where PLOS publishes relatively high volumes of papers. It’s interesting to note that in the Biomedical and Clinical Sciences, preprint rates across PLOS and comparator articles are very similar. This parity may indicate, for example, that the convenience of facilitated posting less strongly influences the decision to preprint in this discipline. In Earth and Environmental Sciences, rates are generally lower than portfolio average, and vary by subdiscipline relative to comparators. Other future avenues of exploration might include investigation into the most popular servers by discipline, as well as when preprints are posted relative to journal submission date.
Figure 4. Rates of preprint posting by discipline for PLOS and comparator articles with topic areas with greater than 100 relevant research articles.
Where will you take discipline-level Open Science data in the future?
This is just an early example of the types of explorations that OSIs make possible. Let us know what you’d like to learn more about concerning discipline-level Open Science behaviors, and share your own investigations with us in the comments, or by writing to email@example.com.