By Thomas E. Nichols
Studying the brain with Functional Magnetic Resonance Imaging (fMRI) is a laborious and highly multidisciplinary task. In a typical study psychologists or neuroscientists design the study and recruit the participants, programmers develop the experimental paradigms, MRI Physicists develop the acquisition sequences and ensure good data quality from the magnet, and, finally, statisticians fit models to the complex data and find signals in the noisy data.
All told, an fMRI study requires hundreds of man-hours, costly scanner time, and laborious data analysis to process gigabytes of image data. Yet, what is the quantitative result that is the core of a published paper? A list of x, y, z brain atlas coordinates of activation, a dataset that can be recorded on a Post-it note! While figures may show the pattern of brain activation, if any quantitative result (point estimate +/- standard error) is given, it is only for a selected region of the brain. Is it really acceptable practice that, of the gigabytes of raw and processed data that are generated, only centibytes of data are ultimately shared? The issue of reproducibility is everywhere right now (type “social psychology reproducibility” into a search engine to see particularly spirited discussions), but it brings attention to some particular weaknesses of neuroimaging science, and fMRI in particular.
Let’s face it: In brain imaging, we don’t just have a data sharing problem, we have a results reporting problem.
In virtually every discipline of science it is expected that your measurement of interest will be reported with a point estimate (e.g. a mean), a measure of uncertainty (e.g. standard error of the mean, or estimated population standard deviation) and a P-value and/or confidence interval. In fMRI, “a point estimate” is a picture, a 3D volume image, as is the standard error, and we spend hours studying their ratio, a T-test statistic image. But these images usually never leave the lab, and only the scrap-paper-sized summary, the x,y,z locations, make it into publication. But the images are just the start of it.
Everyone can step up
Everyone involved in brain imaging can help increase reproducibility and transparency of the research they conduct.
- Investigators should plan on data sharing from the outset, ensuring that their ethics paperwork allows for them to share suitably anonymised versions of their data.
- The trainees or staff actually collecting the data can make sure it is organized in a way that will make sharing easy; see e.g. OpenfMRI.org and Data Organization.
- The software developers creating the analysis software should make data export, along with relevant provenance of what was done to the data, as easy and interoperable as possible.
Two important initiatives in this area are Nipype, a glue for multiple analysis packages that includes provenance tracking, and NIDM, an on-going effort to establish standards for communicating neuroimaging analyses and results.
- Data repositories can make it as easy as possible for users to upload their data and provide stable URL’s to reference the data; OpenfMRI and Neurovault are two places to upload individual studies, while LORIS and COINS offer more comprehensive project-level solutions for sharing data.
- Finally, journals can provide guidelines that require minimal data sharing and encourage sharing as much data as possible; in this, PLOS has led the way with their policy on sharing of data, materials, and software.
Neuroimaging data are big, but they almost don’t qualify as “big data”.
Brain image data are highly structured, and the processed data that goes into group level analysis is measured in MB, not GB. Given electrophysiology experiments that generate TB’s in a single session, we really can’t use the size of the data as an excuse. The fMRI community is also fortunate to have a widely accepted NIfTI file format to store the image data.
While these work in our favor, we suffer from a lack of standards to describe all the metadata, that is the things that surround the data… the experimental design, the precise details of the statistical model fit, and how the inference (aka thresholding) was conducted.
From the outset, think: The day my paper appears in print I will get an email asking for my full image data, and analysis details; how will I respond to this?
If we all do our part, and plan on data sharing from the get-go, we will make reproducible science happen.
Thomas E. Nichols is a Professor at the University Warwick, holding a joint position between the Department of Statistics and WMG. He is a statistician with a 20 year focus on neuroimaging, known for his contributions to the modelling of functional and structural MRI data. His current focus is on the meta-analysis of neuroimaging data and tools data sharing. He is also a PLOS ONE Academic Editor.