Skip to content

When you choose to publish with PLOS, your research makes an impact. Make your work accessible to all, without restrictions, and accelerate scientific discovery with options like preprints and published peer review that make your work more Open.

PLOS BLOGS The Official PLOS Blog

Multisensory integration and causal inference in the brain

Amidst the incessant onslaught of signals that reach our senses every second, how does the brain determine which auditory and visual signals originate from a common source and should be integrated, and which ones should be distinguished as they reflect separate objects?  An influential idea is that the brain solves this problem by performing optimal probabilistic inference, known as Bayesian causal inference. In a recent study published in PLOS Biology, Drs. Tim Rohe and Uta Noppeney, from the Max Planck Institute of Tuebingen, Germany, and the University of Birmingham, UK, combined behavior and functional MRI with intricate analysis models to shed light on the neural underpinnings of Bayesian causal inference in the cerebral cortex.

An everyday life example

Picture yourself as you are about to cross a busy street. Both the visual shape and the engine noise coming from your right surely signal the same car to you; but what about that sudden honking horn? If the sound is coming from the left, chances are it is another vehicle, potentially closer and more dangerous to you. If, on the other hand, your ears tell you that it is coming from the right, then it most likely originated from the same car you had already seen. How and where in the brain does such processing of the sensory scene take place?

A model for multisensory processing

Bayesian causal inference is a framework for analyzing multisensory integration that postulates that the attributes of a sensory object (e.g. the location of the honking horn in space) are represented in the brain in a probabilistic fashion, with the probability distribution reflecting the reliability of the given sensory modality. Unisensory stimuli would then be combined into a single percept if those distributions overlap enough. Behavioral experiments have established that this model most accurately reflects the way we handle multisensory stimuli, but the neural basis remains unexplored.

The ventriloquist illusion

In the case of spatial localization, vision is more accurate than audition, so that our perception of the source of a sound in space can be biased towards that of a neighboring visual stimulus occurring at the same time. Rohe and Noppeney used such an experimental paradigm, the “ventriloquist illusion”, in extensive functional MRI sessions (participants were scanned for 18 hours each!). Importantly, they manipulated the reliability of visual inputs in addition to just their spatial localization to probe how well the brain conforms to the predictions of Bayesian causal inference.

Panels A-C illustrate the principle of the experiment used by Rohe and Noppeney; panel D summarizes the main findings. From: Kayser C, Shams L, Multisensory Causal Inference in the Brain, PLOS Biol 2015.
Panels A-C illustrate the principle of the experiment used by Rohe and Noppeney; panel D summarizes the main findings. From: Kayser C, Shams L, Multisensory Causal Inference in the Brain, PLOS Biol 2015.

A hierarchy of cortical multisensory processing

Their results, fascinatingly, point toward an organizational hierarchy in the multisensory processing of spatial information: whereas primary visual and auditory cortices mostly processed their respective inputs separate from any concurrent input in the other modality (forced segregation), cortical areas further up in the processing hierarchy (posterior intraparietal sulcus) systematically integrated sensory inputs, regardless of their spatial provenance (forced fusion). Only at the highest stage of sensory processing, in the anterior intraparietal sulcus, did neural activity reflect the uncertainty about where the sound and flash were coming from. Drs. Rohe and Noppeney’s complex study is put into perspective in a very informative and beautifully illustrated Primer by Drs. Christoph Kayser and Ladan Shams, also published in PLOS Biology.

A few questions to the authors

These findings in turn generate a series of questions: how and when does the brain “learn” about the natural statistics of the outside world and the reliability of its perceptual abilities? Is the situation similar in the time dimension as in space? I asked a few questions to Drs. Rohe and Noppeney about the perspectives opened by their exciting work.

The brain uses a Bayesian framework by incorporating prior knowledge about the world when processing sensory information. According to you, how does the brain acquire that knowledge?

The brain may acquire prior knowledge about the statistical structure of the world at multiple timescales. Some priors may have evolved through evolutionary selection and be innately specified. Other priors may slowly evolve during neurodevelopment, when children are exposed to the statistical structure of sensory signals.

Yet, numerous studies have demonstrated that even low level sensory priors can be modified across experimental sessions suggesting that in many cases the brain constantly adapts prior expectations to the current environmental statistics (Sotiropoulos et al., 2011).

What would be the neuronal underpinnings of that knowledge?

It is largely unknown how the brain implements prior knowledge and expectations. Various mechanisms have been proposed such as spontaneous activity (Berkes et al., 2011), the fraction of neurons encoding a particular feature (Girshick et al., 2011), their response gain and tuning curves or connectivity and top-down projections from higher order areas (Rao and Ballard, 1999).

Potentially, the brain may use different mechanisms depending on the particular prior and the timescale of learning. In the multisensory context, it is still unknown whether the brain encodes modality-specific or supramodal priors. Again this may depend on the particular type of prior (e.g. spatial vs. temporal).

The ventriloquist effect is based on the notion that the visual system provides the brain with a more accurate readout of the space around us than the auditory system. This is in contrast to the time dimension, where audition is generally thought to be more precise than vision. Could there be some sort of “temporal ventriloquy” effect where timing judgments made on visual inputs get biased by conflicting auditory inputs?

In the temporal domain, the sound is more likely to bias the visual percept. This is illustrated in the classical flutter driving flicker phenomenon where participants are biased in their judgment of the flicker rate by a concurrent fluttering sound (e.g. Gebhard and Mowbray, 1959). More recent studies have demonstrated that even a single sound that is presented with a temporal offset to the flash can attract the perceived timing of the flash (Vroomen and de Gelder, 2004).

Would you predict that the same set of brain areas that you investigated here would show similar activations?

For spatial ventriloquism we focused on the dorsal visual processing stream that is known to be involved in spatial processing. A temporal ventriloquist effect may emerge along a temporal processing stream which is thought to culminate in the right temporoparietal junction (Batelli et al., 2007). However, the temporal ventriloquist effect may not be reflected in a functionally specialized and segregated temporal processing system, but rather emerge in multiple regions affecting temporal features of the neural response.

MEG and EEG studies with their greater temporal resolution may thus provide better insights into the neural mechanisms of temporal ventriloquism.

I was wondering if you also looked at reaction times in your study. I would expect that multisensory integration affects reaction times as well as accuracy.

In this particular study, we did not look at response times. However, multisensory integration and Bayesian Causal Inference will indeed also affect response times. Most research to date has either focused on response choices / accuracy or response times. Future research will need to develop models of multisensory integration and segregation that make predictions jointly for both response choices and times (e.g. see: Drugowitsch et al., 2014; Noppeney et al., 2010).

Any views expressed are those of the author, and do not necessarily reflect those of PLOS.

Back to top