Impact of action-object congruency on the integration of auditory and visual stimuli in extended reality

Thumbnail Image
May, Keenan Russell
Walker, Bruce N.
Associated Organization(s)
Organizational Unit
Organizational Unit
Supplementary to
Extended Reality (XR) systems are currently of interest to both academic and commercial communities. XR systems may involve interacting with many objects in three-dimensional space. The usability of such systems could be improved by playing sounds that are perceptually integrated with visual representations of objects. In the multisensory integration process, humans take into account various types of crossmodal congruency to determine whether auditory and visual stimuli should be bound into unified percepts. In XR environments, spatial and temporal congruency may be unreliable. As such, the present research expands on associative congruency, which refers to content congruency effects that are acquired via perceptual learning in response to exposure to co-occurrent stimuli or features. A new type of associative congruency is proposed called action-object congruency. Research in ecological sound perception has identified a number of features of objects and actions that humans can discern based on the sounds produced by sound-producing events. Since humans can infer such information through sound, this information should also inform the integration of auditory and visual stimuli. When perceiving a realistic depiction of a sound-producing event such as a strike, scrape or rub, integration should be more likely to occur if a concurrently-presented sound is congruent with the objects and action that are seen. These effects should occur even if the visual objects and the sound are novel and unrecognizable, as long as relevant features can be ascertained visually and via sound. To evaluate this, the temporal and spatial ventriloquism illusions were utilized to assess the impact of action congruency and object congruency on multisensory integration. Visual depictions of interacting objects were displayed in virtual reality, and congruent or incongruent sounds were played over speakers. In two types of trials, participants either localized the sounds via pointing, or judged whether the sounds and visual events were simultaneous. Action-object congruent visual and auditory pairings led to greater localization biasing and higher rates of perceived simultaneity, reflecting stronger integration of stimuli. Action and object congruency were both impactful, but action congruency had a larger effect. The effects of action and object congruency were additive, providing support for the linear summation model of congruency type combination. These results suggest that action-object congruency can be used to better understand how humans conduct multisensory integration as well as to improve MSI in future XR environments.
Date Issued
Resource Type
Resource Subtype
Rights Statement
Rights URI