Auditory Stream Disruption in Human Computer Interaction

This paper presents results of the first phase of a study evaluating disruptive auditory features during a visual memory task in a HCI context. The disruption of primary tasks by secondary information streams is an important consideration in the design of multimodal interfaces. Although no statistical difference was exhibited in terms of task performance between the various auditory streams, a TLX questionnaire suggests that auditory streams comprising complex rhythmic features are perceived by users as being notably disruptive.


INTRODUCTION
Mobile, wearable and IoT devices now dominate the consumer computer market.This has resulted in interface designers contending with significantly reduced screen real-estate and needing to utilise other sensory modalities to present information to the user, such as auditory cues and haptic feedback.Such interfaces call for careful consideration of the idiosyncrasies inherent in the human perceptual system.Of particular importance is the possibility of introducing crossmodal interference in interfaces that concurrently present information both visually and aurally.Appropriate auditory cue design can avoid excessive disruption during a primary visual task while allowing for peripheral awareness of secondary information.Furthermore, user engagement and interpretation of data streams can be facilitated by sonification.Sonification is the use of non-speech audio to convey data.
As the capabilities of small-screen devices increase with technological advances, sonification may be utilised to relay complex system states and multivariate data (Hussain, Kang, & Lee, 2014;Lemmelä et al., 2008).However, if sonified streams are to effectively relay information to the user without negatively impacting on visual information streams, it is necessary to underpin the complex interplay between sensory streams in human attention mechanisms and working memory.
To this end, the authors present findings from a study investigating the potential interference elicited by two features of a secondary auditory stream on a concurrent visual working-memory task.The features investigated are rhythmic variations in the presentation of sine tones and the addition of reverberant spatial cues.

Design Considerations for the Presentation of Sonified Data.
Sonification is a prospective method for relaying data when using visually restrictive devices such as wearable, mobile and IoT devices.(Lemmelä et al., 2008) The integration of sonification in multimodal interface design requires consideration beyond simply choosing which sonification technique to use.Various techniques have been developed and reviewed in literature (Gaver, 1986;Blattner, Sumikawa, & Greenberg, 1989;Gavin, Jedir, & Neff, 2016).These techniques are generally categorised as auditory icons, earcons, parameter mapping, spearcons and hybrids (Neuhoff, 2011).
Depending on the task involved, consideration needs to be given to complex constituents intrinsic to how auditory information is processed in the human auditory system (Hussein et al., 2009).These include attention mechanisms; how sonic dimensions such as timbre are processed; the constraints associated with cognitive load; and how visual and auditory modalities interact on a perceptual level.These factors are important in scenarios requiring the user to maintain a certain level of task performance.Therefore, the inclusion of sonified data that functions to augment visualised information needs to be examined from these perspectives.

Working Memory Constraints and Attention Mechanisms
Working Memory (WM) is an established multicomponent model of human memory, proposed and refined through continuous research by Alan D. Baddeley and colleagues (Baddeley & Hitch, 1974).The model itself consists of three subsystems (Salamé & Baddeley, 1982;1986;1987;Baddeley, 2015).These are termed the phonological loop; the visuospatial sketchpad; and the episodic buffer.The three subsystems engage congruently to serve WM's 'central executive', which controls all subsystems and defines links to long-term memory.The study described by the authors relates directly to the function of the phonological loop, which determines a user's capacity to encode to-be-remembered (TBR) items in short-term memory (STM).The phonological loop provides a short-term store which is reinforced by a deliberate articulatory rehearsal process to account for a capacity limit.When an individual is presented with a visual-recall task, two processes are immediately engaged, allowing the order of presentation of visual items to be reinforced as they deliberately attempt to memorise them.
Auditory attention can be involuntarily elicited using intelligent auditory stimulus design (Jones, Miles, & Page, 1990;Jones & Macken, 1993).This shift in attention can disrupt the performance of visually demanding cognitive tasks (Alho et al., 1997;Escera et al., 1998;McCloy et al., 2017).The Irrelevant Sound Effect (ISE) is a theory associated with eliciting auditory attention, which warrants consideration in sonification design.The ISE is associated with a user's retention of visually presented information being disrupted by inconsequential background sound present in the environment (Jones & Macken, 1993).Researchers have consistently demonstrated that WM processes are reliably disrupted by irrelevant sound (Jones, Miles, & Page, 1990;Morris & Jones, 1990a;Morris & Jones, 1990b;Jones, Madden, & Miles, 1992;Jones & Macken, 1993;Parmentier & Beaman, 2015;Macken, Mosdell, & Jones, 1999).Furthermore, research y alam and Baddeley spawned interest in links to verbal STM mechanisms with the assumption that confusion occurs between phonologically similar material in both the auditory and visual domains (Salamé & Baddeley, 1982;Baddeley, 2015).
The authors in this paper assert that the identification of specific sonic properties that may be responsible for WM disruption is paramount in sonification design for multimodal interfaces, as it allows for the elimination of stimuli that draw on primary attention mechanisms during data presentation.The principle theory expressing this process is the Changing-State Hypothesis (Jones, Madden, & Miles, 1992).

The Changing-State Hypothesis
The Changing-State Hypothesis (CSH) examines the disruption of WM tasks by auditory stimuli.The CSH is based on the theory that dynamic elements within auditory stimuli are the main cause of WM disruption (Jones, Miles, & Page, 1990;Morris & Jones, 1990a;Morris & Jones, 1990b;Jones, Madden, & Miles, 1992;Jones & Macken, 1993).The theory suggests that this disruption is due to a conflict in memory whereby two similar processes occur, each involving seriation and each sharing a representational space (Jones & Macken, 1993).Parmentier and Beaman (2015) were the first to test the CSH as it relates to rhythmic change.The disruption of a serial recall task was examined using speech content with two types of variance: changes in temporal regularity; and changes in speech content.Findings consistently demonstrated that variations of the content, but not the rhythm of the irrelevant speech, reliably disrupted performance (Parmentier & Beaman, 2015).
In 1992, Jones suggested a new theory, 'interference at the attention level', which implied that the passing of sonic information to STM is reliant on auditory characteristics.Jones' model implies that a perceptual filter operates in response to the simultaneous presentation of visual and auditory stimuli, and that it fails to portray exact processes occurring in memory.Jones suggests in this model that speech content has privileged access to STM, based on the differences in disruption caused by non-speech signals in comparison to speech (Jones, Madden, & Miles, 1992).The effect is observable in a number of studies where disruption is compared between stream formats, with the effect being most prominently exhibited when contrasting speech content to broadband noise (Tremblay, Macken, & Jones, 2001).
This study examines the CSH as it relates to rhythmic and reverberant characteristics of nonspeech auditory streams, as well as fixed versus dynamic reverberation in sine-tone auditory streams.Reverberation can be described as the time-domain response of a space to audio-spectral excitation, and imposes influence on other sonic dimensions, such as rhythm.A potential benefit of using reverberation in sonification applications is its potential to lessen the disruptive effects of seriation between sequential auditory objects while adding a sense of depth or source distance.However, excessive use of reverberation can obstruct the delivery of information (Lee, Shinn-Cunningham, 2008;Shinn-Cunningham 2000a;2000b;2004).Perham et al. (2007), examined the influence of reverberation levels applied to recordings of office background noise, which were presented as a distractive auditory stream during a serial recall task.Their findings indicated that increasing levels of reverberation lessened the disruptive effect of a changing-state auditory stream but did not differ significantly to non-reverb office noise in terms of recall task performance.In a similar approach, the study presented in this paper applies reverberation to a sine-tone auditory stream to eliminate potential compounding influences found in complex sound sources.

USER STUDY DESIGN
The study was designed to allow participants to navigate fully through the presented conditions with no time limits applied.Participants were required to recall a sequence of seven visually-displayed digits (between 1 and 9), presented in random order.With the exception of the first condition, once the serial display was complete, auditory stimuli were presented via headphones during the serial rehearsal phase.No training or trial phase was implemented in order to eliminate potential conditioning through exposure or practice.
Each participant was given three attempts per condition.A list of instructions was also presented to the participants in order to reinforce the process.The participants were first presented with a quiet condition (no auditory stimulus during the serial rehearsal phase), followed by four irrelevant auditory streams presented in random order for the remaining conditions (see table 1).

Methodology
Two potential distractors were examined: rhythmic change in the auditory content, and reverberant timbral change.The effects of rhythmic change were examined using a stream of sine-tone pulses, presented in a sequence with random temporal intervals, creating a random and irregular rhythmic stream.To examine the effects of this irregular pattern, a simple stream of repeated sine-tone pulses in precisely equal temporal intervals was also presented to allow for a direct comparison.All sine-tone pulses were presented at a frequency of 391Hz, with an identical amplitude envelope applied to all items.To examine the potential effects of reverberant timbral change, a fixed, digitally synthesised impulse-response reverb with a 750ms reverberation time was applied to the simple, repetitive rhythmic sine-tone stream.For the final condition, a randomised reverberation time between 150ms and 2000ms was dynamically applied to the aforementioned stream in real-time to investigate the effects of changing-state reverb.
Participants were specifically instructed to ignore the auditory stimuli to the best of their abilities while completing the recall tasks.A typical trial began with participants clicking on a 'start' toggle, and 7 digits visually presented 2 seconds (s) later.Each digit was presented for 1s.An additional 1s gap was presented after the last digit followed by the appearance of a submission box prompting the user to recall the sequence.Following 3 trial iterations, the participants were presented with a software replication of a Task Load Index (TLX) Questionnaire designed by NASA (Hart, 1986).The TLX is used to examine various categories of cognitive load and to gather perceptual feedback from participants.The categories examined were mental demand; physical demand; temporal demand; performance; effort; and frustration.These categories were summated to provide a total rating.In total, 15 trial iterations (3 per auditory condition) and 5 TLX questionnaires were completed by each participant.
52 undergraduate students voluntarily participated in the user study.The age of participants ranged from 18 to 49 years old.82.35% of participants were male and 17.65% were female.The application used to present the stimuli, as well as the follow-on questionnaire, was developed using the Max 7™ visual programming language on a 2012 Apple® Macbook Pro® running macOs® Sierra.The study took place in a controlled lab environment.
Participants were seated at individual workstations in the lab.The auditory stimuli were rendered using Csound, an audio DSL (domain-specific language).Reverb was applied to auditory stimuli using Space Designer® in Apple Logic Pro X®.

RESULTS
The disruptive effects on serial recall of the auditory conditions outlined in table 1 were compared.Silence; sine tone pulses presented in a repetitive rhythmic pattern; sine tone pulses presented arrhythmically; sine-tone pulses (repetitive rhythmic pattern) with a fixed reverb setting applied; and finally, sine-tone pulses (repetitive rhythmic pattern) with randomised changes in the reverb time across items.Serial-recall data was scored depending on the serial order entered by the participant.The user-study application was programmed to output a percentage error calculation, which compared each display list of TBR items (randomly generated) to the user's response that followed.
A mean error rate was derived from the percentage error rate for each condition, incorporating the percentage error from three iterations.Inter-quartile functions were applied to eliminate outliers in terms of serial recall performance.No outliers were identified.Further analysis and calculation of the confidence interval (CI=95%) demonstrated no statistically significant difference in mean (M) error rates across presented conditions (see figure 1).
TLX rating data was then analysed to determine perceived cognitive load demand.Inter-quartile functions were calculated to eliminate outliers (see table 3).The physical demand category was the prominent source of outliers.Further analysis of individual conditions across all TLX categories demonstrated statistical differences between condition 1 and 3.This indicates increased perceived cognitive load demand amongst users when comparing silence to sine tones presented in a rhythmically complex manner.The categories whereby statistical differences occurred were: mental demand, effort, frustration and total rating (see table 4).Additionally, a statistical difference was evident between condition 1 (silence) and condition 4 (simple rhythm, fixed reverb).Frustration ratings for condition 1, M=45.30, CI 95% [38.55,52.05]; in comparison to frustration ratings for condition 4, M=59.9,CI 95% [52.77, 67.03].

DISCUSSION
The user study presented examines the potential disruptive effects of constant variation in rhythmic and reverberant characteristics in auditory streams on the performance of a cognitive task involving both WM and auditory attention mechanisms.Analyses support a null hypothesis over the hypothesis that irregular rhythms would disrupt serial recall to a greater extent than regular rhythms, and that dynamic reverb would disrupt performance to a greater extent than fixed reverb.Combined analysis from all conditions revealed that none of the auditory streams presented were statistically more disruptive to task performance compared to the silent condition.However, TLX data exhibit interesting results in relation to the perceived difficulty of the task across each auditory condition.In particular, differences were observed when comparing condition 1 (silence) with auditory condition 3 (arrhythmic sine tone pulse pattern).This indicates a difference in the perceived disruption caused by these two conditions, with an increased cognitive load under arrhythmic conditions (see table 4).
Additionally, in the frustration category there was a statistical difference between condition 1 and 4 (silence vs. simple rhythm with fixed reverb).No differences were found in task performance relating to both the fixed and changing-state reverberant conditions.Interestingly, the fixed reverb stream was perceived to add to the frustration in completing the task.This opposes the view that reverberant streams reduce the elicited perceived distraction.
Based on the results of the user study, the use of both rhythmic and reverberant dimensions in multimodal interfaces does not negatively impact on actual task performance when using simple auditory streams.However, TLX results indicate that users perceive a higher cognitive load, especially in relation to arrhythmic sonification, which has the potential to impact on the user experience.

CONCLUSION
No significant difference in participant performance of the serial recall task was found.TLX results indicate that participant cognitive load demand was influenced by the irregular rhythmic presentation of sine-tones in the following categories: mental demand, effort, frustration and total rating.Furthermore, the addition of reverb was perceived to add to frustration in comparison to silence.A future study will incorporate harmonically complex tones to evaluate potential changes to error rate results and perceived cognitive load.

Fig 1 :
Fig 1: Mean % Error Rates for conditions presented.

Table 3 :
Number of TLX Result Outliers Removed