Sound as Computer Feedback
A Sound Feedback Style Scenario
Interactivity and System Responsiveness
HUMAN INTERFACE DESIGN
A Transparent Interface
Communicating through Metaphor
Intrinsically Motivating Instruction
This column describes our experiences in the Michigan State University Comm Tech Lab with the use of sound in hypermedia software. Theoretical and artistic reasoning and empirical and anecdotal evidence will be offered in support of the premise that it is time to fully integrate sound into computing environments; that someday we will look back on the anachronistically silent word processors and spreadsheets of today in much the same way that we appreciate a silent movie for its historical value. Commercial software is already beginning to incorporate this style of feedback. Kidpix, a paint program for children by Broderbund, is an excellent example. Each tool makes a different sound when used. But kids shouldn't have all the fun. Virtus Walkthrough uses soft clicks when tools are selected from the toolbox. This is just the beginning.
One of many interface decisions in hypermedia design is how the application program responds and reacts to user input. User inputs today most commonly take the form of keyboard keystrokes, mouse or trackball clicks and drags, or touches on a touch screen. Responses to those inputs by a hypermedia application may take several forms:
With touch screens where a certain firmness of touch is necessary to send a command, it is important for the user to have feedback about when they have pressed hard enough. Mainframes and videotex systems frequently use feedback to let the user know that a command -- which may take some time to execute-- has been received.
Applications also use auditory cues (usually beeps) to indicate that something wrong or not executable has been requested. Initially, "beeps" were the primary form of audio feedback (both positive and negative), mainly because computers had limited sound capabilities. But recently the sound capabilities of microcomputers has expanded dramatically, to the point where digitized sound of all forms can be stored on computer and played back at will. Speech synthesis is also possible. The new capabilities far exceed our experience and understanding of what are desirable and effective uses of sound in hypermedia.
The human environment is full of meaningful auditory information: doors closing, keys turning, pen caps opening, feet hitting the ground, cars accelerating, typewriters clacking, phones dialing and ringing, water running, etc. These sounds provide feedback and information about human actions and the external world. Turning off all sound creates a world that feels more distant and requires greater visual processing to confirm, interpret and notice events which normally wouldnÍt require much thought or attention.
Common computer applications have focussed almost exclusively on visual channels. In designing prototypical hypermedia applications to explore the potential of the medium, the Comm Tech Lab has developed and studied a style of incorporating sound into computer environments based on five general guidelines:
Sounds used may be short clips of music, recognizable real-world
sound effects, interesting noises, or brief speech. In general they
should be short and unobtrusive, but distinctive.
Theories from the fields of communication, psychology, education and human interface design can all be applied to justify and explain expected benefits of the proposed sound feedback style of hypermedia design.
Daft and Lengle (1984) introduced the concept of "information richness" which locates communication media on a continuum. The least rich include numeric data communication and ASCII text files, while the richest mode is interpersonal, face to face communication. Applying Daft and Lengle's construct, adding sound to a previously visual only medium increases its information richness, bringing it closer to their "ideal" mode of face to face interaction.
Channel redundancy, or offering the same information over more than one communication channel (sight, sound, touch, smell, taste) increases the likelihood that the information will be received and understood. Thus, feedback offered both visually and in audio strengthens the message. Education researchers have found that individuals have different orientations toward text, sound and images (Kampwirth and Bates, 1980; Dunn, 1988). Students with extreme learning style characteristics (strong preferences for visual versus tactile versus auditory modes of learning) do achieve higher test scores when instructional resources compliment how they like to learn. From this standpoint, the use of multiple channels reaches a wider audience more effectively.
Rafaeli (1985) defines interactivity as a continuous variable measuring how "actively responsive" a medium is to users. Both he and Rice (1984) consider responsiveness from a user's perspective. Rafaeli considers human-like communication the ultimate form of interactivity, when the role of human and computer are interchangeable and "any third exchange in communication transmissions is predicated on the bearings of the first two exchanges." Although sound feedback does not approach this standard, we presume that it affects the perceived responsiveness of the system for the user. A hypermedia application with sound feedback will probably be rated more responsive than a silent hypermedia application.
Direct manipulation is a compound construct which describes a particular style of interface design conceived by theoreticians and designers as the type of interface that is easiest to use. Three components of direct manipulation are most relevant to sound: visibility, transparency and metaphor.
Schneiderman (1987) identifies visibility of objects and actions of interest as a key element for direct manipulation. When objects and actions are visible, they need not be asked about or remembered. Applying the most straight forward interpretation, one applies this criterion to an interface by asking: what are the objects; what are the actions; and can the user see them. The organization and presentation of visual information affect visibility. The concept of visibility can be extended appropriately to include audibility as a substitute for visibility.
Rafaeli (1990) asserts that "DM is system transparency, a reduced imposition of system-centered structure." He describes DM as "allowing the semantics to upstage the syntax" of the task. Norman (1988) concurs. "The point cannot be overstressed: make the computer system invisible." An invisible interface is one in which the computer does not hinder the process being undertaken. This is an interesting directive for designers. It describes the absence of an attribute rather than the presence of one. Basically, in DM, the computer is supposed to stop getting in the way. By providing auditory cues, the user's eyes are freed to concentrate on process or content, making use of the system easier and more transparent.
Metaphor is a technique for capitalizing on familiarity. "Metaphors, if they are appropriate, are very quickly understood" (Rosendahl-Kreitman, 1990). The design of a screen can suggest "this is like a newspaper" or "this is like flying a space ship." Metaphors in hypermedia are approximations. In each case, the question is whether they serve to enhance or impair the users' affective and cognitive appreciation of the interface. Metaphor may be accomplished through graphics, text or sound, in combination or alone. For example, a "start over" button yields the sound of water washing away previous choices. A "forward" button which looks like a lever on a space ship pulls forward and makes a lever-like sound when pressed.
"Educational activities can evoke sensory curiosity by including audio and visual effects, such as music and graphics. They can evoke cognitive curiosity by leading learners into situations in which they are surprised" (Malone and Levin, 1984).
Thomas Malone's (1981) theory of intrinsically motivating instruction. In Malone's theory, he distinguishes between two kinds of curiosity -- sensory and cognitive -- depending on the level of mental processing involved. It is sensory curiosity that will be studied here, Sensory curiosity involves the attention-attracting value of changes in the light, sound or other sensory stimuli of the environment. There is no reason why educational environments have to be impoverished sensory environments. Colorfully illustrated textbooks, television programs like Sesame Street, and tactile teaching devices like those used in Montessori schools demonstrate this point.
The presence and the type of immediate audio feedback in hypermedia may have a major impact on the amount of time spent and information retained by users. Some aspects of this proposition have been studied. According to Berlyne (1968), motivational attributes include notions of curiosity or exploration of novelty. However, these factors are derived primarily from print learning and do not include aspects of the microcomputer, such as animation, visual effects, and sound. Malone (1984) formulated broader explanations of motivational qualities based on research with arcade games. Malone incorporates these game aspects into a framework for a theory of motivation.
Maddi (1968), Pearson (1970), Zuckerman (1979) and Kozielecki (1981) have each studied "novelty-seeking" as a personality trait, positing and documenting that individuals have characteristic optimal levels of sensory stimulation. When actual stimulation falls below that point, the person starts searching for additional stimuli. When actual stimulation exceeds optimal levels, the person seeks to reduce it. Like Malone these researchers found that different optimal levels can exist for cognitive versus sensory stimulation within an individual. Sound can be used to provide sensory stimulation without detracting from the cognitive visual or verbal content, keeping the user focussed on the hypermedia application and alleviating their need to turn away from the software to seek stimulation.
We designed a prototypical application which offered users 45 megabytes (1000+ screens) of sounds and images about Mars, U.S. and Soviet planetary exploration and past and future missions too Mars. About 5 "buttons" (Mission to Mars main menu, list of topics, forward, reverse, and backtrack) appeared regularly throughout the 1000 screens, and each had a unique but consistent sound when pressed. Hundreds of other buttons appeared only in one or a few places. The sounds that accompanied these buttons were unique to the button function and related in some way to other buttons in that sector (e.g., all were short clips of classical music, or sound effects, variations on a theme, simulated Martians talking, etc.). The intent was to use variable sound feedback as a reward and motivator to encourage exploration and exposure to as much of the available content as we could entice the user to explore. A repeated measure experimental design was planned to compare the "click motivation" impacts of silent feedback, beeping feedback (as has been customary for many applications) and variable sound feedback.
The general hypothesis is that: Variable sound buttons enhances click motivation; Silent buttons have no effect on click motivation; Beeping buttons inhibits click motivation. The last assumption is based in the supposition that traditional beeping feedback is annoying, repetitive and monotone.
Three versions of six segments from Mission to Mars! were created: one which was silent when buttons were clicked on; one that beeped when buttons were clicked, and one that made all sorts of different sounds, both musical and sound effects, when buttons where clicked. Eighteen subjects used all 6 segments. Thus, there were 108 observations for analysis. For each subject, 2 segments were silent, 2 beeped and 2 contained all different sounds.
Conditions were balanced across segments so that each segment was used in each audio feedback condition by 6 subjects. There was stronger evidence user preference for variable sound feedback than there was significant findings related to increased motivation or learning.
Overall, the experimental style of ALL DIFFERENT sound feedback was strongly preferred by the experimental subjects. All 18 students chose the ALL DIFFERENT condition over BEEP condition. Eighty-nine percent chose the ALL DIFFERENT condition over the SILENT condition. Less than half (44%) preferred beeping to silence, even though they felt varied sounds would have been better. A number of students, once they had tried an ALL DIFFERENT segment and then came upon a silent segment, turned to the study administrator to ask whether the computer was broken. Learning and motivation differences did not achieve statistical significance, although they were consistently favorable in the direction of all different sound conditions.
After completing this study, the Comm Tech Lab designed a hypermedia database application for a Fortune 500 corporation. We applied the same sound criteria identified in the introduction, for this corporate setting. For the corporate database, the goal was not to motivate exploration and learning-- it was to provide a rich computing environment which offered orienting sounds at each mouseClick to aid in information entry and search. Corporate officials were initially skeptical of the "cute" sounds, asking how to turn them off. Now that they are regularly using the database, we get nothing but glowing reports about how important the sounds are, how much visitors love the sounds and even a request for a "more interesting sound for QUIT." Once a user has used a piece of software with variable sound feedback, the application feels broken with the sound off. Mission to Mars experimental subjects, after using a variable sound condition segment and encountering a silent segment, not infrequently turned to the experimenter to ask "is this one broken, or what?"
Even IBM at one point introduced a silent typewriter. Users complained, and they added the sound back. Think about a keyboard that had slightly different sounds for each key pressed. A typist would intuitively learn the sounds, and detect misspellings and typos without needing to look.
Virtual reality is a new stage for human-computer interaction. In experimental laboratories, virtual reality typically involves human interfaces such as goggles ("eyephones"), datagloves, data suits, etc. with expensive computers generating a 3-D sound and image sensory stimuli environment which responds to head, hand or body movement. In heralding future computing environments, these scenarios help emphasize that even today hypermedia designers can create interactive sensory environments, as opposed to mere "applications."
A common objection to our proposed sound scenarios is that it would make classrooms and offices too noisy. It is not that offices are quiet as much as that we have come to accept as normal ringing phones, people talking on the phone, typewriting, photocopying, pencil sharpening, printing, etc. The sounds can be soft. Perhaps the biggest problem will be privacy. Those within earshot would know what commands are being issued. In debugging the corporate database by telephone, we could tell where they were within the program just by listening to the sound feedback over the phone. If letters on the keyboard had different sounds, some people would know what words were being typed.
Both from an aesthetic and linguistic perspective, our vocabulary of nonverbal sound is limited to musical expression (generally in long compositions) and short human-generated noises. In the lab, given our commitment to this sound style, we have numerous discussions of "what does X sound like?" (What should the College of Engineering button sound like? How about the Honors College? NASA? The library? Continue? Help?)
Should users be able to customize their sounds, or are there certain types of sounds that are most appropriate, efficient and meaningful for different functions? Nonverbal sound is one of many new languages it is time to invent.
Berlyne, D.E. (1968). Curiosity and exploration, Science, volume
153, pp. 25-33.
Daft, Richard and Lengle, Robert (1984). "Information richness: A new approach to managerial behavior and organizational design," Research in Organizational Behavior, volume 6, pp. 191-233.
Kozielecki (1981). Psychological Decision Theory, Warsaw, Poland: PWN-- Polish Scientific Publishers.
Maddi, S. (1968). "The pursuit of consistency and variety," in R. Abelson (Ed.) Theories of Cognitive Consistency, Rand McNally.
Malone, Thomas (1980). "What makes things fun to learn? A study of intrinsically motivating computer games." Technical Report # CIS7, Palo Alto, Xerox Palo Alto Research Center.
Malone, Thomas (1984). "Toward a theory of intrinsically motivating instruction," in Instructional Software -- Principles and Perspective for Design and Use, Walker and Hess, Eds., Wadsworth.
Malone, Thomas and Levin, James (1984). "Microcomputers in education: Cognitive and social design principles," in Instructional Software -- Principles and Perspective for Design and Use, Walker and Hess, Eds., Wadsworth.
Norman, Donald (1988). The Psychology of Everyday Things. Basic Books, Inc., Publishers: New York.
Pearson, P. (1970). Relationships between global and specified measures of novelty-seeking, in Journal of Consulting and Clinical Psychology, 34, 199-204.
Rafaeli, Sheizaf (1985). "If the computer is the medium, what is the message: Explicating interactivity" presented at the International Communication Association annual convention, Honolulu, May.
Rafaeli, Sheizaf (1990). "Semantics over syntax: Construct-validating direct manipulation in human-computer interaction," presented at the International Communication Association annual convention, Dublin, Ireland.
Rice, Ronald and Williams, Fred (1984). Theories old and new: The study of new media. In The New Media, (pp. 55-80), Sage: Beverly Hills.
Rosendahl-Kreitman, Kristee (1990). User Interface Design, Multimedia Computing Corp.: Santa Clara, CA.
Schneiderman, Ben (1987). Designing the User Interface: Strategies for Effective Human Computer Interaction. Addison-Wesley Publishing Company: Reading, MA.
Zuckerman, M. (1979). Sensation-seeking: Beyond the optimal level of arousal, Erlbaum: Hillsdale, NJ.