Sound Research
Background
In developing the Screen-to-Soundscape project, our aim was to create an auditory experience that was not only functional but also engaging and intuitive for blind and visually impaired users. Central to this endeavor was the design of sound cues that would guide users through the digital landscape, enhancing their understanding and interaction with information presented on a webpage. These sounds were crafted to coexist harmoniously within the soundscape, each with a distinct personality and role in conveying the introduction, transition, or passing of information. Through this report, we outline the process, theory, and insights gained from our sound research and how it informed the overall design and functionality of the Screen-to-Soundscape prototype.
Developing Sound Cues for Information Flow
The primary goal of our sound design was to create cues that were kind to one another when played simultaneously yet retained distinct identities to convey various types of information effectively. These sounds needed to function in both the foreground and background, providing informative cues about different groupings of content while supporting the broader auditory experience. We recognized that the relationship between sounds was crucial; they had to be both distinguishable and cohesive to maintain a fluid and intuitive soundscape.
To achieve this, the length and timing of the sounds were carefully considered. Certain sounds were designed to bleed into vocalized text, while others were introduced just before text was vocalized, providing a preparatory signal that new information was forthcoming. This layering approach allowed us to create a dynamic and responsive auditory environment where sounds provided context, transitions, and emphasis on critical information.
Integrating Theories of Space and Scale
A brainstorming board filled with various sticky notes and hand-drawn sketches is displayed. The board features a mix of pink, yellow, and green sticky notes, each containing handwritten words and diagrams. On the left side, pink sticky notes include words like "Vibe," "Interconnectedness," "Material," and "Interaction." In the center, yellow sticky notes are arranged in rows with terms such as "Distance," "Width/Breadth," "Rolloff," "Orientation," "Envelope," "Pitch," "Tempo," "Harmonics," "Voice Character," "Volume," and "Looping." Hand-drawn lines and diagrams connect some of the notes, indicating relationships between concepts. At the top, there are sketches of nodes and connections, labeled with terms like "Immediate," "Vista," "Environmental," and "Geographical," which appear to align with concepts related to spatial sound design. The overall board captures an active brainstorming session with ideas about sound parameters, user interaction, and project goals.
While developing these sound cues, our team drew inspiration from the text "Scale and Multiple Psychologies of Space" by Daniel R. Montello, which Alyssa had shared with the group. Montello's work provided valuable insights into how humans experience scale and space, offering a theoretical framework that helped guide the design of our soundscapes. The text breaks down space into four distinct categories: immediate (figural) space, vista space, environmental space, and geographical space, each of which is nested within the other.
We used these spatial concepts to inform how we translated two-dimensional websites into four-dimensional soundscapes. The definitions of these spatial categories helped us understand how to create sound cues that mirrored the ways individuals navigate physical spaces:
Immediate (Figural) Space: Represents small spaces perceived without movement, like objects or pictures. In our sound design, this translated into quick, concise sound cues that highlighted immediate, static information on a webpage.
Vista Space: Represents larger spaces seen from a single vantage point, such as rooms or valleys. For this, we used sound cues that were slightly more prolonged and had more depth, allowing users to grasp a sense of layout and the arrangement of information from one 'position.'
Environmental Space: Involves spaces that require movement to be fully perceived, such as buildings or cities. To convey this, we created more expansive soundscapes that evolved as users navigated through different sections of a webpage, providing auditory feedback that indicated movement and changes in context.
Geographical Space: Refers to vast spaces understood through maps, like countries or planets. These sounds were the most complex and layered, guiding users through extensive sections of information and allowing them to grasp the overarching structure of the webpage.
By implementing these spatial concepts into our sound design, we ensured that the auditory experience was intuitive and reflective of how users naturally understand and explore physical spaces.
Creating a Harmonious and Informative Soundscape
One of the most significant challenges was ensuring that the sounds were both informative and aesthetically pleasing when played together. We approached this by carefully considering the tone, pitch, and rhythm of each sound cue, ensuring that they did not clash or compete for attention but instead worked together to create a cohesive auditory environment.
The sounds were designed to serve as navigational aids, helping users identify the introduction of new information, changes in content, or transitions between different sections of a webpage. For example, a short, high-pitched sound might indicate the appearance of a new paragraph, while a longer, more mellow tone could signal the end of a section. This layered approach allowed the sounds to provide users with both micro and macro-level information about the digital content, enhancing their overall understanding and navigation experience.
Adapting to User Feedback and Real-World Scenarios
Throughout the development process, we engaged in co-creation sessions with blind and visually impaired users to gain valuable feedback on how these sound cues were experienced in practice. Their insights guided adjustments to the soundscape, such as refining the timing and layering of sounds to prevent auditory overload and ensuring that essential cues were clear and easily distinguishable.
For instance, users noted the importance of having a clear sense of boundaries within the soundscape, especially when navigating complex websites. In response, we introduced boundary sounds that indicated the edges of the webpage or a specific section, helping users maintain their orientation within the digital space. This iterative process ensured that our sound design remained user-centered and responsive to real-world needs.
Conclusion
By integrating carefully designed sound cues and drawing inspiration from Daniel R. Montello’s concept of psychological spaces, we were able to transform two-dimensional web content into a rich, four-dimensional auditory experience. This approach allowed us to map digital information into layers of sound that correspond to different spatial scales—immediate, vista, environmental, and geographical. By doing so, we have laid the groundwork for a more accessible and immersive navigation experience, where users can perceive and explore digital content through an auditory interface that reflects the complexity and richness of real-world spaces. As we continue to refine and develop the Screen-to-Soundscape project, this foundational sound research will be crucial in guiding the creation of a more intuitive, responsive, and inclusive digital experience for blind and visually impaired users.
Phase 2 Sounds
In this phase we built a focused sound set that makes a place audible without overwhelming you. The backbone is a family of ambient beds, traffic hush, canal water, park birds/wind, rail hum, market murmur, mixed with distance attenuation and gentle occlusion so narrow streets feel close and open squares feel airy. On top of that sit symbolic auditory icons for common points of interest: a doorbell for small shops, a revolving door whoosh for malls/offices, scissors for a hairdresser, a freewheel click for bike shops, plus public-transport motifs (bus stop ping, tram ring, station PA blur). We softened emergency tones and keep a quiet ambient floor at all times, because “silence feels like being lost.” Footsteps are surface-aware (asphalt, cobblestone, paving stones, gravel, grass, wood, polished floor), and we’ve started adding spatial “life” to them with light echo/reverb so walls and passages read in your ears.
For interaction, we composed a small, consistent set of earcons: a spawn/landing chime, a short command-mode “open” blip (on Enter), loading/ready cues, teleport/zoom whooshes, left/right orientation pips, and clear wall feedback for bumps vs. slides (“thud” vs. gentle scrape). When the AI speaks, a longer, low-key bed sits underneath and the mix ducks other layers so answers are easy to catch. The overall palette is intentionally compact and learnable: a few well-chosen ambiences, a handful of memorable icons, and clean UI tones—enough to convey space, direction, and nearby places, without turning into audio clutter.
Surface Sounds
As you move, you hear footsteps that match the surface under you, grass, concrete, asphalt, wood, gravel, even near water (with a light splash at the edge). The cursor/character movement is “voiced” as steps, so your ears always know if you’re on a path, a square, or a softer natural surface.
Feature Sounds
Symbols are immediate, recognizable icons for specific places or objects, like scissors for a hairdresser, a school bell for a school, or leafy rustle for trees. They’re quick to learn and help you identify what’s nearby without long speech.
Symbol Sounds
Feature sounds paint the atmosphere of larger elements: the hush and whoosh of a highway, birds and wind in parks, station PA for rail hubs, or the door bell/revolving door motif for shops and malls. They give you a sense of space and activity level before you get there.
Interaction Sounds
Interaction sounds tell you what the system is doing: a short cue when AI is listening or replying, a distinct boundary sound when you bump a wall, hedge, or gate, and clear earcons for states like landing, command mode, loading, ready, zoom, or teleport. These make the interface predictable and reduce guesswork.