Hieronymus Bosch, The Garden of Earthly Delights, oil on oak panels, 205.5 cm × 384.9 cm (81 in × 152 in), Museo del Prado, Madrid
Prototype
Current Prototype
The executable is available via GitHub at version 1.0.5. The tutorial launches automatically on first run.
Phase 1B Prototype: Wikipedia Soundscape Generator
Explore any Wikipedia article as a 3D spatial audio soundscape. Enter a URL, and the system fetches the article, converts text to speech, and builds an interactive scene in real time. Available in English and French.
Open full-screen for the best experience. Best with headphones.
Sound Guide — What You Hear in the Soundscape
The real-time demo uses several layers of spatial audio to help you navigate and understand the article structure:
Core Audio
- Section speech — When you approach a sphere, the section's text is read aloud using text-to-speech. Volume scales with distance — closer sections are louder, distant ones are quieter.
- Singing bowl beacons — All unvisited elements emit a gentle, looping singing bowl tone from their 3D position, like overhearing conversations at a party. Each hierarchy level has a distinct pitch:
- Deep bowl (174 Hz) — Article title
- Mid bowl (264 Hz) — Main sections (H2 headings)
- Bright bowl (396 Hz) — Subsections (H3 headings)
- Gentle bowl (480 Hz) — Paragraphs
- Spatial summary on arrival — After the introductory audio, a spoken overview announces how many sections the article has and names the first few, giving you orientation before you start exploring.
- Content sonification — Images and tables have distinct sonic markers:
- Images (orange diamond shapes) — A camera shutter click plays before the image description is read
- Tables (blue flat shapes) — Three ascending beeps play before the table data is read
- Auto-announce — As you approach a section sphere (within about 6 units), its heading is spoken automatically.
- Ambient background — A soft, layered drone plays continuously, combining low-frequency tones (55 Hz, 82.5 Hz, 110 Hz) with gentle filtered noise.
- Section ambiences — Each major section has its own subtle ambient texture, creating audio "neighborhoods" that change as you move through the article.
- Boundary sound — A percussive "bump" sound plays when you reach the edge of the scene.
- Welcome audio — On first entry, an instruction audio plays. Double-tap Space (or the pause button on mobile) to hear a welcome message.
Footstep Audio
- Surface-responsive footsteps — As you move, footstep sounds play at regular intervals. The sound character changes based on your position: softer near the introduction (like grass), harder in deeper sections (like stone).
Navigation & Orientation
- "Where am I?" (Tab key) — Press Tab to hear a spoken summary of your position: which section is nearest, and how many sections are to your left, right, and ahead.
- Return to start (Escape key) — Press Escape to teleport back to where you first landed. Useful if you get lost in a large article.
- Breadcrumb trail — Visited spheres change color (dimmed) so you can visually track where you've been. A quiet click plays when you revisit an already-visited section.
- Dynamic floor — The green floor plane resizes to match the article's element layout, giving a visual boundary for the soundscape.
Layout & Structure
- Linear path layout — Elements are placed along a forward path going into the scene. Walk forward to progress through the article in reading order. Headers are centered, paragraphs offset to the right, subsections slightly to the left, images further right, and tables further left.
- All at ear level — All elements (title, sections, subsections, paragraphs, images, tables) are placed at the same height (y=1.6) for consistent audio.
- Content-type shapes — Headers appear as spheres, paragraphs as horizontal cylinders (length reflects text amount), images as orange rotated diamond boxes, and tables as blue flat wide boxes.
- Wikipedia article panel — The original Wikipedia article is displayed in a panel at the top of the screen. As you approach and play sections, the corresponding text is highlighted in green and auto-scrolled into view.
Interactive Features
- Auto-advance — When the current element finishes speaking and you haven't moved, the camera gently drifts toward the next element in reading order. Move any arrow key or touch the screen to cancel the drift and take manual control.
- Play all by distance (P key) — Press P to hear all text elements read aloud sequentially, starting with the nearest. Volume is based on distance. Press P again to stop.
- Link portals — Wikipedia links from the article appear as magenta spinning spheres at the edges of the scene. Walking into a portal announces the linked article title. Press Enter to load it as a new soundscape.
Keyboard Controls
- Arrow keys — Move around the 3D space
- Space — Play/pause audio
- Double-tap Space — Play welcome message
- Shift — Play nearest sound
- Tab — "Where am I?" position summary
- P — Play all elements by distance (toggle on/off)
- Enter — Load a portal link if near one
- Escape — Return to starting position
Phase 1A Prototype
The original Phase 1A prototype uses pre-recorded audio from the Galaxy Wikipedia article with a semicircular spatial layout. Try it here: https://www.screentosoundscape.com/scripts/phase1aprototype.html
Operational Model
The application functions as a local (Godot) desktop prototype that typically connects to a DigitalOcean server for map data and AI assistance. Offline functionality is supported for demonstrations, with local handling of movement, spatial audio, and tutorials.
Core Experience
Users explore urban environments through audio rather than visuals. The system provides ambient soundscapes, auditory icons for nearby locations, and surface-responsive footsteps. Navigation uses arrow keys with periodic AI assistant access.
Technical Architecture
The system integrates OpenStreetMap data with spatial audio processing, covering Netherlands and Belgium regions. Voice synthesis and AI models run server-side to optimize client performance.
Prototype Development
Initial iterations addressed navigation challenges identified through co-creation sessions. Developers implemented boundary audio cues and keyboard controls to enhance user orientation and control.
Design of first prototype
The initial prototype was built using the A-Frame framework. This web-based prototype featured keyboard navigation and audio triggers at spatial points.
Feedback from first co-creation
Users appreciated the spatial layout but needed stronger auditory boundary indicators. Key findings included the need for clearer navigation cues and better orientation feedback.
Reflection on the first co-creation
The team learned that traditional screen readers often "flatten" web experiences by reducing content to linear lists, eliminating spatial context crucial for understanding complex information like maps or images. This insight drove the development of enhanced spatial audio features.
Design of the second prototype
The second prototype added enhanced audio boundaries, refined sound parameters, and adjusted distance modeling. Additional keyboard controls were implemented to give users more agency in navigation.
Feedback from the second co-creation
The refined prototype received positive feedback for its improved boundary audio and control options. Users noted that the enhanced sound design made navigation more intuitive and less disorienting.
Reflection on the second co-creation
Co-creators valued control over voice characteristics, sound localization, and movement within soundscapes. While exploration appealed to participants, they highlighted difficulties navigating without clear auditory cues and complexity from multiple layered voices.
Alt-Text Generation Examples
Using AI-powered image analysis, Screen-to-Soundscape can generate customized alt-text descriptions tailored to different audiences and contexts. Below are examples using Hieronymus Bosch's "The Garden of Earthly Delights":
Garden of Earthly Delights - Custom Alt-Text for Art Curator
Detailed art historical description for an art curator perspective
Garden of Earthly Delights - Custom Alt-Text for a Child
Child-friendly description with simpler language
Garden of Earthly Delights - Custom Alt-Text for a Child (Upbeat tone)
Child-friendly description with an upbeat, enthusiastic tone
Garden of Earthly Delights - Custom Alt-Text for a Child (Upbeat tone and Soundscape)
Child-friendly description with upbeat tone and immersive soundscape
Plan for the future co-creation
Future development will focus on:
- Expanding co-creation sessions with diverse visual content (charts, infographics, complex materials)
- Promoting open-source participation from developers, sound designers, and accessibility advocates
- Documenting co-creation guidelines for future inclusive design projects
- Enhancing spatial audio with echoic footsteps and clearer state transitions
- Supporting community sound packs and offline city bundles