Phase 2 Report

A sandy trail winds through coastal dunes under a cloudy sky. Three people stand near a wooden signpost that reads **“Noordduinen – Gele wandelroute”** with an arrow; one person holds a small dog on a leash. Tall grasses cover the rolling dunes, and a metal transmission tower rises in the distance. A long strap or lead lies on the sand in the foreground, adding to the casual, mid-hike feel of the scene.

Screen-to-Soundscape (STS) evolved from a speculative concept into a working, downloadable prototype that lets people explore cities by ear. In Phase 2, our goal was to prove that open geographic data (OpenStreetMap) and spatial audio could create a learnable, navigable sound world—prioritizing exploration over step-by-step routing. We delivered a Godot-based desktop app that auto-starts with a tutorial, renders ambient “beds,” symbolic point-of-interest icons, and surface-aware footsteps, and keeps the world loading continuously within the Netherlands and Belgium so users don’t “fall off the edge.” We also integrated a lightweight, location-aware AI assistant for “Where am I?” and “What’s nearby?”, while keeping the stack open (Piper TTS) and the download small by hosting heavier models and OSM services on DigitalOcean.

Phase 2 Methodology

We kept an iterative, co-creation-led rhythm: short build → test → learn cycles with blind and visually impaired co-creators as decision-makers. Two structured co-creation sessions informed controls, the sound vocabulary, and scope; a listening party validated real-world use and produced targeted fixes. We emphasized keyboard-first interaction (hotkeys for help, restart, where-am-I, etc.), a first-run tutorial that can be re-started any time, and a compact, learnable audio grammar mapped to OSM tags.

Phase 2 Co-Creation

A small co-creation/work session in a bright room. Three participants wearing headphones work on laptops at a long table while another person stands and points, offering guidance. Mugs, a water carafe, and notes on a corkboard set an informal, collaborative atmosphere.

We ran two co-creation workshops and one listening party. Sessions covered device habits (screen readers, headphones), exploration tasks (finding a toilet, following a busy street), and comfort (fatigue, silence). We co-wrote the tutorial, refined sound density, added borders/street “walls,” and prioritized typed commands after testers noted typing is often faster and more reliable than speech across languages.Prototype Development and Testing

Our team developed an initial prototype using A-Frame, a web framework for creating virtual reality (VR) experiences, and hosted it on Glitch. This prototype converted a basic webpage into a virtual soundscape, where users could navigate using keyboard commands, triggering audio elements as they approached specific points. The first co-creation session revealed several insights: while participants appreciated the spatial layout and freedom of exploration, they also expressed concerns about the lack of auditory cues to indicate their location and boundaries within the virtual space.

Based on this feedback, we developed a second prototype with enhanced features such as audio boundaries, additional keyboard controls, and refined sound parameters to provide clearer auditory cues. We adjusted the distance model to ensure users could differentiate between nearby and distant sounds, making navigation more intuitive. This iterative process, informed by co-creator feedback, allowed us to refine the tool in a way that balanced functionality, aesthetics, and usability.

Phase 2 Results

During our listening party, our co-creators tested our latest demo on a macbook

Stack & hosting. The prototype is a local Godot app; spatial audio, movement, and UI run on-device. We query OpenStreetMap for nearby features and surfaces; to keep the client small and responsive, the OSM/Overpass endpoint and AI models (several GB) run on a DigitalOcean droplet we will maintain through December 2025. The app also supports offline city bundles (local OSM extracts) for demos; when the AI endpoint isn’t available, “Where am I?” falls back to a local summary (heading, spawn distance, nearest cached features).

Sound model. We implemented distance attenuation, stereo/HRTF panning, gentle occlusion, a constant ambient floor (“silence feels like being lost”), surface-aware footsteps (asphalt, cobble, gravel, grass, wood, polished floor), symbolic POI icons (e.g., scissors for hairdresser, doorbell for small shops), and clean earcons for system states (spawn/landing, command mode, loading/ready, wall hit/slide, boundary). Early tests showed users could feel enclosure on narrow streets and anticipate openings at corners.

Tutorial. The tutorial auto-starts on first run, can be toggled with H, advanced with Tab, and re-started with F1. It teaches movement, turning, “Where am I?” and push-to-talk (T)—with conditional prompts for bumping/sliding along walls, and distinct state sounds so the interface is predictable.

We delivered a working, downloadable prototype (NL/BE), a compact and reusable audio grammar mapped to OSM, and an always-available tutorial that reduced onboarding friction and increased willingness to wander. The continuous world (no tile edges), clearer state cues, and gentle ambient bed improved comfort and confidence; participants oriented themselves, turned toward salient ambiences (parks, busy streets), and discovered nearby amenities without visual cues. From an engineering standpoint, we proved the OSM → audio pipeline, validated the client-light/server-heavy packaging, and confirmed that modest 3D audio (plus simple echo on footsteps) produces a larger-than-expected boost in spatial understanding—on ordinary laptops.

Unexpected Outcomes

Beyond the strong 3D audio results, three surprises stood out. First, the tutorial became scaffolding for independence—not just instructions. Second, a constant ambient floor materially reduced anxiety; silence consistently read as “lost.” Third, symbolic sounds were learned instantly and remembered (e.g., scissors = hairdresser), supporting a small, stable icon set over ever-increasing realism.

Limitations & What’s Pending

Cognitive load remains a design challenge: some users still “construct a picture” as they move, so level-of-detail and scale presets (Quiet/Normal/Rich; Street/Neighborhood/City) need to go further. A few state cues arrived late (spawn/landing, stronger command-mode, explicit “wall”), and testers requested more shortcuts (reset/exit building, fast “Where am I?”). Our alt-alt-text feature—a heading-aware computer-vision layer for “what’s in front/left/right?”—reached prototype stage but was not stable/low-latency enough for a user interface within the grant window; we’ve documented it for Phase 3. Global scale remains out of scope; NL/BE is deliberate for performance.

Tools, Software, Workshops (Phase 2 outputs)

Title: Screen-to-Soundscape (Prototype Software, NL/BE)
Launch: September 2025 (public demo 26 Sep 2025)
URL: https://github.com/vladimirnani/screen2soundscape/releases/tag/1.0.5
Title: Co-creation Workshop #1
Date: February 2025
Title: Co-creation Workshop #2
Date: May 2025
Title: Listening Party / Public Prototype Demo
Date: 26 September 2025 • Participants: ~10

The Future

Next steps

Finish the alt-alt-text interface, deepen spatial audio (echoic footsteps as a true 3D emitter; clearer spawn/command; stronger borders/street walls), expand detail/scale presets and quiet mode, add time-of-day and city-specific ambiences, support community sound packs, and ship offline city bundles. We’ll run a longitudinal co-creation study (orientation time, task success, recovery without help, confidence/fatigue).

Sharing

Maintain the website, release builds on GitHub, publish a short technical write-up (OSM→audio grammar, interaction rules), a demo video with transcript/AD, and propose workshops/talks (e.g., ISEA and related venues) with community partners.

Funding

We do not have follow-up funding secured. We seek €50,000–€100,000 to deliver a fully functional application, pay co-creators, complete alt-alt-text, deepen spatial audio, package offline bundles, and run studies. We’ll explore accessibility/digital-culture grants, civic partnerships (cities, transit, museums), and small sponsorships—keeping the core stack open. If partial funds arrive, we’ll ship by city bundles (Leiden/Leuven, Amsterdam, Brussels).