Game Audio Without Libraries: Building a Sound Engine from Scratch
📅 June 15, 2026✍️ Tom Reeves🏷️ Technical⏱️ 5 min read
All Gerk Games use Web Audio API for sound effects. Not a library, not a framework — just raw oscillator nodes and gain envelopes. Here's what we learned building a full sound engine in 50 lines.
Why Web Audio API Beats Audio Files
Audio files (MP3, OGG, WAV) need to load, decode, and buffer. On a slow connection, this can take 2-5 seconds per sound. Web Audio API generates sounds procedurally — zero network overhead, zero file size, zero loading time. A game with 10 sound effects using procedural audio loads instantly.
Our sound engine uses a single function: an oscillator with a frequency, duration, waveform type (sine/square/sawtooth), and a gain envelope that starts at a volume and fades to zero. That's it. Everything from the Snake Arena eating sound to the Pixel Jumper jump sound is just different parameters to the same function.
Audio files typically use MP3 at roughly 1MB per minute of audio. A game with 10 sound effects at 2 seconds each would need roughly 350KB of audio files. On a 3G connection, that adds 2-3 seconds of loading time. Worse, the browser must decode each audio file before playing it, adding another 200-500ms of latency. Procedural audio using Web Audio API eliminates all of this. Our entire sound engine is about 600 bytes of JavaScript. It loads instantly and plays sounds with zero decoding latency.
The specific technique: an OscillatorNode generates a waveform (sine for musical tones, square for impact sounds, sawtooth for "danger" sounds). A GainNode controls the volume envelope (attack, sustain, decay). By varying these parameters, we can produce a surprising range of sounds. A sine wave at 440Hz with 50ms attack and 200ms decay sounds like a soft button press. A square wave at 200Hz with 5ms attack and 100ms decay sounds like a game over thud. A sawtooth wave at 800Hz with rapid frequency modulation sounds like a power-up activation. All of these come from the same 600-byte function, just called with different parameters.
The Web Audio Oscillator as an Instrument
Building sound from scratch means treating the Web Audio API's oscillator as a primitive instrument. A square wave gives you the classic retro blip; a sine wave gives you a soft tone; a sawtooth gives you a harsh buzz. By scheduling short oscillator bursts at specific frequencies and shaping their volume envelope, you can synthesize every sound effect a casual game needs without loading a single audio file. The advantage is enormous: zero network requests for audio, instant playback, and a total footprint of a few hundred bytes of code instead of megabytes of sample files.
Envelopes Are Everything
The difference between a pleasant sound and an ear-splitting one is the envelope — how the volume ramps up and fades out. A raw oscillator switched on and off produces an unpleasant click at both ends because of the abrupt amplitude change. The fix is a quick fade-in over a few milliseconds and a smooth fade-out, which removes the click and gives the sound a natural shape. Most of the work in scratch-built game audio is not choosing frequencies, it is sculpting these tiny envelopes so the sounds feel intentional rather than harsh.
Why We Avoid Audio Files Entirely
The honest reason we synthesize everything is performance discipline. The moment you add audio files, you add loading time, decoding overhead, and a dependency on the network. For games that promise to load in under two seconds on any connection, audio files are a liability. Synthesized sound keeps the entire game self-contained and instant. The constraint forces creativity, and the result is games that play their first sound effect the instant the page is interactive, with nothing to download.