/glossary

Terms, plainly defined.

A working glossary of audio, AI, DSP, accessibility, and live-audio terms used across the labs. Updated whenever a doc surfaces a new term that needs one.

01 76 terms

AAudio Engineering: Android’s native low-latency audio API, introduced in Android 8.0 and matured significantly in 12+. The recommended path for realtime audio on modern Android.
ABX test SkillLab: A double-blind listening test: the subject is presented with reference A, reference B, and unknown X — and must identify whether X = A or X = B. The gold standard for proving an audible difference exists.
AGC VoiceLab: Automatic Gain Control — adaptive amplification that keeps signal levels consistent without operator intervention. Standard in conferencing apps; used carefully in podcast post-production.
Aliasing Engineering: The spurious low-frequency artefacts produced when a signal contains frequencies above half the sample rate (the Nyquist frequency). Prevented by anti-aliasing filters before sampling and by oversampling in nonlinear DSP.
ASHA HearLab: Audio Streaming for Hearing Aids — the Bluetooth profile used by Android to stream low-latency audio directly to compatible hearing aids.
ASR VoiceLab: Automatic Speech Recognition — the speech-to-text layer. Whisper, Deepgram, AWS Transcribe, Google STT. Mostly a commodity in 2026 at high-90s % accuracy on clean material.
Audiogram HearLab: A graph of hearing threshold (dB HL) versus frequency (Hz), per ear. The standard diagnostic output of an audiology test. Drives the prescription of any modern hearing aid.
AudioWorklet MixLab: The browser API for running audio processing code on a dedicated thread, separate from the main thread. The foundation for any serious WebAudio analysis or synthesis.
Auracast HearLab: Bluetooth LE Audio broadcast mode. A single source streams audio that any nearby compatible device — including hearing aids — can tune to without pairing. The basis for next-generation assisted-listening systems.
Bit depth General: The number of bits per sample. 16-bit covers most consumer use; 24-bit is the production standard; 32-bit float is the working format inside DAWs and analysis tools.
Brickwall limiter MixLab: A look-ahead peak limiter with an absolute ceiling — set the threshold and no sample passes above it. The last stage of nearly every loudness-normalised master.
BS.1770-4 MixLab: The ITU recommendation defining how to measure loudness in audio. The basis for LUFS metering, EBU R128, and streaming-platform normalisation targets.
Buffer size Engineering: The number of audio samples processed per callback. Smaller = lower latency but more CPU per second; larger = higher latency but more efficient. 128 or 256 samples is the modern realtime sweet spot.
Codec General: A coder-decoder pair that compresses and decompresses audio. Lossy: MP3, AAC, Opus, LC3. Lossless: FLAC, ALAC, WAV/PCM (uncompressed). The choice of codec sets the floor for what any downstream tool can achieve.
Crest factor MixLab: The ratio between peak and RMS levels in a signal, expressed in dB. Higher numbers indicate more dynamic range. See the crest factor doc.
Critical listening SkillLab: Trained, focused listening for specific audio attributes — frequency balance, dynamics, distortion, stereo width. The skill SkillLab drills systematically.
Cue list CueLab: An ordered sequence of show-runtime triggers — audio, lighting, video, scenes. The data structure underneath every live-event production tool. See CueLab’s cuelist schema doc.
De-esser VoiceLab: A frequency-selective compressor that targets the 5–9 kHz sibilance band. Reduces harsh "s" and "t" sounds in vocal recordings without dulling the overall top end.
Diarisation SignalLab: Speaker turn segmentation — identifying who spoke when in an audio file. A core capability for the SignalLab indexer.
Dither MixLab: Low-level noise added before bit-depth reduction to randomise quantisation error. Required when converting from 24/32-bit to 16-bit for distribution.
Ducking VoiceLab: Reducing the level of one signal whenever a higher-priority signal is present. The classic example: music ducking under voice in a podcast or radio show.
Dynamic EQ MixLab: An EQ band whose gain reacts to level. Conceptually a single-band compressor with a parametric frequency control. Less destructive than multiband for surgical tonal balancing.
Embedding SignalLab: A learned dense vector representation of audio (or a chunk of audio). The core of modern audio retrieval, similarity search, and zero-shot classification. AudioLab signal indexing uses embeddings under the hood.
FFT MixLab: Fast Fourier Transform — the algorithm at the heart of every frequency-domain operation in audio software, from spectrum analysers to convolution reverbs.
Formant VoiceLab: A resonant frequency band in the human vocal tract. Formant patterns distinguish vowels and reveal speaker identity. Preserved or shifted intentionally by pitch-correction tools.
Hann window MixLab: A common windowing function applied before FFT to reduce spectral leakage. The default windowing choice for most analysis-oriented audio code.
Headroom MixLab: The dB margin between the loudest sample in a mix and 0 dBFS (or true peak). Streaming masters typically target –1 dBTP / –14 LUFS; broadcast targets are more conservative. See the methodology page for AudioLab’s defaults.
HRTF HearLab: Head-Related Transfer Function — the frequency-domain filter describing how a sound from a specific direction is altered by the listener’s head, torso, and outer ear. The basis for binaural spatial audio.
Inference SignalLab: Running a trained model to produce outputs. Distinct from training. The phase that needs to be fast, predictable, and small enough to fit on-device. The user-visible part of any audio AI product.
Integrated LUFS MixLab: A perceptually-weighted, time-gated measure of loudness across an entire piece of audio. The number streaming platforms use to normalise playback.
JND SkillLab: Just Noticeable Difference. The smallest change in a stimulus that a listener can reliably detect. Drives the calibration of every audio-perception threshold in SkillLab’s drills.
JUCE Engineering: Cross-platform C++ audio framework. The path of least resistance for shipping the same DSP code on Android, iOS, macOS, Windows, and as VST/AU/AAX plugins.
K-weighting MixLab: The pre-filter chain (high-shelf at ~1.5 kHz + high-pass at ~38 Hz) defined in BS.1770-4. Applied before RMS energy summation so the resulting LUFS value tracks perceived loudness rather than raw signal energy.
LC3 HearLab: Low Complexity Communication Codec. The default codec for Bluetooth LE Audio. Materially better than SBC and competitive with AAC at lower bitrates.
LE Audio HearLab: The Bluetooth audio specification introduced in 5.2, including LC3 codec and multi-stream profiles. Material improvement for hearing devices and low-power audio.
Lissajous figure MixLab: A plot of one signal against another, producing a curve whose shape reveals their phase relationship. The classical visualisation behind every "phase scope" in pro audio.
LRA (Loudness Range) MixLab: EBU 3342 metric expressing the difference between the loudest and quietest sections of a piece of audio. Distinct from peak-to-RMS dynamics; closer to perceived "how much the music breathes".
Mel spectrogram SignalLab: A spectrogram on the mel frequency scale — perceptually-weighted bins that more closely match human pitch perception than linear frequency. The standard input representation for audio neural networks.
MFCC SignalLab: Mel-Frequency Cepstral Coefficients. The classical compact representation of timbral content — 13–40 coefficients summarising the spectral envelope. Still useful as features even in the deep-learning era.
Mid/Side MixLab: A stereo representation where Mid = (L+R)/2 (centre content) and Side = (L−R)/2 (stereo difference). The basis for measuring true stereo width.
Mix-minus CueLab: A feed sent to a remote guest containing the full programme audio except their own voice — preventing the “talking-into-a-tunnel” feedback experience. Standard in broadcast and remote-podcast routing.
Momentary LUFS MixLab: A 400 ms sliding-window loudness measurement. The fast-twitch metric — used to catch transient loud passages. Calibrated against the same K-weighting filter as integrated LUFS but without gating.
Multiband compression MixLab: Splitting the signal into 2–6 frequency bands and compressing each independently. The standard tool for taming a single problematic band (boomy lows, harsh upper-mids) without dulling the whole mix.
MUSHRA SignalLab: MUlti Stimulus test with Hidden Reference and Anchor. The standard listening-test methodology for evaluating perceived audio quality at intermediate-to-high quality levels.
N-1 feed CueLab: Another name for mix-minus. Literally: a mix of all N sources minus the one going to that destination. Built natively into proper broadcast consoles; emulated via routing matrices in software.
Neural vocoder VoiceLab: A model that converts an intermediate audio representation (typically a mel spectrogram) into a waveform. WaveNet, HiFi-GAN, BigVGAN. The “last mile” of any modern TTS pipeline.
Nyquist frequency Engineering: Half the sample rate — the theoretical upper bound of representable frequencies in a digital signal. 22.05 kHz at 44.1 kHz; 24 kHz at 48 kHz. The reason mastering chains often work at 96 kHz or higher.
Oboe Engineering: Google’s C++ wrapper around AAudio (with OpenSL ES fallback for legacy devices). The recommended starting point for Android audio code in 2026.
OfflineAudioContext MixLab: WebAudio API variant for offline, faster-than-realtime audio processing. The right choice for batch analysis and rendering. Powers the AudioLab sample-synthesis pipeline.
Onset detection SignalLab: Identifying the start times of musical events — drum hits, note attacks, percussive transients. Foundational for beat tracking, tempo estimation, and segment-level audio analysis.
Pearson correlation MixLab: The statistical measure of linear correlation between two signals, returning a value from −1 to +1. Used in MixLab to compute stereo correlation between L and R channels.
Phantom power VoiceLab: +48 V DC supplied to condenser microphones via the same XLR cable that carries audio. Required to operate the mic capsule; supplied by audio interfaces and mixers.
Phase / polarity Engineering: Polarity is binary (signal inverted or not). Phase is continuous and frequency-dependent. Conflating the two is the most common audio-engineering vocabulary error — and the cause of many comb-filter mysteries.
Phoneme VoiceLab: The smallest unit of speech sound that distinguishes one word from another in a given language. English has roughly 44; phoneme-level alignment underpins every modern TTS and ASR system.
Plosive VoiceLab: Low-frequency burst from "p" and "b" consonants hitting the microphone. Mitigated by pop filters in capture and high-pass filtering or transient editing in post.
Pre-show checklist CueLab: The reusable set of pre-broadcast / pre-event verifications that catches preventable failures. Hardware-routing-levels-stream-backup, in that order. See the pre-show checklist doc.
Prosody VoiceLab: The rhythm, stress, and intonation of speech. The hardest thing for synthetic voices to get right — and the most common giveaway when they don’t. Drives intelligibility and perceived emotion.
Round-trip latency Engineering: The total time from sound capture to sound output through a processing chain. The headline number for any realtime audio app — under 15 ms is the threshold for monitoring-while-tracking.
RT60 VoiceLab: The reverberation time required for an impulse to decay 60 dB. The standard metric for room acoustics. Practical voice-QA tools estimate it implicitly via envelope-decay analysis.
Sample rate General: The number of samples per second. 44.1 kHz for CD; 48 kHz for broadcast and most modern production; 96 kHz / 192 kHz for high-resolution mastering.
Short-term LUFS MixLab: A 3-second sliding-window loudness measurement. The middle layer between momentary (400 ms) and integrated (whole-program). Drives the “LRA” range calculation in EBU 3342.
Sibilance VoiceLab: High-frequency consonant energy from "s", "sh", "t", and "z" sounds. Concentrated around 5–9 kHz. A common source of listener fatigue when over-emphasised.
Sidechain MixLab: An external control signal that triggers a dynamics processor (typically a compressor). The technique behind pumping kick-drum mixes, voice-ducked music beds, and modern de-essing.
SNR General: Signal-to-Noise Ratio. The dB margin between desired signal and background noise. The single number that predicts intelligibility, recording quality, and ML model performance better than any other.
Source separation SignalLab: Splitting a mixed audio signal back into individual stems (vocals, drums, bass, other). Demucs, Spleeter, Open-Unmix. State of the art in 2026 produces clean isolation on most pop music.
Speaker verification VoiceLab: Confirming a claimed speaker identity from a voice sample. The defensive counterpart to voice cloning. Modern systems use d-vector or x-vector embeddings compared via cosine similarity.
Spectral centroid MixLab: The "centre of mass" of the spectrum — a weighted average frequency that corresponds roughly to perceived brightness.
Speech-in-noise HearLab: A class of metrics and tests (SIN, QuickSIN, HINT) measuring speech intelligibility against varying noise floors. The real-world predictor of hearing-aid benefit, more so than pure-tone audiometry.
Streaks SkillLab: Practice-consistency markers used in SkillLab. Designed to survive a missed day to avoid slot-machine progression dynamics. See progression design.
Tag schema SignalLab: The controlled vocabulary used to describe audio content for downstream retrieval. SignalLab’s schema covers content type, brightness, dynamics, and issue flags. See the tag schema doc.
True peak MixLab: The actual analog peak of a reconstructed signal, which can exceed the sample-domain peak due to inter-sample peaks. Measured via oversampling.
TTS (Text-to-Speech) VoiceLab: Generating speech audio from text. State of the art in 2026 is neural — typically a text encoder + acoustic model + vocoder pipeline, often distilled into a single model for realtime inference.
VAD (Voice Activity Detection) VoiceLab: A classifier that returns 1 for speech and 0 for non-speech, frame by frame. The first stage of nearly every voice pipeline — from conferencing apps to podcast editors to the VoiceLab QA demo.
Voice cloning VoiceLab: Synthesising speech in the voice of a target speaker from a reference recording — modern models work from 3–30 seconds of reference audio. The basis for both legitimate dubbing/accessibility tools and modern impersonation attacks.
WDRC HearLab: Wide Dynamic Range Compression — the multiband, level-dependent gain processing inside every modern hearing aid. Maps the speaker’s dynamic range into the user’s residual hearing range.
WebAudio API MixLab: The W3C standard for audio processing in browsers. Includes AudioContext, AudioNode, AudioBuffer, AnalyserNode, AudioWorklet, and OfflineAudioContext. The foundation under every browser-side audio tool.