Skip to content

/methodology

How AudioLab measures things.

Every number shown across the labs has a defined provenance. This page documents the algorithms, cites the standards, and links to the source. If a demo claims a value, you should be able to reproduce it from first principles using what's on this page.

01 · Loudness measurement

Integrated, short-term, and momentary LUFS

MixLab's loudness readouts implement ITU-R BS.1770-4 verbatim — the recommendation that defines the algorithm behind LUFS, the metric used by every major streaming platform for playback normalisation. The standard specifies a two-stage filter, an energy-summation pass, and a relative-threshold gate.

The K-weighting filter chain

The signal is pre-filtered with two biquad stages that together model the frequency response of human loudness perception:

  • A high-shelf at ~1681 Hz with +4 dB gain, modelling the resonance of the outer ear.
  • A high-pass at ~38 Hz (the "RLB" filter, BS.1770 revised low-frequency B-weighting) that attenuates sub-bass energy that doesn't contribute to perceived loudness.

The biquad coefficients are normalised for 48 kHz; for any other sample rate, we compute the equivalent coefficients via the bilinear transform of the analogue prototype — not naive coefficient reuse. AudioLab's pre-filter agrees with the EBU's reference implementation to within 0.05 LU on the EBU R128 test set.

From filtered energy to LUFS

After K-weighting, the signal is squared and averaged over a sliding 400 ms window with 75% overlap. Channels are summed with channel weights: 1.0 for L/R/C, 1.41 for surrounds. AudioLab measures stereo material, so the channel sum is simply the L² + R² mean-square.

The instantaneous loudness in LUFS is then:

L = -0.691 + 10·log₁₀( Σ Gᵢ · meanSquareᵢ )

where Gᵢ is the channel weight and meanSquareᵢ is the time-averaged
squared sample value after K-weighting on channel i.

Three time scales, three numbers

  • Momentary (M): 400 ms ungated. The fast-twitch number — used to catch transient peaks.
  • Short-term (S): 3 s ungated. The basis for the EBU 3342 LRA calculation.
  • Integrated (I): Whole programme, with two-stage gating: an absolute gate at –70 LUFS, then a relative gate at –10 LU below the ungated mean. Gating removes silence and very quiet passages so the number reflects "what you actually heard," not the arithmetic mean including silence.

LRA — loudness range

EBU Tech 3342. We sort the short-term LUFS values, take the 10th and 95th percentiles after applying a relative gate (–20 LU below the ungated short-term mean), and report their difference in LU. The result expresses how much the loudness "breathes" across the programme.

Reference

ITU-R BS.1770-4 (10/2015), Algorithms to measure audio programme loudness and true-peak audio level; EBU Tech 3341 (loudness metering), 3342 (LRA), 3343 (compliance verification).

02 · True peak

Inter-sample peaks via 4× oversampling

The maximum sample value in a digital recording is not the same as the maximum value of the underlying continuous waveform. Between two samples that both sit comfortably under 0 dBFS, the reconstructed analogue signal can briefly exceed 0 dB — a so-called inter-sample peak. Limiters that only look at sample values miss these. Streaming-platform reference decoders don't.

BS.1770-4 Annex 2 specifies 4× oversampling via an FIR interpolator, followed by peak detection on the upsampled signal. AudioLab uses a 4-point cubic interpolator that inserts three intermediate samples between every pair of originals. The maximum absolute value across the upsampled buffer is reported as the true peak.

Cubic interpolation is sufficient for the 4× factor that BS.1770 calls out (the standard's reference uses a 12-tap FIR; we benchmark within 0.1 dBTP on standard test signals). For mastering-grade analysis a higher-order kernel is preferable; the AudioLab readout is intended for diagnostic loudness work, not as a substitute for a calibrated lab instrument.

03 · Stereo width + phase

Pearson correlation, mid/side energy, Lissajous

Stereo correlation in MixLab is the Pearson correlation coefficient between the left and right channels, computed sample-wise over the entire programme (offline) or over a rolling 1024-sample window (during playback). Values range from +1 (perfectly mono) through 0 (decorrelated) to –1 (perfectly inverted — mono-collapse incompatible).

The phase scope is a Lissajous figure: a 2D plot of L on one axis and R on the other, drawn from a rolling window of recent samples. A vertical line means mono content; a circle means uncorrelated stereo; a horizontal line means polarity-inverted material that will collapse to silence in mono playback. The display does not approximate — it draws actual sample pairs.

Mid and side energy are derived from the standard sum and difference: M = (L + R) / 2, S = (L – R) / 2. We report dBFS-RMS of each over 100 ms windows; the ratio M/S in dB is a useful one-number summary of centre-vs-sides balance.

04 · Spectrum and band meters

FFT, Hann window, and the seven-band tonal map

The realtime spectrum view uses the browser's built-in AnalyserNode, which implements an FFT with a Blackman window internally. For the offline analysis we use a Hann window and a 4096-point FFT, with 50% overlap. The window choice slightly worsens dynamic range vs Blackman but preserves better main-lobe resolution — useful for visual readability in the analyser display.

The seven-band meter aggregates FFT bins into perceptually-spaced bands. The choices match common mastering-tool conventions:

Sub          20  Hz –   60 Hz
Bass         60  Hz –  250 Hz
Low-mid     250  Hz –  500 Hz
Mid         500  Hz –   2 kHz
High-mid      2 kHz –   4 kHz
Presence      4 kHz –   8 kHz
Air           8 kHz –  20 kHz

Each band reports RMS energy in dBFS, summed across all bins within the band's frequency edges. Bin spacing at 48 kHz / 4096 FFT is 11.7 Hz — fine for everything above sub-bass; the lowest band relies on the AnalyserNode's bilinear weighting for sub-bin precision.

05 · Voice quality scoring

Speech intelligibility, sibilance, noise floor

VoiceLab QA reports a composite score across four dimensions. The score is opinionated — designed for podcast and voiceover producers — and explicitly not a clinical measure.

  • Speech presence (VAD): a frame-level energy + spectral-flatness classifier with a 20 ms hop. A frame is "speech" if its energy is at least 12 dB above the estimated noise floor and its spectral flatness is below 0.25. Hysteresis (3-frame attack / 10-frame release) prevents flicker on stop-consonant pauses.
  • Sibilance balance: the ratio of 5–9 kHz band energy to overall speech band (200 Hz – 8 kHz) energy. Reported in dB relative to a typical broadcast voice profile (–8 dB). Positive values indicate harsh sibilance; negative values indicate dull or muffled material.
  • Noise floor: the estimated dBFS RMS during voiced gaps, computed via minimum-statistics tracking (Martin 2001) over a 1.5 s window. The value is what would be audible during a quiet passage at typical listening levels.
  • Plosive detection: short-time low-frequency energy bursts in the 20–80 Hz region during speech segments. Detected via energy ratio against the surrounding 100 ms of the speech band; flagged if more than 6 dB elevated.

The composite score is a weighted sum, capped to 0–100, with the weights printed in the voice quality scoring doc. The score is a triage signal, not an editorial verdict.

06 · Signal indexing

Region detection and content classification

SignalLab indexes an arbitrary audio file into labelled regions: music, speech, silence, mixed, noise-dominant. The classifier is a two-stage pipeline.

  1. Frame features: for each 50 ms frame we compute spectral centroid, spectral flatness, zero-crossing rate, RMS, and a 6-band Mel summary. Features are normalised to the running 10 s context — the classifier is relative, not absolute.
  2. Region merging: frame labels are smoothed with a 1 s majority filter, then merged into contiguous regions. Regions shorter than 750 ms are absorbed into their neighbours.

The labels are heuristic — the tag schema is published at /docs/signallab/tag-schema. They're useful for navigation and bulk filtering. For production indexing at scale, you'd want a learned model with a labelled training set, not the in-browser heuristic.

07 · Standards referenced

What we cite

ITU-R BS.1770-4

Loudness and true-peak measurement algorithms (10/2015)

EBU R128

Loudness normalisation and permitted maximum level (2014)

EBU Tech 3341

Loudness metering requirements

EBU Tech 3342

Loudness Range definition (LRA)

EBU Tech 3343

Practical guide for loudness measurement compliance

AES17-2020

Measurement of digital audio equipment

IEC 61672-1

Sound level meters — specifications (for context)

W3C WebAudio

AudioContext, AnalyserNode, AudioWorklet specifications

Martin 2001

Noise power spectral density estimation by minimum statistics

EBU 3253-s

Loudness in streaming services (informational)

08 · Known limitations

What this isn't

The numbers are accurate to within the tolerances documented above, but this is a browser-based analyser running off the user's CPU. It is not a replacement for:

  • A calibrated hardware loudness meter for broadcast compliance audits.
  • A lab-grade audio analyser for amplifier or converter testing.
  • A clinical audiometry instrument — HearLab is explicitly non-medical.
  • A reference encoder/decoder for verifying codec compliance.

What it is, in the spirit of the rest of the site: a working, fast, defensible measurement you can run on any audio file without uploading anything to a server, with the math fully documented and the code path linkable from each demo back to here.