/docs · VoiceLab · Deep

Estimating room echo without RT60

You don’t need an impulse response to know your room is too live. Here are three pragmatic measurements.

The textbook way to measure room reverberation is RT60 — the time it takes for a known impulse to decay 60 dB. It’s exact, it’s standardised, and it’s entirely unhelpful for QA on a podcast someone already recorded.

For practical voice QA, three pragmatic measurements get you 80% of the way there using only the recording itself:

1. Envelope decay after speech offsets

When a speaker stops talking, their voice doesn’t cut to zero — the room rings down. The shape of that ringdown is essentially the room’s impulse response convolved with the voice.

The cheap measurement:

Detect speech-to-silence transitions using simple VAD (energy threshold on a smoothed envelope).
For each transition, measure the envelope value 200ms after the transition relative to the value at the transition.
Average the ratio across all transitions in the recording.

function decayRatio(envelope, mask, hopSec) {
  const decays = [];
  for (let i = 1; i < mask.length - 30; i++) {
    if (mask[i - 1] && !mask[i]) {
      const peak = envelope[i - 1];
      if (peak < 1e-5) continue;
      const window = Math.min(30, Math.floor(0.6 / hopSec));
      let tail = 0;
      for (let k = 0; k < window; k++) tail += envelope[i + k];
      tail /= window;
      decays.push(tail / peak);
    }
  }
  return decays.reduce((a, b) => a + b, 0) / decays.length;
}

Values typically fall in this range:

Mean decay ratio	Room character
< 0.05	Dry / treated booth
0.05–0.15	Tight room
0.15–0.35	Live room
> 0.35	Reverberant

This is exactly the signal MixLab and VoiceLab use to bucket “room” character.

2. Spectral autocorrelation

Reverberant rooms colour the spectrum in characteristic ways: comb-filter notches from early reflections, and a slow tilt from the room mode distribution. A spectral autocorrelation on a 4-second speech window reveals both.

In practice the spectral autocorrelation is harder to interpret than the envelope decay, but it’s the right tool when you want to detect which kind of room problem you have (early reflections vs late reverb). For QA purposes, envelope decay alone is usually enough.

3. Vowel-segment voice quality

The cleanest measurement comes from analysing sustained vowel segments. Pick the longest steady-state vowel in the recording (usually around 200ms of voiced energy with low ZCR), and look at its spectral flatness. A clean voice in a dry room has a tonal vowel with distinct harmonic peaks. A reverberant room smears those peaks into a noisier spectrum.

This requires more work (vowel detection isn’t trivial without a phoneme model) but it’s the most listener-relevant measurement because it correlates directly with how “tubby” the voice sounds.

Why RT60 is wrong for QA

RT60 wants you to inject a known stimulus into the room. You can’t do that on a recording someone already made. Even if you could, RT60 measures the room, not the recording — and the recording is what your listeners will hear.

For QA, the question isn’t “what is this room’s RT60?” It’s “does this recording sound like it was made in a controlled space?” The envelope-decay measurement answers exactly that question, with no calibration, no specialised equipment, and no microphone-position assumptions.

Why filler density matters more than filler count
Designing hearing support without medicalising it — the broader framing

Estimating room echo without RT60

1. Envelope decay after speech offsets

2. Spectral autocorrelation

3. Vowel-segment voice quality

Why RT60 is wrong for QA

Related