Skip to content

Increasing Intelligibility

Share this Post:

In any worship environment, spoken word clarity is essential. Photo courtesy Sennheiser

In public spaces where people speak to an audience, we hope to ensure that the speakers’ voices are heard clearly — it’s the key purpose of the space, and houses of worship are the perfect example of it. We want every syllable of every word heard clearly by the congregation. There are a couple of paradigms in which we can work to help accomplish this goal: electronic and acoustic. Both are important to the achievement of our aim, and both should be considered. Electronic solutions tend to be less expensive and disruptive, so it’s only natural to gravitate toward them, but we should be prepared to look into acoustical solutions nevertheless.

It All Starts with the Mic

For our purposes, we’ll presume that a P.A. system will be part of our consideration. Choosing the proper kind of microphone is a fairly important factor in determining intelligibility. Mics mounted in the pulpit, whether a slender gooseneck or a rock ‘n’ roll dynamic on a mic stand, can work, but are susceptible to level fluctuations, as our speaker may become animated on occasion, bobbing and weaving in and out of the mic’s pickup pattern. We can reduce these level fluctuations by choosing an omnidirectional pattern over a cardioid, but in so doing, we increase the likelihood of feedback. It’s a risk that must be weighed if choosing a stationary mic in the pulpit.

Fluctuations in level caused by an animated speaker can be virtually eliminated by choosing a mic that remains close to the speaker’s mouth no matter how much they move around. A lapel mic is superior to a pulpit mic in this regard, but is still less than optimal. If its polar pattern is cardioid, level can fluctuate if our speaker turns his or her head to the side. If its polar pattern is omnidirectional, we suffer more risk of feedback. A headset mic helps to ensure that the mic is kept in close proximity to the speaker’s mouth no matter what motion they exhibit. We still have to choose between omni and cardioid polar patterns, but in this case, the risk of feedback in the case of an omni is mitigated a bit because gain can be reduced due to the proximity of the capsule to the speaker’s mouth. Either way, a headset mic is probably the best option in terms of increasing intelligibility.

Knob Turning

Once we have chosen and placed a microphone, we move on to the electronic segment of the system. Equalization is a really important part of this segment. Try this experiment (it’s most easily done with a recorded voice in a digital audio workstation): Filter everything above 500 Hz with a pretty steep LPF (24 dB/oct or greater) — intelligibility will be very low, as there’s nothing but vowel sounds down there. Now move the cutoff frequency up to 1k Hz — clarity improves a bit. Now up to 2k Hz — even better. Now up to 4k Hz — should be pretty solid at this point. Now up to 8k Hz — even better! Finally — up to 12k Hz — not much notable difference from our 8k Hz filter. Conclusion — we can’t get by with only the mids and low-mids — vowel sounds alone won’t cut it. We have also learned that we don’t gain that much additional clarity above a certain point. The sibilant frequencies from the high-mids into the highs are the ones that give us clarity. Sibilance is not all bad — it’s not necessarily the enemy we’ve all learned to fear and loathe. Of course there is a such thing as too much, hence the existence of de-essers, but there is also a such thing as too little. We just need to find the balance. I recommend using a moderately steep HPF to attenuate thumps, bumps, and plosives — typically around 75 Hz, but possibly higher for female speakers.

I also recommend the age-old, tried-and-true method of cutting before boosting. The elimination of undesirable frequency components can leave us with a nice, intelligible signal. There’s typically mud from about 150-250 Hz — cutting there can have a profound clarifying effect. If there’s any nasal resonance, it’s likely in the 500-1,000 Hz domain, and attenuating a frequency or two in that range can improve clarity (and may sound more flattering for speakers who tend toward a nasal thin-ness). Getting up to 1.5-2.5 kHz, we enter into the domain of the intelligibility frequencies. Here’s where balance is the key word. Too much can sound overly sibilant, but too little sounds muffled and unclear. We also need to sufficiently present the segment from 2.5 kHz up to the limit of frequencies audible to humans — again, in a balanced way — too much is overly crisp and too little is muffled. Equalizing for intelligibility requires a little extra focus on the domain north of about 1.5 kHz.

We may discover in the course of ensuring that these higher frequencies are well-represented that some ranges are over-represented. This is the bad sibilance that your mother warned you about, and we have developed means by which to keep it under control — the venerable de-esser. Keeping our mic in close proximity to the speaker’s mouth has a multitude of benefits, but it does come with a potential negative — harshness and excessive sibilance. We can diminish this excess with frequency-limited compression — a process known as de-essing. It’s important to find a proper balance here — too much de-essing reduces intelligibility, but too little fails to control harshness. Spend some time making sure de-essers are properly configured for each speaker who may take to the pulpit.

Consistency of level is a very important attribute of intelligibility — and our go-to tool to increase it is the compressor. Reduction of dynamic range can literally be defined as making level more consistent, and it’s right in line with our goal of making every syllable of every word clear and audible. As with all tools, we must be careful — we don’t want the compressor constantly hammering away — crushing the dynamic range of the speaker into oblivion. Gentle, transparent compression sounds more natural. If it’s possible to do so, using more than one stage of compression in a serial signal path is a nice way to accomplish this. The first compressor tames the loudest peaks, feeding its output into the next compressor, which further smooths out the level. A limiter at the end of the entire vocal chain can be helpful, but easy does it — we’re not going for a hard-sell broadcast ad voiceover sound here — our speaker needs to have sufficient dynamic range.

The Acoustics Side

Moving beyond electronic methods of increasing intelligibility, we need to look at acoustics. A principal enemy of intelligibility is natural reverberation. When we hear speech in a sufficiently large and reverberant space, amplified or not, the clean, clear source right from the speaker’s mouth competes with reflections from the room’s surfaces. Every room where church services happen will exhibit some kind of reverberance. Our goal is to maximize the ratio between the level of the original source of the voice with the level of the reflections from the room. As the SPL of the reflections will increase if we increase the level of the original source, we can’t just turn the voice up louder. We have to deal with the reflections.

There are volumes of information written about how to tamp down the reflections in an overly reverberant space, so we won’t go into excruciating detail on the how-to… other than to say that we’ll want to accomplish two main things: attenuate reflections by covering large, reflective surfaces with absorptive materials, and ensure that our speakers are placed in such a way that they’re directing the important intelligibility frequencies toward the ears of our congregation, and (hopefully) not toward reflective surfaces. We’ll want speakers with controlled directivity — giving us the capability of directing sound energy toward our congregation and away from reflective surfaces. By taking these points into consideration, we can further increase intelligibility beyond the improvements made possible by electronic solutions alone.

If all these suggestions are deployed, substantial increases in intelligibility can be achieved, and the congregation will be thankful that they can hear the words of our speakers clearly.

John McJunkin is the chief engineer and staff producer in the studio at Grand Canyon University.