2

I am using Asterisk to interact with analog telephony devices that can be programmed and tested with DTMF interaction.

Some of these guys speak rather quickly. Too quickly, you could convincingly argue; I'd be right with you there. And yet, Asterisk is perfectly capable of hearing the tones, and if I'm lucky enough to get a pure stream with in-band DTMF audio, I can recognize even really fast tones very succesfully.

The problem arises when Asterisk (or another telephony system) decides it needs to recognize and regenerate the DTMF. I realize that this is important to do when translating e.g. to/from out-of-band DTMF, but I'm not sure why it seems to be the default action to do this, and in particular why it is often regenerated with lengthy durations (e.g. 100ms; thankfully, in Asterisk, this can be changed, although it can involve a recompile) that is almost guaranteed to mean loss of digits. Others have reported issues where in-band conversion to out-of-band has resulted in duplicated digits, even though the conversion was not necessary.

So my question is: why is this the M.O. for telephony systems? Why not leave in-call DTMF alone unless translation is explicitly required?

zigg
  • 163
  • 1
  • 2
  • 12
  • 1
    Interesting question, though "why does X do Y in this manner?" type questions are typically more appropriate question for topic-specific venues, perhaps the asterisk-users mailing list. – EEAA Feb 05 '13 at 13:44

1 Answers1

1

Take a high-fidelity CD recording of your favorite song.
Record it using the cheapest microphone you can find.
Encode the recording with a lousy 8-bit audio codec optimized for spoken words.
Play the recording back through a cheap speaker (and wiggle the wires).

If you listen to the CD and the chain above side-by-side you'll hear how badly mangled things get in telephony. Now imagine that instead of a song you recorded DTMF tones and were trying to play them back and get a computer to recognize them.

This is why most VoIP systems re-encode DTMF tones using an out-of-band channel (like RFC 2833) -- the compression, network jitter, latency, and potential packet loss make audio-encoded DTMF prone to failure.
By sending the DTMF tones as out-of-band data they can be reinserted into the audio stream at the endpoint closest to the PSTN, minimizing the risk that the tones will be mangled.

Why 100ms? Because some telephone lines or remote ends have trouble with shorter tone durations (if you've ever called a touch-tone system over a noisy land line you've probably held a button for a few seconds in frustration to get the system to recognize the tone).
(100ms is probably too long - 20-50ms is more than adequate)


You don't have to use out-of-band signaling -- moist VoIP systems will deal with in-band signaling (you typically have to set a parameter on your phone and your server to do so, and you must use high-quality codecs (or disable compression entirely if you want a real shot at reliability).
Most people deploying them elect to use RFC 2833 (and re-encode DTMF received in-band) instead because it is substantially more reliable.

voretaq7
  • 79,345
  • 17
  • 128
  • 213
  • So the reason that some systems always do this regeneration, even if VoIP technologies are not actually involved, is that they are engineered to treat everything out-of-band? – zigg Feb 13 '13 at 16:53
  • @zigg I don't know of any non-VoIP system that regenerates DTMF tones. A POTS PBX simply passes the audio (including DTMF) that it hears. VoIP systems regenerate for the reasons I mentioned (compression, jitter, latency, packet loss mangling the audio channel) – voretaq7 Feb 13 '13 at 17:13
  • I've seen Asterisk do this if local channels are involved in a call (DAHDI-to-DAHDI may native bridge and then this does not happen.) I don't have hands-on experience with other PBX systems, but at times it seems we're dealing with systems that are regenerating. Cellular networks are also a culprit, though their architecture could include out-of-band DTMF as you describe. I'm not sure. – zigg Feb 13 '13 at 21:06