7

Assume sensitive audio emissions from a mechanical keyboard. These audio emissions are often sufficient to reconstruct the actual key presses that generated the sound. If the audio is compressed using a narrowband audio codec such as G.711, how much of the information is destroyed?

Put another way, can acoustic side-channel attacks ever be done using modern telephony?

forest
  • 64,616
  • 20
  • 206
  • 257
  • 2
    From [Don't Skype & Type! Acoustic Eavesdropping in Voice-Over-IP](https://www.researchgate.net/publication/308744204_Don't_Skype_Type_Acoustic_Eavesdropping_in_Voice-Over-IP): *"... In fact, we show that very popular VoIP software (Skype) conveys enough audio information to reconstruct the victim's input -- keystrokes typed on the remote keyboard. ..."*. – Steffen Ullrich Apr 27 '21 at 04:10
  • 2
    @SteffenUllrich The attack is against Skype which uses the SILK codec, which is much more capable than G.711, G.729, or other common codecs used in cellular telephony. Most VoIP software tends to use higher quality codecs than your local cell tower, SIP trunk, or whatever. – forest Apr 27 '21 at 06:49
  • @forest shouldn't you be comparing to AMR-WB and EVS in 2021? – hobbs Apr 27 '21 at 21:46
  • @hobbs I have no idea. My knowledge of cellular telephony is very primitive. – forest Apr 27 '21 at 23:07
  • @forest was about to say that: even if PHONE conversations are safe, most telecommunication is done using a number of other programs, almost all of which use a codex with more fidelity (and more potential info leak). And since OP said "modern telephony", a whatsapp call or so doesn't seem too much out of scope. – Hobbamok Apr 29 '21 at 13:55
  • 1
    @Hobbamok I suppose I should have said typical cellular telephony that can go over PSTN. I'm sure there are VoIP clients which support significantly higher sample rates, etc. – forest Apr 30 '21 at 00:07
  • @forest well, my point was that modern cell phones on modern networks *also* support codecs that are more wideband than ye olde 1980s GSM... and in fact more wideband than conventional PSTN (with the odd-to-oldtimers result that you might get better audio quality on a cell-to-cell call than a cell-to-landline one). – hobbs Jun 09 '21 at 05:53

1 Answers1

9

According to Wikipedia:

"... G.711 passes audio signals in the range of 300–3400 Hz and samples them at the rate of 8,000"

Nyquist criteria limits the top frequency to be less than half the sample rate, or less than 4 KHz in this case. Further G.711 filtering apparently cuts this down to 3.4 KHz top.

A quick impromptu experiment

Laying the microphone of a USB headset next to my very clicky old Dell keyboard and recording the sound gives me an amplitude time space recording of: Keyboard recording

Running a Fourier transform to frequency space yields: enter image description here

It looks to me like all of the subtle key impulse differentiation lies in the 5-20 KHz region, which by definition cannot be passed by G.711

This was a quick and dirty experiment, take it for what it's worth.

user10216038
  • 7,552
  • 2
  • 16
  • 19
  • 3
    "It looks to me like all of the subtle key impulse differentiation lies in the 5-20 KHz region" I fail to see why this is the case. The 5-20Khz region is less smooth but this is a spectrum, it's less intuitive than the time-domain signal. E.g. think of the spectrum of a transmission of a secret phrase encoded in morse code using an ideal tuning fork. Also G.711 surely isn't an infinite order bandpass filter (but to be honest I don't know its details). – Margaret Bloom Apr 27 '21 at 18:23