4

What technical measures are used to prevent recorded voice samples from being used for authentication on voice recognition/id systems?

A client wanted to simplify user logins and mentioned the ATO as an example. I am concerned about how easy it could be to bypass voice authentication.

schroeder
  • 123,438
  • 55
  • 284
  • 319
David
  • 53
  • 2
  • The ATO system was raised as an example by a client wanting to "simplify" customer logins. My background is network security, including authentication systems, but I have no experience with voice-authentication. It seems to me that it may have some weaknesses as I assume it needs to allow for organic variations in human voice quality, phone line quality, background noise etc. It seems to need a broader tolerance of what is acceptable for authentication (less secure) - my initial thoughts, anyway. I am grateful for input from those with more expertise in the area. – David Jan 29 '18 at 01:36
  • @David a little more context and a little more focus to the question would have been nice. I added the details in your comment to your question. – schroeder Jan 29 '18 at 12:19
  • IMO , it is even helpless even against synthesis voice. As @shroeder mentioned, you need to define the context. Because future research can dump everything including the kitchen sink to authenticate the voice is not from a machine (maybe the machine ask you pick up a song and make sure you sing the bad bad note to make sure you are you? ) – mootmoot Jan 29 '18 at 12:41

1 Answers1

5

Nothing, except loss of fidelity due to the recording and playing back, provided the system is amenable to a replay attack.

But if you do acquisition and playing back at a higher fidelity than the voice recognition system was built for, the latter won't have a clue.

It might be possible to analyze echoes and harmonics: a human phonatory system does not produce sounds from a single point in space, while a loudspeaker does. This would require several sensitive microphones placed in different positions, to be able to calculate time-of-flight for different phonemes.

Challenge/Response

Another possibility is if the attacker has only access to a fixed recording and we can also do voice recognition.

I think I saw this in some 007 film, with the guy approaching a voice-activated door and fiddling with his watch, from which the 'Nice party. I recommend you the shrimp salad...' captured the evening before in the villain's voice issues forth, unlocking the door.

But what if the door had asked, 'Repeat after me: horse battery staple correct'? The shrimp salad wouldn't have cut it.

So:

  • the voice is enrolled
  • the user is asked to pronounce a certain sequence, different every time
  • the sequence and the voiceprint must match.

This reduces the chances of a replay attack because even if someone recorded my voice saying '577892', they wouldn't be able to pronounce in my voice '297779'. Or would they? With a large enough sample and voice synthesis technology similar to Loquendo TTS, it is possible to have a computer say anything in my voice. With only a few words or digits, the attacker doesn't even need that much technology.

The need to avoid both false negatives and false positives, added to the background noise, requires threading a very difficult needle: you could reject exactly identical phonemes as being recordings, but background noise (either real or faked) would make this very hard - two playbacks of the same sound would be acquired as different, while the same person's voice will naturally generate almost identical phonemes.

Over a telephone, any attempt to distinguish between "real" and "artificial" phonation will fail, because the phonation will always be artificially flattened by the sender's microphone.

I am in no way an expert in artificial voice fakery, but I'm quite confident that a very reasonable budget to acquire voice samples, recording equipment and a voicesynth framework will allow you to bypass any such voice-authentication over the phone. Against an unprepared opponent armed with just a tape-recorder, voiceprint plus challenge/handshake will probably always win.

LSerni
  • 22,521
  • 4
  • 51
  • 60