What prevents recorded voice samples being used for authentication on other systems?

Question

What technical measures are used to prevent recorded voice samples from being used for authentication on voice recognition/id systems?

A client wanted to simplify user logins and mentioned the ATO as an example. I am concerned about how easy it could be to bypass voice authentication.

The ATO system was raised as an example by a client wanting to "simplify" customer logins. My background is network security, including authentication systems, but I have no experience with voice-authentication. It seems to me that it may have some weaknesses as I assume it needs to allow for organic variations in human voice quality, phone line quality, background noise etc. It seems to need a broader tolerance of what is acceptable for authentication (less secure) - my initial thoughts, anyway. I am grateful for input from those with more expertise in the area. — David, Jan 29 '18 at 01:36
@David a little more context and a little more focus to the question would have been nice. I added the details in your comment to your question. — schroeder, Jan 29 '18 at 12:19
IMO , it is even helpless even against synthesis voice. As @shroeder mentioned, you need to define the context. Because future research can dump everything including the kitchen sink to authenticate the voice is not from a machine (maybe the machine ask you pick up a song and make sure you sing the bad bad note to make sure you are you? ) — mootmoot, Jan 29 '18 at 12:41

LSerni · Accepted Answer · 2018-01-30T06:54:57.950

Nothing, except loss of fidelity due to the recording and playing back, provided the system is amenable to a replay attack.

But if you do acquisition and playing back at a higher fidelity than the voice recognition system was built for, the latter won't have a clue.

It might be possible to analyze echoes and harmonics: a human phonatory system does not produce sounds from a single point in space, while a loudspeaker does. This would require several sensitive microphones placed in different positions, to be able to calculate time-of-flight for different phonemes.

Challenge/Response

Another possibility is if the attacker has only access to a fixed recording and we can also do voice recognition.

I think I saw this in some 007 film, with the guy approaching a voice-activated door and fiddling with his watch, from which the 'Nice party. I recommend you the shrimp salad...' captured the evening before in the villain's voice issues forth, unlocking the door.

But what if the door had asked, 'Repeat after me: horse battery staple correct'? The shrimp salad wouldn't have cut it.

So:

the voice is enrolled
the user is asked to pronounce a certain sequence, different every time
the sequence and the voiceprint must match.

This reduces the chances of a replay attack because even if someone recorded my voice saying '577892', they wouldn't be able to pronounce in my voice '297779'. Or would they? With a large enough sample and voice synthesis technology similar to Loquendo TTS, it is possible to have a computer say anything in my voice. With only a few words or digits, the attacker doesn't even need that much technology.

The need to avoid both false negatives and false positives, added to the background noise, requires threading a very difficult needle: you could reject exactly identical phonemes as being recordings, but background noise (either real or faked) would make this very hard - two playbacks of the same sound would be acquired as different, while the same person's voice will naturally generate almost identical phonemes.

Over a telephone, any attempt to distinguish between "real" and "artificial" phonation will fail, because the phonation will always be artificially flattened by the sender's microphone.

I am in no way an expert in artificial voice fakery, but I'm quite confident that a very reasonable budget to acquire voice samples, recording equipment and a voicesynth framework will allow you to bypass any such voice-authentication over the phone. Against an unprepared opponent armed with just a tape-recorder, voiceprint plus challenge/handshake will probably always win.

What prevents recorded voice samples being used for authentication on other systems?

1 Answers1

Challenge/Response