Methods to analyse voice recordings extending 10+ hours?

1

I have a situation where I have a large mp3 file which is the output of a journalist recording device, this device is kept running for long time as I forget to press record buttons whenever speaking to people in the site where I go to get reports of incidents etc..

so basically I have a 11 hour mp3 file and currently I am analysing it to find the recordings i made. This takes several hours 4-5 hours some time.

So is there a way..
1. To get the voice recordings alone from this file
2. Eliminate or decrease volume of traffic/ or other background noices like machine sounds etc so that only voice sounds are extracted from the mp3.

Not sure if this is possible

Thanks

Siva

Posted 2011-04-27T14:09:12.510

Reputation: 145

1I'm afraid I can't help you with your current situation (although maybe if you play it back at double or more speed you can more quickly find your interviews?), but maybe in the future you should carry a notepad with you and just jot down approximately what time you talk to people. Also jot down what time you start recording, and then you can use that to jump to each voice recording almost instantly. – Kromey – 2011-04-27T16:36:13.257

Answers

2

To be clear from the beginning on: It is a highly complicated task to automatically analyze audio recordings. Trying to differentiate between speech and noise is theoretically possible, but I doubt there is a one-click solution available on the Internet. This sounds more like research work.

Also, your recording will probably not have passages of complete silence. If it were so, one could split the file at the points where there is absolutely no sound - this involves some programming as well, I can't recall any program which does that.

Finding significant parts or parts with voice

You might want to use a (free, cross-platform) program like Audacity in order to see the Waveform of the MP3. Using the Waveform you can see where "most" of the action is.

enter image description here

For example, the brownish sections I marked are the ones that exceed a certain threshold. They are most likely the ones with the voice data you are trying to find.

The other (blue) parts might not contain any relevant information or speech as they aren't as loud as the others.

Also see the gaps in between - these will help you to identify parts where really nothing is going on. You could cut the file there and split it in order to get different "interviews" (or whatever you were recording).

Noise elimination

To eliminate noise, you can try to use the Equalizer effect and filter out certain frequencies. You will need to experiment with that, as not every recording device is the same and noise conditions change.

That being said you can try to boost frequencies between 500Hz and 1kHz (or even up to 4kHz), and cut frequencies below 500Hz and above 8kHz.

Audacity also has certain noise elimination filters to remove static, hiss, hum, or other constant background noises. Experiment with those.

slhck

Posted 2011-04-27T14:09:12.510

Reputation: 182 472

I have tried audacity's waveform method already and there is no much differentiation between noise and speech in my case, only subtle difference and if i remove then I am missing out speech recordings too by mistake.. anyother way possible ? filters remove very low voice recordings too.. – Siva – 2011-04-27T14:30:55.043

Also the recordings are having very low sounded voices as I use them on meetings where people could speak casually in a low voice – Siva – 2011-04-27T14:34:42.837

@Siva That's bad. If you can't even spot the difference between speech and noise in your recordings then I guess no algorithm will be able to do it automatically. – slhck – 2011-04-27T14:54:22.257

I am keeping the question open for more inputs for couple of days.. if nothing then i will mark yours as answer – Siva – 2011-04-28T06:44:05.327

@Siva No problem, maybe somebody comes up with some ideas.. – slhck – 2011-04-28T07:16:38.283