As already mentioned, recording at 22.05kHz for spoken word isn't in itself 'bad'; but neither can it really be 'fixed' because there's no information in the recording to emphasise. You can only work with what's there already.
Some explanation...
The human voice is really at its most distinct at around 2 - 6 kHz. That's where all the consonants are & what really helps the listener to decide what's actually being said; it's also why putting your fingers in your ears reduces comprehensibility, it mainly blocks these higher frequencies.
There is information in speech above 6kHz, but it trails away a lot above that & by 11kHz there's really very little useful information left.
So - for spoken word they use 22.05kHz as the sample frequency.
There's a very complex audio analysis called the Nyquist-Shannon Sampling Theorem often just referred to as the Nyquist Limit, which basically boils down to
"The highest audio frequency that can be recorded in an audio file is half the sampling frequency."
That equates to about 11kHz on a 22.05kHz recording.
That's plenty for a human voice.
It also means there is no longer any information above that to work with, even if you do change the sampling frequency up to 44.1kHz [CD audio quality].
On to your audio book.
The problem, as I hear it, is that the reader was a bit close to the mic. This emphasises lower frequencies, due to something called the proximity effect. No need to go into that in full here, but overall it's made the recording a bit bassy.
It's also been somewhat compressed - it's had the dynamic range reduced so the quiet bits are louder & the loud bits are quieter. This ought to help intelligibility, but it wasn't done quite as well as it could have been, & tends to emphasis the bass even more. The only reasoning I can think of for doing this is it makes the reader sound "more manly, more authoritative".. but doesn't actually help intelligibility in the slightest :/
What we need to do then is reduce the bass, emphasise the highs & try to de-emphasise some of the heavy compression.
Most of this could be done in Audacity, to greater or lesser degree, but I'm more comfortable in Cubase, so let me show you in there...
Most people would tell you to Normalise the file first.
Don't do this first - you will kill your potential headroom.
If you need to do it at all, do it last.
Also note you cannot "undo" the compression that has already been applied - that would be the equivalent of getting the eggs & flour back from a baked cake - instead you can only try to mitigate it in the most heavily-affected areas.
If all you have to work with is Equalisation, then you could try reducing levels below 250Hz, gently rolling off below that. You can then try to gain some consonants back by adding in an opposite slope above maybe 2 or 3 kHz.
I spotted an irritating click, or lip-smack at about 3:40, which I simply selected & turned down to zero - you could get all clever with a de-clicker, but it wasn't worth the effort.
My weapon of choice for any rescue operation like this is a multi-band compressor.
I found a free multi band comp for Audacity, though I haven't tried it myself, so YMMV - https://www.gvst.co.uk/gmulti.htm
I use the considerably more expensive Waves LinMB but the general idea is the same. This is how I have it set up...
From the image, you can see I'm hitting the low end really hard, to try to remove that excessive boom. The middle I'm pretty much leaving untouched. The highs I've increased their output level, whilst at the same time applied a slight compression just so some of the heavier S's etc don't get too punchy. Also, at this point I haven't increased the overall volume at all - we still have plenty of headroom to play with & it's best if when you switch your effect in & out for comparison that you're not just fooling yourself with the volume change.
Quick examples -
before...
https://soundcloud.com/graham-lee-15/antal-vegh-orig?in=graham-lee-15/sets/intelligibility-fix
after...
https://soundcloud.com/graham-lee-15/antal-vegh-linmb?in=graham-lee-15/sets/intelligibility-fix
At this point, once you're happy with how it sounds, now you can normalise.
Note my examples are at a higher sample-rate purely because I cannot export directly at 22.05. This does not materially affect the result in any way.
1Can you share a sample of the audio? – Attie – 2018-07-30T18:20:33.323
1
Yes, of course: https://drive.google.com/open?id=1Sz8YF-fbDI5MoCnXuVNYyPq6-7O_rAD8
– Konstantin – 2018-07-30T18:32:26.6531
Maybe you can run it through a super sophisticated speech reconstruction model, as described here. I’m not at all familiar with the requirements though.
– Daniel B – 2018-07-31T08:29:31.883