0
I would like to produce a numeric list of amplitudes from an audio file. I should be able to:
- Specify the sampling rate (16kHz, 44.1kHz, etc)
- Specify the data type of the amplitude samples (8 bit integers, 32 bit floats, etc)
- Easily parse the list so that I can import it into other tools, like Python's numpy (newline delimited, csv, etc)
- Conversely, I would also like a method to re-encode such a list into an arbitrary audio format.
I believe I have used ffmpeg to do this before, but haven't been able to find a solution. (Or maybe it was Audacity?)
I think I'm hot on the trail when I look at the set of codecs that my recent-ish ffmpeg supports (edited excerpt from ffmpeg -codecs
):
DEA..S pcm_f64be PCM 64-bit floating point big-endian
DEA..S pcm_s24be PCM signed 24-bit big-endian
DEA..S pcm_s64be PCM signed 64-bit big-endian
DEA..S pcm_s8 PCM signed 8-bit
DEA..S pcm_u32be PCM unsigned 32-bit big-endian
DEA..S pcm_u8 PCM unsigned 8-bit
The above "PCM" method seems to describe exactly what I'm trying to do, but I just need to know how to extract the samples in a parseable format.
All the commands that I've tried create files in some binary encoding that seem to require some kind of decoder to understand. Here's an example:
ffmpeg -i audio.wav -f u8 -c:a pcm_u8 -ar 16000 out.raw
ffmpeg completes this command without issue, but the output is indecipherable.
1@t-mart: It just occured to me that Numpy has its own binary data loader,
numpy.frombuffer()
, which could be used for the same purpose. – user1686 – 2019-09-12T09:02:20.693As an improvement, from things I just read in the
struct
package:struct
functions are "a tuple even if it contains exactly one item", solist.extend()
might be better than append to avoid nested dataStruct
objects expose asize
property, which is simpler than 32 // 81Yes, the struct-using code could be greatly improved for correctness and efficiency; it was mostly meant to demonstrate that it's still a very simple parser (literally a list of samples). The best option seems to be either
array.frombuffer()
which I forgot ornumpy.frombuffer()
which I didn't know previously, either of which should completely avoid creating objects for every sample. – user1686 – 2019-09-12T09:15:07.573array.frombuffer()
unfortunately only uses the native endianness of the current machine. I suppose you could fix this at ffmpeg-time to match your own. https://stackoverflow.com/a/23320951/235992 Numpy types (https://docs.scipy.org/doc/numpy/user/basics.types.html) also seem to lack specification of endianness, however you could do something like https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html#changing-byte-ordering – t-mart – 2019-09-12T09:29:02.543Yes, but it still allows you to do
if sys.byteorder != "big": arr.byteswap()
, which is a bit more annoying than having the module take care of it, but ultimately performs the same thing. (Hopefully.) – user1686 – 2019-09-12T09:40:18.343As for numpy types, according to the frombuffer docs page, it uses the same way to specify endianness (byte-order) as 'struct' does, i.e.
– user1686 – 2019-09-12T09:42:12.203>
for big-endian. The page shows an example usingnewbyteorder(">")
, but it seems you get the same result by directly callingdtype(">u4")
.