ffmpeg - Encode audio as list of samples

All formats require some kind of a parser/decoder, however, the parser needed for the PCM format in your example is actually even simpler than that needed for CSV. -f u8 is a very straightforward format – "PCM" does not involve any compression, in your case it is literally 1 byte per sample.

This means that various built-in Python and/or Numpy functions can be used to read it. With u8 (1 byte per sample), you don't need anything extra as Python will already give you a bytearray consisting of unsigned integer values:

Note: All examples are for Python 3.

with open("out_pcm_u8.raw", "rb") as fh:
    samples = list(fh.read())

With formats like u32be, you can use the 'struct' or 'array' modules, as well as numpy.frombuffer(). All necessary information is already in the format's name and you just use help(struct) to find the matching type (> for big-endian, I for u32, i for s32). For example:

import struct

with open("out_pcm_u32be.raw", "rb") as fh:
    buf = fh.read()
    samples = [t[0] for t in struct.iter_unpack(">I", buf)]

import numpy

dt = numpy.dtype(">u4")
with open("out_pcm_u32be.raw", "rb") as fh:
    buf = fh.read()
    samples = numpy.frombuffer(buf, dtype=dt)

For completeness, the expanded version of the earlier struct example:

import struct

samples = []
with open("out_pcm_u32be.raw", "rb") as fh:
    while True:
        buf = fh.read(32 // 8)
        if buf:
            (samp,) = struct.unpack(">I", buf)
            samples.append(samp)
        else:
            break

user1686

Posted 2019-09-12T05:23:46.753

Reputation: 283 655

1@t-mart: It just occured to me that Numpy has its own binary data loader, numpy.frombuffer(), which could be used for the same purpose. – user1686 – 2019-09-12T09:02:20.693

As an improvement, from things I just read in the struct package:

The returned values from struct functions are "a tuple even if it contains exactly one item", so list.extend() might be better than append to avoid nested data
"Creating a Struct object once and calling its methods is more efficient than calling the struct functions with the same format since the format string only needs to be compiled once." Since we're repeating this call for a (large?) file, seems like a good optimization.
Also,Struct objects expose a size property, which is simpler than 32 // 8

2019-09-12T09:10:26.550

1Yes, the struct-using code could be greatly improved for correctness and efficiency; it was mostly meant to demonstrate that it's still a very simple parser (literally a list of samples). The best option seems to be either array.frombuffer() which I forgot or numpy.frombuffer() which I didn't know previously, either of which should completely avoid creating objects for every sample. – user1686 – 2019-09-12T09:15:07.573

array.frombuffer() unfortunately only uses the native endianness of the current machine. I suppose you could fix this at ffmpeg-time to match your own. https://stackoverflow.com/a/23320951/235992 Numpy types (https://docs.scipy.org/doc/numpy/user/basics.types.html) also seem to lack specification of endianness, however you could do something like https://docs.scipy.org/doc/numpy/user/basics.byteswapping.html#changing-byte-ordering – t-mart – 2019-09-12T09:29:02.543

Yes, but it still allows you to do if sys.byteorder != "big": arr.byteswap(), which is a bit more annoying than having the module take care of it, but ultimately performs the same thing. (Hopefully.) – user1686 – 2019-09-12T09:40:18.343

As for numpy types, according to the frombuffer docs page, it uses the same way to specify endianness (byte-order) as 'struct' does, i.e. > for big-endian. The page shows an example using newbyteorder(">"), but it seems you get the same result by directly calling dtype(">u4").

– user1686 – 2019-09-12T09:42:12.203

ffmpeg - Encode audio as list of samples

Answers