I'm storing large datasets in s3, but on a given computer in my cluster, my program only needs to read a small subset of the data.
I first tried s3fs, but it downlooads the entire file first, which takes a really long time.
Are there any s3 backed file systems that make use of the S3 API bytes parameter, so that internal read (and seek) commands only read the desired part of the file?
As a practical example if I run:
tail -c 1024 huge_file_on_s3
only the last 1kb should be requested (via the bytes parameter), meaning I should get the result really fast.
(I am not concerned with writing back to S3; only reading from it)