2

I have large numbers of small log files that are essentially write-only, unless I have to look at them for some reason. Right now, they accumulate in day-specific subdirectories in a logging folder (e.g. 2018-12-29 for yesterday, 2018-12-30 for today, etc.) and I end up tar/bzip2'ing them up later into single files-per-day.

That's not terribly convenient for me, and I was thinking that if I could create a compressed filesystem for each day, I could write directly to those filesystems, use less disk space and not have to "go back" and compress each directory into a tarball. It also makes inspecting individual files later easier because I could mount the filesystem and use it however -- use grep, find, less, etc. rather than trying to use tar to stream the data through some command pipeline.

I know I can create a loopback device of arbitrary size, but I have to know that size in advance and if I guess "too high" I end up wasting disk space with unused space and if I choose "too low", I'll run out of disk space and my software will fail (or at the very least complain very loudly).

I know I can create a sparse file, but I'm not exactly sure how that will interact with a filesystem such as extNfs or other filesystems available on Linux; it may end up expanding far larger than necessary due to backup superblocks and stuff like that.

Is there a way to create a loop-device that can take up a minimal amount of physical space on the disk?

Christopher Schultz
  • 1,056
  • 1
  • 11
  • 20
  • 1
    Why aren't you already using something like `logrotate` which can automatically rotate and compress your logs for you? – Michael Hampton Dec 31 '18 at 00:13
  • The files are already being rotated and compressed, automatically. But if I need to go back into the archive to find something, `.tar.gz` isn't exactly the most convenient format. These archives also must be encrypted, so it's even less convenient. An encrypted, compressed, mountable filesystem is the most convenient package for me. – Christopher Schultz Jan 02 '19 at 15:31

3 Answers3

3

You might create a gzip compressed ZFS pool based on plain files and store your logs on it. There would be no need to do anything else than writing the logs there.

They will, from the outset, only use their compressed size in the ZFS file systems. You will be able to read the data afterwards (grep, find, less, and so on), and even modify, delete them even if that's not part of your requirements.

Should the pool become full, you can either grow the back-end file (with the autoexpand property set to on) or add new back-end files and the file systems capacity should grow accordingly.

jlliagre
  • 8,691
  • 16
  • 36
  • I'll have to look at this. I'm not sure how good ZFS-on-Linux is right now (or, at least, on *my* Linux which happens to be Debian Stretch). Any pointers for commands/man pages I should read? I have no experience with ZFS on *any* OS. – Christopher Schultz Jan 02 '19 at 15:35
  • You can have a look to this reply: https://unix.stackexchange.com/a/396160/2594 to start familiarizing you with ZFS. – jlliagre Jan 02 '19 at 15:53
2

You should investigate the use of logrotate(8) to help manage your log files. It can be configured to rename your files to a specific date format and compress them automatically. You can also configure it to keep a specified number of logs (and many other things). Once you have it set up like you want you can basically forget about it.

Also, take a look at the tools that come with gzip/bzip2, e.g. zgrep, zless, bzgrep, bzless etc. They allow you to work with archives without you having to create pipes.

user9517
  • 114,104
  • 20
  • 206
  • 289
2

I know logrotate has been suggested for you here, but if you'd still would like to go forward with the compressed filesystem idea, why wouldn't you create those only after the day is over? Your shell script would then calculate the size of the logging folder, create the loopback device file of needed size, mount the loopback image, move the log files there, and finally unmount the loopback image.

I can feel the pain if some stupid application you cannot/are not allowed to do anything about creates millions of log files per day under some directory and you'd still need to keep those on disk for half a year or so. In that case a loopback image might be a good idea as the active amount of small files on some partition would come down dramatically.

Janne Pikkarainen
  • 31,454
  • 4
  • 56
  • 78
  • Post-day filesystem creation would be fine. I'm already delaying the tar process until after the fact, so it would just be a replacement. The problem is that I don't know how much space the compressed files will take without actually compressing them. Also, I'm not sure what the filesystem overhead will be, so I think it's not an exactly-knowable target size. "Good enough" would indeed be good enough. So a process for computing a "good enough" size is perfectly acceptable. – Christopher Schultz Jan 02 '19 at 15:33
  • The good old Stetson-Harrison method can work surprisingly well. If you take a look back at the log history, and find out for example that on average your log files do compress to about 33% of their original size, then use that as your starting number, perhaps with some slight headroom, and be happy with that. :) – Janne Pikkarainen Jan 02 '19 at 16:53
  • I've only made the briefest of tests, but mksquashfs doesn't require you to give a target size. Sure, the result is read-only, but for log files, that might not be a big problem. – Ulrich Schwarz Dec 22 '21 at 09:57