9

When I do a journalctl --disk-usage it says something about 300MB size of the journal files but when I look at the actual text with journalctl | wc -c it's something about 28MB. Well, journald has compression and even considering the metadata like timestamp, uid, message hash and such things it seems to me like a ridiculous waste of disk space.

Can someone tell me why the journal files are so big compared to the actual text inside?

Andrew Schulman
  • 8,561
  • 21
  • 31
  • 47
Smith_33
  • 93
  • 1
  • 4

3 Answers3

7

There are two reasons. First, as @Mella mentioned, there is the difference between the current-log vs all-logs.

Second, as documented in man journalctl, there a number of output formats. You were measuring the size of the most-compact/least-detailed. To see maximal data in the systemd journal, use:

journactl --output=verbose

In my case, the compact journal output returns 32 Megs of data, while 128 MB are returned with --output=verbose and 152M are found with journalctl --disk-usage, covering both active and archived journals.

See man journald.conf to learn how to limit how much disk space journald uses if you are concerned.

Mark Stosberg
  • 3,771
  • 23
  • 27
  • 4
    imho this answer is useful, but yet kind of misleading. While true, that there is the larger *verbose* output, this cannot explain even larger on disk files. I checked an see `journald` `.journal` files being 8MiB however giving only about 56kiB **verbose** output. Clearly there has to be another explanation to why `journald` is so wasteful with disk storage ( see also https://unix.stackexchange.com/q/462266/24394 ) – humanityANDpeace Aug 13 '18 at 11:20
5
  1. They are huge, because its kind of a bug:

As it is indicated upstream and hence known to the developers of journald, the used in the binary log format is not at all very great (yet?).

  1. They are huge, because maybe the Compression is not activated

There also is a option in /etc/systemd/journald.conf named Compress=yes, which might not be active on your system, so as there being effectively no compression.

  1. The issue of archived journals does not matter here.

While in principle true that journald distinguishes between active and archived journal logs, this is a misleading reply of the other answers, as in man journalctl it states unequivically:

Output is interleaved from all accessible journal files, whether they are rotated or currently being written, and regardless of whether they belong to the system itself or are accessible user journals.

The other answer are hence misleading here.

  1. They disk usage of journalctl is huge (i.e greater than plain text files with comparable level of information - that is fields) because of some file allocation, fragmentation, anti-corruption measures.

"file fragmentation/allocation issues"

On my box, journalctl --version == "systemd 239[...]" the journal files that contain the data exists in filesizes being a multiples of 8MiB. As a consequence on my system journal file, will be 8MiB big even when only a fraction (as in one case 56kiB) of data is actually stored in it.

"anti corruption issue" According to Poettering one of the developers of journald and systemd in a case that a journal is considered to have become corrupted by journald, it wont be "fixed" but instead left as is, to prevent further problems. (see https://bugs.freedesktop.org/show_bug.cgi?id=64116#c3)

This of course means that there is a good chance that uncompressed, almost empty journal binary log files sit arround in your var log, making it effectively much much huger than a sane plaintext alternative.

0

journalctl without parameters (whose output you're measuring with wc -c) is only displaying journal entries in the active journal (I'm not sure offhand what prompts a turnover). journalctl --disk-usage is displaying the space used by the archived journals as well as the active journal.

Kefka
  • 196
  • 1
  • 8