15

I'm trying to figure out why is kjournald going crazy on my machine. It's an 8-core box with loads of memory. It's got ~50% cpu load.

The iotop doesn't seem to point at any specific processes - some bursts of writes here and there (mostly cron starting, some monitoring stats generated, etc.) When I used sys/vm/block_dump to gather the write statistics, I got lists like this:

kjournald(1352): 1909
sendmail(28934): 13
cron(28910): 12
cron(28912): 11
munin-node(29015): 3
cron(28913): 3
check_asterisk_(28917): 3
sh(28917): 2
munin-node(29022): 2
munin-node(29021): 2

Where kjournald actions are just WRITEs.

Why is that happening? What else should I look at to limit the kjournald activity a bit? It seems disproportionate to what's actually being written.

viraptor
  • 1,264
  • 6
  • 21
  • 40

4 Answers4

15

kjournald is responsible for the journal of ext3 (journaling filesystem). It's known to use a lot of CPU under certain loads. There's not much to do except use another filesystem or disable journaling (effectively making the fs ext2).

Theoretically you can use one of the other modes of ext3 journaling and check if the CPU usage goes down, but remember that each method is a compromise on the safety of the data being written to the disk. You have ordered mode, writeback mode and 'everything' mode.

  1. Ordered: journal only metadata, but assures that data related to a metadata is saved before commiting the metadata changes to the journal.
  2. writeback: journal only metadata, but has no guarantee that the data is saved before the journal commit.
  3. journal: everything is journaled, data and metadata. It may be slow but YMMV.

You set the mode using the option data= when mounting the system, like data=ordered.

coredump
  • 12,573
  • 2
  • 34
  • 53
  • There's no sense for the point in changing journaling mode contrary to turning it off completely, but it has even less sense too. So describing what journal options do kinda pointless. – poige Feb 17 '11 at 17:37
  • 3
    Different journal modes exhibit different CPU behaviors. Some tests [here](http://www.ibm.com/developerworks/linux/linux390/perf/tuning_res_journaling.html). – coredump Feb 17 '11 at 18:45
  • 1
    @coredump, *still pointless*. There're no graphs showing CPU usage for different journaling modes, only throughput. CPU usage graph shows differences between FSes only, actually. Also, taken into cosideration rather a noticeable difference between EXT3 and Reiser3 on that graph it's clear that *overall and average* CPU footprint is analyzed, wherease @viraptor has pikes of kjournald activity. – poige Feb 18 '11 at 02:23
  • We will agree to disagree then. Only testing on his environment will show there's a difference or not on CPU usage. Also I would not recommend ReiserFS, since the government got a permanent lock on the FS author :). – coredump Feb 18 '11 at 03:01
  • @coredump, I would recommend using brain more often when making statements. "Locking author" could possibly make way of **Reiser4** tougher, but I was talking about *Reiser3*. (*Feel the difference*.) **Reiser3** was primarily supported by SuSe and it's quite matured FS nowadays. – poige Feb 18 '11 at 10:44
  • 8
    Here, take this cup of humor :\ – coredump Feb 18 '11 at 10:49
4

By default your ext3 filesystem is going to be mounted with atimes turned on. Each time a file or directory is read/accessed the filesystem will have to write back to the disks to update this atime record. This means that even if your workload is mostly read based you'll still need to hit the disks to update the access times of each file & directory, and this is my guess as to why your kjournald process was writing so many blocks.

Turning off atime's will yield a large boost to performance but will break POSIX compliance. Check out this Wikipedia article for some discussion around the criticism of atime's.

To turn off atimes just add noatime to the mount options for your filesystem, or you can remount as suggested by poige. Here's an example for your root filesystem:

mount -o remount,noatime /
  • 3
    Note that more recent kernels default to `relatime` which seems to be an acceptable compromise between `noatime` and `atime`. – Oliver May 02 '12 at 15:26
1

If perfectness of the data is not important: do this

iostat -o -a

Make sure that it's really kjournald. It what causes my server to crash.

Changing hard drive to SSD would work.

When you see kjournald writing 5-10MB of data you do

http://ubuntuforums.org/showthread.php?t=56621

sudo tune2fs -O ^has_journal /dev/sda1
sudo e2fsck /dev/sda1

where sda1 is the name of your partition

Report result in comment so I can further check.

user4234
  • 303
  • 3
  • 16
0

Not in the order to do, just to mention:

  1. mount -oremount,noatime /fs/being_over/journaled — as a quick guess-shot (you didn't show us what your mount looks like anyway)
  2. Try reducing journal size (tune2fs -J …)
  3. Switch to Reiser3 (robust for quite a long time, yeah. And no such a nasty journaling ever.)
poige
  • 9,171
  • 2
  • 24
  • 50