I've got a SUSE box with 8GB RAM and Reiserfs filesystem which has been running smoothly for over 4 years with no o/s and h/w related problems. The box serves a couple of (database driven) sites of low to moderate traffic which incurs low i/o, cpu and memory utilization.
Recently the machine hanged 3 times in the time span of 10 days. This has happened in irregular times (e.g not every time at 00:00 o clock). CPU, memory and HD are heavily underutilized and I've validated that these where also underutilized at the time of the halt so the sites are not responsible.
Every time the box hangs it can only respond to ping but no other service is usable (ssh, www etc). I then reboot the box and everything returns to normal (until the next halt).
What I've found in /var/log/boot.msg (possibly happening before and during the halt) in all 3 incidents is Filesystem is NOT clean
and then a Replaying journal
which seems to do a lot of work but never gets to 100%:
Reiserfs super block in block 16 on 0xfd03 of format 3.6 with standard journal
Blocks (total/free): 786432/540858 by 4096 bytes
Filesystem is NOT clean
Replaying journal: Trans replayed: mountid 39, transid 12424272, desc 7381, len 9, commit 7391, next trans offset 7374
Replaying journal: | | 0.1% 1 trans
Trans replayed: mountid 39, transid 12424273, desc 7392, len 9, commit 7402, next trans offset 7385
Trans replayed: mountid 39, transid 12424274, desc 7403, len 9, commit 7413, next trans offset 7396
Trans replayed: mountid 39, transid 12424275, desc 7414, len 9, commit 7424, next trans offset 7407
Replaying journal: | / 0.5% 4 trans
Trans replayed: mountid 39, transid 12424276, desc 7425, len 8, commit 7434, next trans offset 7417
Trans replayed: mountid 39, transid 12424277, desc 7435, len 9, commit 7445, next trans offset 7428
Trans replayed: mountid 39, transid 12424278, desc 7446, len 9, commit 7456, next trans offset 7439
Replaying journal: | - 1.0% 7 trans
This went on to 33% on the first incident, and to 58% on the 3rd incident.
Could the halt of the system be reiserfs related?
Any ideas on where should I look at next?
thanks a lot