14

Has anyone had any issues with logrotate before that causes a log file to get rotated and then go back to the same size it originally was? Here's my findings:

Logrotate Script:

/var/log/mylogfile.log {
    rotate 7
    daily
    compress
    olddir /log_archives
    missingok
    notifempty
    copytruncate
}

Verbose Output of Logrotate:

copying /var/log/mylogfile.log to /log_archives/mylogfile.log.1
truncating /var/log/mylogfile.log
compressing log with: /bin/gzip
removing old log /log_archives/mylogfile.log.8.gz

Log file after truncate happens

[root@server ~]# ls -lh /var/log/mylogfile.log
-rw-rw-r--  1 part1 part1 0 Jan 11 17:32 /var/log/mylogfile.log

Literally Seconds Later:

[root@server ~]# ls -lh /var/log/mylogfile.log
-rw-rw-r--  1 part1 part1 3.5G Jan 11 17:32 /var/log/mylogfile.log

RHEL Version:

[root@server ~]# cat /etc/redhat-release 
Red Hat Enterprise Linux ES release 4 (Nahant Update 4)

Logrotate Version:

[root@DAA21529WWW370 ~]# rpm -qa | grep logrotate
logrotate-3.7.1-10.RHEL4

Few Notes:

  • Service can't be restarted on the fly, so that's why I'm using copytruncate
  • Logs are rotating every night, according to the olddir directory having log files in it from each night.
drewrockshard
  • 1,753
  • 4
  • 20
  • 27

4 Answers4

20

This is probably because even though you truncate the file, the process writing to the file will continue writing at whichever offset it were at last. So what's happening is that logrotate truncates the file, size is zero, process writes to the file again, continuing at the offset it left off, and you now have a file with NULL-bytes up to the point where you truncated it plus the new entries written to the log.

od -c after truncate + sudden growth, generated output along the lines of:

0000000  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0  \0
*
33255657600  \0   C   K   B   -   s   e   r   v   e   r       [   h   t   t
33255657620 <more log output>

What this says is from offset 0 to 33255657600 your file consists of null bytes, and then some legible data. Getting to this state doesn't take the same amount of time it would take to actually write all those null-bytes. The ext{2,3,4} filesystems support something called sparse files, so if you seek past a region of a file that doesn't contain anything, that region will be assumed to contain null-bytes and won't take up space on disk. Those null bytes won't actually be written, just assumed to be there, hence the time it takes to go to 0 to 3.5GB don't take a lot of time to. (You can test the amount of time it takes by doing something like dd if=${HOME}/.bashrc of=largefile.bin seek=3432343264 bs=1, this should generate a file of over 3GB in a few milliseconds).

If you run ls -ls on your logfiles after they've been truncated and had a sudden growth again, it should now report a number at the beginning of the line which represents the actual size (in blocks occupied on disk), which probably is orders of magnitude smaller than the size reported by just ls -l.

Kjetil Joergensen
  • 5,854
  • 1
  • 26
  • 20
  • I don't think so - in seconds, the log file goes from 0 bytes to 3.5GB. The total time it takes for logrotate to complete is about 5 minues. The log files isn't written to that fast AND the size is always the original size, which seems too much coincidental. – drewrockshard Jan 12 '11 at 00:18
  • There should be an easy way of verifying this, inspect the file and see if there's actually anything in it. Start out by running od -c filename | head -n 100. Chances are it'll tell you in a few lines that there's a truckload of null-bytes, plus the latest log-entries. – Kjetil Joergensen Jan 12 '11 at 00:35
  • If this is the case then can we test whether cat /dev/null > /var/log/mylogfile.log truncates the file in a way that solves this issue, rather than using logrotate built in copytruncate? – mtinberg Jan 12 '11 at 00:43
  • 2
    The problem doesn't lie with the way the file is truncated, the problem lies with how the program writes the logfile. For truncating the logfile to be a viable approach, you'll have to get the program writing to the logfile somehow seek to position 0. It's possible that a filesystem that doesn't support sparse files wouldn't have this proble, although I don't have any way of testing that right now. – Kjetil Joergensen Jan 12 '11 at 01:03
  • This was my findings, still not sure why this is having issues. http://sprunge.us/LAXY - The first column goes from 33255530660 -> 0000000 -> 17362070000 -> 0000000 -> 33255657600. The command I ended up running was `while true; do od -c /home/part1/kbServer/ckbServer/CKBServer.log | head -n 100;done;` so that I could continue reading in the file. – drewrockshard Jan 12 '11 at 02:45
  • The left column is offset. What would be nice would be to have some delimiter of each run. The left column is offset so, 33255530660 looks to be "dangling", that is, the beginning of that od -c | tail .. run is gone. However, if you see the 000000 -> 33255657600 run, what this tells you is. The first 16 bytes consists of NULL. Then, there's a asterisk in the left column saying that the entries from the last offset are repeated, Then you get to 33255657600 which actually contains any legible data. So, yes, the case is that what writes to the file continues at the offset it were at. – Kjetil Joergensen Jan 12 '11 at 08:46
  • Correct, but still does not explain why I got from 0 bytes to 3.5GB in less than 5 minutes. Additionally, it does not explain why it goes back to the same original size. It's ALWAYS the same original size. – drewrockshard Jan 12 '11 at 14:02
  • 0 -> 3.5GB in less than 5 minutes incorporated into answer, as it's a bit too large for a comment. When you say "the same original size" is this as reported by: ls -l or ls -lh, the latter does an awful lot of rounding which would hide a few lines appended to a file within rounding errors. – Kjetil Joergensen Jan 12 '11 at 16:18
  • 1
    This final answer actually made more sense and the fix will probably end up being to restart the application. – drewrockshard Jan 12 '11 at 21:27
  • Please also note that, opening file in append mode is the solution here in case you have such a control on the application, as detailed in http://serverfault.com/a/522687/387313. Because O_APPEND triggers seek to the end of file before each write, see http://man7.org/linux/man-pages/man2/open.2.html – turkenh Nov 24 '16 at 12:39
2

I am extremely confident that Kjetil has hit it. Drew, you may not yet be convinced by his explanation but I urge you to read carefully what he's said.

If you accept it, the fix is either to stop and restart your application when the logs are rotated, or use a tool like apache's "rotatelogs", where you feed the log output to the tool via a pipe, and the tool takes care of rotating the logfile every so often. For example, one of my apache instances logs with

ErrorLog "|/usr/sbin/rotatelogs /www/logs/error_log 604800"

which causes a lot of logfiles with names like

-rw-r--r--    1 root     root         4078 Dec 21 01:04 error_log.1292457600
-rw-r--r--    1 root     root         4472 Dec 29 08:41 error_log.1293062400
-rw-r--r--    1 root     root        78630 Jan  4 12:57 error_log.1293667200
-rw-r--r--    1 root     root        15753 Jan 12 01:10 error_log.1294272000

to appear without restarting apache; I can then compress them manually after the fact. Note how the rotation is done every week, which is every 604800 seconds, that being the argument passed to rotatelogs.

If you can't stop and restart the app, and it can't log via a pipe, then I think you've got a real problem. Perhaps others will have suggestions.

MadHatter
  • 78,442
  • 20
  • 178
  • 229
0

it would be really great if you could send the whole logrotate.

Why try to use kill -HUP ? (Classical reloading not restarting) method.

Also ... check with lsof who is accessing the file.

Nikolaidis Fotis
  • 1,994
  • 11
  • 13
  • Can't use `kill -HUP` as this application cannot be touched in any way - it's a sensitive application that I do not own (I don't even manage it - I only manage the OS side) so I need to be able to do the logrotations this way. – drewrockshard Jan 12 '11 at 14:04
  • 1
    **Sorry, original message got submitted early, here's the rest of my original message:** I logged into the system this morning and the file is now back to normal after the scheduled logrotate in `/etc/cron.daily` was initiated. Question to all: Is there something that the logrotate script does differently than running logrotate manually? My logrotate script literally looks like `/usr/sbin/logrotate /etc/logrotate.conf`. This is quite baffling. – drewrockshard Jan 12 '11 at 14:17
0

Just use ">>" which means append instead of ">" which means create from your scripts that write to this file. I had the exact same issue and I fixed it using append in my script.

SomeScript.sh >> output.txt

Hope that is clearer.

user578558
  • 11
  • 1
  • There is no indication that writing to the log file is done using `>` from a script. Moreover this answer is confusing because there is a major difference between `>` and `>(` `)` in scripts. Finally if the code doing the writing is updated, it would be much better to have it simply start writing to the new logfile after `logrotate` has done its thing. – kasperd Apr 09 '16 at 17:13
  • I edit my answer so it's clearer. – user578558 Apr 10 '16 at 22:02