2

CentOS 4.x

I apologize in advance if this is not the appropriate place to ask this question. It pertains to a linux server / IT admin task.

I've got a log file on an old CentOS 4.x server and I want to remove log entries older than a certain date and place them in a new file for archive.

Here's an example of the log format:

2012-06-07 22:32:01,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:32:03,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:32:04,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|
2012-06-07 22:32:10,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:32:12,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:32:15,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|
2012-06-07 22:32:40,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:32:58,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:33:01,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|
2012-06-07 22:33:01,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|blah blah blah
2012-06-07 22:33:02,289 ABC:0|Foo|Foo2|4.4|1234|Some Event|123|

Essentially, I'm looking for a one-liner that will do the following:

  1. Find any events older than a provided YYYY-MM-DD and remove them from the primary log file.
  2. Take the deleted events from step 1 and put them in a new log file
  3. (Optional) Compress the new archive log file holding the deleted events.

I'm aware that there are log rotate tools that do this but this should just be a one-time task so I'd prefer not to set that up.

Additional notes:

  • If the date part it tricky or too resource intensive, an alternative would be to just keep the last X number of lines and move the rest. I was originally thinking of something like tail -n 10000 > newfile.txt but that would mean moving the "good" logs to a new file and then doing a name swap... and then I'd still need to remove the "good" entries from the archive.
  • This particular log file is pretty large (1 GB) so I'd prefer the task to be as resource and time efficient as possible.
  • The extra pipes in the log concern me and I'm not sure if I'd need extra protection in the commands to avoid that from causing problems.
Mike B
  • 11,570
  • 42
  • 106
  • 165
  • The `logrotate` package neatly solves this problem, and it should be available in your base repositories. You really should use it, since if you have to do this once, you'll almost certainly have to do it again. – Michael Hampton Dec 07 '12 at 22:00
  • What program is generating the logs? It doesn't look like a standard syslog. It seems like it would be trivial to hack something together with a python/perl script to part and split the log file. If this was an Apache log file, then I would suggest `cronosplit`. – Zoredache Dec 07 '12 at 22:00
  • @MichaelHampton I appreciate the suggestion but would like to pursue other options (if nothing else than to expand on my knowledge of bash). :-) – Mike B Dec 07 '12 at 22:03
  • @Zoredache It's a custom app/log so unfortunately I'm not sure what it's using in the background. Not Apache. – Mike B Dec 07 '12 at 22:04
  • Are you certain a log entry is always one log entry per line, and that no log entries span a line? Do all lines start with the date in `YYYY-MM-DD`. That should be pretty easy. – Zoredache Dec 07 '12 at 22:08
  • @Zoredache Yes, I'm sure that all lines start with YYYY-MM-DD – Mike B Dec 07 '12 at 22:29

1 Answers1

2

Something simple might work for you.

Assuming log entries are on a single line and the lines always start with YYYY-MM-DD then a simple script like this would split the log file by date.

logsplit: usage cat logfile | logsplit

#!/bin/bash
LOGBASEPATH=/logfilepath/logfile
while read LOGLINE ; do
  [[ -z ${LOGLINE} ]] && continue # skip empty
  dayprefix=`echo $LOGLINE | cut -d ' ' -f 1`
  echo $LOGLINE  > $LOGBASEPATH/logname.$dayprefix
done

This would nicely match up with the dateext option of logrotate so you can have one log file per day.

Zoredache
  • 128,755
  • 40
  • 271
  • 413
  • Thanks @Zoredache I deeply appreciate the prompt response but I have a few questions. 1) The log file is large (1 GB) and could potential span a couple years. If that's the case, wouldn't I have a lot of small log files? 2) Ideally I'm looking for a way to purge anything older than X days and save the purged data somewhere for safe keeping. Is that possible? 3) I thought double-quotes were needed to protect scripts from injection. Is this safe considering the pipe symbol in the log? – Mike B Dec 07 '12 at 22:36
  • 1) yes, lots of small files, this was just an example you could probably update the cut command to return `YYYYMM` pretty easily and summarize per-month. 2)once you have the files broken up, then move the old ones into your archive location 3) yes quotes may be needed, I was just hacking something together quickly and didn't really test much. See http://tldp.org/LDP/abs/html/quotingvar.html – Zoredache Dec 07 '12 at 23:04