1

I have the following script, that runs normally when I type the script name at the prompt (logscript):

#!/bin/sh
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`
echo DSS1 > serverlog_${filedate}.txt
grep "^$dvar" oasErrLog >> serverlog_${filedate}.txt
echo CMX1 >> serverlog_${filedate}.txt
ssh GVECMX1 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo CMX2 >> serverlog_${filedate}.txt
ssh GVECMX2 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo XIS1 >> serverlog_${filedate}.txt
ssh GVEXIS1 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
echo XIS2 >> serverlog_${filedate}.txt
ssh GVEXIS2 grep "^$dvar" /home/gve/log/oasErrLog >> serverlog_${filedate}.txt
scp serverlog_${filedate}.txt "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"
#rm serverlog_${filedate}.txt

Sample normal output is:

DSS1
01/11/10 03:00:08.139 XIS run_backupserver - Startup of XISBACKUP backupserver complete.
CMX1
01/11/10 04:30:05.710 OMNICOMM 30274 - boatimesync: error 3 from boaRx rtu 35 pretry 1 {protocols/boa/boa_command.c:601}
01/11/10 04:30:12.117 OMNICOMM 30274 - CRC (0FFC) does not match calculated CRC (03B0) for remote JRS. headerLength=5 dataLength=0 crcByteOffset=2  functionCode=1  {protocols/boa/boa_io.c:1177}    
CMX2
XIS1
XIS2
01/11/10 03:00:10.563 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

BUT when I set up a CRON job, it runs and scps the file, but the contents are wrong, AND the file isn't on the server (when the rm line is commented out like I showed above). Here is the output I'm getting, but NOTE: the output changes, it varys in what it outputs:

DSS1
CMX1
01/11/10 001/11/10 04:30:05.710 OMNICOMM 30274 - boatimesync: error 3 from boaRx rtu 35 pretry 1 {protocols/boa/boa_command.c:601}
01/11/10 04:30:12.117 OMNICOMM 30274 - CRC (0FFC) does not match calculated CRC (03B0) for remote JRS. headerLength=5 dataLength=0 crcByteOffset=2  functionCode=1  {protocols/boa/boa_io.c:1177}
CMX2
CMX2
CMX2
CMX2
XIS1
XIS1
XIS1
XIS1
XIS2
01/1101/11/10 001/11/10 03:00:10.563 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

Any ideas on why the CRON job is not running this exactly as the system runs it when the command is typed in manually?

EDIT: I have modified the script to loop and with all absolute addressing and modified the CRON file with SHELL, PATH and HOME variables, but the output is still erratic, here's the script now:

#!/bin/sh

### internal variable definitions
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`

# add the prefix of new hosts into the string below
# which will be expanded later into GVE(whatever) while looping
HOSTLIST="DSS1 CMX1 CMX2 XIS1 XIS2"

# main Loop
for SUFFIX in $HOSTLIST
do
  echo $SUFFIX >> /home/gve/log/serverlog_${filedate}.txt
  ssh GVE$SUFFIX grep "^$dvar" /home/gve/log/oasErrLog \
    >> /home/gve/log/serverlog_${filedate}.txt
  echo "\n" >> /home/gve/log/serverlog_${filedate}.txt
done

# transfer and delete file
scp /home/gve/log/serverlog_${filedate}.txt \
  "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"
#rm serverlog_${filedate}.txt

and here is the output:

DSS1
01/1201/12/10 03:00:08.323 XIS run_backupserver - Startup of XISBACKUP backupserver complete.
1
01/12/101/12/10 00:00:37.003 agc - dbioLower: DNP prev cmd may still in prog, name NPC_GT3_GOV raiseTimeout 1250 lowerTimeout 2500 curtime(1263286837:3) cmd_time(1263286834:807)
01/12/10 02:14:57.545 OMNICOMM 1562 - CRC (F110) does not match calculated CRC (1110) for remote ARS. headerLength=5 dataLength=10 crcByteOffset=7  functionCode=2  {protocols/boa/boa_io.c:1177}


CMX2


XIS1


XIS1


XIS2
01/12/101/12/10 03:00:10.408 XIS run_backupserver - Startup of XISBACKUP backupserver complete.

Notice the messed up dates on some lines, the '1' instead of 'CMX1' and the duped 'XIS1'.

FINAL EDIT:

It looks like that somehow CRON has spawned multiple processes that were tripping over each other. After kill all applicable processes it straightened out. CRON has a history (if you do some web-searching on it) of buggy multiple-process spawns, so beware.

Lance Roberts
  • 401
  • 3
  • 12
  • 29

2 Answers2

3

CRON does have its own environment.

Did you install the job with crontab -e as the user running the job? How was the job added?

Also, a little re-do of the script, with looping; this should work OK on your setup.

#!/bin/sh

### internal variable definitions
dvar=`date +"%m\/%d\/%y"`
filedate=`date +%b%d%Y`

# add the prefix of new hosts into the string below,
# which will be expanded later into GVE{whatever} while looping
HOSTLIST="DSS1 CMX1 CMX2 XIS1 XIS2"

# main processing loop
for SUFFIX in $HOSTLIST
do
  echo $SUFFIX >> serverlog_${filedate}.txt     
  ssh GVE$SUFFIX grep "^$dvar" /home/gve/log/oasErrLog \
    >> serverlog_${filedate}.txt
done

scp serverlog_${filedate}.txt \
  "GVEXOSR2:C:/Documents\ and\ Settings/gve/Desktop/logs"

Follow-up to 2nd attempt:

Ok, so something is definitely horked. The fact that you got 2x XIS1 is a good indication that either buffers are not being written correctly, or the shell itself is the culprit. The loop should be isolating each host as it runs, so unless you've got unflushed pipes/buffers/what-have-you lying around, it should not show XIS1 two times in a row. Try explicitly using #!/bin/bash as the shell instead of sh, sometimes vendors re-hook sh to something other than bash (and the loop is a bash-ism, so it could cause problems). Also, put a sync just before the done in the script, to see if it is a buffering issue.

Avery Payne
  • 14,326
  • 1
  • 48
  • 87
  • I didn't realize I could loop like that; I'll convert my code over to that. The testing takes a while since CRON won't immediately act on my updates, I seem to have to give it a lot of time to realize that it has a different time to act on. – Lance Roberts Jan 11 '10 at 23:17
  • `sudo service cron restart` should address that. – Avery Payne Jan 12 '10 at 00:38
  • Thanks, I don't have sudo, but su'ed in and did the cron restart, that speeds up testing a lot. See my edit above on current status. – Lance Roberts Jan 12 '10 at 18:11
  • I don't have bash to use, just csh and the Bourne shell. I put the sync in, but it didn't solve any of the output problems (though I'm sure you're right about it being a buffering problem, since the garbage that get's output varies). – Lance Roberts Jan 12 '10 at 22:10
3

First rule of cron is to set up several things you expect. One, what directory you are in (explicitly 'cd' to it). Two, the path you expect (in the crontab, PATH=...) and three, where the mail goes (if you want to change it.)

For example:

SHELL=/bin/sh
PATH=/bin:/sbin:/usr/bin:/usr/sbin
HOME=/var/log

I would then also have each script either set up additional paths if needed, and always

cd $HOME

or to another explicit path.

Michael Graff
  • 6,588
  • 1
  • 23
  • 36
  • Thanks, I've added those in, but the output is still messed up. See my edit in the question for current status. – Lance Roberts Jan 12 '10 at 18:11
  • Does using ssh -T make a difference? This will disable pseudo-tty allocation. It should not be doing this in batch mode but will in interactive, so perhaps making this explicit will at least make it act identically in both manual testing and cron. Also, I'd change 'echo "\n"' to just 'echo' or 'echo ""' as \n is not supported by all versions, so is not portable. – Michael Graff Jan 12 '10 at 21:46
  • Tried -T, didn't help. – Lance Roberts Jan 12 '10 at 22:10
  • It is almost as if your cron script is running more than once... That is the only possible way I can see that lines are repeated like in your original example and this one. How long does this script take to run and how often do you run it? What is the crontab line you used? I bet it starts a second copy before the first one finishes. – Michael Graff Jan 12 '10 at 22:24
  • My crontab line (only other thing in there is what you had me enter): 15 13 * * * /home/gve/log/logscript – Lance Roberts Jan 12 '10 at 22:28
  • Hmm. It doesn't seem like this should cause it to run more than once, but if I had to place money that's what I'd guess is happening. This will run it daily at 13:15. Try adding an echo STARTED AT ``date`` >> serverlog_${filedate}.txt near the top and one that says it ENDED AT ``date`` at the bottom. I'd add them just before the loop starts and just after it ends. If you see two STARTED without an END between, you know your script is running more than once. – Michael Graff Jan 12 '10 at 22:35
  • I only got one Start, but I got 2 Ends. – Lance Roberts Jan 12 '10 at 23:32
  • I tried limiting the number of jobs in 'queuedefs' but that didn't help (in fact I ended up with 3 ENDS). – Lance Roberts Jan 12 '10 at 23:56
  • How many cron processes are running? More than one? :) – Michael Graff Jan 13 '10 at 07:42
  • There is supposed to be only one, but I found that the original I had setup is under 'su', but the one I've been modifying is under 'gve' (the only other account). They're set at different times so they shouldn't be tripping over each other. Having said that, when I came in this morning, the job had run at 1:06 am (completely unscheduled) and had run perfect. I guess I'll have to achieve a higher level of CRON consciousness to figure this one out. – Lance Roberts Jan 13 '10 at 17:05
  • I checked out the ps -ef, and saw that there were a ton of 'cron restart' processes and also a lot of the old ssh commands were hanging around in there also. I killed all those processes, did a new 'cron restart' (verified that there was only one running) and tried again. The results were similar, duplication of lines and two ENDS. – Lance Roberts Jan 13 '10 at 18:02
  • OK, I looked some more and found one more process to kill. This time CRON ran fine, I'll have to see how it does over the next week. – Lance Roberts Jan 13 '10 at 18:14
  • Well today it ran fine also, so I've edited my question to let everyone know what happened. Thanks for the help. – Lance Roberts Jan 14 '10 at 18:36